emnlp emnlp2012 emnlp2012-60 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Aciel Eshky ; Ben Allison ; Mark Steedman
Abstract: User simulation is frequently used to train statistical dialog managers for task-oriented domains. At present, goal-driven simulators (those that have a persistent notion of what they wish to achieve in the dialog) require some task-specific engineering, making them impossible to evaluate intrinsically. Instead, they have been evaluated extrinsically by means of the dialog managers they are intended to train, leading to circularity of argument. In this paper, we propose the first fully generative goal-driven simulator that is fully induced from data, without hand-crafting or goal annotation. Our goals are latent, and take the form of topics in a topic model, clustering together semantically equivalent and phonetically confusable strings, implicitly modelling synonymy and speech recognition noise. We evaluate on two standard dialog resources, the Communicator and Let’s Go datasets, and demonstrate that our model has substantially better fit to held out data than competing approaches. We also show that features derived from our model allow significantly greater improvement over a baseline at distinguishing real from randomly permuted dialogs.
Reference: text
sentIndex sentText sentNum sentScore
1 uk Abstract User simulation is frequently used to train statistical dialog managers for task-oriented domains. [sent-5, score-0.647]
2 Instead, they have been evaluated extrinsically by means of the dialog managers they are intended to train, leading to circularity of argument. [sent-7, score-0.588]
3 In this paper, we propose the first fully generative goal-driven simulator that is fully induced from data, without hand-crafting or goal annotation. [sent-8, score-0.306]
4 Our goals are latent, and take the form of topics in a topic model, clustering together semantically equivalent and phonetically confusable strings, implicitly modelling synonymy and speech recognition noise. [sent-9, score-0.281]
5 We evaluate on two standard dialog resources, the Communicator and Let’s Go datasets, and demonstrate that our model has substantially better fit to held out data than competing approaches. [sent-10, score-0.417]
6 1 Introduction Automatically simulating user behaviour in humanmachine dialogs has become vital for training statistical dialog managers in task-oriented domains. [sent-12, score-1.266]
7 , 2011), it has been argued that user simulation avoids the expensive, labour intensive, and error-prone experience of exposing real humans to fledgling dialog systems (Eckert et al. [sent-18, score-0.905]
8 Training effective dialog managers should benefit from exposure to properties exhibited by real users. [sent-20, score-0.567]
9 Table 1 shows an example dialog in a domain such as we consider, where the objective is to simulate at the semantic level. [sent-21, score-0.417]
10 In such task oriented domains, the user has a goal (in this case, to book a flight from New York to Osaka), and the machine is tasked with fulfilling it. [sent-22, score-0.403]
11 Notice that the user is consistent with this goal throughout the dialog, in that they do not provide contradictory information (although an ASR error is present), but that every mention of their destination city uses a different string. [sent-23, score-0.489]
12 As a result, it has only been possible to evaluate them extrinsically using dialog managers. [sent-28, score-0.446]
13 Furthermore, there is little reason to assume that because a simulator performs well with a certain dialog manager, it would perform similarly PLroacnegeudiang es L oefa thrnein 2g0,1 p2a Jgoesint 7 C1–o8n1f,e Jre jnuce Is olnan Edm, Kpior icea ,l 1 M2e–t1h4od Jsul iyn 2 N01a2tu. [sent-31, score-0.622]
14 (1997) and Levin and Pieraccini (2000); secondly we compare to a probabilistic goalbased simulator where the goals are string literals, as envisaged by Scheffler and Young (2002) and Schatzmann et al. [sent-42, score-0.314]
15 We demonstrate substantial improvement over these models in terms of predicting heldout data on two standard dialog resources: DARPA Communicator (Levin et al. [sent-44, score-0.492]
16 (1997): their Bigram model conditions user utterances exclusively on the preceding machine utterance. [sent-49, score-0.471]
17 This was extended by Levin and Pieraccini (2000), who manually restrict 72 Osaka rendered as Salt Lake City, something our model the model to estimating “sensible” pairs of user and machine utterances by assigning all others probability zero. [sent-50, score-0.512]
18 Pietquin (2004), for example, explicitly models a user goal as a set of slot-value pairs randomly generated once per dialog. [sent-53, score-0.353]
19 (2009) use large amounts of dialog state annotations (e. [sent-56, score-0.417]
20 what information has been provided so far) to learn Conditional Random Fields over the user utterances, and assume that those features ensure user consistency. [sent-58, score-0.568]
21 Scheffler and Young (2002) simulate user behaviour by introducing rules for actions that depend on the user goal, and probabilistic modelling for actions that are not goal-dependent. [sent-61, score-0.711]
22 They then map out a decision network that determines user actions at every node prior to the start ofthe dialog. [sent-62, score-0.311]
23 Agendabased user simulation, another approach from the literature, assumes a probability distribution over the user goal which is either induced from data (Schatzmann et al. [sent-63, score-0.747]
24 These models ensure consistency but restrict the variability in user behaviour that can be accommodated. [sent-69, score-0.499]
25 Furthermore, because these approaches do not define a complete probability distribution over user behaviour, they restrict possibilities for their evaluation, a point to which we now turn. [sent-70, score-0.362]
26 2 Related Work on Simulator Evaluation No standardised metric of evaluation has been established for user simulators largely because they have been so inextricably linked to dialog managers. [sent-72, score-0.833]
27 The most popular method of evaluation relies on generating synthetic dialogs through the interaction of the user simulator with some dialog manager. [sent-73, score-1.294]
28 (2005) hand-craft a simple deterministic dialog manager based on finite automata, and compute similarity measures between these synthetically produced dialogs and real dialogs. [sent-75, score-0.935]
29 (2006) use a scoring function to evaluate synthetic dialogs using accuracy, precision, recall, and perplexity, while Schatzmann et al. [sent-77, score-0.388]
30 Williams (2008) use a Cramer–von Mises test, a hypothesis test to determine whether simulated and real dialogs are significantly different, while Janarthanam and Lemon (2009) use Kullback Leibler Divergence between the empirical distributions over acts in real and simulated dialogs. [sent-79, score-0.702]
31 (2000) and Ai and Litman (2008) judge the consistency of human quality ranked synthetic dialogs generated by different simulators interacting with the IT-SPOKE dialog system. [sent-81, score-1.01]
32 (2007b) use a simulator to train a statistical dialog manager and then evaluate the learned policy. [sent-83, score-0.69]
33 There has been far less evaluation of simulators without a dialog manager. [sent-85, score-0.574]
34 The main approach is to compute precision and recall on an utterance ba73 sis, which is intended to measure the similarity between real user responses in the corpora and simulated user responses produced under similar circum- stances (Schatzmann et al. [sent-86, score-0.905]
35 However, this is a harsh evaluation as it assumes a correct or “best” answer, and penalises valid variability in user behaviour. [sent-89, score-0.337]
36 3 Dialog as a Statistical Process We consider a dialog to be a series of turns, comprised of multiple utterances. [sent-90, score-0.417]
37 Dialogs proceed by the user and the machine alternating turns. [sent-92, score-0.284]
38 Because the dialogs are of mixed initiative, there is no restriction on the number of contiguous machine or user utterances. [sent-93, score-0.672]
39 Our aim is to model the user, and are interested in the conditional distribution of the user utterances given the dialog up to that point. [sent-94, score-0.925]
40 di−1), where dn is either a machine utterance mn or a user utterance un. [sent-98, score-0.586]
41 1 Bigram Model The simplest model we define over dialogs is the bigram model of Eckert et al. [sent-101, score-0.468]
42 Since each utterance is generated independently of others in the dialog with the same context, there is no enforced consistency between utterances. [sent-105, score-0.641]
43 Each sub-model uses the maximum likelihood estimator (the relative frequency of the utterance), and unseen machine utterances place full weight on the unigram/smoothed model (ignoring the bigram probability since it has no meaning if mi−1 is unobserved). [sent-108, score-0.339]
44 2 Goal-Based Models One way to ensure consistency and more realistic behaviour is to have a goal for the user in the dialog, which corresponds to values for slots required in the problem. [sent-111, score-0.604]
45 3 An Upper-Bound on String-Goal Models The simplest variant of g has string values for each of the slots the user is required to provide in order for the dialog to succeed. [sent-115, score-0.839]
46 However, in these simulators, while the goal is probabilistic, there is no distribution over utterances given the goal because utterances are assembled deterministically from a series of rule applications. [sent-119, score-0.549]
47 74 The issue with a model of user goals as strings in this fashion is that users describe the same values in multiple ways (Osaka Japan, Osaka), and speech recognition errors corrupt consistent user input (Osaka mis-recognised as Salt Lake City). [sent-121, score-0.744]
48 If the value is unseen in the dialog, we use the full probability of the utterance from the bigram model as described above. [sent-126, score-0.272]
49 Such situations are common in the noisy dialog resources from which simulators are induced— however, any string-based goal will necessarily consider these different renderings to be different goals, and will require resampling or smoothing terms to deal with them. [sent-137, score-0.647]
50 Instead we use the simpler Mixture-of-Multinomials model, where the latent topic is sampled once per dialog instead of once per value uttered. [sent-141, score-0.524]
51 In this formulation, the latent goal for each slot, which was previously a string, now becomes an indicator for a topic in a topic model. [sent-143, score-0.283]
52 Each topic can in theory generate any string (so the model is inherently smoothed), but most strings in most topics will have only the smoothing weight and most probability mass will be on a small number of highly correlated strings. [sent-144, score-0.272]
53 We treat the slots as being independent of one another in the goal, and thus: p(g) = Yp(zs) (5) Ys Where zs is the topic indicator for some slot s. [sent-145, score-0.439]
54 For utterances defined over such slots we use a standard bigram model as in (1), and for appropriate utterances we use a topic-goal model as in (7). [sent-148, score-0.543]
55 Under this model, each dialog has associated with it a latent variable zs for each slot s in the goal, which indicates which topic is used to draw the values for that slot. [sent-154, score-0.767]
56 Let’s Go is a bus routing domain in Pittsburgh collected by having the general public interact with the CMU dialog system to find their way through the city. [sent-171, score-0.446]
57 The dialogs in both corpora are of mixed-initiative, having a free number of contiguous system and user responses. [sent-172, score-0.672]
58 We then divided the corpora into training, development and test sets as follows: Communicator contains 2285 dialogs in total, and Let’s Go contains 17992, and in each case we selected 80% of dialogs at random for training, 10% for development, and 10% for testing. [sent-175, score-0.776]
59 76 LDGC::INFPORROMVIpDlaEce IN [FdOepoarritgure cit pyla Bcoes tCoMnU, arrival place airport] Table 2: Example utterances from the two corpora. [sent-176, score-0.268]
60 In addition, users tend to be more flexible with their bus routes than they are with their flight destinations, and so values are a lot more varied throughout the course of Let’s Go dialogs than Communicator ones. [sent-179, score-0.497]
61 Free model parameters are set by a simple search on the development set, where the objective is likelihood—for the bigram model the parameters are the interpolation weights, and for the topic model we search for the number of topics and smoothing constant for the topic distributions. [sent-184, score-0.333]
62 The slots over which the topic model is defined for Communicator are dest city and orig city (this takes into account PROVIDE and REPROVIDE acts). [sent-186, score-0.521]
63 For Let’s Go we derive the model over the three properties: single place, arrival place and departure place, as opposed to the less informative slot place. [sent-187, score-0.282]
64 This metric is more suitable than the precision and recall metrics which have been previously used, because it acknowledges that, rather than each user response being “correct” at the point which it is observed, there ModelDC(A)DC(P)LG(A)LG(P) STotripnicg225702. [sent-189, score-0.311]
65 DC-A is all acts for Communicator, while DC-P is the calculated on PROVIDE acts alone (the acts on which our model is designed to improve prediction). [sent-206, score-0.336]
66 Because the models we define are full probability models, we are able to compute this metric and do not need to use an arbitrarily selected dialog manager for evaluation. [sent-209, score-0.526]
67 We report the mean per-utterance log probability of unseen data, that is, the probability of the whole heldout corpus divided by the number of user utterances. [sent-214, score-0.441]
68 Note that while the percentages match across resources, Let’s Go is much larger and thus the absolute numbers of dialogs are different, which explains the better performance on Let’s Go. [sent-219, score-0.388]
69 Right: top 6 utterances (plus fraction of samples in 10,000) generated in response to the machine utterance “REQUEST INFO dest city” and conditioned on the topic zdest city. [sent-243, score-0.616]
70 If we sample a PROVIDE INFO act, we check whether we have sampled a topic for the corresponding slot thus far in the dialog. [sent-247, score-0.32]
71 Once the topic for the slot is set, we sample values as draws from the fixed multinomial and add these to the ACT, slot pair. [sent-249, score-0.489]
72 For each row in the table (corresponding to a new dialog d), we sample a topic for the dest city and orig city as needed, and sample 10000 utterances given that topic. [sent-251, score-1.086]
73 The left hand side of the table shows the top five strings in the sampled topic, 78 while the right hand side shows the top six utterances in response to REQUEST INFO dest city. [sent-252, score-0.353]
74 Note that the proportion of utterances on the right does not match the probability of the values on the left because of the presence of other user acts besides PROVIDE dest city. [sent-253, score-0.727]
75 In the face of value synonymy and ASR errors, we define inconsistent dialogs to be ones that are locally coherent but lack the structure of a real dialog from one turn to the next. [sent-255, score-0.892]
76 (The bigram model by definition provides no help here, since the units of which dialogs consist contain the entire window of context used for the bigram model). [sent-259, score-0.548]
77 We take our training and development data from the Communicator corpus in the previous section, and create a classification problem as follows: real dialogs form positive examples in the classification problem. [sent-260, score-0.45]
78 rW} teu keep a histogram over arpepalr dpirai-log lengths, and sample a number of turns for our “fake” dialogs proportional to this histogram. [sent-262, score-0.491]
79 We then sample this many turns from the frequency distribution over turns in the real data, and create exactly as many dialogs in this fashion as real dialogs in the data. [sent-263, score-1.118]
80 The result is an equal number of dialogs comprised of real turns, of (expected) real length, but where the sequence of turns is highly unlikely to be coherent given the random sampling. [sent-264, score-0.59]
81 However, our setting is different: we do not seek to tell real dialogs from fully simulated ones, but real dialogs from scrambled versions of real dialogs. [sent-271, score-1.001]
82 In addition to lengthbased features, we add binary presence indicator for several user and machine acts highly correlated with the completion of dialogs, as well as for acts which indicate the provision of information and the proportion of all acts occupied by these. [sent-272, score-0.62]
83 Finally, we use our topic-model simulator to derive consistency features. [sent-276, score-0.278]
84 Our topics are induced from the real training dialogs, and their posterior probabilities computed for all dialogs relative to this model. [sent-278, score-0.521]
85 We take posterior probabilities of the fifty most probable topics for each of the dest city and orig city slots as features, as well as the normalised log probability of the dialog (the log probability divided by the number of user utterances). [sent-279, score-1.236]
86 Features defined over the latent topic goal space substantially improve performance in a difficult discrimination task, demonstrating that our model captures an important notion of how real dialogs appear that is not shared by the other models we consider. [sent-292, score-0.626]
87 8 Concluding Remarks and Future Work This paper presents a fully generative goal driven user simulator, the first to merge both consistency and variability within a fully probabilistic framework. [sent-293, score-0.479]
88 We evaluate our model on two task-based dialog domains, Let’s Go and Communicator, and find it to outperform both a simple bigram model and an upper bound on probability models where the strings are represented as goals, in terms of the probability the model assigns to heldout dialogs. [sent-294, score-0.719]
89 Another possible improvement is to explore the effects of introducing dependency between the slots in the user goal, which would enforce more plausible values pairings and would potentially improve the simulator’s performance. [sent-298, score-0.373]
90 The effects of a dependence assumption between the different utterances occurring in a single user turn under the act model can also be explored. [sent-299, score-0.559]
91 We would also like to use our simulator to train a POMDP-based dialog manager using a form of reinforcement learning. [sent-300, score-0.737]
92 Assessing dialog system user simulation evaluation measures using human judges. [sent-304, score-0.843]
93 Learning user simulations for information state update dialogue systems. [sent-342, score-0.386]
94 Automatic annotation of communicator dialogue data for learning dialogue strategies and user simulations. [sent-346, score-0.708]
95 Atwotier user simulation model for reinforcement learning of adaptive referring expression generation policies. [sent-360, score-0.473]
96 Datadriven user simulation for automated evaluation of spoken dialog systems. [sent-364, score-0.891]
97 A stochastic model of human-machine interaction for learning dialog strategies. [sent-372, score-0.417]
98 Quantitative evaluation of user simulation techniques for spoken dialogue systems. [sent-414, score-0.576]
99 Agenda-based user simulation for bootstrapping a POMDP dialogue system. [sent-418, score-0.528]
100 Automatic learning of dialogue strategy using dialogue simulation and reinforcement learning. [sent-426, score-0.393]
wordName wordTfidf (topN-words)
[('dialog', 0.417), ('dialogs', 0.388), ('user', 0.284), ('communicator', 0.22), ('simulator', 0.205), ('utterances', 0.187), ('schatzmann', 0.164), ('slot', 0.163), ('georgila', 0.161), ('utterance', 0.151), ('simulation', 0.142), ('osaka', 0.132), ('simulators', 0.132), ('acts', 0.112), ('topic', 0.107), ('dest', 0.103), ('dialogue', 0.102), ('slots', 0.089), ('behaviour', 0.089), ('managers', 0.088), ('act', 0.088), ('city', 0.086), ('ui', 0.08), ('zs', 0.08), ('bigram', 0.08), ('turns', 0.078), ('heldout', 0.075), ('consistency', 0.073), ('go', 0.07), ('goal', 0.069), ('eckert', 0.068), ('manager', 0.068), ('sigdial', 0.068), ('levin', 0.062), ('real', 0.062), ('goals', 0.06), ('lake', 0.059), ('variability', 0.053), ('flight', 0.05), ('arrival', 0.05), ('destination', 0.05), ('kallirroi', 0.05), ('orig', 0.05), ('salt', 0.05), ('thomson', 0.05), ('string', 0.049), ('spoken', 0.048), ('reinforcement', 0.047), ('scheffler', 0.044), ('samples', 0.041), ('probability', 0.041), ('steve', 0.041), ('topics', 0.039), ('simulated', 0.039), ('departure', 0.038), ('sensible', 0.038), ('info', 0.038), ('jost', 0.038), ('pieraccini', 0.038), ('rz', 0.038), ('perplexity', 0.038), ('yp', 0.038), ('mixture', 0.037), ('mi', 0.037), ('distribution', 0.037), ('strings', 0.036), ('oliver', 0.034), ('blaise', 0.034), ('lg', 0.034), ('dirichlet', 0.032), ('let', 0.032), ('induced', 0.032), ('multinomial', 0.031), ('place', 0.031), ('users', 0.03), ('responses', 0.03), ('booking', 0.029), ('bus', 0.029), ('circularity', 0.029), ('extrinsically', 0.029), ('renderings', 0.029), ('upper', 0.029), ('ai', 0.028), ('response', 0.027), ('actions', 0.027), ('ga', 0.027), ('sample', 0.025), ('synonymy', 0.025), ('far', 0.025), ('recognition', 0.025), ('diane', 0.025), ('eskenazi', 0.025), ('esther', 0.025), ('jung', 0.025), ('keizer', 0.025), ('litman', 0.025), ('mises', 0.025), ('intended', 0.025), ('speech', 0.025), ('asr', 0.024)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000013 60 emnlp-2012-Generative Goal-Driven User Simulation for Dialog Management
Author: Aciel Eshky ; Ben Allison ; Mark Steedman
Abstract: User simulation is frequently used to train statistical dialog managers for task-oriented domains. At present, goal-driven simulators (those that have a persistent notion of what they wish to achieve in the dialog) require some task-specific engineering, making them impossible to evaluate intrinsically. Instead, they have been evaluated extrinsically by means of the dialog managers they are intended to train, leading to circularity of argument. In this paper, we propose the first fully generative goal-driven simulator that is fully induced from data, without hand-crafting or goal annotation. Our goals are latent, and take the form of topics in a topic model, clustering together semantically equivalent and phonetically confusable strings, implicitly modelling synonymy and speech recognition noise. We evaluate on two standard dialog resources, the Communicator and Let’s Go datasets, and demonstrate that our model has substantially better fit to held out data than competing approaches. We also show that features derived from our model allow significantly greater improvement over a baseline at distinguishing real from randomly permuted dialogs.
2 0.26362938 102 emnlp-2012-Optimising Incremental Dialogue Decisions Using Information Density for Interactive Systems
Author: Nina Dethlefs ; Helen Hastie ; Verena Rieser ; Oliver Lemon
Abstract: Incremental processing allows system designers to address several discourse phenomena that have previously been somewhat neglected in interactive systems, such as backchannels or barge-ins, but that can enhance the responsiveness and naturalness of systems. Unfortunately, prior work has focused largely on deterministic incremental decision making, rendering system behaviour less flexible and adaptive than is desirable. We present a novel approach to incremental decision making that is based on Hierarchical Reinforcement Learning to achieve an interactive optimisation of Information Presentation (IP) strategies, allowing the system to generate and comprehend backchannels and barge-ins, by employing the recent psycholinguistic hypothesis of information density (ID) (Jaeger, 2010). Results in terms of average rewards and a human rating study show that our learnt strategy outperforms several baselines that are | v not sensitive to ID by more than 23%.
3 0.10731131 89 emnlp-2012-Mixed Membership Markov Models for Unsupervised Conversation Modeling
Author: Michael J. Paul
Abstract: Recent work has explored the use of hidden Markov models for unsupervised discourse and conversation modeling, where each segment or block of text such as a message in a conversation is associated with a hidden state in a sequence. We extend this approach to allow each block of text to be a mixture of multiple classes. Under our model, the probability of a class in a text block is a log-linear function of the classes in the previous block. We show that this model performs well at predictive tasks on two conversation data sets, improving thread reconstruction accuracy by up to 15 percentage points over a standard HMM. Additionally, we show quantitatively that the induced word clusters correspond to speech acts more closely than baseline models.
4 0.091720045 114 emnlp-2012-Revisiting the Predictability of Language: Response Completion in Social Media
Author: Bo Pang ; Sujith Ravi
Abstract: The question “how predictable is English?” has long fascinated researchers. While prior work has focused on formal English typically used in news articles, we turn to texts generated by users in online settings that are more informal in nature. We are motivated by a novel application scenario: given the difficulty of typing on mobile devices, can we help reduce typing effort with message completion, especially in conversational settings? We propose a method for automatic response completion. Our approach models both the language used in responses and the specific context provided by the original message. Our experimental results on a large-scale dataset show that both components help reduce typing effort. We also perform an information-theoretic study in this setting and examine the entropy of user-generated content, especially in con- versational scenarios, to better understand predictability of user generated English.
5 0.083853908 8 emnlp-2012-A Phrase-Discovering Topic Model Using Hierarchical Pitman-Yor Processes
Author: Robert Lindsey ; William Headden ; Michael Stipicevic
Abstract: Topic models traditionally rely on the bagof-words assumption. In data mining applications, this often results in end-users being presented with inscrutable lists of topical unigrams, single words inferred as representative of their topics. In this article, we present a hierarchical generative probabilistic model of topical phrases. The model simultaneously infers the location, length, and topic of phrases within a corpus and relaxes the bagof-words assumption within phrases by using a hierarchy of Pitman-Yor processes. We use Markov chain Monte Carlo techniques for approximate inference in the model and perform slice sampling to learn its hyperparameters. We show via an experiment on human subjects that our model finds substantially better, more interpretable topical phrases than do competing models.
6 0.078075498 29 emnlp-2012-Concurrent Acquisition of Word Meaning and Lexical Categories
7 0.075435489 134 emnlp-2012-User Demographics and Language in an Implicit Social Network
8 0.075254291 90 emnlp-2012-Modelling Sequential Text with an Adaptive Topic Model
9 0.064359777 49 emnlp-2012-Exploring Topic Coherence over Many Models and Many Topics
10 0.053935558 115 emnlp-2012-SSHLDA: A Semi-Supervised Hierarchical Topic Model
11 0.04969231 19 emnlp-2012-An Entity-Topic Model for Entity Linking
12 0.042008769 23 emnlp-2012-Besting the Quiz Master: Crowdsourcing Incremental Classification Games
13 0.040698592 21 emnlp-2012-Assessment of ESL Learners' Syntactic Competence Based on Similarity Measures
14 0.039611179 47 emnlp-2012-Explore Person Specific Evidence in Web Person Name Disambiguation
15 0.039575074 96 emnlp-2012-Name Phylogeny: A Generative Model of String Variation
16 0.038717911 120 emnlp-2012-Streaming Analysis of Discourse Participants
17 0.037593409 77 emnlp-2012-Learning Constraints for Consistent Timeline Extraction
18 0.037496898 15 emnlp-2012-Active Learning for Imbalanced Sentiment Classification
19 0.036984522 91 emnlp-2012-Monte Carlo MCMC: Efficient Inference by Approximate Sampling
20 0.035849851 41 emnlp-2012-Entity based QA Retrieval
topicId topicWeight
[(0, 0.152), (1, 0.043), (2, 0.04), (3, 0.119), (4, -0.2), (5, 0.041), (6, 0.005), (7, -0.102), (8, 0.023), (9, 0.049), (10, 0.132), (11, -0.067), (12, -0.313), (13, -0.085), (14, 0.043), (15, -0.065), (16, -0.015), (17, 0.048), (18, -0.041), (19, -0.128), (20, -0.059), (21, -0.147), (22, -0.339), (23, -0.051), (24, -0.004), (25, -0.131), (26, -0.058), (27, -0.136), (28, -0.015), (29, -0.13), (30, -0.041), (31, -0.172), (32, -0.067), (33, -0.046), (34, 0.052), (35, -0.026), (36, 0.044), (37, -0.003), (38, 0.009), (39, -0.018), (40, -0.002), (41, -0.072), (42, -0.036), (43, 0.06), (44, -0.014), (45, -0.013), (46, -0.033), (47, -0.131), (48, -0.041), (49, 0.011)]
simIndex simValue paperId paperTitle
same-paper 1 0.9634245 60 emnlp-2012-Generative Goal-Driven User Simulation for Dialog Management
Author: Aciel Eshky ; Ben Allison ; Mark Steedman
Abstract: User simulation is frequently used to train statistical dialog managers for task-oriented domains. At present, goal-driven simulators (those that have a persistent notion of what they wish to achieve in the dialog) require some task-specific engineering, making them impossible to evaluate intrinsically. Instead, they have been evaluated extrinsically by means of the dialog managers they are intended to train, leading to circularity of argument. In this paper, we propose the first fully generative goal-driven simulator that is fully induced from data, without hand-crafting or goal annotation. Our goals are latent, and take the form of topics in a topic model, clustering together semantically equivalent and phonetically confusable strings, implicitly modelling synonymy and speech recognition noise. We evaluate on two standard dialog resources, the Communicator and Let’s Go datasets, and demonstrate that our model has substantially better fit to held out data than competing approaches. We also show that features derived from our model allow significantly greater improvement over a baseline at distinguishing real from randomly permuted dialogs.
2 0.87652194 102 emnlp-2012-Optimising Incremental Dialogue Decisions Using Information Density for Interactive Systems
Author: Nina Dethlefs ; Helen Hastie ; Verena Rieser ; Oliver Lemon
Abstract: Incremental processing allows system designers to address several discourse phenomena that have previously been somewhat neglected in interactive systems, such as backchannels or barge-ins, but that can enhance the responsiveness and naturalness of systems. Unfortunately, prior work has focused largely on deterministic incremental decision making, rendering system behaviour less flexible and adaptive than is desirable. We present a novel approach to incremental decision making that is based on Hierarchical Reinforcement Learning to achieve an interactive optimisation of Information Presentation (IP) strategies, allowing the system to generate and comprehend backchannels and barge-ins, by employing the recent psycholinguistic hypothesis of information density (ID) (Jaeger, 2010). Results in terms of average rewards and a human rating study show that our learnt strategy outperforms several baselines that are | v not sensitive to ID by more than 23%.
3 0.43653247 114 emnlp-2012-Revisiting the Predictability of Language: Response Completion in Social Media
Author: Bo Pang ; Sujith Ravi
Abstract: The question “how predictable is English?” has long fascinated researchers. While prior work has focused on formal English typically used in news articles, we turn to texts generated by users in online settings that are more informal in nature. We are motivated by a novel application scenario: given the difficulty of typing on mobile devices, can we help reduce typing effort with message completion, especially in conversational settings? We propose a method for automatic response completion. Our approach models both the language used in responses and the specific context provided by the original message. Our experimental results on a large-scale dataset show that both components help reduce typing effort. We also perform an information-theoretic study in this setting and examine the entropy of user-generated content, especially in con- versational scenarios, to better understand predictability of user generated English.
4 0.39969832 134 emnlp-2012-User Demographics and Language in an Implicit Social Network
Author: Katja Filippova
Abstract: We consider the task of predicting the gender of the YouTube1 users and contrast two information sources: the comments they leave and the social environment induced from the affiliation graph of users and videos. We propagate gender information through the videos and show that a user’s gender can be predicted from her social environment with the accuracy above 90%. We also show that the gender can be predicted from language alone (89%). A surprising result of our study is that the latter predictions correlate more strongly with the gender predominant in the user’s environment than with the sex of the person as reported in the profile. We also investigate how the two views (linguistic and social) can be combined and analyse how prediction accuracy changes over different age groups.
5 0.3429203 89 emnlp-2012-Mixed Membership Markov Models for Unsupervised Conversation Modeling
Author: Michael J. Paul
Abstract: Recent work has explored the use of hidden Markov models for unsupervised discourse and conversation modeling, where each segment or block of text such as a message in a conversation is associated with a hidden state in a sequence. We extend this approach to allow each block of text to be a mixture of multiple classes. Under our model, the probability of a class in a text block is a log-linear function of the classes in the previous block. We show that this model performs well at predictive tasks on two conversation data sets, improving thread reconstruction accuracy by up to 15 percentage points over a standard HMM. Additionally, we show quantitatively that the induced word clusters correspond to speech acts more closely than baseline models.
6 0.27842334 90 emnlp-2012-Modelling Sequential Text with an Adaptive Topic Model
7 0.27750522 29 emnlp-2012-Concurrent Acquisition of Word Meaning and Lexical Categories
8 0.27193853 8 emnlp-2012-A Phrase-Discovering Topic Model Using Hierarchical Pitman-Yor Processes
9 0.25409082 139 emnlp-2012-Word Salad: Relating Food Prices and Descriptions
10 0.21515834 115 emnlp-2012-SSHLDA: A Semi-Supervised Hierarchical Topic Model
11 0.21109641 49 emnlp-2012-Exploring Topic Coherence over Many Models and Many Topics
12 0.19448188 78 emnlp-2012-Learning Lexicon Models from Search Logs for Query Expansion
13 0.19388245 23 emnlp-2012-Besting the Quiz Master: Crowdsourcing Incremental Classification Games
14 0.18967983 122 emnlp-2012-Syntactic Surprisal Affects Spoken Word Duration in Conversational Contexts
15 0.1652507 7 emnlp-2012-A Novel Discriminative Framework for Sentence-Level Discourse Analysis
16 0.16314033 56 emnlp-2012-Framework of Automatic Text Summarization Using Reinforcement Learning
17 0.16213441 44 emnlp-2012-Excitatory or Inhibitory: A New Semantic Orientation Extracts Contradiction and Causality from the Web
18 0.15456921 30 emnlp-2012-Constructing Task-Specific Taxonomies for Document Collection Browsing
19 0.15154657 119 emnlp-2012-Spectral Dependency Parsing with Latent Variables
20 0.14664099 41 emnlp-2012-Entity based QA Retrieval
topicId topicWeight
[(2, 0.461), (16, 0.027), (25, 0.015), (34, 0.063), (60, 0.05), (63, 0.059), (64, 0.022), (65, 0.022), (70, 0.012), (74, 0.058), (76, 0.037), (80, 0.012), (81, 0.013), (86, 0.031), (95, 0.034)]
simIndex simValue paperId paperTitle
same-paper 1 0.86377102 60 emnlp-2012-Generative Goal-Driven User Simulation for Dialog Management
Author: Aciel Eshky ; Ben Allison ; Mark Steedman
Abstract: User simulation is frequently used to train statistical dialog managers for task-oriented domains. At present, goal-driven simulators (those that have a persistent notion of what they wish to achieve in the dialog) require some task-specific engineering, making them impossible to evaluate intrinsically. Instead, they have been evaluated extrinsically by means of the dialog managers they are intended to train, leading to circularity of argument. In this paper, we propose the first fully generative goal-driven simulator that is fully induced from data, without hand-crafting or goal annotation. Our goals are latent, and take the form of topics in a topic model, clustering together semantically equivalent and phonetically confusable strings, implicitly modelling synonymy and speech recognition noise. We evaluate on two standard dialog resources, the Communicator and Let’s Go datasets, and demonstrate that our model has substantially better fit to held out data than competing approaches. We also show that features derived from our model allow significantly greater improvement over a baseline at distinguishing real from randomly permuted dialogs.
2 0.8226555 62 emnlp-2012-Identifying Constant and Unique Relations by using Time-Series Text
Author: Yohei Takaku ; Nobuhiro Kaji ; Naoki Yoshinaga ; Masashi Toyoda
Abstract: Because the real world evolves over time, numerous relations between entities written in presently available texts are already obsolete or will potentially evolve in the future. This study aims at resolving the intricacy in consistently compiling relations extracted from text, and presents a method for identifying constancy and uniqueness of the relations in the context of supervised learning. We exploit massive time-series web texts to induce features on the basis of time-series frequency and linguistic cues. Experimental results confirmed that the time-series frequency distributions contributed much to the recall of constancy identification and the precision of the uniqueness identification.
3 0.71893299 46 emnlp-2012-Exploiting Reducibility in Unsupervised Dependency Parsing
Author: David Marecek ; Zdene20 ek Zabokrtsky
Abstract: The possibility of deleting a word from a sentence without violating its syntactic correctness belongs to traditionally known manifestations of syntactic dependency. We introduce a novel unsupervised parsing approach that is based on a new n-gram reducibility measure. We perform experiments across 18 languages available in CoNLL data and we show that our approach achieves better accuracy for the majority of the languages then previously reported results.
4 0.36443156 102 emnlp-2012-Optimising Incremental Dialogue Decisions Using Information Density for Interactive Systems
Author: Nina Dethlefs ; Helen Hastie ; Verena Rieser ; Oliver Lemon
Abstract: Incremental processing allows system designers to address several discourse phenomena that have previously been somewhat neglected in interactive systems, such as backchannels or barge-ins, but that can enhance the responsiveness and naturalness of systems. Unfortunately, prior work has focused largely on deterministic incremental decision making, rendering system behaviour less flexible and adaptive than is desirable. We present a novel approach to incremental decision making that is based on Hierarchical Reinforcement Learning to achieve an interactive optimisation of Information Presentation (IP) strategies, allowing the system to generate and comprehend backchannels and barge-ins, by employing the recent psycholinguistic hypothesis of information density (ID) (Jaeger, 2010). Results in terms of average rewards and a human rating study show that our learnt strategy outperforms several baselines that are | v not sensitive to ID by more than 23%.
5 0.3562074 77 emnlp-2012-Learning Constraints for Consistent Timeline Extraction
Author: David McClosky ; Christopher D. Manning
Abstract: We present a distantly supervised system for extracting the temporal bounds of fluents (relations which only hold during certain times, such as attends school). Unlike previous pipelined approaches, our model does not assume independence between each fluent or even between named entities with known connections (parent, spouse, employer, etc.). Instead, we model what makes timelines of fluents consistent by learning cross-fluent constraints, potentially spanning entities as well. For example, our model learns that someone is unlikely to start a job at age two or to marry someone who hasn’t been born yet. Our system achieves a 36% error reduction over a pipelined baseline.
6 0.35080555 120 emnlp-2012-Streaming Analysis of Discourse Participants
7 0.34567064 114 emnlp-2012-Revisiting the Predictability of Language: Response Completion in Social Media
8 0.34526283 8 emnlp-2012-A Phrase-Discovering Topic Model Using Hierarchical Pitman-Yor Processes
9 0.3443138 124 emnlp-2012-Three Dependency-and-Boundary Models for Grammar Induction
10 0.34240273 72 emnlp-2012-Joint Inference for Event Timeline Construction
11 0.34099546 23 emnlp-2012-Besting the Quiz Master: Crowdsourcing Incremental Classification Games
12 0.32895231 81 emnlp-2012-Learning to Map into a Universal POS Tagset
13 0.32646152 89 emnlp-2012-Mixed Membership Markov Models for Unsupervised Conversation Modeling
14 0.32329735 66 emnlp-2012-Improving Transition-Based Dependency Parsing with Buffer Transitions
15 0.32278654 18 emnlp-2012-An Empirical Investigation of Statistical Significance in NLP
16 0.32219911 130 emnlp-2012-Unambiguity Regularization for Unsupervised Learning of Probabilistic Grammars
17 0.32098126 14 emnlp-2012-A Weakly Supervised Model for Sentence-Level Semantic Orientation Analysis with Multiple Experts
18 0.31859067 123 emnlp-2012-Syntactic Transfer Using a Bilingual Lexicon
19 0.31798431 59 emnlp-2012-Generating Non-Projective Word Order in Statistical Linearization
20 0.31671494 122 emnlp-2012-Syntactic Surprisal Affects Spoken Word Duration in Conversational Contexts