acl acl2013 acl2013-124 knowledge-graph by maker-knowledge-mining

124 acl-2013-Discriminative state tracking for spoken dialog systems

Source: pdf

Author: Angeliki Metallinou ; Dan Bohus ; Jason Williams

Abstract: In spoken dialog systems, statistical state tracking aims to improve robustness to speech recognition errors by tracking a posterior distribution over hidden dialog states. Current approaches based on generative or discriminative models have different but important shortcomings that limit their accuracy. In this paper we discuss these limitations and introduce a new approach for discriminative state tracking that overcomes them by leveraging the problem structure. An offline evaluation with dialog data collected from real users shows improvements in both state tracking accuracy and the quality of the posterior probabilities. Features that encode speech recognition error patterns are particularly helpful, and training requires rel- atively few dialogs.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Discriminative state tracking for spoken dialog systems Angeliki Metallinou1∗, Dan Bohus2, and Jason D. [sent-1, score-1.174]

2 edu met al l dbohu s @mi cro s o ft Abstract In spoken dialog systems, statistical state tracking aims to improve robustness to speech recognition errors by tracking a posterior distribution over hidden dialog states. [sent-3, score-2.143]

3 In this paper we discuss these limitations and introduce a new approach for discriminative state tracking that overcomes them by leveraging the problem structure. [sent-5, score-0.553]

4 An offline evaluation with dialog data collected from real users shows improvements in both state tracking accuracy and the quality of the posterior probabilities. [sent-6, score-1.166]

5 1 Introduction Spoken dialog systems interact with users via natural language to help them achieve a goal. [sent-8, score-0.612]

6 As the interaction progresses, the dialog manager maintains a representation of the state of the dialog in a process called dialog state tracking. [sent-9, score-2.298]

7 For example, in a bus schedule information system, the dialog state might indicate the user’s desired bus route, origin, and destination. [sent-10, score-1.068]

8 Dialog state tracking is difficult because automatic speech recognition (ASR) and spoken language understanding (SLU) errors are common, and can cause the system to misunderstand the user’s needs. [sent-11, score-0.584]

9 At the same time, state tracking is crucial because the system relies on the estimated dialog state to choose actions for example, which bus schedule – . [sent-12, score-1.458]

10 The dialog state tracking problem can be formalized as follows (Figure 1). [sent-16, score-1.099]

11 Each system turn in the dialog is one datapoint. [sent-17, score-0.673]

12 For each datapoint, the input consists of three items: a set of K features that describes the current dialog context, G dialog state hypotheses, and for each dialog state hypothesis, M features that describe that dialog state hypothesis. [sent-18, score-3.253]

13 The task is to assign a probability distribution over the G dialog state hypotheses, plus a meta-hypothesis which indicates that none of the G hypotheses is correct. [sent-19, score-1.08]

14 Note that G varies across turns (datapoints) for example, in the first turn of Figure 1, G = 3, and in the second and third turns G = 5. [sent-20, score-0.223]

15 It is a requirement that the G hypotheses are disjoint; with the special “everything else” meta-hypothesis, exactly one hypothesis is correct by construction. [sent-22, score-0.316]

16 After the dialog state tracker has output its distribution, this distribution is passed to a separate, downstream process – that chooses what action to take next (e. [sent-23, score-0.974]

17 Dialog state tracking can be seen an analogous to assigning a probability distribution over items on an ASR N-best list given speech input and the recognition output, including the contents of the N-best list. [sent-26, score-0.608]

18 For dialog state tracking, most commercial systems use hand-crafted heuristics, selecting the SLU result with the highest confidence score, and discarding alternatives. [sent-36, score-0.908]

19 In contrast, statistical approaches compute a posterior distribution over many hypotheses for the dialog state. [sent-37, score-0.856]

20 The key insight is that dialog is a temporal process in which correlations between turns can be harnessed to overcome SLU errors. [sent-38, score-0.682]

21 Statistical state tracking has been shown to improve task completion in end-to-end spoken dialog systems (Bohus and Rudnicky (2006); Young et al. [sent-39, score-1.174]

22 Two types of statistical state tracking approaches have been proposed. [sent-41, score-0.487]

23 (2010); Thomson and Young (2010)) use generative models that capture how the SLU results are generated from hidden dialog states. [sent-43, score-0.679]

24 These models can be used to track an arbitrary number of state hypotheses, but cannot easily incorporate large sets of potentially informative features (e. [sent-44, score-0.274]

25 from ASR, SLU, dialog history), resulting in poor probability estimates. [sent-46, score-0.641]

26 In contrast, discriminative approaches use conditional models, trained in a discriminative fashion (Bohus and Rudnicky (2006)) to directly estimate the distribution over a set of state hypotheses based on a large set of informative features. [sent-48, score-0.571]

27 They generally produce more accurate distributions, but in their current form they can only track a handful of state hypotheses. [sent-49, score-0.257]

28 As a result, the correct hypothesis may be discarded: for instance, in Figure 1, a discriminative model might consider only the top 2 SLU results, and thus fail to consider the correct 61C hypothesis at all. [sent-50, score-0.328]

29 The main contribution of this paper is to develop a new discriminative model for dialog state tracking that can operate over an arbitrary number of hypotheses and still compute accurate probability estimates. [sent-51, score-1.379]

30 The two systems differed in acoustic models, confidence scoring model, state tracking method and parameters, number of supported routes (8 vs 40, for DS1 and DS2 respectively), presence of minor bugs, and user population. [sent-61, score-0.637]

31 In both systems, a dialog state hypothesis consists of a value of the user’s goal for a certain slot: for example, a state hypothesis for the origin slot might be “carnegie mellon university”. [sent-63, score-1.395]

32 1 Experimental setup To perform a comparative analysis of various state tracking algorithms, we test them offline, i. [sent-74, score-0.487]

33 , by re-running state tracking against the SLU results from deployment. [sent-76, score-0.487]

34 However, care must be taken: when the improved state-tracker is installed into a dialog system and used to drive action selection, the distribution of the resulting dialog data (which is an input for the state tracker) will change. [sent-77, score-1.512]

35 467 Figure 1: Overview of dialog state tracking. [sent-80, score-0.843]

36 In this example, the dialog state contains the user’s desired bus route. [sent-81, score-0.94]

37 The user’s spoken response is processed to extract a set of spoken language understanding (SLU) results, each with a local confidence score. [sent-83, score-0.194]

38 A set of G dialog state hypotheses is formed by considering all SLU results observed so far, including the current turn and all previous turns. [sent-84, score-1.134]

39 For each state hypothesis, a feature extractor produces a set of M hypothesis-specific features, plus a single set of K general features that describes the current dialog context. [sent-85, score-0.96]

40 The dialog state tracker uses these features to produce a distribution over the G state hypotheses, plus a meta-hypothesis rest which accounts for the possibility that none of the G hypotheses are correct. [sent-86, score-1.399]

41 While the MISMATCH condition may not identically replicate the mismatch observed from deploying a new state tracker online (since online characteristics depend on user behavior) training on DS1 and testing on DS2 at least ensures the presence of some real-world mismatch. [sent-88, score-0.483]

42 Accuracy indicates whether the state hypothesis with the highest assigned probability is correct, where rest is correct iff none of the SLU results prior to the current turn include the user’s goal. [sent-90, score-0.478]

43 High accuracy is important as a dialog system must ultimately commit to a single interpretation of the user’s needs e. [sent-91, score-0.63]

44 2 Hand-crafted baseline state tracker As a baseline, we construct a hand-crafted state tracking rule that follows a strategy common in commercial systems: it returns the SLU result with the maximum confidence score, ignoring all other hypotheses. [sent-98, score-0.877]

45 However, this simple rule can’t make use of SLU results on the N-best list, or statistical priors; these limitations motivate the use of statistical state trackers, introduced next. [sent-103, score-0.231]

46 G˜ G˜, 3 Generative state tracking Generative state tracking approaches leverage models that describe how SLU results are generated from a hidden dialog state, denoted g. [sent-104, score-1.606]

47 Generative approaches model the posterior over all possible dialog state hypotheses, including those not observed in the SLU N-best lists. [sent-107, score-0.898]

48 Another approach is to factor the components of a dialog state, make assumptions about conditional independence between the components, and apply approximate inference techniques such as loopy belief propagation (Thomson and Young (2010)). [sent-111, score-0.612]

49 In deployment, DS 1 and DS2 used the AT&T; Statistical Dialog Toolkit (ASDT) for dialog state tracking (Williams (2010); AT&T; Statistical Dialog Toolkit). [sent-112, score-1.099]

50 Component models were learned from dialog data from a different dialog system. [sent-114, score-1.224]

51 A maximum of G˜ = 20 state hypotheses were tracked for each slot. [sent-115, score-0.436]

52 1 that observations at different turns are independent conditioned on the dialog state in practice, confusions made by speech recognition are highly correlated (Williams (2012)). [sent-117, score-0.954]

53 For all datasets, we re-estimated the models on the train set and re-ran generative tracking with an unlimited number of partitions (i. [sent-118, score-0.358]

54 This can be partly attributed to the difficulty in estimating accurate initial priors b(g) for MISMATCH, where the bus route, origin, and destination slot values in train and test systems differed significantly. [sent-122, score-0.247]

55 We then describe the features used, and finally review existing discriminative approaches for state tracking which serve as a starting point for the new approach we introduce in Section 5. [sent-128, score-0.596]

56 2 Features Discriminative approaches for state tracking rely on informative features to predict the correct dialog state. [sent-136, score-1.172]

57 In this work we designed a set of hypothesis-specific features that convey information about the correctness of a particular state hypothesis, and a set of general features that convey information about the correctness of the rest metahypothesis. [sent-137, score-0.384]

58 Similar statistics were computed for prior probability of an SLU result appearing on an N-best list, and prior probability of SLU result appearance at specific rank positions of an N-best list, prior probability of confusion between pairs of SLU results, and others. [sent-149, score-0.221]

59 General features provide aggregate information about dialog history and SLU results, and are shared across different SLU results of an N-best list. [sent-150, score-0.688]

60 For a given dialog turn with G state hypotheses, there are a total of G ∗ M K distinct features. [sent-155, score-0.904]

61 3 Fixed-length discriminative state tracking In past work, Bohus and Rudnicky (2006) introduced discriminative state tracking, casting the problem as standard multiclass classification. [sent-157, score-0.894]

62 Since in dialog state tracking the number of state hypotheses varies across turns, Bohus and Rudnicky (2006) chose a subset of G˜ state hypotheses to score. [sent-159, score-1.953]

63 s cTlhases pifriocabtlieomn − − 470 over G˜ + 1= G1 + G2 + G3 + 1classes, where the correct class indicates which of these hypotheses (or rest) is correct. [sent-162, score-0.215]

64 However, the total number of feature functions (hence weights to learn) is 1) K), which increases quadratically 1w)i t×h (the number of hypotheses considered Although regularization can help avoid overfitting per se, it becomes a more challenging task with more features. [sent-174, score-0.248]

65 Also, although we know in advance that posteriors for a dialog state hypothesis are most dependent on the features corresponding to that hypothesis, in this approach the features from all hypotheses are pooled together and the model is left to discover these correspondences via learning. [sent-176, score-1.215]

66 G˜ = 6, we found that in 10% of turns, the correct state hypothesis was present but was being discarded by the model, which substantially reduces the upper-bound on tracker performance. [sent-181, score-0.436]

67 In the next section, we introduce a novel discriminative state tracking approach that addresses the above limitations, and enables jointly considering an arbitrary number of state hypotheses, by exploiting the structure inherent in the dialog state tracking problem. [sent-182, score-1.883]

68 5 Dynamic discriminative state tracking The key idea in the proposed approach is to use feature functions that link hypothesis-specific features to their corresponding dialog state hypothesis. [sent-183, score-1.484]

69 This approach makes it straightforward to model relationships such as “higher confidence for an SLU result increases the probability of its corresponding state hypothesis being correct”. [sent-184, score-0.426]

70 weights to learn) from the number of hypotheses considered, allowing an arbitrary number of dialog states hypotheses to be scored. [sent-187, score-1.02]

71 Recall each dialog state hypothesis has M hypothesis-specific features; for each hypothesis, we concatenate these M features with the K general features, which are identical for all hypotheses. [sent-191, score-1.008]

72 471 Figure 3: The DISCDYN model presented in this paper exploits the structure of the state tracking problem. [sent-194, score-0.487]

73 separate estimation Table 2: Description of the various implemented state tracking algorithms The model is based on M + K feature functions. [sent-201, score-0.514]

74 Specifically, for a turn with G hypotheses, we define φi (x, y = g) = xig, where y ranges over the set of possible dialog states G 1 (and as above + i ∈ 1. [sent-203, score-0.693]

75 , cthtieo nnu φmber of dialog state hypotheses to score varies from turn to turn. [sent-213, score-1.111]

76 In practice, this formulation in which general features are duplicated across every dialog state hypothesis may require some additional feature engineering: for every hypothesis g and general feature i, the value of that general feature xig will be multiplied by the same weight λi. [sent-217, score-1.27]

77 Nonetheless, general features do contain useful information for state tracking; to make use of them, we add conjunctions (combinations) of general and hypothesis-specific features. [sent-219, score-0.355]

78 For example, if the 3-way hypothesisspecific indicator feature for rank described above were conjoined with a 4-way general indicator feature for dialog state, the result would be an indicator of dimension 3 4 = 12. [sent-227, score-0.855]

79 This model + + consists of 2 binary classifiers: the first one scores each hypothesis in isolation, using the M hypothesis-specific features for that hypothesis the K general features for that turn, and outputs a (single) probability that the hypothesis is correct. [sent-230, score-0.439]

80 472 Metric Dataset Features features are Accuracy (larger numbers better) MATCH1 MATCH2 MISMATCH b bc bch b bc bch b bc bch denoted as b, ASR/SLU confusion features L2 (smaller numbers better) MATCH1 MATCH2 MISMATCH b bc bch b bc bch b bc bch as c and history features as h. [sent-236, score-1.03]

81 6 Results and discussion The implemented state tracking methods are summarized in Table 2, and our results are presented in Table 3. [sent-238, score-0.487]

82 First, discriminative approaches for state tracking broadly outperform generative methods. [sent-240, score-0.6]

83 This result suggests that discriminative methods have good promise when deployed into real systems, where mismatch between training and test distributions is expected. [sent-243, score-0.228]

84 This shows the benefit of a model which can score every dialog state hypotheses, rather than a fixed subset. [sent-245, score-0.843]

85 For example, some turns have 1 hypothesis and others have 100, but DISCIND training counts all hypotheses equally. [sent-250, score-0.356]

86 We believe this is because the SLU confusion features can be estimated well for slots with small cardinalities (there are 7 possible values for the day), and less well for slots with large cardinalities (there are 24 60 = 1440 possible time values). [sent-263, score-0.215]

87 7 Conclusion and Future Work Dialog state tracking is crucial to the successful operation of spoken dialog systems. [sent-352, score-1.174]

88 Recently developed statistical approaches are promising as they fully utilize the dialog history, and can incorporate priors from past usage data. [sent-353, score-0.652]

89 In this paper, we have introduced a new model for discriminative state tracking. [sent-355, score-0.297]

90 The key idea is to exploit the structure of the problem, in which each dialog state hypothesis has features drawn from the same set. [sent-356, score-0.987]

91 In contrast to past approaches to dis- criminative state tracking which required a number of parameters quadratic in the number of state hypotheses, our approach uses a constant number of parameters, invariant to the number of state hypotheses. [sent-357, score-0.97]

92 We evaluated the proposed method and compared it to existing generative and discriminative approaches on a corpus of real-world humancomputer dialogs chosen to include a mismatch between training and test, as this will be found in deployments. [sent-359, score-0.246]

93 Results show that the proposed model exceeds both the accuracy and probability quality of all baselines when using the richest feature set, which includes information about common ASR confusions and dialog history. [sent-360, score-0.687]

94 The next step is to incorporate this approach into a deployed dialog system, and use the estimated posterior over dialog states as input to the action selection process. [sent-364, score-1.34]

95 Effective handling of dialogue state in the hidden information state pomdp dialogue manager. [sent-392, score-0.58]

96 Incremental partition recombination for efficient tracking of multiple dialogue states. [sent-408, score-0.294]

97 Challenges and opportunities for state tracking in statistical spoken dialog systems: Results from two public deployments. [sent-413, score-1.174]

98 Partially observable markov decision processes for spoken dialog systems. [sent-422, score-0.687]

99 Demonstration of AT&T; Let’s Go: A production-grade statistical spoken dialog system. [sent-426, score-0.687]

100 The hidden informa- tion state model: a practical framework for POMDP-based spoken dialogue management. [sent-429, score-0.364]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('dialog', 0.612), ('slu', 0.506), ('tracking', 0.256), ('state', 0.231), ('hypotheses', 0.185), ('mismatch', 0.115), ('hypothesis', 0.101), ('bus', 0.097), ('williams', 0.092), ('bch', 0.088), ('discdyn', 0.088), ('discind', 0.088), ('spoken', 0.075), ('tracker', 0.074), ('slot', 0.07), ('turns', 0.07), ('discriminative', 0.066), ('confusion', 0.064), ('bohus', 0.063), ('discfixed', 0.063), ('turn', 0.061), ('asr', 0.056), ('young', 0.056), ('route', 0.053), ('genoffline', 0.051), ('genonline', 0.051), ('origin', 0.049), ('generative', 0.047), ('bc', 0.046), ('thomson', 0.046), ('rudnicky', 0.045), ('confidence', 0.044), ('user', 0.044), ('features', 0.043), ('routes', 0.041), ('conjunctions', 0.039), ('dialogue', 0.038), ('hypothesisspecific', 0.038), ('xig', 0.038), ('entropy', 0.038), ('posterior', 0.036), ('action', 0.034), ('mismatched', 0.034), ('history', 0.033), ('offline', 0.031), ('schedule', 0.031), ('correct', 0.03), ('probability', 0.029), ('slots', 0.029), ('rank', 0.028), ('encoding', 0.028), ('feature', 0.027), ('indicator', 0.027), ('current', 0.026), ('deployed', 0.026), ('asdt', 0.025), ('calibrated', 0.025), ('cardinalities', 0.025), ('items', 0.024), ('ds', 0.024), ('jason', 0.024), ('multiclass', 0.023), ('correctness', 0.023), ('destination', 0.023), ('contents', 0.023), ('steve', 0.023), ('distribution', 0.023), ('milica', 0.022), ('pomdp', 0.022), ('zadrozny', 0.022), ('varies', 0.022), ('speech', 0.022), ('ga', 0.021), ('past', 0.021), ('result', 0.021), ('differed', 0.021), ('general', 0.021), ('date', 0.02), ('hidden', 0.02), ('dynamic', 0.02), ('states', 0.02), ('maximum', 0.02), ('partitions', 0.02), ('confusions', 0.019), ('unsurprising', 0.019), ('blaise', 0.019), ('horvitz', 0.019), ('slt', 0.019), ('observed', 0.019), ('priors', 0.019), ('realistic', 0.019), ('weights', 0.018), ('confirmation', 0.018), ('commit', 0.018), ('dialogs', 0.018), ('deployment', 0.018), ('unlimited', 0.018), ('functions', 0.018), ('train', 0.017), ('subscript', 0.017)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999982 124 acl-2013-Discriminative state tracking for spoken dialog systems

Author: Angeliki Metallinou ; Dan Bohus ; Jason Williams

2 0.54109764 230 acl-2013-Lightly Supervised Learning of Procedural Dialog Systems

Author: Svitlana Volkova ; Pallavi Choudhury ; Chris Quirk ; Bill Dolan ; Luke Zettlemoyer

Abstract: Procedural dialog systems can help users achieve a wide range of goals. However, such systems are challenging to build, currently requiring manual engineering of substantial domain-specific task knowledge and dialog management strategies. In this paper, we demonstrate that it is possible to learn procedural dialog systems given only light supervision, of the type that can be provided by non-experts. We consider domains where the required task knowledge exists in textual form (e.g., instructional web pages) and where system builders have access to statements of user intent (e.g., search query logs or dialog interactions). To learn from such textual resources, we describe a novel approach that first automatically extracts task knowledge from instructions, then learns a dialog manager over this task knowledge to provide assistance. Evaluation in a Microsoft Office domain shows that the individual components are highly accurate and can be integrated into a dialog system that provides effective help to users.

3 0.22426118 168 acl-2013-Generating Recommendation Dialogs by Extracting Information from User Reviews

Author: Kevin Reschke ; Adam Vogel ; Dan Jurafsky

Abstract: Recommendation dialog systems help users navigate e-commerce listings by asking questions about users’ preferences toward relevant domain attributes. We present a framework for generating and ranking fine-grained, highly relevant questions from user-generated reviews. We demonstrate ourapproachon anew dataset just released by Yelp, and release a new sentiment lexicon with 1329 adjectives for the restaurant domain.

4 0.069318354 221 acl-2013-Learning Non-linear Features for Machine Translation Using Gradient Boosting Machines

Author: Kristina Toutanova ; Byung-Gyu Ahn

Abstract: In this paper we show how to automatically induce non-linear features for machine translation. The new features are selected to approximately maximize a BLEU-related objective and decompose on the level of local phrases, which guarantees that the asymptotic complexity of machine translation decoding does not increase. We achieve this by applying gradient boosting machines (Friedman, 2000) to learn new weak learners (features) in the form of regression trees, using a differentiable loss function related to BLEU. Our results indicate that small gains in perfor- mance can be achieved using this method but we do not see the dramatic gains observed using feature induction for other important machine learning tasks.

5 0.061365895 325 acl-2013-Smoothed marginal distribution constraints for language modeling

Author: Brian Roark ; Cyril Allauzen ; Michael Riley

Abstract: We present an algorithm for re-estimating parameters of backoff n-gram language models so as to preserve given marginal distributions, along the lines of wellknown Kneser-Ney (1995) smoothing. Unlike Kneser-Ney, our approach is designed to be applied to any given smoothed backoff model, including models that have already been heavily pruned. As a result, the algorithm avoids issues observed when pruning Kneser-Ney models (Siivola et al., 2007; Chelba et al., 2010), while retaining the benefits of such marginal distribution constraints. We present experimental results for heavily pruned backoff ngram models, and demonstrate perplexity and word error rate reductions when used with various baseline smoothing methods. An open-source version of the algorithm has been released as part of the OpenGrm ngram library.1

6 0.059778526 133 acl-2013-Efficient Implementation of Beam-Search Incremental Parsers

7 0.058385834 283 acl-2013-Probabilistic Domain Modelling With Contextualized Distributional Semantic Vectors

8 0.056543984 155 acl-2013-Fast and Accurate Shift-Reduce Constituent Parsing

9 0.050706074 315 acl-2013-Semi-Supervised Semantic Tagging of Conversational Understanding using Markov Topic Regression

10 0.050376918 111 acl-2013-Density Maximization in Context-Sense Metric Space for All-words WSD

11 0.05007612 90 acl-2013-Conditional Random Fields for Responsive Surface Realisation using Global Features

12 0.048916847 176 acl-2013-Grounded Unsupervised Semantic Parsing

13 0.047864914 141 acl-2013-Evaluating a City Exploration Dialogue System with Integrated Question-Answering and Pedestrian Navigation

14 0.047603525 175 acl-2013-Grounded Language Learning from Video Described with Sentences

15 0.046803258 19 acl-2013-A Shift-Reduce Parsing Algorithm for Phrase-based String-to-Dependency Translation

16 0.044716191 321 acl-2013-Sign Language Lexical Recognition With Propositional Dynamic Logic

17 0.043533914 197 acl-2013-Incremental Topic-Based Translation Model Adaptation for Conversational Spoken Language Translation

18 0.042166963 190 acl-2013-Implicatures and Nested Beliefs in Approximate Decentralized-POMDPs

19 0.041667122 127 acl-2013-Docent: A Document-Level Decoder for Phrase-Based Statistical Machine Translation

20 0.040171232 276 acl-2013-Part-of-Speech Induction in Dependency Trees for Statistical Machine Translation

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.12), (1, 0.016), (2, -0.017), (3, -0.014), (4, -0.013), (5, -0.006), (6, 0.114), (7, -0.078), (8, -0.008), (9, 0.044), (10, -0.028), (11, 0.046), (12, -0.035), (13, -0.03), (14, 0.058), (15, -0.076), (16, -0.097), (17, 0.08), (18, 0.057), (19, -0.119), (20, -0.157), (21, -0.076), (22, 0.167), (23, 0.091), (24, 0.26), (25, -0.158), (26, 0.102), (27, 0.391), (28, -0.052), (29, -0.174), (30, -0.077), (31, 0.085), (32, 0.212), (33, 0.029), (34, -0.163), (35, -0.007), (36, 0.017), (37, 0.023), (38, 0.133), (39, -0.083), (40, -0.005), (41, 0.007), (42, -0.057), (43, -0.028), (44, -0.045), (45, -0.079), (46, -0.075), (47, 0.052), (48, 0.042), (49, 0.057)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94755065 124 acl-2013-Discriminative state tracking for spoken dialog systems

Author: Angeliki Metallinou ; Dan Bohus ; Jason Williams

2 0.91861176 230 acl-2013-Lightly Supervised Learning of Procedural Dialog Systems

Author: Svitlana Volkova ; Pallavi Choudhury ; Chris Quirk ; Bill Dolan ; Luke Zettlemoyer

3 0.68178201 168 acl-2013-Generating Recommendation Dialogs by Extracting Information from User Reviews

Author: Kevin Reschke ; Adam Vogel ; Dan Jurafsky

4 0.4408021 141 acl-2013-Evaluating a City Exploration Dialogue System with Integrated Question-Answering and Pedestrian Navigation

Author: Srinivasan Janarthanam ; Oliver Lemon ; Phil Bartie ; Tiphaine Dalmas ; Anna Dickinson ; Xingkun Liu ; William Mackaness ; Bonnie Webber

Abstract: We present a city navigation and tourist information mobile dialogue app with integrated question-answering (QA) and geographic information system (GIS) modules that helps pedestrian users to navigate in and learn about urban environments. In contrast to existing mobile apps which treat these problems independently, our Android app addresses the problem of navigation and touristic questionanswering in an integrated fashion using a shared dialogue context. We evaluated our system in comparison with Samsung S-Voice (which interfaces to Google navigation and Google search) with 17 users and found that users judged our system to be significantly more interesting to interact with and learn from. They also rated our system above Google search (with the Samsung S-Voice interface) for tourist information tasks.

5 0.3532722 90 acl-2013-Conditional Random Fields for Responsive Surface Realisation using Global Features

Author: Nina Dethlefs ; Helen Hastie ; Heriberto Cuayahuitl ; Oliver Lemon

Abstract: Surface realisers in spoken dialogue systems need to be more responsive than conventional surface realisers. They need to be sensitive to the utterance context as well as robust to partial or changing generator inputs. We formulate surface realisation as a sequence labelling task and combine the use of conditional random fields (CRFs) with semantic trees. Due to their extended notion of context, CRFs are able to take the global utterance context into account and are less constrained by local features than other realisers. This leads to more natural and less repetitive surface realisation. It also allows generation from partial and modified inputs and is therefore applicable to incremental surface realisation. Results from a human rating study confirm that users are sensitive to this extended notion of context and assign ratings that are significantly higher (up to 14%) than those for taking only local context into account.

6 0.33902124 176 acl-2013-Grounded Unsupervised Semantic Parsing

7 0.28985107 373 acl-2013-Using Conceptual Class Attributes to Characterize Social Media Users

8 0.27028739 325 acl-2013-Smoothed marginal distribution constraints for language modeling

9 0.26337266 36 acl-2013-Adapting Discriminative Reranking to Grounded Language Learning

10 0.25170472 100 acl-2013-Crowdsourcing Interaction Logs to Understand Text Reuse from the Web

11 0.24135245 266 acl-2013-PAL: A Chatterbot System for Answering Domain-specific Questions

12 0.22944848 190 acl-2013-Implicatures and Nested Beliefs in Approximate Decentralized-POMDPs

13 0.2247801 321 acl-2013-Sign Language Lexical Recognition With Propositional Dynamic Logic

14 0.22329578 133 acl-2013-Efficient Implementation of Beam-Search Incremental Parsers

15 0.21993139 183 acl-2013-ICARUS - An Extensible Graphical Search Tool for Dependency Treebanks

16 0.2136713 313 acl-2013-Semantic Parsing with Combinatory Categorial Grammars

17 0.20965509 315 acl-2013-Semi-Supervised Semantic Tagging of Conversational Understanding using Markov Topic Regression

18 0.19644997 221 acl-2013-Learning Non-linear Features for Machine Translation Using Gradient Boosting Machines

19 0.19632791 268 acl-2013-PATHS: A System for Accessing Cultural Heritage Collections

20 0.18285026 127 acl-2013-Docent: A Document-Level Decoder for Phrase-Based Statistical Machine Translation

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.051), (6, 0.053), (11, 0.059), (24, 0.039), (26, 0.051), (28, 0.304), (35, 0.076), (42, 0.037), (48, 0.028), (70, 0.053), (88, 0.036), (90, 0.042), (95, 0.067)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.95240802 349 acl-2013-The mathematics of language learning

Author: Andras Kornai ; Gerald Penn ; James Rogers ; Anssi Yli-Jyra

Abstract: unkown-abstract

same-paper 2 0.79753649 124 acl-2013-Discriminative state tracking for spoken dialog systems

Author: Angeliki Metallinou ; Dan Bohus ; Jason Williams

3 0.76783431 107 acl-2013-Deceptive Answer Prediction with User Preference Graph

Author: Fangtao Li ; Yang Gao ; Shuchang Zhou ; Xiance Si ; Decheng Dai

Abstract: In Community question answering (QA) sites, malicious users may provide deceptive answers to promote their products or services. It is important to identify and filter out these deceptive answers. In this paper, we first solve this problem with the traditional supervised learning methods. Two kinds of features, including textual and contextual features, are investigated for this task. We further propose to exploit the user relationships to identify the deceptive answers, based on the hypothesis that similar users will have similar behaviors to post deceptive or authentic answers. To measure the user similarity, we propose a new user preference graph based on the answer preference expressed by users, such as “helpful” voting and “best answer” selection. The user preference graph is incorporated into traditional supervised learning framework with the graph regularization technique. The experiment results demonstrate that the user preference graph can indeed help improve the performance of deceptive answer prediction.

4 0.76185703 363 acl-2013-Two-Neighbor Orientation Model with Cross-Boundary Global Contexts

Author: Hendra Setiawan ; Bowen Zhou ; Bing Xiang ; Libin Shen

Abstract: Long distance reordering remains one of the greatest challenges in statistical machine translation research as the key contextual information may well be beyond the confine of translation units. In this paper, we propose Two-Neighbor Orientation (TNO) model that jointly models the orientation decisions between anchors and two neighboring multi-unit chunks which may cross phrase or rule boundaries. We explicitly model the longest span of such chunks, referred to as Maximal Orientation Span, to serve as a global parameter that constrains underlying local decisions. We integrate our proposed model into a state-of-the-art string-to-dependency translation system and demonstrate the efficacy of our proposal in a large-scale Chinese-to-English translation task. On NIST MT08 set, our most advanced model brings around +2.0 BLEU and -1.0 TER improvement.

5 0.74455905 96 acl-2013-Creating Similarity: Lateral Thinking for Vertical Similarity Judgments

Author: Tony Veale ; Guofu Li

Abstract: Just as observing is more than just seeing, comparing is far more than mere matching. It takes understanding, and even inventiveness, to discern a useful basis for judging two ideas as similar in a particular context, especially when our perspective is shaped by an act of linguistic creativity such as metaphor, simile or analogy. Structured resources such as WordNet offer a convenient hierarchical means for converging on a common ground for comparison, but offer little support for the divergent thinking that is needed to creatively view one concept as another. We describe such a means here, by showing how the web can be used to harvest many divergent views for many familiar ideas. These lateral views complement the vertical views of WordNet, and support a system for idea exploration called Thesaurus Rex. We show also how Thesaurus Rex supports a novel, generative similarity measure for WordNet. 1 Seeing is Believing (and Creating) Similarity is a cognitive phenomenon that is both complex and subjective, yet for practical reasons it is often modeled as if it were simple and objective. This makes sense for the many situations where we want to align our similarity judgments with those of others, and thus focus on the same conventional properties that others are also likely to focus upon. This reliance on the consensus viewpoint explains why WordNet (Fellbaum, 1998) has proven so useful as a basis for computational measures of lexico-semantic similarity Guofu Li School of Computer Science and Informatics, University College Dublin, Belfield, Dublin D2, Ireland. l .guo fu . l gmai l i @ .com (e.g. see Pederson et al. 2004, Budanitsky & Hirst, 2006; Seco et al. 2006). These measures reduce the similarity of two lexical concepts to a single number, by viewing similarity as an objective estimate of the overlap in their salient qualities. This convenient perspective is poorly suited to creative or insightful comparisons, but it is sufficient for the many mundane comparisons we often perform in daily life, such as when we organize books or look for items in a supermarket. So if we do not know in which aisle to locate a given item (such as oatmeal), we may tacitly know how to locate a similar product (such as cornflakes) and orient ourselves accordingly. Yet there are occasions when the recognition of similarities spurs the creation of similarities, when the act of comparison spurs us to invent new ways of looking at an idea. By placing pop tarts in the breakfast aisle, food manufacturers encourage us to view them as a breakfast food that is not dissimilar to oatmeal or cornflakes. When ex-PM Tony Blair published his memoirs, a mischievous activist encouraged others to move his book from Biography to Fiction in bookshops, in the hope that buyers would see it in a new light. Whenever we use a novel metaphor to convey a non-obvious viewpoint on a topic, such as “cigarettes are time bombs”, the comparison may spur us to insight, to see aspects of the topic that make it more similar to the vehicle (see Ortony, 1979; Veale & Hao, 2007). In formal terms, assume agent A has an insight about concept X, and uses the metaphor X is a Y to also provoke this insight in agent B. To arrive at this insight for itself, B must intuit what X and Y have in common. But this commonality is surely more than a standard categorization of X, or else it would not count as an insight about X. To understand the metaphor, B must place X 660 Proce dingSsof oifa, th Beu 5l1gsarti Aan,An u aglu Mste 4e-ti9n2g 0 o1f3 t.he ?c A2s0s1o3ci Aatsiosonc fioartio Cno fmorpu Ctoamtiopnuatalt Lioin gauli Lsitnicgsu,i psatgices 6 0–670, in a new category, so that X can be seen as more similar to Y. Metaphors shape the way we per- ceive the world by re-shaping the way we make similarity judgments. So if we want to imbue computers with the ability to make and to understand creative metaphors, we must first give them the ability to look beyond the narrow viewpoints of conventional resources. Any measure that models similarity as an objective function of a conventional worldview employs a convergent thought process. Using WordNet, for instance, a similarity measure can vertically converge on a common superordinate category of both inputs, and generate a single numeric result based on their distance to, and the information content of, this common generalization. So to find the most conventional ways of seeing a lexical concept, one simply ascends a narrowing concept hierarchy, using a process de Bono (1970) calls vertical thinking. To find novel, non-obvious and useful ways of looking at a lexical concept, one must use what Guilford (1967) calls divergent thinking and what de Bono calls lateral thinking. These processes cut across familiar category boundaries, to simultaneously place a concept in many different categories so that we can see it in many different ways. de Bono argues that vertical thinking is selective while lateral thinking is generative. Whereas vertical thinking concerns itself with the “right” way or a single “best” way of looking at things, lateral thinking focuses on producing alternatives to the status quo. To be as useful for creative tasks as they are for conventional tasks, we need to re-imagine our computational similarity measures as generative rather than selective, expansive rather than reductive, divergent as well as convergent and lateral as well as vertical. Though WordNet is ideally structured to support vertical, convergent reasoning, its comprehensive nature means it can also be used as a solid foundation for building a more lateral and divergent model of similarity. Here we will use the web as a source of diverse perspectives on familiar ideas, to complement the conventional and often narrow views codified by WordNet. Section 2 provides a brief overview of past work in the area of similarity measurement, before section 3 describes a simple bootstrapping loop for acquiring richly diverse perspectives from the web for a wide variety of familiar ideas. These perspectives are used to enhance a Word- Net-based measure of lexico-semantic similarity in section 4, by broadening the range of informative viewpoints the measure can select from. Similarity is thus modeled as a process that is both generative and selective. This lateral-andvertical approach is evaluated in section 5, on the Miller & Charles (1991) data-set. A web app for the lateral exploration of diverse viewpoints, named Thesaurus Rex, is also presented, before closing remarks are offered in section 6. 2 Related Work and Ideas WordNet’s taxonomic organization of nounsenses and verb-senses – in which very general categories are successively divided into increasingly informative sub-categories or instancelevel ideas – allows us to gauge the overlap in information content, and thus of meaning, of two lexical concepts. We need only identify the deepest point in the taxonomy at which this content starts to diverge. This point of divergence is often called the LCS, or least common subsumer, of two concepts (Pederson et al., 2004). Since sub-categories add new properties to those they inherit from their parents – Aristotle called these properties the differentia that stop a category system from trivially collapsing into itself – the depth of a lexical concept in a taxonomy is an intuitive proxy for its information content. Wu & Palmer (1994) use the depth of a lexical concept in the WordNet hierarchy as such a proxy, and thereby estimate the similarity of two lexical concepts as twice the depth of their LCS divided by the sum of their individual depths. Leacock and Chodorow (1998) instead use the length of the shortest path between two concepts as a proxy for the conceptual distance between them. To connect any two ideas in a hierarchical system, one must vertically ascend the hierarchy from one concept, change direction at a potential LCS, and then descend the hierarchy to reach the second concept. (Aristotle was also first to suggest this approach in his Poetics). Leacock and Chodorow normalize the length of this path by dividing its size (in nodes) by twice the depth of the deepest concept in the hierarchy; the latter is an upper bound on the distance between any two concepts in the hierarchy. Negating the log of this normalized length yields a corresponding similarity score. While the role of an LCS is merely implied in Leacock and Chodorow’s use of a shortest path, the LCS is pivotal nonetheless, and like that of Wu & Palmer, the approach uses an essentially vertical reasoning process to identify a single “best” generalization. Depth is a convenient proxy for information content, but more nuanced proxies can yield 661 more rounded similarity measures. Resnick (1995) draws on information theory to define the information content of a lexical concept as the negative log likelihood of its occurrence in a corpus, either explicitly (via a direct mention) or by presupposition (via a mention of any of its sub-categories or instances). Since the likelihood of a general category occurring in a corpus is higher than that of any of its sub-categories or instances, such categories are more predictable, and less informative, than rarer categories whose occurrences are less predictable and thus more informative. The negative log likelihood of the most informative LCS of two lexical concepts offers a reliable estimate of the amount of infor- mation shared by those concepts, and thus a good estimate of their similarity. Lin (1998) combines the intuitions behind Resnick’s metric and that of Wu and Palmer to estimate the similarity of two lexical concepts as an information ratio: twice the information content of their LCS divided by the sum of their individual information contents. Jiang and Conrath (1997) consider the converse notion of dissimilarity, noting that two lexical concepts are dissimilar to the extent that each contains information that is not shared by the other. So if the information content of their most informative LCS is a good measure of what they do share, then the sum of their individual information contents, minus twice the content of their most informative LCS, is a reliable estimate of their dissimilarity. Seco et al. (2006) presents a minor innovation, showing how Resnick’s notion of information content can be calculated without the use of an external corpus. Rather, when using Resnick’s metric (or that of Lin, or Jiang and Conrath) for measuring the similarity of lexical concepts in WordNet, one can use the category structure of WordNet itself to estimate infor- mation content. Typically, the more general a concept, the more descendants it will possess. Seco et al. thus estimate the information content of a lexical concept as the log of the sum of all its unique descendants (both direct and indirect), divided by the log of the total number of concepts in the entire hierarchy. Not only is this intrinsic view of information content convenient to use, without recourse to an external corpus, Seco et al. show that it offers a better estimate of information content than its extrinsic, corpus-based alternatives, as measured relative to average human similarity ratings for the 30 word-pairs in the Miller & Charles (1991) test set. A similarity measure can draw on other sources of information besides WordNet’s category structures. One might eke out additional information from WordNet’s textual glosses, as in Lesk (1986), or use category structures other than those offered by WordNet. Looking beyond WordNet, entries in the online encyclopedia Wikipedia are not only connected by a dense topology of lateral links, they are also organized by a rich hierarchy of overlapping categories. Strube and Ponzetto (2006) show how Wikipedia can support a measure of similarity (and relatedness) that better approximates human judgments than many WordNet-based measures. Nonetheless, WordNet can be a valuable component of a hybrid measure, and Agirre et al. (2009) use an SVM (support vector machine) to combine information from WordNet with information harvested from the web. Their best similarity measure achieves a remarkable 0.93 correlation with human judgments on the Miller & Charles word-pair set. Similarity is not always applied to pairs of concepts; it is sometimes analogically applied to pairs of pairs of concepts, as in proportional analogies of the form A is to B as C is to D (e.g., hacks are to writers as mercenaries are to soldiers, or chisels are to sculptors as scalpels are to surgeons). In such analogies, one is really assessing the similarity of the unstated relationship between each pair of concepts: thus, mercenaries are soldiers whose allegiance is paid for, much as hacks are writers with income-driven loyalties; sculptors use chisels to carve stone, while surgeons use scalpels to cut or carve flesh. Veale (2004) used WordNet to assess the similarity of A:B to C:D as a function of the combined similarity of A to C and of B to D. In contrast, Turney (2005) used the web to pursue a more divergent course, to represent the tacit relationships of A to B and of C to D as points in a highdimensional space. The dimensions of this space initially correspond to linking phrases on the web, before these dimensions are significantly reduced using singular value decomposition. In the infamous SAT test, an analogy A:B::C:D has four other pairs of concepts that serve as likely distractors (e.g. singer:songwriter for hack:writer) and the goal is to choose the most appropriate C:D pair for a given A:B pairing. Using variants of Wu and Palmer (1994) on the 374 SAT analogies of Turney (2005), Veale (2004) reports a success rate of 38–44% using only WordNet-based similarity. In contrast, Turney (2005) reports up to 55% success on the same analogies, partly because his approach aims 662 to match implicit relations rather than explicit concepts, and in part because it uses a divergent process to gather from the web as rich a perspec- tive as it can on these latent relationships. 2.1 Clever Comparisons Create Similarity Each of these approaches to similarity is a user of information, rather than a creator, and each fails to capture how a creative comparison (such as a metaphor) can spur a listener to view a topic from an atypical perspective. Camac & Glucksberg (1984) provide experimental evidence for the claim that “metaphors do not use preexisting associations to achieve their effects [… ] people use metaphors to create new relations between concepts.” They also offer a salutary reminder of an often overlooked fact: every comparison exploits information, but each is also a source of new information in its own right. Thus, “this cola is acid” reveals a different perspective on cola (e.g. as a corrosive substance or an irritating food) than “this acid is cola” highlights for acid (such as e.g., a familiar substance) Veale & Keane (1994) model the role of similarity in realizing the long-term perlocutionary effect of an informative comparison. For example, to compare surgeons to butchers is to encourage one to see all surgeons as more bloody, … crude or careless. The reverse comparison, of butchers to surgeons, encourages one to see butchers as more skilled and precise. Veale & Keane present a network model of memory, called Sapper, in which activation can spread between related concepts, thus allowing one concept to prime the properties of a neighbor. To interpret an analogy, Sapper lays down new activation-carrying bridges in memory between analogical counterparts, such as between surgeon & butcher, flesh & meat, and scalpel & cleaver. Comparisons can thus have lasting effects on how Sapper sees the world, changing the pattern of activation that arises when it primes a concept. Veale (2003) adopts a similarly dynamic view of similarity in WordNet, showing how an analogical comparison can result in the automatic addition of new categories and relations to WordNet itself. Veale considers the problem of finding an analogical mapping between different parts of WordNet’s noun-sense hierarchy, such as between instances of Greek god and Norse god, or between the letters of different alphabets, such as of Greek and Hebrew. But no structural similarity measure for WordNet exhibits enough discernment to e.g. assign a higher similarity to Zeus & Odin (each is the supreme deity of its pantheon) than to a pairing of Zeus and any other Norse god, just as no structural measure will assign a higher similarity to Alpha & Aleph or to Beta & Beth than to any random letter pairing. A fine-grained category hierarchy permits fine-grained similarity judgments, and though WordNet is useful, its sense hierarchies are not especially fine-grained. However, we can automatically make WordNet subtler and more discerning, by adding new fine-grained categories to unite lexical concepts whose similarity is not reflected by any existing categories. Veale (2003) shows how a property that is found in the glosses of two lexical concepts, of the same depth, can be combined with their LCS to yield a new fine-grained parent category, so e.g. “supreme” + deity = Supreme-deity (for Odin, Zeus, Jupiter, etc.) and “1 st” + letter = 1st-letter (for Alpha, Aleph, etc.) Selected aspects of the textual similarity of two WordNet glosses – the key to similarity in Lesk (1986) – can thus be reified into an explicitly categorical WordNet form. 3 Divergent (Re)Categorization To tap into a richer source of concept properties than WordNet’s glosses, we can use web ngrams. Consider these descriptions of a cowboy from the Google n-grams (Brants & Franz, 2006). The numbers to the right are Google frequency counts. a lonesome cowboy 432 a mounted cowboy 122 a grizzled cowboy 74 a swaggering cowboy 68 To find the stable properties that can underpin a meaningful fine-grained category for cowboy, we must seek out the properties that are so often presupposed to be salient of all cowboys that one can use them to anchor a simile, such as

6 0.72189796 112 acl-2013-Dependency Parser Adaptation with Subtrees from Auto-Parsed Target Domain Data

7 0.61731678 27 acl-2013-A Two Level Model for Context Sensitive Inference Rules

8 0.52164352 155 acl-2013-Fast and Accurate Shift-Reduce Constituent Parsing

9 0.51582515 250 acl-2013-Models of Translation Competitions

10 0.50714999 343 acl-2013-The Effect of Higher-Order Dependency Features in Discriminative Phrase-Structure Parsing

11 0.50497133 168 acl-2013-Generating Recommendation Dialogs by Extracting Information from User Reviews

12 0.50307041 144 acl-2013-Explicit and Implicit Syntactic Features for Text Classification

13 0.50051945 169 acl-2013-Generating Synthetic Comparable Questions for News Articles

14 0.49943534 133 acl-2013-Efficient Implementation of Beam-Search Incremental Parsers

15 0.49873161 16 acl-2013-A Novel Translation Framework Based on Rhetorical Structure Theory

16 0.49818301 17 acl-2013-A Random Walk Approach to Selectional Preferences Based on Preference Ranking and Propagation

17 0.49627057 185 acl-2013-Identifying Bad Semantic Neighbors for Improving Distributional Thesauri

18 0.49621546 267 acl-2013-PARMA: A Predicate Argument Aligner

19 0.49620342 329 acl-2013-Statistical Machine Translation Improves Question Retrieval in Community Question Answering via Matrix Factorization

20 0.4958759 280 acl-2013-Plurality, Negation, and Quantification:Towards Comprehensive Quantifier Scope Disambiguation