emnlp emnlp2010 emnlp2010-26 knowledge-graph by maker-knowledge-mining

26 emnlp-2010-Classifying Dialogue Acts in One-on-One Live Chats


Source: pdf

Author: Su Nam Kim ; Lawrence Cavedon ; Timothy Baldwin

Abstract: We explore the task of automatically classifying dialogue acts in 1-on-1 online chat forums, an increasingly popular means of providing customer service. In particular, we investigate the effectiveness of various features and machine learners for this task. While a simple bag-of-words approach provides a solid baseline, we find that adding information from dialogue structure and inter-utterance dependency provides some increase in performance; learners that account for sequential dependencies (CRFs) show the best performance. We report our results from testing using a corpus of chat dialogues derived from online shopping customer-feedback data.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 net Abstract We explore the task of automatically classifying dialogue acts in 1-on-1 online chat forums, an increasingly popular means of providing customer service. [sent-4, score-1.2]

2 While a simple bag-of-words approach provides a solid baseline, we find that adding information from dialogue structure and inter-utterance dependency provides some increase in performance; learners that account for sequential dependencies (CRFs) show the best performance. [sent-6, score-0.795]

3 We report our results from testing using a corpus of chat dialogues derived from online shopping customer-feedback data. [sent-7, score-0.38]

4 1 Introduction Recently, live chats have received attention due to the growing popularity of chat services and the increasing body of applications. [sent-8, score-0.551]

5 For example, large organizations are increasingly providing support or information services through live chat. [sent-9, score-0.223]

6 One advantage of chat-based customer service over conventional telephone-based customer service is that it becomes possible to semi-automate aspects of the interaction (e. [sent-10, score-0.236]

7 conventional openings or canned responses to standard questions) without the customer being aware of it taking place, something that is not possible with speech-based dialogue systems (as synthesised speech is still easily distinguishable from natural speech). [sent-12, score-0.801]

8 Potentially huge savings can be made by organisations providing customer help services if we can increase the degree of automation of live chat. [sent-13, score-0.313]

9 Given the increasing impact of live chat services, there is surprisingly little published computational 862 linguistic research on the topic. [sent-14, score-0.327]

10 There has been substantially more work done on dialogue and dialogue corpora, mostly in spoken dialogue (e. [sent-15, score-2.121]

11 (2000)) but also multimodal dialogue systems in application areas such as telephone support service (Bangalore et al. [sent-18, score-0.736]

12 Spoken dialogue analysis introduces many complications related to the error inherent in current speech recognition technologies. [sent-20, score-0.711]

13 As an instance of written dialogue, an advantage of live chats is that recognition errors are not such an issue, although the nature of language used in chat is typically ill-formed and turn-taking is complicated by the semi-asynchronous nature of the interaction (e. [sent-21, score-0.516]

14 In this paper, we investigate the task of automatic classification of dialogue acts in 1-on-1 live chats, focusing on “information delivery” chats since these are proving increasingly popular as part of enterprise customer-service solutions. [sent-24, score-1.285]

15 Our main challenge is to develop effective features and classifiers for classifying aspects of 1-on-1 live chat. [sent-25, score-0.239]

16 Much of the work on analysing dialogue acts in spoken dialogues has relied on non-lexical features, such as prosody and acoustic features (Stolcke et al. [sent-26, score-1.084]

17 Previous dialogue-act detection for chat systems has used bags-of-words (hereafter, BoW) as features for dialogue-act detection; this simple approach has shown some promise (e. [sent-29, score-0.139]

18 , 2005) have also been used for dialogue act classification. [sent-35, score-0.881]

19 tc ho2d0s10 in A Nsastoucira tlio Lnan fogru Cagoem Ppruotcaetisosninagl, L pinag eusis 8t6ic2s–871, In this paper, we first re-examine BoW features for dialogue act classification. [sent-38, score-0.881]

20 As a baseline, we use the work of Ivanovic (2008), which explored 1grams and 2-grams with Boolean values in 1-on-1 live chats in the MSN Online Shopping domain (this dataset is described in Section 5). [sent-39, score-0.377]

21 We extend this work by using ideas from related research such as text categorization (Debole and Sebastiani, 2003), and explore variants of BoW based on analysis of live chats, along with feature weighting. [sent-41, score-0.188]

22 Finally, our main aim is to explore new features based on dialogue structure and dependencies between utterances1 that can enhance the use of BoW for dialogue act classification. [sent-42, score-1.591]

23 Our hypothesis is that, for task-oriented 1-on-1 live chats, the structure and interactions among utterances are useful in predicting future dialogue acts: for example, conversations typically start with a greeting, and questions and answers typically appear as adjacency pairs in a conversation. [sent-43, score-1.111]

24 Therefore, we propose new features based on structural and dependency information derived from utterances (Sections 4. [sent-44, score-0.283]

25 2 Related Work While there has been significant work on classifying dialogue acts, the bulk of this has been for spoken dialogue. [sent-47, score-0.8]

26 Most such work has considered: (1) defining taxonomies of dialogue acts; (2) discovering useful features for the classification task; and (3) experimenting with different machine learning techniques. [sent-48, score-0.717]

27 For classifying dialogue acts in spoken dialogue, various features such as dialogue cues, speech characteristics, and n-grams have been proposed. [sent-50, score-1.702]

28 (1998) utilized the characteristics of spoken dialogues and examined speaker direction, punctuation marks, cue phrases and ngrams for classifying spoken dialogues. [sent-52, score-0.294]

29 (1998) used prosodic, lexical and syntactic features for spoken dialogue classification. [sent-54, score-0.749]

30 More recently, Julia and Iftekharuddin (2008) and Sridhar et 1An utterance is the smallest unit to deliver message(s) in a turn. [sent-55, score-0.211]

31 (2006) used n-grams from the previous 1–3 utterances in order to classify dialogue acts for the target utterance. [sent-61, score-1.073]

32 There has been substantially less effort on classifying dialogue acts in written dialogue: Wu et al. [sent-62, score-0.928]

33 (2002) and Forsyth (2007) have used keyword-based approaches for classifying online chats; Ivanovic (2008) tested the use of n-gram features for 1-on-1 live chats with MSN Online Shopping assistants. [sent-63, score-0.471]

34 Various machine learning techniques have been investigated for the dialogue classification task. [sent-64, score-0.717]

35 (Bui, 2003)) have also all been applied to automatic dialogue act classification. [sent-82, score-0.881]

36 3 Dialogue Acts A number ofdialogue act taxonomies have been proposed, designed mainly for spoken dialogue. [sent-83, score-0.258]

37 , 1992) defines 42 types of dialogue acts from human-to-human telephone conversations. [sent-87, score-0.899]

38 , 1991) defines a set of 128 dialogue acts to model task-based spoken conversations. [sent-89, score-0.94]

39 (2002) define 15 dialogue act tags based on previouslydefined dialogue act sets (Samuel et al. [sent-91, score-1.762]

40 Forsyth (2007) defines 15 dialogue acts for casual online conversations, based on 16 conversations with 10,567 utterances. [sent-96, score-0.961]

41 Ivanovic (2008) proposes 12 dialogue acts based on DAMSL for 1-on-1 online customer service chats. [sent-97, score-1.038]

42 Ivanovic’s set of dialogue acts for chat dialogues has significant overlap with the dialogue act sets of Wu et al. [sent-98, score-2.014]

43 In our work, we re-use the set of dialogue acts proposed in Ivanovic (2008), due to our targeting the same task of 1-on-1 IM chats, and indeed experimenting over the same dataset. [sent-102, score-0.877]

44 The definitions of the dialogue acts are provided in Table 1, along with examples. [sent-103, score-0.877]

45 1 Bag-of-Words n-gram-based BoW features are simple yet effective for identifying similarities between two utterances, and have been used widely in previous work on dialogue act classification for online chat dialogues (Louwerse and Crossley, 2006; Ivanovic, 2008). [sent-106, score-1.211]

46 However, chats containing large amounts of noise such as typos and emoticons pose a greater challenge for simple BoW approaches. [sent-107, score-0.189]

47 In this work, we chose to start with a BoW approach based on our observation that commercial live chat services contain relatively less noise; in particular, the commercial agent tends to use well-formed, formulaic prose. [sent-109, score-0.426]

48 Previously, Ivanovic (2008) explored Boolean 1864 gram and 2-gram features to classify MSN Online Shopping live chats, where a user requests assistance in purchasing an item, in response to which the commercial agent asks the customer questions and makes suggestions. [sent-110, score-0.348]

49 While 1-grams performed well (as live chat utterances are generally shorter than, e. [sent-112, score-0.523]

50 2 Structural Information Our motivation for using structural information as a feature is that the location of an utterance can be a strong predictor of the dialogue act. [sent-124, score-0.945]

51 Based on the nature oflive chats, we observed that the utterance position in the chat, as well as in a turn, plays an important role when identifying its dialogue act. [sent-132, score-0.946]

52 For example, an utterance such as Hello will occur at the beginning of a chat while an utterance such as Have a nice day will typically appear at the end. [sent-133, score-0.586]

53 The position of utterances in a turn can also help identify the dialogue act; i. [sent-134, score-0.968]

54 when there are several utterances in a turn, utterances are related to each other, and thus examining the previous utterances in the same turn can help correctly predict the target utterance. [sent-136, score-0.625]

55 You are welcome, my pleasure EXPRESSIVE: An acknowledgement of a previous utterance or an indication of the speaker’s mood. [sent-146, score-0.234]

56 Used to confirm that the previous utterance was received/accepted. [sent-163, score-0.211]

57 – Table 1: The set of dialogue acts used in this research, taken from Ivanovic (2008) the same turn. [sent-178, score-0.877]

58 We also noticed that identifying the utterance author can help classify the dialogue act (previously used in Ivanovic (2008)). [sent-179, score-1.171]

59 Based on these observations, we tested the follow- ing four structural features: • Author information, • Relative position in the chat, • Author + Relative position, • Author + Turn-relative position among utterances irn a given rteulranti. [sent-180, score-0.342]

60 We illustrate our structural features in Table 2, which shows an example of a 1-on-1 live chat. [sent-181, score-0.236]

61 The participants are the agent (A) and customer (C); Uxx indicates an utterance (U) with ID number xx. [sent-182, score-0.342]

62 The relative position is calculated by dividing the utterance number by the total number of utterances in the dialogue; the turn-relative position is calculated by dividing the utterance position by the number of utterances in that turn. [sent-184, score-0.961]

63 For example, for utterance 4 (U4), the relativepositionis 442, whileitsturn-relativeposition is 32 since U4 is the second utterance among U3,4,5 that the customer makes in a single turn. [sent-185, score-0.512]

64 They used relative position, author information and automatically predicted labels from previous post(s) as dependency features for assigning a semantic label to the current target post. [sent-190, score-0.139]

65 Similarly, by examining our chat corpus, we observed significant dependencies between utterances. [sent-191, score-0.163]

66 agent-to-user) dialogues often contain dependencies between adjacent utterances by different authors. [sent-194, score-0.337]

67 Another example is that when the agent makes a greeting, such as Have a nice day, then the customer will typically respond with a greeting or closing remark, and not a Yes or No. [sent-197, score-0.201]

68 Second, the flow of dialogues is in general cohesive, unless the topic of utterances changes dramatically (e. [sent-198, score-0.313]

69 Third, we observed that be- tween utterances made by the same author (either agent or user), the target utterance relies on previous utterances made by the same author, especially when IDUtterance A:U1Hello Customer, welcome to MSN Shopping. [sent-202, score-0.75]

70 A:U39 If you have any additional questions or you need additional information, please log in again to chat with us. [sent-216, score-0.139]

71 Table 2: An example of a 1-on-1 live chat, with turn and utterance structure the agent and user repeatedly question and answer. [sent-222, score-0.477]

72 With these observations, we checked the likelihood of dialogue act pairings between two adjacent utterances, as well as between two adjacent utterances made by the same author. [sent-223, score-1.077]

73 Overall, we found strong co-occurrence (as measured by number of occurrences of labels across adjacency pairs) between certain pairs of dialogue acts (e. [sent-224, score-0.918]

74 STATEMENT, on the other hand, can associate with most other dialogue acts. [sent-227, score-0.686]

75 Based on this, we designed the following five utterance dependency features; by combining these, we obtain 3 1 feature sets. [sent-228, score-0.25]

76 Dependency of utterances regardless of author (a) Dialogue act of previous utterance (b) Accumulated dialogue act(s) of previous utterances (c) Accumulated dialogue acts of previous ut866 terances in a given turn 2. [sent-230, score-2.477]

77 In contrast, instead of using utterances which indirectly encode dialogue acts, we directly use the dialogue act classifications, as done in Stolcke et al. [sent-233, score-1.763]

78 The motivation is that, due to the high performance of simple BoW features, using dialogue acts directly would capture the dependency better than indirect information from utterances, despite introducing some noise. [sent-235, score-0.916]

79 We do not build a probabilistic model of dialogue transitions the way Stolcke et al. [sent-236, score-0.686]

80 (2010) in using predicted dialogue act(s) labels learned in previous step(s) as a feature. [sent-238, score-0.707]

81 5 Experiment Setup As stated earlier, we use the data set from Ivanovic (2008) for our experiments; it contains 1-on-1 live chats from an information delivery task. [sent-239, score-0.397]

82 This dataset contains 8 live chats, including 542 manuallysegmented utterances. [sent-240, score-0.188]

83 The maximum and minimum number of utterances in a dialogue are 84 and 42, respectively; the maximum number of utterances in a turn is 14. [sent-241, score-1.115]

84 The live chats were manually tagged with the 12 dialogue acts described in Section 3. [sent-242, score-1.254]

85 The utterance distribution over the dialogue acts is described in Table 3. [sent-243, score-1.088]

86 We then built a dialogue act classifier using three different machine learners: SVM-HMM (Joachims, 1998),2 naive Bayes 2http://www. [sent-246, score-0.881]

87 Table 6 shows the results: Pos indicates the relative position of an utterance in the whole dialogue, Author means author information, and Posturn indicates the relative position of the utterance in a turn. [sent-285, score-0.599]

88 Since we use the dialogue acts directly in utterance dependency, we first experimented using gold-standard dialogue act labels. [sent-303, score-1.969]

89 We also tested using the dialogue acts which were automatically learned in previous steps. [sent-304, score-0.877]

90 Table 7 shows performance using both the goldstandard and learned dialogue acts. [sent-305, score-0.686]

91 However, List decreased the performance, as the flow of dialogues can change, and when a larger history of dialogue acts is included, it tends to introduce noise. [sent-314, score-0.994]

92 Comparing use of gold-standard and learned dialogue acts, the reduction in accuracy was not statistically significant, indicating that we can FeatureCRFSVMNB C + L a b e lPLAriuestvhort. [sent-315, score-0.686]

93 257432680542 Table 8: Accuracy with Structural and Dependency Information: C means lemmatized Unigram+Position+Author achieve high performance on dialogue act classification even with interactively-learned dialogue acts. [sent-318, score-1.598]

94 Rows indicate the correct dialogue acts and columns indicate misclassified dialogue acts. [sent-330, score-1.563]

95 In particularly, a large number of REQUEST and RESPONSEACK utterances were tagged as STATEMENT. [sent-332, score-0.196]

96 869 In future work, we plan to investigate methods for automatically cleansing the data to remove typos, and taking account of temporal gaps that can sometimes arise in online chats (e. [sent-334, score-0.232]

97 7 Conclusion We have explored an automated approach for classifying dialogue acts in 1-on-1 live chats in the shopping domain, using bag-of-words (BoW), structural information and utterance dependency features. [sent-337, score-1.684]

98 Of the learners we experimented with, CRFs performed best, due to their ability to natively capture sequential dialogue act dependencies. [sent-340, score-0.95]

99 Automatic instant messaging dialogue using statistical models and dialogue acts. [sent-472, score-1.372]

100 Combining lexical, syntactic and prosodic cues for improved online dialog act tagging. [sent-554, score-0.365]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('dialogue', 0.686), ('bow', 0.229), ('utterance', 0.211), ('utterances', 0.196), ('act', 0.195), ('acts', 0.191), ('chats', 0.189), ('live', 0.188), ('ivanovic', 0.186), ('chat', 0.139), ('dialogues', 0.117), ('customer', 0.09), ('forsyth', 0.081), ('shopping', 0.081), ('author', 0.079), ('greeting', 0.07), ('msn', 0.07), ('spoken', 0.063), ('crf', 0.06), ('stolcke', 0.058), ('yesanswer', 0.058), ('dialog', 0.058), ('bangalore', 0.058), ('classifying', 0.051), ('position', 0.049), ('yes', 0.049), ('structural', 0.048), ('ack', 0.046), ('conventionalopening', 0.046), ('damsl', 0.046), ('downplayer', 0.046), ('louwerse', 0.046), ('openquestion', 0.046), ('responseack', 0.046), ('yesnoquestion', 0.046), ('learners', 0.046), ('prosodic', 0.045), ('boolean', 0.044), ('online', 0.043), ('statement', 0.042), ('agent', 0.041), ('dependency', 0.039), ('turn', 0.037), ('lemmas', 0.037), ('request', 0.036), ('crfs', 0.035), ('brb', 0.035), ('bye', 0.035), ('conventionalclosing', 0.035), ('crossley', 0.035), ('labelauthor', 0.035), ('noanswer', 0.035), ('sridhar', 0.035), ('services', 0.035), ('shriberg', 0.033), ('samuel', 0.033), ('tf', 0.033), ('classification', 0.031), ('thanks', 0.03), ('response', 0.029), ('service', 0.028), ('discourse', 0.028), ('welcome', 0.027), ('acoustic', 0.027), ('anderson', 0.027), ('hello', 0.027), ('rq', 0.027), ('day', 0.025), ('speech', 0.025), ('dependencies', 0.024), ('cues', 0.024), ('jurafsky', 0.023), ('acknowledgement', 0.023), ('affirmative', 0.023), ('coccaro', 0.023), ('debole', 0.023), ('formulaic', 0.023), ('grau', 0.023), ('hcrc', 0.023), ('iftekharuddin', 0.023), ('misclassification', 0.023), ('natively', 0.023), ('ries', 0.023), ('witten', 0.023), ('expressive', 0.023), ('kim', 0.022), ('rochester', 0.022), ('ig', 0.022), ('telephone', 0.022), ('trains', 0.022), ('wu', 0.021), ('op', 0.021), ('accumulated', 0.021), ('conversations', 0.021), ('qu', 0.021), ('labels', 0.021), ('delivery', 0.02), ('bates', 0.02), ('adjacency', 0.02), ('casual', 0.02)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999964 26 emnlp-2010-Classifying Dialogue Acts in One-on-One Live Chats

Author: Su Nam Kim ; Lawrence Cavedon ; Timothy Baldwin

Abstract: We explore the task of automatically classifying dialogue acts in 1-on-1 online chat forums, an increasingly popular means of providing customer service. In particular, we investigate the effectiveness of various features and machine learners for this task. While a simple bag-of-words approach provides a solid baseline, we find that adding information from dialogue structure and inter-utterance dependency provides some increase in performance; learners that account for sequential dependencies (CRFs) show the best performance. We report our results from testing using a corpus of chat dialogues derived from online shopping customer-feedback data.

2 0.12833893 53 emnlp-2010-Fusing Eye Gaze with Speech Recognition Hypotheses to Resolve Exophoric References in Situated Dialogue

Author: Zahar Prasov ; Joyce Y. Chai

Abstract: In situated dialogue humans often utter linguistic expressions that refer to extralinguistic entities in the environment. Correctly resolving these references is critical yet challenging for artificial agents partly due to their limited speech recognition and language understanding capabilities. Motivated by psycholinguistic studies demonstrating a tight link between language production and human eye gaze, we have developed approaches that integrate naturally occurring human eye gaze with speech recognition hypotheses to resolve exophoric references in situated dialogue in a virtual world. In addition to incorporating eye gaze with the best recognized spoken hypothesis, we developed an algorithm to also handle multiple hypotheses modeled as word confusion networks. Our empirical results demonstrate that incorporating eye gaze with recognition hypotheses consistently outperforms the results obtained from processing recognition hypotheses alone. Incorporating eye gaze with word confusion networks further improves performance.

3 0.10293803 107 emnlp-2010-Towards Conversation Entailment: An Empirical Investigation

Author: Chen Zhang ; Joyce Chai

Abstract: While a significant amount of research has been devoted to textual entailment, automated entailment from conversational scripts has received less attention. To address this limitation, this paper investigates the problem of conversation entailment: automated inference of hypotheses from conversation scripts. We examine two levels of semantic representations: a basic representation based on syntactic parsing from conversation utterances and an augmented representation taking into consideration of conversation structures. For each of these levels, we further explore two ways of capturing long distance relations between language constituents: implicit modeling based on the length of distance and explicit modeling based on actual patterns of relations. Our empirical findings have shown that the augmented representation with conversation structures is important, which achieves the best performance when combined with explicit modeling of long distance relations.

4 0.10162832 4 emnlp-2010-A Game-Theoretic Approach to Generating Spatial Descriptions

Author: Dave Golland ; Percy Liang ; Dan Klein

Abstract: Language is sensitive to both semantic and pragmatic effects. To capture both effects, we model language use as a cooperative game between two players: a speaker, who generates an utterance, and a listener, who responds with an action. Specifically, we consider the task of generating spatial references to objects, wherein the listener must accurately identify an object described by the speaker. We show that a speaker model that acts optimally with respect to an explicit, embedded listener model substantially outperforms one that is trained to directly generate spatial descriptions.

5 0.10090657 25 emnlp-2010-Better Punctuation Prediction with Dynamic Conditional Random Fields

Author: Wei Lu ; Hwee Tou Ng

Abstract: This paper focuses on the task of inserting punctuation symbols into transcribed conversational speech texts, without relying on prosodic cues. We investigate limitations associated with previous methods, and propose a novel approach based on dynamic conditional random fields. Different from previous work, our proposed approach is designed to jointly perform both sentence boundary and sentence type prediction, and punctuation prediction on speech utterances. We performed evaluations on a transcribed conversational speech domain consisting of both English and Chinese texts. Empirical results show that our method outperforms an approach based on linear-chain conditional random fields and other previous approaches.

6 0.063600332 75 emnlp-2010-Lessons Learned in Part-of-Speech Tagging of Conversational Speech

7 0.058953762 84 emnlp-2010-NLP on Spoken Documents Without ASR

8 0.044474002 41 emnlp-2010-Efficient Graph-Based Semi-Supervised Learning of Structured Tagging Models

9 0.042094819 118 emnlp-2010-Utilizing Extra-Sentential Context for Parsing

10 0.03815328 48 emnlp-2010-Exploiting Conversation Structure in Unsupervised Topic Segmentation for Emails

11 0.033503965 11 emnlp-2010-A Semi-Supervised Approach to Improve Classification of Infrequent Discourse Relations Using Feature Vector Extension

12 0.027813068 61 emnlp-2010-Improving Gender Classification of Blog Authors

13 0.026718268 51 emnlp-2010-Function-Based Question Classification for General QA

14 0.026128829 24 emnlp-2010-Automatically Producing Plot Unit Representations for Narrative Text

15 0.026115244 30 emnlp-2010-Confidence in Structured-Prediction Using Confidence-Weighted Models

16 0.026091222 9 emnlp-2010-A New Approach to Lexical Disambiguation of Arabic Text

17 0.025955282 69 emnlp-2010-Joint Training and Decoding Using Virtual Nodes for Cascaded Segmentation and Tagging Tasks

18 0.025101067 33 emnlp-2010-Cross Language Text Classification by Model Translation and Semi-Supervised Learning

19 0.024086697 122 emnlp-2010-WikiWars: A New Corpus for Research on Temporal Expressions

20 0.023768457 64 emnlp-2010-Incorporating Content Structure into Text Analysis Applications


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.096), (1, 0.071), (2, -0.01), (3, 0.022), (4, -0.087), (5, -0.006), (6, 0.053), (7, -0.063), (8, -0.038), (9, 0.061), (10, -0.238), (11, -0.24), (12, 0.009), (13, 0.21), (14, 0.059), (15, -0.103), (16, -0.128), (17, 0.007), (18, -0.099), (19, 0.086), (20, -0.169), (21, 0.004), (22, -0.093), (23, 0.044), (24, -0.022), (25, 0.036), (26, 0.116), (27, -0.068), (28, -0.098), (29, 0.204), (30, -0.075), (31, 0.018), (32, 0.058), (33, 0.065), (34, 0.097), (35, 0.122), (36, -0.052), (37, -0.152), (38, -0.047), (39, -0.083), (40, 0.073), (41, -0.1), (42, 0.098), (43, -0.094), (44, -0.004), (45, -0.044), (46, 0.026), (47, -0.019), (48, 0.101), (49, 0.107)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97614545 26 emnlp-2010-Classifying Dialogue Acts in One-on-One Live Chats

Author: Su Nam Kim ; Lawrence Cavedon ; Timothy Baldwin

Abstract: We explore the task of automatically classifying dialogue acts in 1-on-1 online chat forums, an increasingly popular means of providing customer service. In particular, we investigate the effectiveness of various features and machine learners for this task. While a simple bag-of-words approach provides a solid baseline, we find that adding information from dialogue structure and inter-utterance dependency provides some increase in performance; learners that account for sequential dependencies (CRFs) show the best performance. We report our results from testing using a corpus of chat dialogues derived from online shopping customer-feedback data.

2 0.80386734 53 emnlp-2010-Fusing Eye Gaze with Speech Recognition Hypotheses to Resolve Exophoric References in Situated Dialogue

Author: Zahar Prasov ; Joyce Y. Chai

Abstract: In situated dialogue humans often utter linguistic expressions that refer to extralinguistic entities in the environment. Correctly resolving these references is critical yet challenging for artificial agents partly due to their limited speech recognition and language understanding capabilities. Motivated by psycholinguistic studies demonstrating a tight link between language production and human eye gaze, we have developed approaches that integrate naturally occurring human eye gaze with speech recognition hypotheses to resolve exophoric references in situated dialogue in a virtual world. In addition to incorporating eye gaze with the best recognized spoken hypothesis, we developed an algorithm to also handle multiple hypotheses modeled as word confusion networks. Our empirical results demonstrate that incorporating eye gaze with recognition hypotheses consistently outperforms the results obtained from processing recognition hypotheses alone. Incorporating eye gaze with word confusion networks further improves performance.

3 0.65900809 4 emnlp-2010-A Game-Theoretic Approach to Generating Spatial Descriptions

Author: Dave Golland ; Percy Liang ; Dan Klein

Abstract: Language is sensitive to both semantic and pragmatic effects. To capture both effects, we model language use as a cooperative game between two players: a speaker, who generates an utterance, and a listener, who responds with an action. Specifically, we consider the task of generating spatial references to objects, wherein the listener must accurately identify an object described by the speaker. We show that a speaker model that acts optimally with respect to an explicit, embedded listener model substantially outperforms one that is trained to directly generate spatial descriptions.

4 0.31662521 107 emnlp-2010-Towards Conversation Entailment: An Empirical Investigation

Author: Chen Zhang ; Joyce Chai

Abstract: While a significant amount of research has been devoted to textual entailment, automated entailment from conversational scripts has received less attention. To address this limitation, this paper investigates the problem of conversation entailment: automated inference of hypotheses from conversation scripts. We examine two levels of semantic representations: a basic representation based on syntactic parsing from conversation utterances and an augmented representation taking into consideration of conversation structures. For each of these levels, we further explore two ways of capturing long distance relations between language constituents: implicit modeling based on the length of distance and explicit modeling based on actual patterns of relations. Our empirical findings have shown that the augmented representation with conversation structures is important, which achieves the best performance when combined with explicit modeling of long distance relations.

5 0.30915481 25 emnlp-2010-Better Punctuation Prediction with Dynamic Conditional Random Fields

Author: Wei Lu ; Hwee Tou Ng

Abstract: This paper focuses on the task of inserting punctuation symbols into transcribed conversational speech texts, without relying on prosodic cues. We investigate limitations associated with previous methods, and propose a novel approach based on dynamic conditional random fields. Different from previous work, our proposed approach is designed to jointly perform both sentence boundary and sentence type prediction, and punctuation prediction on speech utterances. We performed evaluations on a transcribed conversational speech domain consisting of both English and Chinese texts. Empirical results show that our method outperforms an approach based on linear-chain conditional random fields and other previous approaches.

6 0.21341723 84 emnlp-2010-NLP on Spoken Documents Without ASR

7 0.21024315 75 emnlp-2010-Lessons Learned in Part-of-Speech Tagging of Conversational Speech

8 0.19341455 61 emnlp-2010-Improving Gender Classification of Blog Authors

9 0.17313914 41 emnlp-2010-Efficient Graph-Based Semi-Supervised Learning of Structured Tagging Models

10 0.15995008 118 emnlp-2010-Utilizing Extra-Sentential Context for Parsing

11 0.15927464 122 emnlp-2010-WikiWars: A New Corpus for Research on Temporal Expressions

12 0.14687388 11 emnlp-2010-A Semi-Supervised Approach to Improve Classification of Infrequent Discourse Relations Using Feature Vector Extension

13 0.13449022 92 emnlp-2010-Predicting the Semantic Compositionality of Prefix Verbs

14 0.1313065 24 emnlp-2010-Automatically Producing Plot Unit Representations for Narrative Text

15 0.11983378 110 emnlp-2010-Turbo Parsers: Dependency Parsing by Approximate Variational Inference

16 0.10743449 8 emnlp-2010-A Multi-Pass Sieve for Coreference Resolution

17 0.10195457 81 emnlp-2010-Modeling Perspective Using Adaptor Grammars

18 0.099405862 85 emnlp-2010-Negative Training Data Can be Harmful to Text Classification

19 0.094829001 111 emnlp-2010-Two Decades of Unsupervised POS Induction: How Far Have We Come?

20 0.09364295 33 emnlp-2010-Cross Language Text Classification by Model Translation and Semi-Supervised Learning


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(3, 0.023), (10, 0.012), (11, 0.021), (12, 0.026), (25, 0.118), (29, 0.053), (30, 0.012), (32, 0.018), (52, 0.019), (56, 0.058), (62, 0.018), (66, 0.08), (72, 0.047), (76, 0.316), (79, 0.013), (82, 0.012), (87, 0.019), (89, 0.019)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.92306298 16 emnlp-2010-An Approach of Generating Personalized Views from Normalized Electronic Dictionaries : A Practical Experiment on Arabic Language

Author: Aida Khemakhem ; Bilel Gargouri ; Abdelmajid Ben Hamadou

Abstract: Electronic dictionaries covering all natural language levels are very relevant for the human use as well as for the automatic processing use, namely those constructed with respect to international standards. Such dictionaries are characterized by a complex structure and an important access time when using a querying system. However, the need of a user is generally limited to a part of such a dictionary according to his domain and expertise level which corresponds to a specialized dictionary. Given the importance of managing a unified dictionary and considering the personalized needs of users, we propose an approach for generating personalized views starting from a normalized dictionary with respect to Lexical Markup Framework LMF-ISO 24613 norm. This approach provides the re-use of already defined views for a community of users by managing their profiles information and promoting the materialization of the generated views. It is composed of four main steps: (i) the projection of data categories controlled by a set of constraints (related to the user‟s profiles), (ii) the selection of values with consistency checking, (iii) the automatic generation of the query‟s model and finally, (iv) the refinement of the view. The proposed approach was con- solidated by carrying out an experiment on an LMF normalized Arabic dictionary. 1

2 0.89977705 121 emnlp-2010-What a Parser Can Learn from a Semantic Role Labeler and Vice Versa

Author: Stephen Boxwell ; Dennis Mehay ; Chris Brew

Abstract: In many NLP systems, there is a unidirectional flow of information in which a parser supplies input to a semantic role labeler. In this paper, we build a system that allows information to flow in both directions. We make use of semantic role predictions in choosing a single-best parse. This process relies on an averaged perceptron model to distinguish likely semantic roles from erroneous ones. Our system penalizes parses that give rise to low-scoring semantic roles. To explore the consequences of this we perform two experiments. First, we use a baseline generative model to produce n-best parses, which are then re-ordered by our semantic model. Second, we use a modified version of our semantic role labeler to predict semantic roles at parse time. The performance of this modified labeler is weaker than that of our best full SRL, because it is restricted to features that can be computed directly from the parser’s packed chart. For both experiments, the resulting semantic predictions are then used to select parses. Finally, we feed the selected parses produced by each experiment to the full version of our semantic role labeler. We find that SRL performance can be improved over this baseline by selecting parses with likely semantic roles.

same-paper 3 0.78321648 26 emnlp-2010-Classifying Dialogue Acts in One-on-One Live Chats

Author: Su Nam Kim ; Lawrence Cavedon ; Timothy Baldwin

Abstract: We explore the task of automatically classifying dialogue acts in 1-on-1 online chat forums, an increasingly popular means of providing customer service. In particular, we investigate the effectiveness of various features and machine learners for this task. While a simple bag-of-words approach provides a solid baseline, we find that adding information from dialogue structure and inter-utterance dependency provides some increase in performance; learners that account for sequential dependencies (CRFs) show the best performance. We report our results from testing using a corpus of chat dialogues derived from online shopping customer-feedback data.

4 0.74085164 40 emnlp-2010-Effects of Empty Categories on Machine Translation

Author: Tagyoung Chung ; Daniel Gildea

Abstract: We examine effects that empty categories have on machine translation. Empty categories are elements in parse trees that lack corresponding overt surface forms (words) such as dropped pronouns and markers for control constructions. We start by training machine translation systems with manually inserted empty elements. We find that inclusion of some empty categories in training data improves the translation result. We expand the experiment by automatically inserting these elements into a larger data set using various methods and training on the modified corpus. We show that even when automatic prediction of null elements is not highly accurate, it nevertheless improves the end translation result.

5 0.48898268 21 emnlp-2010-Automatic Discovery of Manner Relations and its Applications

Author: Eduardo Blanco ; Dan Moldovan

Abstract: This paper presents a method for the automatic discovery of MANNER relations from text. An extended definition of MANNER is proposed, including restrictions on the sorts of concepts that can be part of its domain and range. The connections with other relations and the lexico-syntactic patterns that encode MANNER are analyzed. A new feature set specialized on MANNER detection is depicted and justified. Experimental results show improvement over previous attempts to extract MANNER. Combinations of MANNER with other semantic relations are also discussed.

6 0.48286068 53 emnlp-2010-Fusing Eye Gaze with Speech Recognition Hypotheses to Resolve Exophoric References in Situated Dialogue

7 0.47284037 32 emnlp-2010-Context Comparison of Bursty Events in Web Search and Online Media

8 0.46246973 42 emnlp-2010-Efficient Incremental Decoding for Tree-to-String Translation

9 0.45907408 65 emnlp-2010-Inducing Probabilistic CCG Grammars from Logical Form with Higher-Order Unification

10 0.45749995 98 emnlp-2010-Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Using Latent Syntactic Distributions

11 0.45745605 118 emnlp-2010-Utilizing Extra-Sentential Context for Parsing

12 0.45653507 107 emnlp-2010-Towards Conversation Entailment: An Empirical Investigation

13 0.45458475 60 emnlp-2010-Improved Fully Unsupervised Parsing with Zoomed Learning

14 0.4470174 114 emnlp-2010-Unsupervised Parse Selection for HPSG

15 0.44662601 55 emnlp-2010-Handling Noisy Queries in Cross Language FAQ Retrieval

16 0.44423217 24 emnlp-2010-Automatically Producing Plot Unit Representations for Narrative Text

17 0.44251227 106 emnlp-2010-Top-Down Nearly-Context-Sensitive Parsing

18 0.43536288 46 emnlp-2010-Evaluating the Impact of Alternative Dependency Graph Encodings on Solving Event Extraction Tasks

19 0.43243024 86 emnlp-2010-Non-Isomorphic Forest Pair Translation

20 0.43131366 51 emnlp-2010-Function-Based Question Classification for General QA