emnlp emnlp2013 emnlp2013-4 knowledge-graph by maker-knowledge-mining

4 emnlp-2013-A Dataset for Research on Short-Text Conversations


Source: pdf

Author: Hao Wang ; Zhengdong Lu ; Hang Li ; Enhong Chen

Abstract: Natural language conversation is widely regarded as a highly difficult problem, which is usually attacked with either rule-based or learning-based models. In this paper we propose a retrieval-based automatic response model for short-text conversation, to exploit the vast amount of short conversation instances available on social media. For this purpose we introduce a dataset of short-text conversation based on the real-world instances from Sina Weibo (a popular Chinese microblog service), which will be soon released to public. This dataset provides rich collection of instances for the research on finding natural and relevant short responses to a given short text, and useful for both training and testing of conversation models. This dataset consists of both naturally formed conversations, manually labeled data, and a large repository of candidate responses. Our preliminary experiments demonstrate that the simple retrieval-based conversation model performs reasonably well when combined with the rich instances in our dataset.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 In this paper we propose a retrieval-based automatic response model for short-text conversation, to exploit the vast amount of short conversation instances available on social media. [sent-12, score-0.786]

2 For this purpose we introduce a dataset of short-text conversation based on the real-world instances from Sina Weibo (a popular Chinese microblog service), which will be soon released to public. [sent-13, score-0.232]

3 This dataset provides rich collection of instances for the research on finding natural and relevant short responses to a given short text, and useful for both training and testing of conversation models. [sent-14, score-0.6]

4 This dataset consists of both naturally formed conversations, manually labeled data, and a large repository of candidate responses. [sent-15, score-0.151]

5 Our preliminary experiments demonstrate that the simple retrieval-based conversation model performs reasonably well when combined with the rich instances in our dataset. [sent-16, score-0.141]

6 1 Introduction Natural language conversation is one of the holy grail of artificial intelligence, and has been taken as the original form of the celebrated Turing test. [sent-17, score-0.159]

7 Previous effort in this direction has largely focused on analyzing the text and modeling the state of the conversation through dialogue models, while in this pa∗The work is done when the first author worked as intern at Noah’s Ark Lab, Huawei Techologies. [sent-18, score-0.212]

8 935 per we take one step back and focus on a much easier task of finding the response for a given short text. [sent-19, score-0.597]

9 , as proposed in Turing test), and probably not enough for real conversation scenario which requires often several rounds of interactions (e. [sent-23, score-0.158]

10 The research in this direction will not only instantly help the applications of short session dialogue such as automatic message replying on mobile phone and the chatbot employed in voice assistant like Siri1, but also it will eventually benefit the modeling of dialogues in a more general setting. [sent-28, score-0.202]

11 As the result, it has collected conversation history with volume previously unthinkable, which brings opportunity for attacking the conversation problem from a whole new angle. [sent-43, score-0.24]

12 More specifically, instead of generating a response to an utterance, we pick a massive suitable one from the candidate set. [sent-44, score-0.673]

13 The hope is, with a reasonable retrieval model and a large enough candidate set, the system can produce fairly natural and appropriate responses. [sent-45, score-0.159]

14 In our model, it needs only a relatively small labeled dataset for training the retrieval model, but requires a rather large unlabeled set (e. [sent-47, score-0.158]

15 To further promote the research in similar direction, we create a dataset for training and testing the retrieval model, with a candidate responses set of reasonable size. [sent-50, score-0.521]

16 As almost all microblog services, Sina Weibo allows users to comment on a published post3, which forms a natural one-round conversation. [sent-52, score-0.143]

17 936 2 The Dialogues on Sina Weibo Sina Weibo is a Twitter-like microblog service, on which a user can publish short messages (will be referred to as post in the remainder of the paper) visible to public or a group specified by the user. [sent-58, score-0.624]

18 Those comments will be referred to as responses in the remainder of the paper. [sent-61, score-0.401]

19 Figure 1: An example of Sina Weibo post and the comments it received. [sent-62, score-0.54]

20 The comments to a post can be of rather flexible forms and diverse topics, as illustrated in the example in Table 1. [sent-64, score-0.57]

21 With a post stating the user’s status (traveling to Hawaii), the comments can be of quite different styles and contents, but apparently all appropriate. [sent-65, score-0.566]

22 In many cases, the (post, response) pair is selfcontained, which means one does not need any background and contextual information to get the main point of the conversation (Examples of that include the responses from B, D, G and H). [sent-66, score-0.46]

23 For example, the response from user E will be fairly elusive if taken out of the context that A’s Hawaii trip is for an international conference and he is going to give a talk there. [sent-68, score-0.677]

24 Part 3 collects all the responses, in- cluding but not limited to the responses in Part 1and 2. [sent-75, score-0.34]

25 # la1b2el,e4d27 pairs Table 2: Some statistics of the dataset Original (Post, Response) Pairs This part of dataset gives (post, response) pairs naturally presented in the microblog service. [sent-78, score-0.21]

26 In other words, we create a (post, response) pair there when the response is actually given to the post in Sina Weibo. [sent-79, score-1.078]

27 The part of data is noisy since the responses given to a Weibo post could still be inappropriate for different reasons, for example, they could be spams or targeting some responses given earlier. [sent-80, score-1.181]

28 Note that 937 1) the labeling is only on a small subset of posts, and 2) for each selected post, the labeled responses are not originally given to it. [sent-83, score-0.391]

29 This part of data can be directly used for training and testing of retrieval-based response models. [sent-85, score-0.613]

30 We have labeled 422 posts and for each of them, about 30 candidate responses. [sent-86, score-0.27]

31 These extra responses are mainly filtered out by our data cleaning strategy (see Section 4. [sent-88, score-0.403]

32 2) for original (post, response) pairs, including those from filtered-out Weibo posts and those addressing other responses. [sent-89, score-0.201]

33 Nevertheless, those responses are still valid candidate for responses. [sent-90, score-0.397]

34 1 Using the Dataset for Retrieval-based Response Models Our data can be used for training and testing of retrieval-based response model, or just as a bank of responses. [sent-94, score-0.613]

35 Training Low-level Matching Features The rather abundant original (post, response) pairs provide rather rich supervision signal for learning different matching patterns between a post and a response. [sent-96, score-0.73]

36 For example, one may discover from the data that when the word “Hawaii” occurs in the post, the response are more likely to contain words like “trip”, “flight”, or “Honolulu”. [sent-99, score-0.577]

37 For example, the response to a post asking “how to” is statistically longer than average responses. [sent-102, score-1.078]

38 Please note that with more sophisticated natural language processing, we can go beyond bag-of-words for more complicated correspondence between post and response. [sent-106, score-0.53]

39 Training Automatic Response Models Although the original (post, response) pairs are rather abundant, they are not enough for discriminative training and testing of retrieval models, for the following reasons. [sent-107, score-0.177]

40 This supervision will naturally tune the model parameters to find the real good responses from the seemingly good ones. [sent-109, score-0.34]

41 Please note that without the labeled negative pairs, we need to generate negative pairs with randomly chosen responses, which in most of the cases are too easy to differentiate by the ranking model and cannot fully tune the model parameters. [sent-110, score-0.179]

42 Testing Automatic Response Models 938 In testing a retrieval-based system, although we can simply use the original responses associated with the query post as positive and treat all the others as negative, this strategy suffers from the problem of spurious negative examples. [sent-112, score-0.975]

43 In other words, with a reasonably good model, the retrieved responses are often good even if they are not the original ones, which brings significant bias to the evaluation. [sent-113, score-0.379]

44 For example, to determine the sentiment of a response, one needs to consider both the original post as well as the observed interaction between the two. [sent-117, score-0.565]

45 In Figure 3, if we want to understand user’s sentiment towards the “invited talk” mentioned in the post, the two responses should be taken as positive, although the sentiment in the mere responses is either negative or neutral. [sent-118, score-0.763]

46 4 Creation of the Dataset The (post, comment) pairs are sampled from the Sina Weibo posts published by users in a loosely connected community and the comments they received (may not be from this community). [sent-119, score-0.352]

47 The creation process of the dataset, as illustrated in Figure 4, consists of three consecutive steps: 1) crawling the community of users, 2) crawling their Weibo posts and their responses, 3) cleaning the data, with more details described in the remainder of this section. [sent-122, score-0.373]

48 This is done through crawling followees4 of ten manually selected seed users who are NLP researchers active on Sina Weibo (with no less than 2 posts per day on average) and popular enough (with no less than 100 followers). [sent-126, score-0.282]

49 We crawl the posts and the responses they received (not necessarily from the crawled community) for two months (from April 5th, 2013, to June 5th, 2013). [sent-127, score-0.502]

50 and 2) some of the posts or responses are too general to be interesting for other cases, e. [sent-135, score-0.502]

51 • • In the remained posts, we only keep the first I1n00 t responses idn tohset original (post, response) pairs, since we observe that after the first 100 responses there will be a non-negligible proportion of responses addressing things other than the original Weibo post (e. [sent-140, score-1.62]

52 We however will still keep the responses in the bank of responses. [sent-143, score-0.34]

53 e Wpe is sw toil f iflitnerd othuet long responses tvherat- have been posted more than twice on different posts and scrub them out of both original (post, response) pairs and the response repository. [sent-145, score-1.156]

54 For the remained posts and responses, we remove the punctuation marks and emoticons, and use ICTCLAS (Zhang et al. [sent-146, score-0.183]

55 More specifically, for a given post, we use three baseline retrieval models to each select 10 responses (see Section 5 for the description of the baselines), and merge them to form a much reduced candidate set with size ≤ 30. [sent-151, score-0.442]

56 Basically we consider a response suitable for a given post if we cannot tell whether it is an original response. [sent-154, score-1.117]

57 More specifically the suitability of a response is judged based on the following three criteria5: Semantic Relevance: This requires the content of the response to be semantically relevant to the post. [sent-155, score-1.237]

58 As shown in the example right below, the post P is about soccer, and so is response R1 (hence semantically relevant), whereas response R2 is about food (hence semantically irrelevant). [sent-156, score-1.697]

59 This requires the entities in the response to be correctly aligned with those in the post. [sent-161, score-0.596]

60 In other words, if the post is about entity 5Note that although our criteria in general favor short and general answers like “Well said! [sent-162, score-0.545]

61 940 A, while the response is about entity B, they are very likely to be mismatched. [sent-165, score-0.601]

62 As shown in the following example, where the original post is about Paris, and the response R2 talks about London: ThisPR :21howIgEMtoeniavsjonde,mybrI noewla. [sent-166, score-1.117]

63 eycon- taining a different entity could still be sound, as demonstrated by the following two responses to the post above R1:Enjoy your time in France. [sent-169, score-0.865]

64 Logic Consistency: This requires the content of the response to be logically consistent with the post. [sent-171, score-0.645]

65 For example, in the table right below, post P states that the Huawei mobile phone “Honor” is already in the market of mainland China. [sent-172, score-0.578]

66 Response R1 talk- s about a personal preference over the same phone model (hence logically consistent), whereas R2 asks the question the answer to which is already clear from P (hence logically inconsistent). [sent-173, score-0.157]

67 goietn pro,hstenl s Speech Act Alignment: Another important factor in determining the suitability of a response is the speech act. [sent-176, score-0.62]

68 In the example below, post P asks a special question about location. [sent-180, score-0.523]

69 hlwNeyahEwreduYroKpDkeCiwtylbehdte 5 Retrieval-based Response Model In a retrieval-based response model, for a given post x we pick from the candidate set the response with the highest ranking score, where the score is the ensemble of several individual matching features score(x,y) =XwiΦi(x,y). [sent-184, score-1.853]

70 We perform a two-stage retrieval to handle the scalability associated with the massive candidate set, as illustrated in Figure 5. [sent-186, score-0.171]

71 Our response model then decides whether to respond and which candidate response to choose. [sent-189, score-1.261]

72 The level of matching score between a post and a response can be measured as the inner product between their images in the low-dimensional space x>LXLY>y. [sent-197, score-1.195]

73 This is to capture the semantic matching between a Weibo post and a response, which may not be well captured by a word-by-word matching. [sent-199, score-0.641]

74 For example, the im- age of the word “Italy” in the post in the latent space matches well word “Sicily”, “Mediterranean sea” and “travel”. [sent-206, score-0.501]

75 Once the mapping LX and LY arelearned, the semantic matching scorex>LXLY>y will be treated as a feature for modeling the overYall suitability of y as a response to post x. [sent-207, score-1.261]

76 POST-RESPONSE SIMILARITY: Here we use a simple vector-space model for measuring the similarity between a post and a response simPR(x,y) =kxxk>kyyk. [sent-208, score-1.101]

77 (3) Although it is not necessarily true that a good response has many common words as the post, but this measurement is often helpful in finding relevant responses. [sent-209, score-0.577]

78 For example, when the post and response Figure 5: Diagram of the retrieval-based automatic response system. [sent-210, score-1.655]

79 Unlike the semantic matching feature, this simple similarity requires no learning and works on infrequent words. [sent-212, score-0.182]

80 Our empirical results show that it can often capture the Post-Response relation failed with semantic matching feature. [sent-213, score-0.14]

81 POST-POST SIMILARITY: The basic idea here is to find posts similar to x and use their responses as the candidates. [sent-214, score-0.502]

82 (4) The intuition here is that if a post x0 is similar to x its responses might be appropriate for x. [sent-216, score-0.841]

83 It however often fails, especially when a response to x0 addresses parts of x not contained by x, which fortunately can be alleviated when combined with other measures. [sent-217, score-0.577]

84 2 Learning to Rank with Labeled Data With all the matching features, we can learn a ranking model with the labeled (post, response) pairs, e. [sent-219, score-0.192]

85 Appar- y+, y+ ently can be selected from labeled positive response of x, while y− can be sampled either from labeled negative negative or randomly selected ones. [sent-223, score-0.745]

86 Since the manually labeled negative instances are top-ranked candidates according to some individual retrieval model (see Section 5. [sent-224, score-0.15]

87 In addition to the matching features, we also have simple features describing responses only, such as the length of it. [sent-227, score-0.457]

88 P@ 1 This one simply measures the precision of the top one response in the ranked list: P@1 =#good# topp-o1st ressponses We perform a 5-fold cross-validation on the 422 labeled posts, with the results reported in Table 1. [sent-232, score-0.628]

89 To mimic a more realistic scenario on automatic response model on Sina Weibo, we allow the system to choose which post to respond to. [sent-241, score-1.128]

90 Here we simply set the response algorithm to respond only when the highest score of the candidate response passes a certain threshold. [sent-242, score-1.261]

91 Figure 6: An actual instance (the original Chinese text and its English translation) of response returned by our retrieval-based system. [sent-247, score-0.616]

92 1) helps find matched responses that do not share any words with the post (Section 6. [sent-254, score-0.841]

93 :n“dWe EXAMPLE 2: However our retrieval model also makes bad choice, especially when either the query post or the response is long, as shown in Example 3. [sent-264, score-1.123]

94 Here the response is picked up because 1) the correspondence between the word “IT” in the post and the word “mobile phone” in the candidate, and 2) the Chinese word for “lay off” in the post and the word for “outdated” in the response are the same. [sent-265, score-2.156]

95 2 On Logic Consistency Our current model does not explicitly maintain the logic consistency between the response and the post, since Logic consistency requires a deeper analysis of the text, and therefore hard to capture with just a vector space model. [sent-273, score-0.696]

96 3 The Effect of Semantic Matching The experiments also show that we may find interesting and appropriate responses that have no common words as the post, as shown in the example below. [sent-283, score-0.34]

97 England players stand in the penalty RR12::HWahhaat, a it cl iass ssti lcl m 0:a0t,c nho goal so far 7 Summary In this paper we propose a retrieval-based response model for short-text based conversation, to leverage the massive instances collected from social media. [sent-286, score-0.678]

98 For research in similar directions, we create a dataset based on the posts and comments from Sina Weibo. [sent-287, score-0.244]

99 Our preliminary experiments show that our retrievalbased response model, when combined with a large candidate set, can achieve fairly good performance. [sent-288, score-0.672]

100 This dataset will be valuable for both training and testing automatic response models for short texts. [sent-289, score-0.676]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('response', 0.577), ('post', 0.501), ('responses', 0.34), ('weibo', 0.244), ('posts', 0.162), ('sina', 0.153), ('conversation', 0.12), ('matching', 0.117), ('dialogue', 0.068), ('users', 0.058), ('candidate', 0.057), ('labeled', 0.051), ('respond', 0.05), ('huawei', 0.05), ('lxly', 0.05), ('xreduced', 0.05), ('zhengdong', 0.05), ('logically', 0.049), ('microblog', 0.048), ('retrieval', 0.045), ('crawling', 0.043), ('leuski', 0.043), ('suitability', 0.043), ('dataset', 0.043), ('mobile', 0.04), ('pr', 0.04), ('litman', 0.039), ('comments', 0.039), ('original', 0.039), ('massive', 0.039), ('fairly', 0.038), ('pairs', 0.038), ('comment', 0.037), ('phone', 0.037), ('cleaning', 0.037), ('dialogues', 0.037), ('community', 0.036), ('logic', 0.036), ('testing', 0.036), ('abundant', 0.035), ('chinese', 0.034), ('negative', 0.033), ('user', 0.033), ('huawe', 0.033), ('jafarpour', 0.033), ('kxxk', 0.033), ('misu', 0.033), ('schatzmann', 0.033), ('semmatch', 0.033), ('simpp', 0.033), ('etc', 0.033), ('consistency', 0.032), ('illustrated', 0.03), ('complicated', 0.029), ('hawaii', 0.029), ('ly', 0.029), ('reinforcement', 0.029), ('museum', 0.029), ('traveling', 0.029), ('trip', 0.029), ('turing', 0.029), ('stage', 0.028), ('stands', 0.028), ('conversations', 0.028), ('apparently', 0.026), ('act', 0.026), ('strategy', 0.026), ('vast', 0.026), ('relevance', 0.026), ('sentiment', 0.025), ('anton', 0.024), ('lx', 0.024), ('nx', 0.024), ('service', 0.024), ('effort', 0.024), ('ranking', 0.024), ('entity', 0.024), ('similarity', 0.023), ('ark', 0.023), ('ritter', 0.023), ('virtual', 0.023), ('semantic', 0.023), ('remainder', 0.022), ('asks', 0.022), ('diagram', 0.022), ('china', 0.022), ('social', 0.022), ('instances', 0.021), ('semantically', 0.021), ('remained', 0.021), ('lu', 0.021), ('short', 0.02), ('london', 0.02), ('williams', 0.02), ('nice', 0.019), ('posed', 0.019), ('enough', 0.019), ('penalty', 0.019), ('requires', 0.019), ('loosely', 0.019)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000002 4 emnlp-2013-A Dataset for Research on Short-Text Conversations

Author: Hao Wang ; Zhengdong Lu ; Hang Li ; Enhong Chen

Abstract: Natural language conversation is widely regarded as a highly difficult problem, which is usually attacked with either rule-based or learning-based models. In this paper we propose a retrieval-based automatic response model for short-text conversation, to exploit the vast amount of short conversation instances available on social media. For this purpose we introduce a dataset of short-text conversation based on the real-world instances from Sina Weibo (a popular Chinese microblog service), which will be soon released to public. This dataset provides rich collection of instances for the research on finding natural and relevant short responses to a given short text, and useful for both training and testing of conversation models. This dataset consists of both naturally formed conversations, manually labeled data, and a large repository of candidate responses. Our preliminary experiments demonstrate that the simple retrieval-based conversation model performs reasonably well when combined with the rich instances in our dataset.

2 0.32848537 130 emnlp-2013-Microblog Entity Linking by Leveraging Extra Posts

Author: Yuhang Guo ; Bing Qin ; Ting Liu ; Sheng Li

Abstract: Linking name mentions in microblog posts to a knowledge base, namely microblog entity linking, is useful for text mining tasks on microblog. Entity linking in long text has been well studied in previous works. However few work has focused on short text such as microblog post. Microblog posts are short and noisy. Previous method can extract few features from the post context. In this paper we propose to use extra posts for the microblog entity linking task. Experimental results show that our proposed method significantly improves the linking accuracy over traditional methods by 8.3% and 7.5% respectively.

3 0.14069016 204 emnlp-2013-Word Level Language Identification in Online Multilingual Communication

Author: Dong Nguyen ; A. Seza Dogruoz

Abstract: Multilingual speakers switch between languages in online and spoken communication. Analyses of large scale multilingual data require automatic language identification at the word level. For our experiments with multilingual online discussions, we first tag the language of individual words using language models and dictionaries. Secondly, we incorporate context to improve the performance. We achieve an accuracy of 98%. Besides word level accuracy, we use two new metrics to evaluate this task.

4 0.093398906 151 emnlp-2013-Paraphrasing 4 Microblog Normalization

Author: Wang Ling ; Chris Dyer ; Alan W Black ; Isabel Trancoso

Abstract: Compared to the edited genres that have played a central role in NLP research, microblog texts use a more informal register with nonstandard lexical items, abbreviations, and free orthographic variation. When confronted with such input, conventional text analysis tools often perform poorly. Normalization replacing orthographically or lexically idiosyncratic forms with more standard variants can improve performance. We propose a method for learning normalization rules from machine translations of a parallel corpus of microblog messages. To validate the utility of our approach, we evaluate extrinsically, showing that normalizing English tweets and then translating improves translation quality (compared to translating unnormalized text) using three standard web translation services as well as a phrase-based translation system trained — — on parallel microblog data.

5 0.0896383 122 emnlp-2013-Learning to Freestyle: Hip Hop Challenge-Response Induction via Transduction Rule Segmentation

Author: Dekai Wu ; Karteek Addanki ; Markus Saers ; Meriem Beloucif

Abstract: We present a novel model, Freestyle, that learns to improvise rhyming and fluent responses upon being challenged with a line of hip hop lyrics, by combining both bottomup token based rule induction and top-down rule segmentation strategies to learn a stochastic transduction grammar that simultaneously learns bothphrasing and rhyming associations. In this attack on the woefully under-explored natural language genre of music lyrics, we exploit a strictly unsupervised transduction grammar induction approach. Our task is particularly ambitious in that no use of any a priori linguistic or phonetic information is allowed, even though the domain of hip hop lyrics is particularly noisy and unstructured. We evaluate the performance of the learned model against a model learned only using the more conventional bottom-up token based rule induction, and demonstrate the superiority of our combined token based and rule segmentation induction method toward generating higher quality improvised responses, measured on fluency and rhyming criteria as judged by human evaluators. To highlight some of the inherent challenges in adapting other algorithms to this novel task, we also compare the quality ofthe responses generated by our model to those generated by an out-ofthe-box phrase based SMT system. We tackle the challenge of selecting appropriate training data for our task via a dedicated rhyme scheme detection module, which is also acquired via unsupervised learning and report improved quality of the generated responses. Finally, we report results with Maghrebi French hip hop lyrics indicating that our model performs surprisingly well with no special adaptation to other languages. 102

6 0.075497098 46 emnlp-2013-Classifying Message Board Posts with an Extracted Lexicon of Patient Attributes

7 0.07341513 173 emnlp-2013-Simulating Early-Termination Search for Verbose Spoken Queries

8 0.062471706 24 emnlp-2013-Application of Localized Similarity for Web Documents

9 0.061386306 47 emnlp-2013-Collective Opinion Target Extraction in Chinese Microblogs

10 0.060941078 7 emnlp-2013-A Hierarchical Entity-Based Approach to Structuralize User Generated Content in Social Media: A Case of Yahoo! Answers

11 0.060666323 183 emnlp-2013-The VerbCorner Project: Toward an Empirically-Based Semantic Decomposition of Verbs

12 0.057439096 131 emnlp-2013-Mining New Business Opportunities: Identifying Trend related Products by Leveraging Commercial Intents from Microblogs

13 0.056559682 109 emnlp-2013-Is Twitter A Better Corpus for Measuring Sentiment Similarity?

14 0.054098297 6 emnlp-2013-A Generative Joint, Additive, Sequential Model of Topics and Speech Acts in Patient-Doctor Communication

15 0.053120099 91 emnlp-2013-Grounding Strategic Conversation: Using Negotiation Dialogues to Predict Trades in a Win-Lose Game

16 0.049955867 153 emnlp-2013-Predicting the Resolution of Referring Expressions from User Behavior

17 0.046996724 152 emnlp-2013-Predicting the Presence of Discourse Connectives

18 0.046606395 23 emnlp-2013-Animacy Detection with Voting Models

19 0.044858858 69 emnlp-2013-Efficient Collective Entity Linking with Stacking

20 0.043790754 197 emnlp-2013-Using Paraphrases and Lexical Semantics to Improve the Accuracy and the Robustness of Supervised Models in Situated Dialogue Systems


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.153), (1, 0.059), (2, -0.057), (3, -0.064), (4, 0.039), (5, 0.003), (6, 0.063), (7, 0.194), (8, 0.115), (9, -0.068), (10, -0.073), (11, 0.184), (12, 0.076), (13, 0.108), (14, -0.203), (15, 0.025), (16, 0.05), (17, 0.043), (18, -0.305), (19, 0.072), (20, 0.338), (21, -0.002), (22, -0.037), (23, -0.154), (24, -0.039), (25, 0.041), (26, -0.046), (27, 0.021), (28, -0.008), (29, -0.113), (30, 0.003), (31, -0.039), (32, -0.08), (33, 0.043), (34, 0.014), (35, 0.019), (36, -0.077), (37, 0.022), (38, -0.043), (39, 0.028), (40, -0.091), (41, -0.024), (42, -0.035), (43, 0.043), (44, 0.027), (45, -0.03), (46, -0.046), (47, -0.063), (48, 0.119), (49, -0.041)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97685039 4 emnlp-2013-A Dataset for Research on Short-Text Conversations

Author: Hao Wang ; Zhengdong Lu ; Hang Li ; Enhong Chen

Abstract: Natural language conversation is widely regarded as a highly difficult problem, which is usually attacked with either rule-based or learning-based models. In this paper we propose a retrieval-based automatic response model for short-text conversation, to exploit the vast amount of short conversation instances available on social media. For this purpose we introduce a dataset of short-text conversation based on the real-world instances from Sina Weibo (a popular Chinese microblog service), which will be soon released to public. This dataset provides rich collection of instances for the research on finding natural and relevant short responses to a given short text, and useful for both training and testing of conversation models. This dataset consists of both naturally formed conversations, manually labeled data, and a large repository of candidate responses. Our preliminary experiments demonstrate that the simple retrieval-based conversation model performs reasonably well when combined with the rich instances in our dataset.

2 0.83898026 130 emnlp-2013-Microblog Entity Linking by Leveraging Extra Posts

Author: Yuhang Guo ; Bing Qin ; Ting Liu ; Sheng Li

Abstract: Linking name mentions in microblog posts to a knowledge base, namely microblog entity linking, is useful for text mining tasks on microblog. Entity linking in long text has been well studied in previous works. However few work has focused on short text such as microblog post. Microblog posts are short and noisy. Previous method can extract few features from the post context. In this paper we propose to use extra posts for the microblog entity linking task. Experimental results show that our proposed method significantly improves the linking accuracy over traditional methods by 8.3% and 7.5% respectively.

3 0.65595096 204 emnlp-2013-Word Level Language Identification in Online Multilingual Communication

Author: Dong Nguyen ; A. Seza Dogruoz

Abstract: Multilingual speakers switch between languages in online and spoken communication. Analyses of large scale multilingual data require automatic language identification at the word level. For our experiments with multilingual online discussions, we first tag the language of individual words using language models and dictionaries. Secondly, we incorporate context to improve the performance. We achieve an accuracy of 98%. Besides word level accuracy, we use two new metrics to evaluate this task.

4 0.55852151 46 emnlp-2013-Classifying Message Board Posts with an Extracted Lexicon of Patient Attributes

Author: Ruihong Huang ; Ellen Riloff

Abstract: The goal of our research is to distinguish veterinary message board posts that describe a case involving a specific patient from posts that ask a general question. We create a text classifier that incorporates automatically generated attribute lists for veterinary patients to tackle this problem. Using a small amount of annotated data, we train an information extraction (IE) system to identify veterinary patient attributes. We then apply the IE system to a large collection of unannotated texts to produce a lexicon of veterinary patient attribute terms. Our experimental results show that using the learned attribute lists to encode patient information in the text classifier yields improved performance on this task.

5 0.35621971 122 emnlp-2013-Learning to Freestyle: Hip Hop Challenge-Response Induction via Transduction Rule Segmentation

Author: Dekai Wu ; Karteek Addanki ; Markus Saers ; Meriem Beloucif

Abstract: We present a novel model, Freestyle, that learns to improvise rhyming and fluent responses upon being challenged with a line of hip hop lyrics, by combining both bottomup token based rule induction and top-down rule segmentation strategies to learn a stochastic transduction grammar that simultaneously learns bothphrasing and rhyming associations. In this attack on the woefully under-explored natural language genre of music lyrics, we exploit a strictly unsupervised transduction grammar induction approach. Our task is particularly ambitious in that no use of any a priori linguistic or phonetic information is allowed, even though the domain of hip hop lyrics is particularly noisy and unstructured. We evaluate the performance of the learned model against a model learned only using the more conventional bottom-up token based rule induction, and demonstrate the superiority of our combined token based and rule segmentation induction method toward generating higher quality improvised responses, measured on fluency and rhyming criteria as judged by human evaluators. To highlight some of the inherent challenges in adapting other algorithms to this novel task, we also compare the quality ofthe responses generated by our model to those generated by an out-ofthe-box phrase based SMT system. We tackle the challenge of selecting appropriate training data for our task via a dedicated rhyme scheme detection module, which is also acquired via unsupervised learning and report improved quality of the generated responses. Finally, we report results with Maghrebi French hip hop lyrics indicating that our model performs surprisingly well with no special adaptation to other languages. 102

6 0.28860137 151 emnlp-2013-Paraphrasing 4 Microblog Normalization

7 0.28307015 7 emnlp-2013-A Hierarchical Entity-Based Approach to Structuralize User Generated Content in Social Media: A Case of Yahoo! Answers

8 0.27575603 23 emnlp-2013-Animacy Detection with Voting Models

9 0.25231814 173 emnlp-2013-Simulating Early-Termination Search for Verbose Spoken Queries

10 0.24808387 24 emnlp-2013-Application of Localized Similarity for Web Documents

11 0.23795246 69 emnlp-2013-Efficient Collective Entity Linking with Stacking

12 0.23130484 47 emnlp-2013-Collective Opinion Target Extraction in Chinese Microblogs

13 0.23086661 153 emnlp-2013-Predicting the Resolution of Referring Expressions from User Behavior

14 0.22698556 91 emnlp-2013-Grounding Strategic Conversation: Using Negotiation Dialogues to Predict Trades in a Win-Lose Game

15 0.20895243 131 emnlp-2013-Mining New Business Opportunities: Identifying Trend related Products by Leveraging Commercial Intents from Microblogs

16 0.20391445 183 emnlp-2013-The VerbCorner Project: Toward an Empirically-Based Semantic Decomposition of Verbs

17 0.19368921 197 emnlp-2013-Using Paraphrases and Lexical Semantics to Improve the Accuracy and the Robustness of Supervised Models in Situated Dialogue Systems

18 0.1914811 62 emnlp-2013-Detection of Product Comparisons - How Far Does an Out-of-the-Box Semantic Role Labeling System Take You?

19 0.18213712 161 emnlp-2013-Rule-Based Information Extraction is Dead! Long Live Rule-Based Information Extraction Systems!

20 0.1812754 6 emnlp-2013-A Generative Joint, Additive, Sequential Model of Topics and Speech Acts in Patient-Doctor Communication


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(3, 0.034), (18, 0.034), (22, 0.042), (30, 0.486), (50, 0.017), (51, 0.134), (66, 0.039), (71, 0.038), (75, 0.021), (77, 0.01), (90, 0.01), (96, 0.033)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.98957002 20 emnlp-2013-An Efficient Language Model Using Double-Array Structures

Author: Makoto Yasuhara ; Toru Tanaka ; Jun-ya Norimatsu ; Mikio Yamamoto

Abstract: Ngram language models tend to increase in size with inflating the corpus size, and consume considerable resources. In this paper, we propose an efficient method for implementing ngram models based on doublearray structures. First, we propose a method for representing backwards suffix trees using double-array structures and demonstrate its efficiency. Next, we propose two optimization methods for improving the efficiency of data representation in the double-array structures. Embedding probabilities into unused spaces in double-array structures reduces the model size. Moreover, tuning the word IDs in the language model makes the model smaller and faster. We also show that our method can be used for building large language models using the division method. Lastly, we show that our method outperforms methods based on recent related works from the viewpoints of model size and query speed when both optimization methods are used.

2 0.98335701 92 emnlp-2013-Growing Multi-Domain Glossaries from a Few Seeds using Probabilistic Topic Models

Author: Stefano Faralli ; Roberto Navigli

Abstract: In this paper we present a minimallysupervised approach to the multi-domain acquisition ofwide-coverage glossaries. We start from a small number of hypernymy relation seeds and bootstrap glossaries from the Web for dozens of domains using Probabilistic Topic Models. Our experiments show that we are able to extract high-precision glossaries comprising thousands of terms and definitions.

3 0.97746497 176 emnlp-2013-Structured Penalties for Log-Linear Language Models

Author: Anil Kumar Nelakanti ; Cedric Archambeau ; Julien Mairal ; Francis Bach ; Guillaume Bouchard

Abstract: Language models can be formalized as loglinear regression models where the input features represent previously observed contexts up to a certain length m. The complexity of existing algorithms to learn the parameters by maximum likelihood scale linearly in nd, where n is the length of the training corpus and d is the number of observed features. We present a model that grows logarithmically in d, making it possible to efficiently leverage longer contexts. We account for the sequential structure of natural language using treestructured penalized objectives to avoid overfitting and achieve better generalization.

4 0.97456348 127 emnlp-2013-Max-Margin Synchronous Grammar Induction for Machine Translation

Author: Xinyan Xiao ; Deyi Xiong

Abstract: Traditional synchronous grammar induction estimates parameters by maximizing likelihood, which only has a loose relation to translation quality. Alternatively, we propose a max-margin estimation approach to discriminatively inducing synchronous grammars for machine translation, which directly optimizes translation quality measured by BLEU. In the max-margin estimation of parameters, we only need to calculate Viterbi translations. This further facilitates the incorporation of various non-local features that are defined on the target side. We test the effectiveness of our max-margin estimation framework on a competitive hierarchical phrase-based system. Experiments show that our max-margin method significantly outperforms the traditional twostep pipeline for synchronous rule extraction by 1.3 BLEU points and is also better than previous max-likelihood estimation method.

same-paper 5 0.93490064 4 emnlp-2013-A Dataset for Research on Short-Text Conversations

Author: Hao Wang ; Zhengdong Lu ; Hang Li ; Enhong Chen

Abstract: Natural language conversation is widely regarded as a highly difficult problem, which is usually attacked with either rule-based or learning-based models. In this paper we propose a retrieval-based automatic response model for short-text conversation, to exploit the vast amount of short conversation instances available on social media. For this purpose we introduce a dataset of short-text conversation based on the real-world instances from Sina Weibo (a popular Chinese microblog service), which will be soon released to public. This dataset provides rich collection of instances for the research on finding natural and relevant short responses to a given short text, and useful for both training and testing of conversation models. This dataset consists of both naturally formed conversations, manually labeled data, and a large repository of candidate responses. Our preliminary experiments demonstrate that the simple retrieval-based conversation model performs reasonably well when combined with the rich instances in our dataset.

6 0.92448974 163 emnlp-2013-Sarcasm as Contrast between a Positive Sentiment and Negative Situation

7 0.76881421 113 emnlp-2013-Joint Language and Translation Modeling with Recurrent Neural Networks

8 0.73792976 3 emnlp-2013-A Corpus Level MIRA Tuning Strategy for Machine Translation

9 0.72669137 172 emnlp-2013-Simple Customization of Recursive Neural Networks for Semantic Relation Classification

10 0.72453827 146 emnlp-2013-Optimal Incremental Parsing via Best-First Dynamic Programming

11 0.71950787 2 emnlp-2013-A Convex Alternative to IBM Model 2

12 0.71789926 156 emnlp-2013-Recurrent Continuous Translation Models

13 0.71523184 66 emnlp-2013-Dynamic Feature Selection for Dependency Parsing

14 0.71003169 105 emnlp-2013-Improving Web Search Ranking by Incorporating Structured Annotation of Queries

15 0.70975792 14 emnlp-2013-A Synchronous Context Free Grammar for Time Normalization

16 0.69742179 128 emnlp-2013-Max-Violation Perceptron and Forced Decoding for Scalable MT Training

17 0.69213575 52 emnlp-2013-Converting Continuous-Space Language Models into N-Gram Language Models for Statistical Machine Translation

18 0.69086641 40 emnlp-2013-Breaking Out of Local Optima with Count Transforms and Model Recombination: A Study in Grammar Induction

19 0.69054204 81 emnlp-2013-Exploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

20 0.68987 56 emnlp-2013-Deep Learning for Chinese Word Segmentation and POS Tagging