acl acl2013 acl2013-107 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Fangtao Li ; Yang Gao ; Shuchang Zhou ; Xiance Si ; Decheng Dai
Abstract: In Community question answering (QA) sites, malicious users may provide deceptive answers to promote their products or services. It is important to identify and filter out these deceptive answers. In this paper, we first solve this problem with the traditional supervised learning methods. Two kinds of features, including textual and contextual features, are investigated for this task. We further propose to exploit the user relationships to identify the deceptive answers, based on the hypothesis that similar users will have similar behaviors to post deceptive or authentic answers. To measure the user similarity, we propose a new user preference graph based on the answer preference expressed by users, such as “helpful” voting and “best answer” selection. The user preference graph is incorporated into traditional supervised learning framework with the graph regularization technique. The experiment results demonstrate that the user preference graph can indeed help improve the performance of deceptive answer prediction.
Reference: text
sentIndex sentText sentNum sentScore
1 com Abstract In Community question answering (QA) sites, malicious users may provide deceptive answers to promote their products or services. [sent-4, score-1.349]
2 It is important to identify and filter out these deceptive answers. [sent-5, score-0.671]
3 We further propose to exploit the user relationships to identify the deceptive answers, based on the hypothesis that similar users will have similar behaviors to post deceptive or authentic answers. [sent-8, score-1.99]
4 To measure the user similarity, we propose a new user preference graph based on the answer preference expressed by users, such as “helpful” voting and “best answer” selection. [sent-9, score-1.381]
5 The user preference graph is incorporated into traditional supervised learning framework with the graph regularization technique. [sent-10, score-0.676]
6 The experiment results demonstrate that the user preference graph can indeed help improve the performance of deceptive answer prediction. [sent-11, score-1.591]
7 As the answers can guide the user’s behavior, some malicious users are motivated to give deceptive answers to promote their products or services. [sent-29, score-1.53]
8 For example, if someone asks for recommendations about restaurants in the Community QA site, the malicious user may post a deceptive answer to promote the target restaurant. [sent-30, score-1.567]
9 Indeed, because of lucrative financial rewards, in several Community QA sites, some business owners provide incentives for users to post deceptive answers for product promotion. [sent-31, score-1.118]
10 There are at least two major problems that the deceptive answers cause. [sent-32, score-0.921]
11 On the user side, the deceptive answers are misleading to users. [sent-33, score-1.173]
12 If the users rely on the deceptive answers, they will make the wrong decisions. [sent-34, score-0.81]
13 On the Community QA side, the deceptive answers will hurt the health of the Community QA sites. [sent-36, score-0.921]
14 A Community QA site without control of deceptive answers could only benefit spammers but could not help askers at all. [sent-37, score-1.03]
15 Therefore, it is a fundamental task to predict and filter out the deceptive answers. [sent-39, score-0.693]
16 In this paper, we propose to predict deceptive 1723 Proce dingsS o f ita h,e B 5u1lgsta Arinan,u Aaulg Musete 4ti-n9g 2 o0f1 t3h. [sent-40, score-0.693]
17 In the first step, we consider the deceptive answer prediction as a general binary-classification task. [sent-43, score-1.101]
18 We further investigate the user relationship for deceptive answer prediction. [sent-45, score-1.353]
19 To measure the user relationship, we propose a new user preference graph, which is constructed based on the answer evaluation expressed by users, such as “helpful” voting and “best answer” selection. [sent-49, score-1.096]
20 The user preference graph is incorporated into traditional supervised learning framework with graph regularization, which can make answers, from users with same preference, tend to have the same category (deceptive or authentic). [sent-50, score-0.799]
21 The experiment results demonstrate that the user preference graph can further help improve the performance for deceptive answer prediction. [sent-51, score-1.591]
22 Various studies, including retrieving the accumulated question-answer pairs to find the related answer for a new question, finding the expert in a specific domain, summarizing single or multiple answers to provide a concise result, are conducted in the Community QA sites (Jeon et al. [sent-53, score-0.761]
23 However, an important issue which has been neglected so far is the detection of deceptive answers. [sent-59, score-0.671]
24 If the acquired question-answer corpus contains many deceptive answers, it would be meaningless to perform further knowledge mining tasks. [sent-60, score-0.671]
25 Therefore, as the first step, we need to predict and filter out the deceptive answers. [sent-61, score-0.693]
26 , 2010) is most related to the deceptive answer prediction task. [sent-65, score-1.101]
27 While the deceptive answer prediction aims to predict if the main purpose of the provided answer is only to answer the specific question, or includes the user’s self-interest to promote something. [sent-68, score-1.96]
28 However, the deceptive answer may be selected as high-quality answer by the spammer, or because the general users are mislead. [sent-73, score-1.576]
29 Our experiments also show that answer quality prediction is much dif- ferent from deceptive answer prediction. [sent-75, score-1.504]
30 Previous QA studies also analyze the user graph to investigate the user relationship (Jurczyk and Agichtein, 2007; Liu et al. [sent-76, score-0.659]
31 However, we don’t care which user is more knowledgeable, but are more likely to know if two users are both spammers or authentic users. [sent-80, score-0.623]
32 In this paper, we propose a novel user preference graph based on their preference towards the target answers. [sent-81, score-0.731]
33 We assume that the spammers may collaboratively promote the target deceptive answers, while the authentic users may generally promote the authentic answers and demote the deceptive answers. [sent-82, score-2.313]
34 The user preference graph is constructed based on their answer evaluation, such as “helpful” voting or “best answer” selection. [sent-83, score-0.952]
35 3 Proposed Features We first view the deceptive answer prediction as a binary-classification problem. [sent-84, score-1.119]
36 1 Textual Features We first aim to predict the deceptive answer by analyzing the answer content. [sent-86, score-1.459]
37 The top ten unigrams related to deceptive answers are shown on Table 1. [sent-91, score-0.941]
38 We find that URL is good indicator for deceptive answers. [sent-96, score-0.671]
39 3 Phone Numbers and Emails There are a lot of contact information mentioned in the Community QA sites, such as phone numbers and email addresses, which are very likely to be deceptive, as good answers are found to be less likely to refer to phone numbers or email addresses than the malicious ones. [sent-108, score-0.543]
40 This can be explained as the deceptive answers may be well prepared to promote the target. [sent-114, score-1.011]
41 1 Question Answer Relevance The main characteristic of answer in Community QA site is that the answer is provided to answer the corresponding question. [sent-119, score-1.197]
42 To compute the authority score, we first construct a directed user graph with the user interactions in the community. [sent-146, score-0.695]
43 Specifically, on a Q&A; site, an edge from A to B is established when user B answered a question asked by A, which shows user B is more likely to be an expert than A. [sent-149, score-0.603]
44 , uN are the users in the collection, N is the total number of users, M(ui) is the set of users whose answers are provided by user ui, L(ui) is the number of users who answer ui’s questions, d is a damping factor, which is set as 0. [sent-154, score-1.302]
45 In addition, other users can label each answer as “helpful” or “not helpful”. [sent-181, score-0.522]
46 If the two answers are totally same, but the question is different, these answer is potentially as a deceptive answer. [sent-187, score-1.356]
47 4 Deceptive Answer Prediction with User Preference Graph Besides the textual and contextual features, we also investigate the user relationship for deceptive answer prediction. [sent-191, score-1.436]
48 We assume that similar users tend to perform similar behaviors (posting deceptive answers or posting authentic answers). [sent-192, score-1.344]
49 In this section, we first show how to compute the user similarity (user preference graph construction), and then introduce how to employ the user relationship for deceptive answer prediction. [sent-193, score-1.912]
50 Then, there will be several answers to answer this question from other users, for example, answerers u2 and u3. [sent-200, score-0.685]
51 After the answers are provides, users can also vote each answer as “helpful” or “not helpful” to show their evaluation towards the answer . [sent-201, score-1.155]
52 For example, users u4, u5 vote the first answer as “not helpful”, and user u6 votes the second answer as “helpful”. [sent-202, score-1.172]
53 Finally, the asker will select one answer as the best answer among all answers. [sent-203, score-0.846]
54 However, we don’t care which user is more knowledgeable, but are more interested in whether two users are both malicious users or authentic users. [sent-211, score-0.852]
55 Here, we propose a new user graph based on the user preference. [sent-212, score-0.612]
56 The preference is defined based on the answer evaluation. [sent-213, score-0.56]
57 If two users give same “helpful” or “not helpful” to the target answer, we view these two users have same user preference. [sent-216, score-0.565]
58 For example, user u4 and user u5 both give “not helpful” evaluation towards the first answer, we can say that they have same user preference. [sent-217, score-0.756]
59 Then ifuser u6 give “helpful” evaluation to the second answer, we will view user u6 has same preference as user u3, who is the author of the second answer. [sent-219, score-0.722]
60 ∼A futer extracting all∼ user preference relationships, we can construct the user preference graph as shown in Figure 1 (c). [sent-224, score-0.966]
61 If two users have the user preference relationship, there will be an edge between them. [sent-226, score-0.592]
62 In the Community QA sites, the spammers mainly promote their target products by promoting the deceptive answers. [sent-228, score-0.818]
63 The spammers can collaboratively make the deceptive answers look good, by voting them as high-quality answer, or selecting them as “best answer”. [sent-229, score-0.994]
64 Although there maybe noisy relationship, for example, an authentic user may be cheated, and selects the deceptive answer as “best answer”, we hope the overall user preference relation can perform better results than previous user interaction graph for this task. [sent-232, score-2.309]
65 Here, we employ the user preference graph to denote the user relationship. [sent-249, score-0.811]
66 e, the user pairs with same set of all answers posted by weight of edge between ui and uj in user preference graph. [sent-254, score-1.015]
67 In the above objective function, we impose a user graph regularization term β X x∈AuXiX wui,uj(f(x) − f(y))2 ∈Nu X,y∈Auj ui,Xuj to minimize the answer among users with same larization term smoothes structure, where adjacent ence tend to post answers authenticity difference preference. [sent-255, score-1.22]
68 Among these data, we further sam- ple a small data set, and ask three trained annotators to manually label the answer as deceptive or not. [sent-264, score-1.054]
69 If two or more people annotate the answer as deceptive, we will extract this answer as a deceptive answer. [sent-265, score-1.437]
70 In total, 12446 answers are marked as deceptive answers. [sent-266, score-0.921]
71 Finally, we get 24892 answers with deceptive and authentic labels as our dataset. [sent-268, score-1.112]
72 With our labeled data, we employ supervised methods to predict deceptive answers. [sent-269, score-0.715]
73 However, from Table 1, we can see that the word features can provide some weak signals for deceptive answer prediction, for example, words “recommend”, “address”, “professional” express some kinds of promotion intent. [sent-286, score-1.102]
74 The observation of length feature for deceptive answer prediction is very different from previous answer quality prediction. [sent-289, score-1.521]
75 For answer quality prediction, length is an effective feature, for example, long-length provides very strong signals for high-quality answer (Shah and Pomerantz, 2010; Song et al. [sent-290, score-0.786]
76 However, for deceptive answer prediction, we find that the long answers are more potential to be deceptive. [sent-292, score-1.304]
77 This is because most of deceptive answers are well prepared for product promotion. [sent-293, score-0.956]
78 The malicious users may copy the prepared deceptive answers or just simply edit the target name to answer different questions. [sent-298, score-1.61]
79 Questionanswer relevance and robot are the second most useful single features for deceptive answer prediction. [sent-299, score-1.158]
80 If the answer is not relevant to the corresponding question, this answer is more likely to be deceptive. [sent-302, score-0.766]
81 Robot is one of main sources for deceptive answers. [sent-303, score-0.671]
82 It automatically post the deceptive answers to target questions. [sent-304, score-0.98]
83 The user profile feature also can contribute a lot to deceptive answer prediction. [sent-307, score-1.346]
84 Among the user profile features, the user level in the Com- munity QA site is a good indicator. [sent-308, score-0.575]
85 The other two contextual features, including user authority and answer evaluation, provide limited improvement. [sent-309, score-0.761]
86 We find the following reasons: First, some malicious users post answers to various questions for product promotion, but don’t ask any question. [sent-310, score-0.599]
87 Second, the “best answer” is not a good signal for deceptive answer prediction. [sent-325, score-1.054]
88 This may be selected by malicious users, or the authentic asker was misled, and chose the deceptive answer as “best answer”. [sent-326, score-1.456]
89 This also demonstrates that the deceptive answer prediction is very different from the answer quality prediction. [sent-327, score-1.504]
90 When using the user graph as feature, we compute the authority score for each user with PageRank as shown in Equation 1. [sent-338, score-0.695]
91 From the table, we can see that when incorporating user preference graph as a feature, it can’t achieve a better result than the interaction graph. [sent-341, score-0.56]
92 The higher authority score may boosted by other spammer, and can’t be a good indica- tor to distinguish deceptive and authentic answers. [sent-343, score-0.945]
93 We can see that when β ranges from 10−4 ∼ 10−2, the deceptive answer prediction can ach∼iev 1e0 best results. [sent-348, score-1.101]
94 6 Conclusions and Future Work In this paper, we discuss the deceptive answer prediction task in Community QA sites. [sent-349, score-1.101]
95 With the manually labeled data set, we first predict the deceptive answers with traditional classification method. [sent-350, score-0.943]
96 We also introduce a new user preference graph, constructed based on the user evaluations towards the target answer, such as “helpful” voting and “best selection. [sent-352, score-0.73]
97 A graph regularization method is proposed to incorporate the user preference graph for deceptive answer prediction. [sent-353, score-1.747]
98 The experiment results also show that the method with user preference graph can achieve more accurate results for deceptive answer prediction. [sent-355, score-1.591]
99 answer” In the future work, it is interesting to incorporate more features into deceptive answer prediction. [sent-356, score-1.096]
100 It is also important to predict the deceptive question threads, which are posted and answered both by malicious users for product promotion. [sent-357, score-1.031]
wordName wordTfidf (topN-words)
[('deceptive', 0.671), ('answer', 0.383), ('user', 0.252), ('answers', 0.25), ('authentic', 0.191), ('preference', 0.177), ('qa', 0.15), ('users', 0.139), ('malicious', 0.131), ('community', 0.111), ('graph', 0.108), ('sites', 0.084), ('authority', 0.083), ('asker', 0.08), ('promote', 0.071), ('helpful', 0.056), ('robot', 0.056), ('posting', 0.054), ('question', 0.052), ('site', 0.048), ('phone', 0.048), ('prediction', 0.047), ('relationship', 0.047), ('jurczyk', 0.045), ('ui', 0.044), ('song', 0.044), ('contextual', 0.043), ('post', 0.042), ('spammers', 0.041), ('knowledgeable', 0.04), ('textual', 0.04), ('url', 0.035), ('confucius', 0.034), ('jeon', 0.033), ('email', 0.033), ('voting', 0.032), ('agichtein', 0.032), ('pagerank', 0.032), ('regularization', 0.031), ('duplication', 0.03), ('ishikawa', 0.028), ('adamic', 0.028), ('features', 0.025), ('behaviors', 0.024), ('edge', 0.024), ('profile', 0.023), ('author', 0.023), ('promotion', 0.023), ('interaction', 0.023), ('answerer', 0.023), ('auj', 0.023), ('auxix', 0.023), ('bleuscore', 0.023), ('cheated', 0.023), ('figueroa', 0.023), ('xiance', 0.023), ('expert', 0.023), ('relevance', 0.023), ('predict', 0.022), ('employ', 0.022), ('regularizer', 0.022), ('shah', 0.022), ('si', 0.021), ('ny', 0.021), ('accumulated', 0.021), ('questions', 0.021), ('quality', 0.02), ('unigrams', 0.02), ('xi', 0.02), ('askers', 0.02), ('bian', 0.02), ('pomerantz', 0.02), ('yunbo', 0.02), ('threads', 0.02), ('square', 0.02), ('dimension', 0.02), ('nu', 0.019), ('expertise', 0.019), ('prepared', 0.019), ('cao', 0.019), ('sakai', 0.019), ('sps', 0.019), ('urls', 0.018), ('view', 0.018), ('products', 0.018), ('spammer', 0.017), ('target', 0.017), ('incorporate', 0.017), ('besides', 0.017), ('feature', 0.017), ('answering', 0.017), ('fangtao', 0.017), ('weight', 0.016), ('product', 0.016), ('liu', 0.016), ('mountain', 0.016), ('tend', 0.015), ('votes', 0.015), ('millions', 0.015), ('interval', 0.015)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000005 107 acl-2013-Deceptive Answer Prediction with User Preference Graph
Author: Fangtao Li ; Yang Gao ; Shuchang Zhou ; Xiance Si ; Decheng Dai
Abstract: In Community question answering (QA) sites, malicious users may provide deceptive answers to promote their products or services. It is important to identify and filter out these deceptive answers. In this paper, we first solve this problem with the traditional supervised learning methods. Two kinds of features, including textual and contextual features, are investigated for this task. We further propose to exploit the user relationships to identify the deceptive answers, based on the hypothesis that similar users will have similar behaviors to post deceptive or authentic answers. To measure the user similarity, we propose a new user preference graph based on the answer preference expressed by users, such as “helpful” voting and “best answer” selection. The user preference graph is incorporated into traditional supervised learning framework with the graph regularization technique. The experiment results demonstrate that the user preference graph can indeed help improve the performance of deceptive answer prediction.
2 0.3897751 350 acl-2013-TopicSpam: a Topic-Model based approach for spam detection
Author: Jiwei Li ; Claire Cardie ; Sujian Li
Abstract: Product reviews are now widely used by individuals and organizations for decision making (Litvin et al., 2008; Jansen, 2010). And because of the profits at stake, people have been known to try to game the system by writing fake reviews to promote target products. As a result, the task of deceptive review detection has been gaining increasing attention. In this paper, we propose a generative LDA-based topic modeling approach for fake review detection. Our model can aptly detect the subtle dif- ferences between deceptive reviews and truthful ones and achieves about 95% accuracy on review spam datasets, outperforming existing baselines by a large margin.
3 0.24493214 60 acl-2013-Automatic Coupling of Answer Extraction and Information Retrieval
Author: Xuchen Yao ; Benjamin Van Durme ; Peter Clark
Abstract: Information Retrieval (IR) and Answer Extraction are often designed as isolated or loosely connected components in Question Answering (QA), with repeated overengineering on IR, and not necessarily performance gain for QA. We propose to tightly integrate them by coupling automatically learned features for answer extraction to a shallow-structured IR model. Our method is very quick to implement, and significantly improves IR for QA (measured in Mean Average Precision and Mean Reciprocal Rank) by 10%-20% against an uncoupled retrieval baseline in both document and passage retrieval, which further leads to a downstream 20% improvement in QA F1.
4 0.23627102 241 acl-2013-Minimum Bayes Risk based Answer Re-ranking for Question Answering
Author: Nan Duan
Abstract: This paper presents two minimum Bayes risk (MBR) based Answer Re-ranking (MBRAR) approaches for the question answering (QA) task. The first approach re-ranks single QA system’s outputs by using a traditional MBR model, by measuring correlations between answer candidates; while the second approach reranks the combined outputs of multiple QA systems with heterogenous answer extraction components by using a mixture model-based MBR model. Evaluations are performed on factoid questions selected from two different domains: Jeopardy! and Web, and significant improvements are achieved on all data sets.
5 0.19258192 291 acl-2013-Question Answering Using Enhanced Lexical Semantic Models
Author: Wen-tau Yih ; Ming-Wei Chang ; Christopher Meek ; Andrzej Pastusiak
Abstract: In this paper, we study the answer sentence selection problem for question answering. Unlike previous work, which primarily leverages syntactic analysis through dependency tree matching, we focus on improving the performance using models of lexical semantic resources. Experiments show that our systems can be consistently and significantly improved with rich lexical semantic information, regardless of the choice of learning algorithms. When evaluated on a benchmark dataset, the MAP and MRR scores are increased by 8 to 10 points, compared to one of our baseline systems using only surface-form matching. Moreover, our best system also outperforms pervious work that makes use of the dependency tree structure by a wide margin.
6 0.16150029 63 acl-2013-Automatic detection of deception in child-produced speech using syntactic complexity features
7 0.14627504 254 acl-2013-Multimodal DBN for Predicting High-Quality Answers in cQA portals
8 0.12499109 266 acl-2013-PAL: A Chatterbot System for Answering Domain-specific Questions
10 0.098263502 218 acl-2013-Latent Semantic Tensor Indexing for Community-based Question Answering
11 0.095613658 141 acl-2013-Evaluating a City Exploration Dialogue System with Integrated Question-Answering and Pedestrian Navigation
12 0.095340773 292 acl-2013-Question Classification Transfer
13 0.085329995 272 acl-2013-Paraphrase-Driven Learning for Open Question Answering
14 0.084472746 387 acl-2013-Why-Question Answering using Intra- and Inter-Sentential Causal Relations
15 0.080978006 250 acl-2013-Models of Translation Competitions
16 0.080938771 169 acl-2013-Generating Synthetic Comparable Questions for News Articles
17 0.071833923 114 acl-2013-Detecting Chronic Critics Based on Sentiment Polarity and Userâ•Žs Behavior in Social Media
18 0.070418492 20 acl-2013-A Stacking-based Approach to Twitter User Geolocation Prediction
19 0.06971439 121 acl-2013-Discovering User Interactions in Ideological Discussions
20 0.069554947 290 acl-2013-Question Analysis for Polish Question Answering
topicId topicWeight
[(0, 0.138), (1, 0.097), (2, 0.021), (3, -0.069), (4, 0.089), (5, 0.047), (6, 0.06), (7, -0.347), (8, 0.119), (9, 0.013), (10, 0.048), (11, -0.004), (12, 0.019), (13, -0.032), (14, 0.039), (15, -0.03), (16, -0.035), (17, 0.014), (18, 0.111), (19, 0.043), (20, 0.063), (21, -0.066), (22, 0.088), (23, -0.194), (24, -0.037), (25, -0.055), (26, -0.04), (27, -0.043), (28, 0.063), (29, -0.108), (30, 0.012), (31, -0.097), (32, 0.043), (33, 0.103), (34, 0.159), (35, 0.146), (36, -0.157), (37, 0.056), (38, -0.079), (39, -0.017), (40, 0.087), (41, -0.094), (42, -0.138), (43, 0.169), (44, 0.151), (45, -0.128), (46, -0.175), (47, -0.043), (48, -0.05), (49, -0.039)]
simIndex simValue paperId paperTitle
same-paper 1 0.95873481 107 acl-2013-Deceptive Answer Prediction with User Preference Graph
Author: Fangtao Li ; Yang Gao ; Shuchang Zhou ; Xiance Si ; Decheng Dai
Abstract: In Community question answering (QA) sites, malicious users may provide deceptive answers to promote their products or services. It is important to identify and filter out these deceptive answers. In this paper, we first solve this problem with the traditional supervised learning methods. Two kinds of features, including textual and contextual features, are investigated for this task. We further propose to exploit the user relationships to identify the deceptive answers, based on the hypothesis that similar users will have similar behaviors to post deceptive or authentic answers. To measure the user similarity, we propose a new user preference graph based on the answer preference expressed by users, such as “helpful” voting and “best answer” selection. The user preference graph is incorporated into traditional supervised learning framework with the graph regularization technique. The experiment results demonstrate that the user preference graph can indeed help improve the performance of deceptive answer prediction.
2 0.73893213 241 acl-2013-Minimum Bayes Risk based Answer Re-ranking for Question Answering
Author: Nan Duan
Abstract: This paper presents two minimum Bayes risk (MBR) based Answer Re-ranking (MBRAR) approaches for the question answering (QA) task. The first approach re-ranks single QA system’s outputs by using a traditional MBR model, by measuring correlations between answer candidates; while the second approach reranks the combined outputs of multiple QA systems with heterogenous answer extraction components by using a mixture model-based MBR model. Evaluations are performed on factoid questions selected from two different domains: Jeopardy! and Web, and significant improvements are achieved on all data sets.
3 0.68597776 350 acl-2013-TopicSpam: a Topic-Model based approach for spam detection
Author: Jiwei Li ; Claire Cardie ; Sujian Li
Abstract: Product reviews are now widely used by individuals and organizations for decision making (Litvin et al., 2008; Jansen, 2010). And because of the profits at stake, people have been known to try to game the system by writing fake reviews to promote target products. As a result, the task of deceptive review detection has been gaining increasing attention. In this paper, we propose a generative LDA-based topic modeling approach for fake review detection. Our model can aptly detect the subtle dif- ferences between deceptive reviews and truthful ones and achieves about 95% accuracy on review spam datasets, outperforming existing baselines by a large margin.
4 0.62147063 266 acl-2013-PAL: A Chatterbot System for Answering Domain-specific Questions
Author: Yuanchao Liu ; Ming Liu ; Xiaolong Wang ; Limin Wang ; Jingjing Li
Abstract: In this paper, we propose PAL, a prototype chatterbot for answering non-obstructive psychological domain-specific questions. This system focuses on providing primary suggestions or helping people relieve pressure by extracting knowledge from online forums, based on which the chatterbot system is constructed. The strategies used by PAL, including semantic-extension-based question matching, solution management with personal information consideration, and XML-based knowledge pattern construction, are described and discussed. We also conduct a primary test for the feasibility of our system.
5 0.60264659 60 acl-2013-Automatic Coupling of Answer Extraction and Information Retrieval
Author: Xuchen Yao ; Benjamin Van Durme ; Peter Clark
Abstract: Information Retrieval (IR) and Answer Extraction are often designed as isolated or loosely connected components in Question Answering (QA), with repeated overengineering on IR, and not necessarily performance gain for QA. We propose to tightly integrate them by coupling automatically learned features for answer extraction to a shallow-structured IR model. Our method is very quick to implement, and significantly improves IR for QA (measured in Mean Average Precision and Mean Reciprocal Rank) by 10%-20% against an uncoupled retrieval baseline in both document and passage retrieval, which further leads to a downstream 20% improvement in QA F1.
7 0.52962536 254 acl-2013-Multimodal DBN for Predicting High-Quality Answers in cQA portals
8 0.49947774 63 acl-2013-Automatic detection of deception in child-produced speech using syntactic complexity features
9 0.44046569 292 acl-2013-Question Classification Transfer
10 0.43789288 291 acl-2013-Question Answering Using Enhanced Lexical Semantic Models
11 0.43770239 218 acl-2013-Latent Semantic Tensor Indexing for Community-based Question Answering
12 0.36882457 81 acl-2013-Co-Regression for Cross-Language Review Rating Prediction
14 0.35515833 33 acl-2013-A user-centric model of voting intention from Social Media
15 0.35439217 239 acl-2013-Meet EDGAR, a tutoring agent at MONSERRATE
16 0.34767443 20 acl-2013-A Stacking-based Approach to Twitter User Geolocation Prediction
17 0.32201049 290 acl-2013-Question Analysis for Polish Question Answering
18 0.28941718 250 acl-2013-Models of Translation Competitions
19 0.28151304 95 acl-2013-Crawling microblogging services to gather language-classified URLs. Workflow and case study
20 0.27661639 158 acl-2013-Feature-Based Selection of Dependency Paths in Ad Hoc Information Retrieval
topicId topicWeight
[(0, 0.041), (4, 0.01), (6, 0.037), (11, 0.047), (24, 0.065), (26, 0.044), (28, 0.296), (35, 0.077), (42, 0.028), (48, 0.036), (70, 0.073), (88, 0.036), (90, 0.038), (95, 0.047)]
simIndex simValue paperId paperTitle
1 0.95449108 349 acl-2013-The mathematics of language learning
Author: Andras Kornai ; Gerald Penn ; James Rogers ; Anssi Yli-Jyra
Abstract: unkown-abstract
same-paper 2 0.77813655 107 acl-2013-Deceptive Answer Prediction with User Preference Graph
Author: Fangtao Li ; Yang Gao ; Shuchang Zhou ; Xiance Si ; Decheng Dai
Abstract: In Community question answering (QA) sites, malicious users may provide deceptive answers to promote their products or services. It is important to identify and filter out these deceptive answers. In this paper, we first solve this problem with the traditional supervised learning methods. Two kinds of features, including textual and contextual features, are investigated for this task. We further propose to exploit the user relationships to identify the deceptive answers, based on the hypothesis that similar users will have similar behaviors to post deceptive or authentic answers. To measure the user similarity, we propose a new user preference graph based on the answer preference expressed by users, such as “helpful” voting and “best answer” selection. The user preference graph is incorporated into traditional supervised learning framework with the graph regularization technique. The experiment results demonstrate that the user preference graph can indeed help improve the performance of deceptive answer prediction.
3 0.77704251 124 acl-2013-Discriminative state tracking for spoken dialog systems
Author: Angeliki Metallinou ; Dan Bohus ; Jason Williams
Abstract: In spoken dialog systems, statistical state tracking aims to improve robustness to speech recognition errors by tracking a posterior distribution over hidden dialog states. Current approaches based on generative or discriminative models have different but important shortcomings that limit their accuracy. In this paper we discuss these limitations and introduce a new approach for discriminative state tracking that overcomes them by leveraging the problem structure. An offline evaluation with dialog data collected from real users shows improvements in both state tracking accuracy and the quality of the posterior probabilities. Features that encode speech recognition error patterns are particularly helpful, and training requires rel- atively few dialogs.
4 0.73786235 363 acl-2013-Two-Neighbor Orientation Model with Cross-Boundary Global Contexts
Author: Hendra Setiawan ; Bowen Zhou ; Bing Xiang ; Libin Shen
Abstract: Long distance reordering remains one of the greatest challenges in statistical machine translation research as the key contextual information may well be beyond the confine of translation units. In this paper, we propose Two-Neighbor Orientation (TNO) model that jointly models the orientation decisions between anchors and two neighboring multi-unit chunks which may cross phrase or rule boundaries. We explicitly model the longest span of such chunks, referred to as Maximal Orientation Span, to serve as a global parameter that constrains underlying local decisions. We integrate our proposed model into a state-of-the-art string-to-dependency translation system and demonstrate the efficacy of our proposal in a large-scale Chinese-to-English translation task. On NIST MT08 set, our most advanced model brings around +2.0 BLEU and -1.0 TER improvement.
5 0.73342556 96 acl-2013-Creating Similarity: Lateral Thinking for Vertical Similarity Judgments
Author: Tony Veale ; Guofu Li
Abstract: Just as observing is more than just seeing, comparing is far more than mere matching. It takes understanding, and even inventiveness, to discern a useful basis for judging two ideas as similar in a particular context, especially when our perspective is shaped by an act of linguistic creativity such as metaphor, simile or analogy. Structured resources such as WordNet offer a convenient hierarchical means for converging on a common ground for comparison, but offer little support for the divergent thinking that is needed to creatively view one concept as another. We describe such a means here, by showing how the web can be used to harvest many divergent views for many familiar ideas. These lateral views complement the vertical views of WordNet, and support a system for idea exploration called Thesaurus Rex. We show also how Thesaurus Rex supports a novel, generative similarity measure for WordNet. 1 Seeing is Believing (and Creating) Similarity is a cognitive phenomenon that is both complex and subjective, yet for practical reasons it is often modeled as if it were simple and objective. This makes sense for the many situations where we want to align our similarity judgments with those of others, and thus focus on the same conventional properties that others are also likely to focus upon. This reliance on the consensus viewpoint explains why WordNet (Fellbaum, 1998) has proven so useful as a basis for computational measures of lexico-semantic similarity Guofu Li School of Computer Science and Informatics, University College Dublin, Belfield, Dublin D2, Ireland. l .guo fu . l gmai l i @ .com (e.g. see Pederson et al. 2004, Budanitsky & Hirst, 2006; Seco et al. 2006). These measures reduce the similarity of two lexical concepts to a single number, by viewing similarity as an objective estimate of the overlap in their salient qualities. This convenient perspective is poorly suited to creative or insightful comparisons, but it is sufficient for the many mundane comparisons we often perform in daily life, such as when we organize books or look for items in a supermarket. So if we do not know in which aisle to locate a given item (such as oatmeal), we may tacitly know how to locate a similar product (such as cornflakes) and orient ourselves accordingly. Yet there are occasions when the recognition of similarities spurs the creation of similarities, when the act of comparison spurs us to invent new ways of looking at an idea. By placing pop tarts in the breakfast aisle, food manufacturers encourage us to view them as a breakfast food that is not dissimilar to oatmeal or cornflakes. When ex-PM Tony Blair published his memoirs, a mischievous activist encouraged others to move his book from Biography to Fiction in bookshops, in the hope that buyers would see it in a new light. Whenever we use a novel metaphor to convey a non-obvious viewpoint on a topic, such as “cigarettes are time bombs”, the comparison may spur us to insight, to see aspects of the topic that make it more similar to the vehicle (see Ortony, 1979; Veale & Hao, 2007). In formal terms, assume agent A has an insight about concept X, and uses the metaphor X is a Y to also provoke this insight in agent B. To arrive at this insight for itself, B must intuit what X and Y have in common. But this commonality is surely more than a standard categorization of X, or else it would not count as an insight about X. To understand the metaphor, B must place X 660 Proce dingSsof oifa, th Beu 5l1gsarti Aan,An u aglu Mste 4e-ti9n2g 0 o1f3 t.he ?c A2s0s1o3ci Aatsiosonc fioartio Cno fmorpu Ctoamtiopnuatalt Lioin gauli Lsitnicgsu,i psatgices 6 0–670, in a new category, so that X can be seen as more similar to Y. Metaphors shape the way we per- ceive the world by re-shaping the way we make similarity judgments. So if we want to imbue computers with the ability to make and to understand creative metaphors, we must first give them the ability to look beyond the narrow viewpoints of conventional resources. Any measure that models similarity as an objective function of a conventional worldview employs a convergent thought process. Using WordNet, for instance, a similarity measure can vertically converge on a common superordinate category of both inputs, and generate a single numeric result based on their distance to, and the information content of, this common generalization. So to find the most conventional ways of seeing a lexical concept, one simply ascends a narrowing concept hierarchy, using a process de Bono (1970) calls vertical thinking. To find novel, non-obvious and useful ways of looking at a lexical concept, one must use what Guilford (1967) calls divergent thinking and what de Bono calls lateral thinking. These processes cut across familiar category boundaries, to simultaneously place a concept in many different categories so that we can see it in many different ways. de Bono argues that vertical thinking is selective while lateral thinking is generative. Whereas vertical thinking concerns itself with the “right” way or a single “best” way of looking at things, lateral thinking focuses on producing alternatives to the status quo. To be as useful for creative tasks as they are for conventional tasks, we need to re-imagine our computational similarity measures as generative rather than selective, expansive rather than reductive, divergent as well as convergent and lateral as well as vertical. Though WordNet is ideally structured to support vertical, convergent reasoning, its comprehensive nature means it can also be used as a solid foundation for building a more lateral and divergent model of similarity. Here we will use the web as a source of diverse perspectives on familiar ideas, to complement the conventional and often narrow views codified by WordNet. Section 2 provides a brief overview of past work in the area of similarity measurement, before section 3 describes a simple bootstrapping loop for acquiring richly diverse perspectives from the web for a wide variety of familiar ideas. These perspectives are used to enhance a Word- Net-based measure of lexico-semantic similarity in section 4, by broadening the range of informative viewpoints the measure can select from. Similarity is thus modeled as a process that is both generative and selective. This lateral-andvertical approach is evaluated in section 5, on the Miller & Charles (1991) data-set. A web app for the lateral exploration of diverse viewpoints, named Thesaurus Rex, is also presented, before closing remarks are offered in section 6. 2 Related Work and Ideas WordNet’s taxonomic organization of nounsenses and verb-senses – in which very general categories are successively divided into increasingly informative sub-categories or instancelevel ideas – allows us to gauge the overlap in information content, and thus of meaning, of two lexical concepts. We need only identify the deepest point in the taxonomy at which this content starts to diverge. This point of divergence is often called the LCS, or least common subsumer, of two concepts (Pederson et al., 2004). Since sub-categories add new properties to those they inherit from their parents – Aristotle called these properties the differentia that stop a category system from trivially collapsing into itself – the depth of a lexical concept in a taxonomy is an intuitive proxy for its information content. Wu & Palmer (1994) use the depth of a lexical concept in the WordNet hierarchy as such a proxy, and thereby estimate the similarity of two lexical concepts as twice the depth of their LCS divided by the sum of their individual depths. Leacock and Chodorow (1998) instead use the length of the shortest path between two concepts as a proxy for the conceptual distance between them. To connect any two ideas in a hierarchical system, one must vertically ascend the hierarchy from one concept, change direction at a potential LCS, and then descend the hierarchy to reach the second concept. (Aristotle was also first to suggest this approach in his Poetics). Leacock and Chodorow normalize the length of this path by dividing its size (in nodes) by twice the depth of the deepest concept in the hierarchy; the latter is an upper bound on the distance between any two concepts in the hierarchy. Negating the log of this normalized length yields a corresponding similarity score. While the role of an LCS is merely implied in Leacock and Chodorow’s use of a shortest path, the LCS is pivotal nonetheless, and like that of Wu & Palmer, the approach uses an essentially vertical reasoning process to identify a single “best” generalization. Depth is a convenient proxy for information content, but more nuanced proxies can yield 661 more rounded similarity measures. Resnick (1995) draws on information theory to define the information content of a lexical concept as the negative log likelihood of its occurrence in a corpus, either explicitly (via a direct mention) or by presupposition (via a mention of any of its sub-categories or instances). Since the likelihood of a general category occurring in a corpus is higher than that of any of its sub-categories or instances, such categories are more predictable, and less informative, than rarer categories whose occurrences are less predictable and thus more informative. The negative log likelihood of the most informative LCS of two lexical concepts offers a reliable estimate of the amount of infor- mation shared by those concepts, and thus a good estimate of their similarity. Lin (1998) combines the intuitions behind Resnick’s metric and that of Wu and Palmer to estimate the similarity of two lexical concepts as an information ratio: twice the information content of their LCS divided by the sum of their individual information contents. Jiang and Conrath (1997) consider the converse notion of dissimilarity, noting that two lexical concepts are dissimilar to the extent that each contains information that is not shared by the other. So if the information content of their most informative LCS is a good measure of what they do share, then the sum of their individual information contents, minus twice the content of their most informative LCS, is a reliable estimate of their dissimilarity. Seco et al. (2006) presents a minor innovation, showing how Resnick’s notion of information content can be calculated without the use of an external corpus. Rather, when using Resnick’s metric (or that of Lin, or Jiang and Conrath) for measuring the similarity of lexical concepts in WordNet, one can use the category structure of WordNet itself to estimate infor- mation content. Typically, the more general a concept, the more descendants it will possess. Seco et al. thus estimate the information content of a lexical concept as the log of the sum of all its unique descendants (both direct and indirect), divided by the log of the total number of concepts in the entire hierarchy. Not only is this intrinsic view of information content convenient to use, without recourse to an external corpus, Seco et al. show that it offers a better estimate of information content than its extrinsic, corpus-based alternatives, as measured relative to average human similarity ratings for the 30 word-pairs in the Miller & Charles (1991) test set. A similarity measure can draw on other sources of information besides WordNet’s category structures. One might eke out additional information from WordNet’s textual glosses, as in Lesk (1986), or use category structures other than those offered by WordNet. Looking beyond WordNet, entries in the online encyclopedia Wikipedia are not only connected by a dense topology of lateral links, they are also organized by a rich hierarchy of overlapping categories. Strube and Ponzetto (2006) show how Wikipedia can support a measure of similarity (and relatedness) that better approximates human judgments than many WordNet-based measures. Nonetheless, WordNet can be a valuable component of a hybrid measure, and Agirre et al. (2009) use an SVM (support vector machine) to combine information from WordNet with information harvested from the web. Their best similarity measure achieves a remarkable 0.93 correlation with human judgments on the Miller & Charles word-pair set. Similarity is not always applied to pairs of concepts; it is sometimes analogically applied to pairs of pairs of concepts, as in proportional analogies of the form A is to B as C is to D (e.g., hacks are to writers as mercenaries are to soldiers, or chisels are to sculptors as scalpels are to surgeons). In such analogies, one is really assessing the similarity of the unstated relationship between each pair of concepts: thus, mercenaries are soldiers whose allegiance is paid for, much as hacks are writers with income-driven loyalties; sculptors use chisels to carve stone, while surgeons use scalpels to cut or carve flesh. Veale (2004) used WordNet to assess the similarity of A:B to C:D as a function of the combined similarity of A to C and of B to D. In contrast, Turney (2005) used the web to pursue a more divergent course, to represent the tacit relationships of A to B and of C to D as points in a highdimensional space. The dimensions of this space initially correspond to linking phrases on the web, before these dimensions are significantly reduced using singular value decomposition. In the infamous SAT test, an analogy A:B::C:D has four other pairs of concepts that serve as likely distractors (e.g. singer:songwriter for hack:writer) and the goal is to choose the most appropriate C:D pair for a given A:B pairing. Using variants of Wu and Palmer (1994) on the 374 SAT analogies of Turney (2005), Veale (2004) reports a success rate of 38–44% using only WordNet-based similarity. In contrast, Turney (2005) reports up to 55% success on the same analogies, partly because his approach aims 662 to match implicit relations rather than explicit concepts, and in part because it uses a divergent process to gather from the web as rich a perspec- tive as it can on these latent relationships. 2.1 Clever Comparisons Create Similarity Each of these approaches to similarity is a user of information, rather than a creator, and each fails to capture how a creative comparison (such as a metaphor) can spur a listener to view a topic from an atypical perspective. Camac & Glucksberg (1984) provide experimental evidence for the claim that “metaphors do not use preexisting associations to achieve their effects [… ] people use metaphors to create new relations between concepts.” They also offer a salutary reminder of an often overlooked fact: every comparison exploits information, but each is also a source of new information in its own right. Thus, “this cola is acid” reveals a different perspective on cola (e.g. as a corrosive substance or an irritating food) than “this acid is cola” highlights for acid (such as e.g., a familiar substance) Veale & Keane (1994) model the role of similarity in realizing the long-term perlocutionary effect of an informative comparison. For example, to compare surgeons to butchers is to encourage one to see all surgeons as more bloody, … crude or careless. The reverse comparison, of butchers to surgeons, encourages one to see butchers as more skilled and precise. Veale & Keane present a network model of memory, called Sapper, in which activation can spread between related concepts, thus allowing one concept to prime the properties of a neighbor. To interpret an analogy, Sapper lays down new activation-carrying bridges in memory between analogical counterparts, such as between surgeon & butcher, flesh & meat, and scalpel & cleaver. Comparisons can thus have lasting effects on how Sapper sees the world, changing the pattern of activation that arises when it primes a concept. Veale (2003) adopts a similarly dynamic view of similarity in WordNet, showing how an analogical comparison can result in the automatic addition of new categories and relations to WordNet itself. Veale considers the problem of finding an analogical mapping between different parts of WordNet’s noun-sense hierarchy, such as between instances of Greek god and Norse god, or between the letters of different alphabets, such as of Greek and Hebrew. But no structural similarity measure for WordNet exhibits enough discernment to e.g. assign a higher similarity to Zeus & Odin (each is the supreme deity of its pantheon) than to a pairing of Zeus and any other Norse god, just as no structural measure will assign a higher similarity to Alpha & Aleph or to Beta & Beth than to any random letter pairing. A fine-grained category hierarchy permits fine-grained similarity judgments, and though WordNet is useful, its sense hierarchies are not especially fine-grained. However, we can automatically make WordNet subtler and more discerning, by adding new fine-grained categories to unite lexical concepts whose similarity is not reflected by any existing categories. Veale (2003) shows how a property that is found in the glosses of two lexical concepts, of the same depth, can be combined with their LCS to yield a new fine-grained parent category, so e.g. “supreme” + deity = Supreme-deity (for Odin, Zeus, Jupiter, etc.) and “1 st” + letter = 1st-letter (for Alpha, Aleph, etc.) Selected aspects of the textual similarity of two WordNet glosses – the key to similarity in Lesk (1986) – can thus be reified into an explicitly categorical WordNet form. 3 Divergent (Re)Categorization To tap into a richer source of concept properties than WordNet’s glosses, we can use web ngrams. Consider these descriptions of a cowboy from the Google n-grams (Brants & Franz, 2006). The numbers to the right are Google frequency counts. a lonesome cowboy 432 a mounted cowboy 122 a grizzled cowboy 74 a swaggering cowboy 68 To find the stable properties that can underpin a meaningful fine-grained category for cowboy, we must seek out the properties that are so often presupposed to be salient of all cowboys that one can use them to anchor a simile, such as
6 0.69855273 112 acl-2013-Dependency Parser Adaptation with Subtrees from Auto-Parsed Target Domain Data
7 0.6052599 27 acl-2013-A Two Level Model for Context Sensitive Inference Rules
8 0.50533962 155 acl-2013-Fast and Accurate Shift-Reduce Constituent Parsing
9 0.50502688 169 acl-2013-Generating Synthetic Comparable Questions for News Articles
10 0.50061965 168 acl-2013-Generating Recommendation Dialogs by Extracting Information from User Reviews
12 0.49727291 250 acl-2013-Models of Translation Competitions
13 0.49487129 191 acl-2013-Improved Bayesian Logistic Supervised Topic Models with Data Augmentation
14 0.49423492 126 acl-2013-Diverse Keyword Extraction from Conversations
15 0.49406269 230 acl-2013-Lightly Supervised Learning of Procedural Dialog Systems
16 0.49146307 144 acl-2013-Explicit and Implicit Syntactic Features for Text Classification
17 0.48617244 367 acl-2013-Universal Conceptual Cognitive Annotation (UCCA)
18 0.48575586 185 acl-2013-Identifying Bad Semantic Neighbors for Improving Distributional Thesauri
19 0.47968504 16 acl-2013-A Novel Translation Framework Based on Rhetorical Structure Theory
20 0.47901136 17 acl-2013-A Random Walk Approach to Selectional Preferences Based on Preference Ranking and Propagation