acl acl2010 acl2010-174 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Baoxun Wang ; Xiaolong Wang ; Chengjie Sun ; Bingquan Liu ; Lin Sun
Abstract: Quantifying the semantic relevance between questions and their candidate answers is essential to answer detection in social media corpora. In this paper, a deep belief network is proposed to model the semantic relevance for question-answer pairs. Observing the textual similarity between the community-driven questionanswering (cQA) dataset and the forum dataset, we present a novel learning strategy to promote the performance of our method on the social community datasets without hand-annotating work. The experimental results show that our method outperforms the traditional approaches on both the cQA and the forum corpora.
Reference: text
sentIndex sentText sentNum sentScore
1 In this paper, a deep belief network is proposed to model the semantic relevance for question-answer pairs. [sent-2, score-0.521]
2 Observing the textual similarity between the community-driven questionanswering (cQA) dataset and the forum dataset, we present a novel learning strategy to promote the performance of our method on the social community datasets without hand-annotating work. [sent-3, score-0.649]
3 The experimental results show that our method outperforms the traditional approaches on both the cQA and the forum corpora. [sent-4, score-0.366]
4 Nevertheless, most of the QA researches mainly focus on locating the exact answer to a given factoid question in the related documents. [sent-6, score-0.303]
5 The most well known international evaluation on the factoid QA task is the Text REtrieval Conference (TREC)1 , and the annotated questions and answers released by TREC have become important resources for the researchers. [sent-7, score-0.409]
6 The cQA sites (or systems) provide platforms where users can either ask questions or deliver answers, and best answers are selected manually (e. [sent-16, score-0.429]
7 To make use of the QA pairs in cQA sites and online forums, one has to face the challenging problem of distinguishing the questions and their answers from the noise. [sent-22, score-0.534]
8 In this paper, a novel approach for modeling the semantic relevance for QA pairs in the social media sites is proposed. [sent-24, score-0.411]
9 As mentioned above, the user generated questions and their answers via social media are always short texts. [sent-27, score-0.605]
10 How to train a model so that it has good performance on both cQA and forum datasets? [sent-39, score-0.366]
11 So far, people have been doing QA researches on the cQA and the forum datasets separately (Ding et al. [sent-40, score-0.397]
12 To solve the first problem, we present a deep belief network (DBN) to model the semantic relevance between questions and their answers. [sent-46, score-0.652]
13 The network establishes the semantic relationship for QA pairs by minimizing the answer-to-question reconstructing error. [sent-47, score-0.282]
14 Instead of mining the structure based features from cQA pages and forum threads individually, we consider the textual similarity between the two kinds of data. [sent-50, score-0.549]
15 The semantic information learned from cQA corpus is helpful to detect answers in forums, which makes our model show good performance on social media corpora. [sent-51, score-0.521]
16 Thanks to the labels for the best answers existing in the threads, no manual work is needed in our strategy. [sent-52, score-0.253]
17 Section 3 introduces the deep belief network for answer detection. [sent-54, score-0.578]
18 Judging whether a candidate answer is semantically related to the question in the cQA page automatically is a challenging task. [sent-62, score-0.329]
19 A framework for predicting the quality of answers has been presented in (Jeon et al. [sent-63, score-0.253]
20 (2008), for we also aim to rank the candidate answers reasonably, but our ranking algorithm needs only word information, instead of the combination of different kinds of features. [sent-70, score-0.331]
21 Because people have considerable freedom to post on forums, there are a great number of irrelevant posts for answering questions, which makes it more difficult to detect answers in the forums. [sent-71, score-0.492]
22 (2008) have also presented outstanding research works on forum QA extraction. [sent-77, score-0.366]
23 (2008) detect question contexts and answers using the conditional random fields, and a ranking algorithm based on the authority of forum users is proposed by Cong et al. [sent-79, score-0.776]
24 The SMT based methods are effective on modeling the semantic relationship between questions and answers and expending users’ queries in answer retrieval (Riezler et al. [sent-90, score-0.635]
25 , authorship, acknowledgement, post position, etc), also called non-textual features, play an important role in answer extraction. [sent-97, score-0.259]
26 (2009) both propose the strategies to detect questions in the social media corpus, which is proved to be a non-trivial task. [sent-108, score-0.362]
27 A graph based algorithm is presented to answer opinion questions (Li et al. [sent-111, score-0.317]
28 3 The Deep Belief Network for QA pairs Due to the feature sparsity and the low word frequency of the social media corpus, it is difficult to model the semantic relevance between questions and answers using only co-occurrence features. [sent-114, score-0.75]
29 In this section, we propose a deep belief network for modeling the semantic relationship between questions and their answers. [sent-117, score-0.56]
30 The bottom layer represents a visible vector v and the top layer represents a latent feature h. [sent-124, score-0.331]
31 The ability of the RBM suggests us to build a deep belief network based on RBM so that the semantic relevance between questions and answers can be modeled. [sent-128, score-0.905]
32 2 Pretraining a Deep Belief Network In the social media corpora, the answers are always descriptive, containing one or several sen- tences. [sent-130, score-0.446]
33 Noticing that an answer has strong semantic association with the question and involves more information than the question, we propose to train a deep belief network by reconstructing the question using its answers. [sent-131, score-0.847]
34 This layer we design is a little different from the classical RBM’s, so that the bottom layer can generate the hidden features according to the visible answer vector and reconstruct the question vector using the hidden features. [sent-136, score-0.838]
35 In the bottom layer, the binary feature vectors based on the statistics of the word occurrence in the answers are used to compute the “hidden features” in the 1232 Figure 2: The Deep Belief Network for QA Pairs hidden units. [sent-138, score-0.392]
36 The model can reconstruct the questions using the hidden features. [sent-139, score-0.26]
37 Given the training set of answer vectors, the bottom layer generates the corresponding hidden features using Equation 1. [sent-142, score-0.45]
38 (< qihj >qData < qihj − >qRecon) (3) where < qihj >qData denotes the expectation of the frequency with which the word i in a ques- tion and the feature j are on together when the hidden features are driven by the question data. [sent-146, score-0.381]
39 < qihj >qRecon defines the corresponding expectation when the hidden features are driven by the reconstructed question data. [sent-147, score-0.255]
40 The classical RBM structure is taken to build the middle layer and the top layer of the network. [sent-150, score-0.258]
41 The training method for the higher two layer is similar to that of the bottom one, and we only have to make each RBM to reconstruct the input data using its hidden features. [sent-151, score-0.293]
42 3 Fine-tuning the Weights Notice that a greedy strategy is taken to train each layer individually during the pre-training procedure, it is necessary to fine-tune the weights of the entire network for optimal reconstruction. [sent-155, score-0.322]
43 To finetune the weights, the network is unrolled, taking the answers as the input data to generate the corre- sponding questions at the output units. [sent-156, score-0.537]
44 2 will show fine-tuning makes the network performs better for answer detection. [sent-159, score-0.339]
45 4 Best answer detection After pre-training and fine-tuning, a deep belief network for QA pairs is established. [sent-161, score-0.622]
46 To detect the best answer to a given question, we just have to send the vectors of the question and its candidate answers into the input units of the network and perform a level-by-level calculation to obtain the corresponding feature vectors. [sent-162, score-0.804]
47 Then we calculate the distance between the mapped question vector and each candidate answer vector. [sent-163, score-0.329]
48 4 Learning with Homogenous Data In this section, we propose our strategy to make our DBN model to detect answers in both cQA and forum datasets, while the existing studies focus on one single dataset. [sent-165, score-0.697]
49 1 Homogenous QA Corpora from Different Sources Our motivation of finding the homogenous question-answer corpora from different kind of social media is to guarantee the model’s performance and avoid hand-annotating work. [sent-167, score-0.336]
50 In this paper, we get the “solved question” pages in the computer technology domain from Baidu Zhidao as the cQA corpus, and the threads of 1233 Figure 3: Comparison of the post content lengths in the cQA and the forum datasets Forum4 ComputerFansClub as the online forum corpus. [sent-168, score-1.043]
51 As shown in Figure 3, we have compared the post content lengths of the cQA and the forum in our corpora. [sent-171, score-0.502]
52 For the comparison, 5,000 posts from the cQA corpus and 5,000 posts from the forum corpus are randomly selected. [sent-172, score-0.566]
53 The left panel shows the statistical result on the Baidu Zhidao data, and the right panel shows the one on the forum data. [sent-173, score-0.366]
54 The number ion the horizontal axis denotes the post contents whose lengths range from 10(i − 1) + 1to 10i bytes, and the vertical axis represents )th+e counts o bfy tthese, post contents. [sent-174, score-0.34]
55 Flr aoxmis Figure 3 we observe that the contents of most posts in both the cQA corpus and the forum corpus are short, with the lengths not exceeding 400 bytes. [sent-175, score-0.558]
56 Because the cQA corpus and the forum corpus used in this study have homogenous characteristics for answer detecting task, a simple strategy may be used to avoid the hand-annotating work. [sent-188, score-0.703]
57 Because the two corpora are similar, we can apply the deep belief network trained by the cQA corpus to detect answers on both the cQA data and the forum data. [sent-191, score-1.081]
58 2 Features The task of detecting answers in social media corpora suffers from the problem of feature sparsity seriously. [sent-193, score-0.478]
59 For example, in the answers to the causation questions, the words such as because and so are more likely to appear; and the words such as firstly, then, and should may suggest the answers to the manner questions. [sent-201, score-0.506]
60 5 Experiments To evaluate our question-answer semantic relevance computing method, we compare our approach with the popular methods on the answer detecting task. [sent-206, score-0.315]
61 1 Experiment Setup Architecture of the Network: To build the deep belief network, we use a 1500-1500-1000-600 architecture, which means the three layers ofthe network have individually 1,500×1,500, 1,500×1,000 awnodr 1,000×600 uidnuitasl. [sent-208, score-0.422]
62 During the pretraining stage, the bottom layer is greedily pretrained for 200 passes through the entire training set, and each ofthe rest two layers is greedily pretrained for 50 passes. [sent-210, score-0.37]
63 Correspondingly we obtain 90,000 threads from ComputerFansClub, which is an online forum on computer knowledge. [sent-214, score-0.51]
64 5 candidate answers on average, with one best answer among them. [sent-218, score-0.49]
65 To get another testing dataset, we randomly select 2,000 threads from the forum corpus. [sent-219, score-0.449]
66 For this training set, human work are necessary to label the best answers in the posts of the threads. [sent-220, score-0.353]
67 There are 7 posts included in each thread on average, among which one question and at least one answer exist. [sent-221, score-0.378]
68 Baseline: To show the performance of our method, three main popular relevance computing methods for ranking candidate answers are considered as our baselines. [sent-222, score-0.423]
69 Given a question q and its candidate answer a, their cosine similarity can be computed as follows: cos(q,a) =qPknP=1knw=12qkw×qk×qP waknk=1wa2k (4) where wqk and wak qstPand for the qwePight of the kth word in the question and the answer respectively. [sent-224, score-0.676]
70 This strategy is taken as a baseline method for computing the relevance between questions and answers. [sent-232, score-0.263]
71 Given a question q and its candidate answer a, we can construct unigram language model Mq and unigram language model Ma. [sent-234, score-0.329]
72 Table 1 lists the results achieved on the forum data using the baseline methods and ours. [sent-238, score-0.366]
73 The additional “Nearest Answer” stands for the method without any ranking strategies, which returns the nearest candidate answer from the question by position. [sent-239, score-0.418]
74 As shown in Table 1, our deep belief network based methods outperform the baseline methods as expected. [sent-241, score-0.392]
75 Although the training set we offer to the network comes from a different source (the cQA corpus), it still provide enough knowledge to the network to perform better than the baseline methods. [sent-243, score-0.306]
76 Basically, the low precision is ascribable to the forum corpus we have obtained. [sent-250, score-0.366]
77 As mentioned in Section 1, the contents of the forum posts are short, which leads to the sparsity of the features. [sent-251, score-0.522]
78 The baseline results indicate that the online forum is a complex environment with large amount of noise for answer detection. [sent-258, score-0.641]
79 The similar baseline results for forum answer ranking are also achieved by Hong and Davison (2009), which takes some non- textual features to improve the algorithm’s performance. [sent-260, score-0.646]
80 We also notice that, however, the baseline methods have obtained better results on forum corpus (Cong et al. [sent-261, score-0.366]
81 03% in MRR for answer detecting on forum data after fine-tuning, while some related works have reported the results with the precision over 90% (Cong et al. [sent-267, score-0.552]
82 There are mainly two reasons for this phenomena: Firstly, both of the previous works have adopt non-textual features based on the forum structure, such as authorship, position and quotes, etc. [sent-269, score-0.393]
83 For the experiments of this paper, large amount of noise is involved in the forum corpus and we have done nothing extra to filter it. [sent-272, score-0.394]
84 We delete the ones with only one answer to confirm there are at least two candidate answers for each question. [sent-275, score-0.49]
85 The candidate answers are rearranged by post time, so that the real answers do not always appear next to the questions. [sent-276, score-0.63]
86 We attribute the improvements to the high quality QA corpus Baidu Zhidao offers: the candidate answers tend to be more formal than the ones in the forums, with less noise information included. [sent-281, score-0.332]
87 05% in P@ 1 on this dataset, which indicates quite a number of askers receive the real answers at the first answer post. [sent-283, score-0.439]
88 What’s more, if the best answer appear immediately, the asker tends to lock down the question thread, which helps to reduce the noise information in the cQA corpus. [sent-285, score-0.306]
89 On the forum data, the results have been improved by 8. [sent-291, score-0.366]
90 6 Conclusions In this paper, we have proposed a deep belief network based approach to model the semantic relevance for the question answering pairs in social community corpora. [sent-296, score-0.794]
91 The contributions of this paper can be summa- rized as follows: (1) The deep belief network we present shows good performance on modeling the QA pairs’ semantic relevance using only word features. [sent-297, score-0.521]
92 As a data driven approach, our model learns semantic knowledge from large amount of QA pairs to represent the semantic relevance between questions and their answers. [sent-298, score-0.341]
93 (2) We have studied the textual similarity between the cQA and the forum datasets for QA pair extraction, and introduce a novel learning strategy to make our method show good performance on both cQA and forum datasets. [sent-299, score-0.876]
94 The experimental results show that our method outperforms the traditional approaches on both the cQA and the forum corpora. [sent-300, score-0.366]
95 Combining lexical semantic resources with question & answer archives for translation-based answer finding. [sent-313, score-0.501]
96 Using conditional random fields to extract contexts and answers of questions from online forums. [sent-323, score-0.445]
97 Searching questions by identifying question topic and question focus. [sent-328, score-0.315]
98 Finding similar questions in large question and answer archives. [sent-365, score-0.409]
99 Retrieving answers from frequently asked questions pages on the web. [sent-376, score-0.384]
100 Learning to rank answers on large online QA collections. [sent-408, score-0.314]
wordName wordTfidf (topN-words)
[('cqa', 0.528), ('forum', 0.366), ('answers', 0.253), ('qa', 0.25), ('answer', 0.186), ('rbm', 0.175), ('network', 0.153), ('questions', 0.131), ('belief', 0.129), ('layer', 0.129), ('cong', 0.127), ('homogenous', 0.111), ('deep', 0.11), ('social', 0.109), ('posts', 0.1), ('question', 0.092), ('relevance', 0.092), ('forums', 0.091), ('media', 0.084), ('threads', 0.083), ('davison', 0.079), ('zhidao', 0.079), ('dbn', 0.076), ('baidu', 0.076), ('hinton', 0.076), ('post', 0.073), ('hidden', 0.073), ('jeon', 0.069), ('hownet', 0.063), ('qihj', 0.063), ('online', 0.061), ('hong', 0.057), ('reconstruct', 0.056), ('contents', 0.056), ('mq', 0.056), ('salakhutdinov', 0.056), ('axis', 0.051), ('candidate', 0.051), ('reconstructing', 0.048), ('pretraining', 0.048), ('surdeanu', 0.045), ('sites', 0.045), ('pairs', 0.044), ('firstly', 0.043), ('boltzmann', 0.042), ('mrr', 0.041), ('sun', 0.04), ('textual', 0.04), ('strategy', 0.04), ('ding', 0.039), ('email', 0.038), ('visible', 0.038), ('sigir', 0.038), ('detect', 0.038), ('semantic', 0.037), ('lengths', 0.036), ('concurrent', 0.036), ('cosine', 0.036), ('bottom', 0.035), ('similarity', 0.033), ('stands', 0.033), ('greedily', 0.032), ('corpora', 0.032), ('baoxun', 0.032), ('bingquan', 0.032), ('chengjie', 0.032), ('computerfansclub', 0.032), ('ftyp', 0.032), ('kldivergence', 0.032), ('nonfactoid', 0.032), ('pretrained', 0.032), ('qdata', 0.032), ('qrecon', 0.032), ('shrestha', 0.032), ('twedstraigmhneosifluwcatetri', 0.032), ('xiaolong', 0.032), ('wang', 0.031), ('datasets', 0.031), ('vectors', 0.031), ('dataset', 0.03), ('bernhard', 0.03), ('layers', 0.03), ('nearest', 0.029), ('ny', 0.029), ('answering', 0.028), ('noise', 0.028), ('short', 0.028), ('aug', 0.028), ('joon', 0.028), ('retrieval', 0.028), ('features', 0.027), ('content', 0.027), ('riezler', 0.027), ('ranking', 0.027), ('berger', 0.026), ('wij', 0.025), ('vibhu', 0.025), ('jiwoon', 0.025), ('factoid', 0.025), ('gurevych', 0.025)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000002 174 acl-2010-Modeling Semantic Relevance for Question-Answer Pairs in Web Social Communities
Author: Baoxun Wang ; Xiaolong Wang ; Chengjie Sun ; Bingquan Liu ; Lin Sun
Abstract: Quantifying the semantic relevance between questions and their candidate answers is essential to answer detection in social media corpora. In this paper, a deep belief network is proposed to model the semantic relevance for question-answer pairs. Observing the textual similarity between the community-driven questionanswering (cQA) dataset and the forum dataset, we present a novel learning strategy to promote the performance of our method on the social community datasets without hand-annotating work. The experimental results show that our method outperforms the traditional approaches on both the cQA and the forum corpora.
2 0.3405292 171 acl-2010-Metadata-Aware Measures for Answer Summarization in Community Question Answering
Author: Mattia Tomasoni ; Minlie Huang
Abstract: This paper presents a framework for automatically processing information coming from community Question Answering (cQA) portals with the purpose of generating a trustful, complete, relevant and succinct summary in response to a question. We exploit the metadata intrinsically present in User Generated Content (UGC) to bias automatic multi-document summarization techniques toward high quality information. We adopt a representation of concepts alternative to n-grams and propose two concept-scoring functions based on semantic overlap. Experimental re- sults on data drawn from Yahoo! Answers demonstrate the effectiveness of our method in terms of ROUGE scores. We show that the information contained in the best answers voted by users of cQA portals can be successfully complemented by our method.
3 0.19498846 189 acl-2010-Optimizing Question Answering Accuracy by Maximizing Log-Likelihood
Author: Matthias H. Heie ; Edward W. D. Whittaker ; Sadaoki Furui
Abstract: In this paper we demonstrate that there is a strong correlation between the Question Answering (QA) accuracy and the log-likelihood of the answer typing component of our statistical QA model. We exploit this observation in a clustering algorithm which optimizes QA accuracy by maximizing the log-likelihood of a set of question-and-answer pairs. Experimental results show that we achieve better QA accuracy using the resulting clusters than by using manually derived clusters.
4 0.18235992 215 acl-2010-Speech-Driven Access to the Deep Web on Mobile Devices
Author: Taniya Mishra ; Srinivas Bangalore
Abstract: The Deep Web is the collection of information repositories that are not indexed by search engines. These repositories are typically accessible through web forms and contain dynamically changing information. In this paper, we present a system that allows users to access such rich repositories of information on mobile devices using spoken language.
5 0.11797155 204 acl-2010-Recommendation in Internet Forums and Blogs
Author: Jia Wang ; Qing Li ; Yuanzhu Peter Chen ; Zhangxi Lin
Abstract: The variety of engaging interactions among users in social medial distinguishes it from traditional Web media. Such a feature should be utilized while attempting to provide intelligent services to social media participants. In this article, we present a framework to recommend relevant information in Internet forums and blogs using user comments, one of the most representative of user behaviors in online discussion. When incorporating user comments, we consider structural, semantic, and authority information carried by them. One of the most important observation from this work is that semantic contents of user comments can play a fairly different role in a different form of social media. When designing a recommendation system for this purpose, such a difference must be considered with caution.
6 0.11048915 2 acl-2010-"Was It Good? It Was Provocative." Learning the Meaning of Scalar Adjectives
7 0.10909519 113 acl-2010-Extraction and Approximation of Numerical Attributes from the Web
8 0.087543711 227 acl-2010-The Impact of Interpretation Problems on Tutorial Dialogue
9 0.084826864 112 acl-2010-Extracting Social Networks from Literary Fiction
10 0.083755046 129 acl-2010-Growing Related Words from Seed via User Behaviors: A Re-Ranking Based Approach
11 0.074004531 140 acl-2010-Identifying Non-Explicit Citing Sentences for Citation-Based Summarization.
12 0.071645245 123 acl-2010-Generating Focused Topic-Specific Sentiment Lexicons
13 0.062568523 47 acl-2010-Beetle II: A System for Tutoring and Computational Linguistics Experimentation
14 0.061236437 51 acl-2010-Bilingual Sense Similarity for Statistical Machine Translation
15 0.058786616 15 acl-2010-A Semi-Supervised Key Phrase Extraction Approach: Learning from Title Phrases through a Document Semantic Network
16 0.056293413 22 acl-2010-A Unified Graph Model for Sentence-Based Opinion Retrieval
17 0.05103007 218 acl-2010-Structural Semantic Relatedness: A Knowledge-Based Method to Named Entity Disambiguation
18 0.050173946 70 acl-2010-Contextualizing Semantic Representations Using Syntactically Enriched Vector Models
19 0.04595932 18 acl-2010-A Study of Information Retrieval Weighting Schemes for Sentiment Analysis
20 0.045796297 14 acl-2010-A Risk Minimization Framework for Extractive Speech Summarization
topicId topicWeight
[(0, -0.158), (1, 0.079), (2, -0.099), (3, -0.024), (4, 0.017), (5, -0.089), (6, -0.045), (7, -0.005), (8, -0.058), (9, -0.021), (10, -0.024), (11, 0.019), (12, -0.112), (13, -0.156), (14, 0.043), (15, 0.251), (16, -0.285), (17, -0.196), (18, 0.015), (19, -0.015), (20, 0.029), (21, -0.075), (22, 0.176), (23, -0.062), (24, 0.066), (25, 0.07), (26, -0.063), (27, -0.059), (28, -0.05), (29, -0.027), (30, -0.081), (31, -0.035), (32, -0.014), (33, 0.17), (34, -0.083), (35, 0.062), (36, 0.002), (37, 0.173), (38, -0.096), (39, -0.037), (40, -0.04), (41, -0.088), (42, -0.093), (43, 0.062), (44, -0.032), (45, -0.061), (46, -0.085), (47, 0.028), (48, -0.02), (49, -0.041)]
simIndex simValue paperId paperTitle
same-paper 1 0.94942623 174 acl-2010-Modeling Semantic Relevance for Question-Answer Pairs in Web Social Communities
Author: Baoxun Wang ; Xiaolong Wang ; Chengjie Sun ; Bingquan Liu ; Lin Sun
Abstract: Quantifying the semantic relevance between questions and their candidate answers is essential to answer detection in social media corpora. In this paper, a deep belief network is proposed to model the semantic relevance for question-answer pairs. Observing the textual similarity between the community-driven questionanswering (cQA) dataset and the forum dataset, we present a novel learning strategy to promote the performance of our method on the social community datasets without hand-annotating work. The experimental results show that our method outperforms the traditional approaches on both the cQA and the forum corpora.
2 0.8415702 189 acl-2010-Optimizing Question Answering Accuracy by Maximizing Log-Likelihood
Author: Matthias H. Heie ; Edward W. D. Whittaker ; Sadaoki Furui
Abstract: In this paper we demonstrate that there is a strong correlation between the Question Answering (QA) accuracy and the log-likelihood of the answer typing component of our statistical QA model. We exploit this observation in a clustering algorithm which optimizes QA accuracy by maximizing the log-likelihood of a set of question-and-answer pairs. Experimental results show that we achieve better QA accuracy using the resulting clusters than by using manually derived clusters.
3 0.83846527 171 acl-2010-Metadata-Aware Measures for Answer Summarization in Community Question Answering
Author: Mattia Tomasoni ; Minlie Huang
Abstract: This paper presents a framework for automatically processing information coming from community Question Answering (cQA) portals with the purpose of generating a trustful, complete, relevant and succinct summary in response to a question. We exploit the metadata intrinsically present in User Generated Content (UGC) to bias automatic multi-document summarization techniques toward high quality information. We adopt a representation of concepts alternative to n-grams and propose two concept-scoring functions based on semantic overlap. Experimental re- sults on data drawn from Yahoo! Answers demonstrate the effectiveness of our method in terms of ROUGE scores. We show that the information contained in the best answers voted by users of cQA portals can be successfully complemented by our method.
4 0.72721404 215 acl-2010-Speech-Driven Access to the Deep Web on Mobile Devices
Author: Taniya Mishra ; Srinivas Bangalore
Abstract: The Deep Web is the collection of information repositories that are not indexed by search engines. These repositories are typically accessible through web forms and contain dynamically changing information. In this paper, we present a system that allows users to access such rich repositories of information on mobile devices using spoken language.
5 0.60622221 2 acl-2010-"Was It Good? It Was Provocative." Learning the Meaning of Scalar Adjectives
Author: Marie-Catherine de Marneffe ; Christopher D. Manning ; Christopher Potts
Abstract: Texts and dialogues often express information indirectly. For instance, speakers’ answers to yes/no questions do not always straightforwardly convey a ‘yes’ or ‘no’ answer. The intended reply is clear in some cases (Was it good? It was great!) but uncertain in others (Was it acceptable? It was unprecedented.). In this paper, we present methods for interpreting the answers to questions like these which involve scalar modifiers. We show how to ground scalar modifier meaning based on data collected from the Web. We learn scales between modifiers and infer the extent to which a given answer conveys ‘yes’ or ‘no’ . To evaluate the methods, we collected examples of question–answer pairs involving scalar modifiers from CNN transcripts and the Dialog Act corpus and use response distributions from Mechanical Turk workers to assess the degree to which each answer conveys ‘yes’ or ‘no’ . Our experimental results closely match the Turkers’ response data, demonstrating that meanings can be learned from Web data and that such meanings can drive pragmatic inference.
6 0.47182333 113 acl-2010-Extraction and Approximation of Numerical Attributes from the Web
7 0.4517068 204 acl-2010-Recommendation in Internet Forums and Blogs
8 0.4431951 63 acl-2010-Comparable Entity Mining from Comparative Questions
9 0.40422529 140 acl-2010-Identifying Non-Explicit Citing Sentences for Citation-Based Summarization.
10 0.39855641 248 acl-2010-Unsupervised Ontology Induction from Text
11 0.32389456 254 acl-2010-Using Speech to Reply to SMS Messages While Driving: An In-Car Simulator User Study
12 0.31951514 112 acl-2010-Extracting Social Networks from Literary Fiction
13 0.29497391 183 acl-2010-Online Generation of Locality Sensitive Hash Signatures
14 0.29201707 129 acl-2010-Growing Related Words from Seed via User Behaviors: A Re-Ranking Based Approach
15 0.27798676 224 acl-2010-Talking NPCs in a Virtual Game World
16 0.27382246 227 acl-2010-The Impact of Interpretation Problems on Tutorial Dialogue
17 0.26458696 176 acl-2010-Mood Patterns and Affective Lexicon Access in Weblogs
18 0.25439334 19 acl-2010-A Taxonomy, Dataset, and Classifier for Automatic Noun Compound Interpretation
19 0.25413764 218 acl-2010-Structural Semantic Relatedness: A Knowledge-Based Method to Named Entity Disambiguation
20 0.25363263 256 acl-2010-Vocabulary Choice as an Indicator of Perspective
topicId topicWeight
[(14, 0.011), (17, 0.235), (25, 0.051), (33, 0.01), (39, 0.012), (42, 0.059), (59, 0.082), (72, 0.072), (73, 0.062), (78, 0.03), (80, 0.014), (83, 0.092), (84, 0.028), (98, 0.137)]
simIndex simValue paperId paperTitle
same-paper 1 0.79358494 174 acl-2010-Modeling Semantic Relevance for Question-Answer Pairs in Web Social Communities
Author: Baoxun Wang ; Xiaolong Wang ; Chengjie Sun ; Bingquan Liu ; Lin Sun
Abstract: Quantifying the semantic relevance between questions and their candidate answers is essential to answer detection in social media corpora. In this paper, a deep belief network is proposed to model the semantic relevance for question-answer pairs. Observing the textual similarity between the community-driven questionanswering (cQA) dataset and the forum dataset, we present a novel learning strategy to promote the performance of our method on the social community datasets without hand-annotating work. The experimental results show that our method outperforms the traditional approaches on both the cQA and the forum corpora.
2 0.7734434 247 acl-2010-Unsupervised Event Coreference Resolution with Rich Linguistic Features
Author: Cosmin Bejan ; Sanda Harabagiu
Abstract: This paper examines how a new class of nonparametric Bayesian models can be effectively applied to an open-domain event coreference task. Designed with the purpose of clustering complex linguistic objects, these models consider a potentially infinite number of features and categorical outcomes. The evaluation performed for solving both within- and cross-document event coreference shows significant improvements of the models when compared against two baselines for this task.
3 0.72784626 109 acl-2010-Experiments in Graph-Based Semi-Supervised Learning Methods for Class-Instance Acquisition
Author: Partha Pratim Talukdar ; Fernando Pereira
Abstract: Graph-based semi-supervised learning (SSL) algorithms have been successfully used to extract class-instance pairs from large unstructured and structured text collections. However, a careful comparison of different graph-based SSL algorithms on that task has been lacking. We compare three graph-based SSL algorithms for class-instance acquisition on a variety of graphs constructed from different domains. We find that the recently proposed MAD algorithm is the most effective. We also show that class-instance extraction can be significantly improved by adding semantic information in the form of instance-attribute edges derived from an independently developed knowledge base. All of our code and data will be made publicly available to encourage reproducible research in this area.
4 0.67828619 209 acl-2010-Sentiment Learning on Product Reviews via Sentiment Ontology Tree
Author: Wei Wei ; Jon Atle Gulla
Abstract: Existing works on sentiment analysis on product reviews suffer from the following limitations: (1) The knowledge of hierarchical relationships of products attributes is not fully utilized. (2) Reviews or sentences mentioning several attributes associated with complicated sentiments are not dealt with very well. In this paper, we propose a novel HL-SOT approach to labeling a product’s attributes and their associated sentiments in product reviews by a Hierarchical Learning (HL) process with a defined Sentiment Ontology Tree (SOT). The empirical analysis against a humanlabeled data set demonstrates promising and reasonable performance of the proposed HL-SOT approach. While this paper is mainly on sentiment analysis on reviews of one product, our proposed HLSOT approach is easily generalized to labeling a mix of reviews of more than one products.
5 0.66315901 171 acl-2010-Metadata-Aware Measures for Answer Summarization in Community Question Answering
Author: Mattia Tomasoni ; Minlie Huang
Abstract: This paper presents a framework for automatically processing information coming from community Question Answering (cQA) portals with the purpose of generating a trustful, complete, relevant and succinct summary in response to a question. We exploit the metadata intrinsically present in User Generated Content (UGC) to bias automatic multi-document summarization techniques toward high quality information. We adopt a representation of concepts alternative to n-grams and propose two concept-scoring functions based on semantic overlap. Experimental re- sults on data drawn from Yahoo! Answers demonstrate the effectiveness of our method in terms of ROUGE scores. We show that the information contained in the best answers voted by users of cQA portals can be successfully complemented by our method.
6 0.66272271 127 acl-2010-Global Learning of Focused Entailment Graphs
7 0.6594668 251 acl-2010-Using Anaphora Resolution to Improve Opinion Target Identification in Movie Reviews
8 0.65076745 159 acl-2010-Learning 5000 Relational Extractors
9 0.64949638 208 acl-2010-Sentence and Expression Level Annotation of Opinions in User-Generated Discourse
10 0.64765489 22 acl-2010-A Unified Graph Model for Sentence-Based Opinion Retrieval
11 0.64522755 113 acl-2010-Extraction and Approximation of Numerical Attributes from the Web
12 0.63997519 214 acl-2010-Sparsity in Dependency Grammar Induction
13 0.6340431 245 acl-2010-Understanding the Semantic Structure of Noun Phrase Queries
14 0.63265991 218 acl-2010-Structural Semantic Relatedness: A Knowledge-Based Method to Named Entity Disambiguation
15 0.63155788 134 acl-2010-Hierarchical Sequential Learning for Extracting Opinions and Their Attributes
16 0.63007969 188 acl-2010-Optimizing Informativeness and Readability for Sentiment Summarization
17 0.62989843 39 acl-2010-Automatic Generation of Story Highlights
18 0.62846947 185 acl-2010-Open Information Extraction Using Wikipedia
19 0.62666786 215 acl-2010-Speech-Driven Access to the Deep Web on Mobile Devices
20 0.62641513 140 acl-2010-Identifying Non-Explicit Citing Sentences for Citation-Based Summarization.