acl acl2010 acl2010-171 knowledge-graph by maker-knowledge-mining

171 acl-2010-Metadata-Aware Measures for Answer Summarization in Community Question Answering

Source: pdf

Author: Mattia Tomasoni ; Minlie Huang

Abstract: This paper presents a framework for automatically processing information coming from community Question Answering (cQA) portals with the purpose of generating a trustful, complete, relevant and succinct summary in response to a question. We exploit the metadata intrinsically present in User Generated Content (UGC) to bias automatic multi-document summarization techniques toward high quality information. We adopt a representation of concepts alternative to n-grams and propose two concept-scoring functions based on semantic overlap. Experimental re- sults on data drawn from Yahoo! Answers demonstrate the effectiveness of our method in terms of ROUGE scores. We show that the information contained in the best answers voted by users of cQA portals can be successfully complemented by our method.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 We exploit the metadata intrinsically present in User Generated Content (UGC) to bias automatic multi-document summarization techniques toward high quality information. [sent-7, score-0.359]

2 We adopt a representation of concepts alternative to n-grams and propose two concept-scoring functions based on semantic overlap. [sent-8, score-0.164]

3 We show that the information contained in the best answers voted by users of cQA portals can be successfully complemented by our method. [sent-11, score-0.548]

4 1 Introduction Community Question Answering (cQA) portals are an example of Social Media where the information need of a user is expressed in the form of a question for which a best answer is picked among the ones generated by other users. [sent-12, score-0.368]

5 cQA websites are becoming an increasingly popular complement to search engines: overnight, a user can expect a human-crafted, natural language answer tailored to her specific needs. [sent-13, score-0.252]

6 Much valuable information is contained in answers other than the chosen best one (Liu et al. [sent-23, score-0.533]

7 To this end, we casted the problem to an instance of the query-biased multi-document summarization task, where the question was seen as a query and the available answers as documents to be summarized. [sent-26, score-0.666]

8 Quality of the information was assessed via Machine Learning (ML) techniques under best answer supervision in a vector space consisting of linguistic and statistical features about the answers and their authors. [sent-28, score-0.701]

9 Coverage was estimated by semantic comparison with the knowledge space of a corpus of answers to similar questions which had been retrieved through the Yahoo! [sent-29, score-0.613]

10 Relevance was computed as information overlap between an answer and its question, while Novelty was calculated as inverse overlap with all other answers to the same question. [sent-31, score-0.847]

11 A score was assigned to each concept in an answer according to • 1http : / / deve l r . [sent-32, score-0.345]

12 We chose to express concepts in the form of Basic Elements (BE), a semantic unit developed at ISI2 and modeled semantic overlap as intersection in the equivalence classes of two concepts (formal definitions will be given in section 2. [sent-38, score-0.507]

13 The objective of our work was to present what we believe is a valuable conceptual framework; more advance machine learning and summarization techniques would most likely improve the performances. [sent-40, score-0.149]

14 In the next section Quality, Coverage, Relevance and Novelty measures are presented; we explain how they were calculated and combined to generate a final summary of all answers to a question. [sent-42, score-0.672]

15 1 Quality as a ranking problem Quality assessing of information available on Social Media had been studied before mainly as a binary classification problem with the objective of detecting low quality content. [sent-46, score-0.165]

16 We, on the other hand, treated it as a ranking problem and made use of quality estimates with the novel intent of successfully combining information from sources with different levels of trustfulness and writing ability. [sent-47, score-0.219]

17 An answer a was given along with information about the user u that authored it, the set TAq (Total Answers) of all answers to the same question q and the set TAu of all answers by the same user. [sent-52, score-1.223]

18 feature space to capture the following syntactic, behavioral and statistical properties: • • • • ϑ, length of answer a ς, number of non-stopwords in a with a corpus frequency larger pthwaonr n (set two t5h i an our experiments) $, points awarded to user u according to the Y$a,h poooin! [sent-57, score-0.346]

19 Atsn aswwaerrdse’ points system %, ratio of best answers posted by user u The features mentioned above determined a space An answer a, in such feature space, assumed the vectorial form: Ψ; Ψa = ( ϑ, ς, $, % ) Following the intuition that chosen best answers (a? [sent-58, score-1.239]

20 ) carry high quality information, we used supervised ML techniques to predict the probability of a to have been selected as a best answer a? [sent-59, score-0.323]

21 Supervision was given in the form of a training set TrQ of labeled pairs defined as: TrQ = {h Ψa, isbesta i} isbesta was a boolean label indicating whether a was an a? [sent-62, score-0.244]

22 It was calculated as dot product between the learned weight vector W and the feature vector for answer Ψa. [sent-66, score-0.261]

23 2 Bag-of-BEs and semantic overlap The properties that remain to be discussed, namely Coverage, Relevance and Novelty, are measures of semantic overlap between concepts; a concept is the smallest unit of meaning in a portion of written text. [sent-70, score-0.305]

24 To represent sentences and answers we adopted an alternative approach to classical ngrams that could be defined bag-of-BEs. [sent-71, score-0.454]

25 BEs are a strong theoretical instrument to tackle the ambiguity inherent in natural language that find successful practical applications in realworld query-based summarization systems. [sent-74, score-0.149]

26 A sentence is defined as a set of concepts and an answer is defined as the union between the sets that represent its sentences. [sent-78, score-0.37]

27 From a set-theoretical point of view, each concepts c was uniquely associated with a set Ec = {c1, c2 . [sent-80, score-0.164]

28 cm} such that: ∀i,j (ci ≈L c) ∧ (ci ≡ c) ∧ (ci ≡ cj) In our model, the “≡” relation indicated syntacItinc equivalence (exact pattern matching), wd shyilnet tahce“≈L” relation represented semantic equivalence “un≈der the convention of some language L (two concepts having the same meaning). [sent-83, score-0.39]

29 “Climbing a tree to escape a black bear is pointless because they can climb very well. [sent-85, score-0.219]

30 / k(Ecc ≡∩ k Ek6= ∅ or We defined semantic overlap as occurring between c and k if they were syntactically identical or if their equivalence classes Ec and Ek had at least one element in common. [sent-93, score-0.179]

31 /” is symmetric, transitive and reflexive; as a consequence all concepts with the same meaning are part of a same equivalence class. [sent-96, score-0.277]

32 3 Coverage via concept importance In the scenario we proposed, the information need is addressed in the form of a unique, summarized answer; information that is left out of user’s the final summary will simply be unavailable. [sent-101, score-0.325]

33 We proceeded by treating each answer to a question q as a separate document and we retrieved through the Yahoo! [sent-105, score-0.269]

34 Answers API a set TKq (Total Knowledge) of 50 answers to questions similar to q: the knowledge space of TKq was chosen to approximate the entire knowledge space related to the queried question q. [sent-106, score-0.755]

35 We calculated Coverage as a function of the portion of answers in TKq that presented semantic overlap with a. [sent-107, score-0.575]

36 762 C(a,q) = Xγ(ci) · tf(ci,a) (2) cXi∈a The Coverage measure for an answer a was calculated as the sum of term frequency tf(ci, a) for concepts in the answer itself, weighted by a concept importance function, γ(ci), for concepts in the total knowledge space TKq. [sent-115, score-0.975]

37 / c} The function γ(c) of concept c was calculated as a function of the cardinality of set TKq and set TKq,c, which was the subset of all those answers d that contained at least one concept k which presented semantical overlap with c itself. [sent-117, score-0.949]

38 A similar idea of knowledge space coverage is addressed by Swaminathan et al. [sent-118, score-0.148]

39 We calculated relevance by computing the semantic overlap between concepts in the answers and the question. [sent-125, score-0.848]

40 Intuitively, we reward concepts that express meaning that could be found in the question to be answered. [sent-126, score-0.227]

41 / c} The Relevance measure R(c, q) of a concept c with respect to a question q was calculated as the ratio of the cardinality of set (containing all concepts in q that semantically overlapped with c) normalized by the total number of concepts in q. [sent-128, score-0.676]

42 Since all elements in TAq (the set of qc all answers to q) would be used for the final summary, we positively rewarded concepts that were expressing novel meanings. [sent-130, score-0.691]

43 The score for concept c appearing in sentence was calculated as: sc Y4 SΠ(c) = Y(Φic) · logt(length(sc)) (6) iY= Y1 A second approach that made use of human annotation to learn a vector of weights V = (v1, v2, v3, v4) that linearly combined the scores was investigated. [sent-139, score-0.252]

44 X4 SΣ(c) = X(Φic· vi) + length(sc) · v5 (7) Xi= X1 In order to learn the weight vector V that would combine the above scores, we asked three human annotators to generate question-biased extractive summaries based on all answers available for a certain question. [sent-141, score-0.555]

45 6 Quality constrained summarization The previous sections showed how we quantitatively determined which concepts were more worthy of becoming part of the final machine summary M. [sent-147, score-0.442]

46 The final step was to generate the summary itself by automatically selecting sentences under a length constraint. [sent-148, score-0.182]

47 M was generated so as to maximize the scores of the concepts it included. [sent-151, score-0.164]

48 The integer tveanricaebs:les M xi =and { yj were equals to∀ one Tifh hthee i corresponding concept ci and sentence sj were included in M. [sent-153, score-0.48]

49 Similarly occij was equal to one if concept ci was contained in sentence sj. [sent-154, score-0.395]

50 We maximized the sum of scores S(ci) (for S equals to SΠ or SΣ) for each concept ci in the final summary M. [sent-155, score-0.402]

51 We did so under the constraint that the total length of all sentences sj included in M must be less than the total expected length of the summary itself. [sent-156, score-0.313]

52 In addition, we imposed a consistency constraint: if a concept ci was included in M, then at least one sentence sj that contained the concept must also be selected (constraint (10)). [sent-157, score-0.532]

53 We conclude with an empirical side note: since solving the above can be computationally very demanding for large number of concepts, we found occij performance-wise very fruitful to skim about one fourth of the concepts with lowest scores. [sent-159, score-0.245]

54 1 Datasets and filters The initial dataset was composed of 216,563 questions and 1,982,006 answers written by 171,676 user in 100 categories from the Yahoo! [sent-161, score-0.678]

55 ] We also removed questions that showed statistical values outside ofconvenient ranges: the number of answers, length of the longest answer and length of the sum of all answers (both absolute and normalized) were taken in consideration. [sent-188, score-0.884]

56 The dataset size was thus reduced to 358 answers to 100 questions that were manually summarized (refer to Section 3. [sent-190, score-0.689]

57 Figure 1: Precision values (Y-axis) in detecting best answers a? [sent-197, score-0.454]

58 amount of training examples needed to successfully train a classifier for the quality assessing task. [sent-199, score-0.21]

59 The Linear Regression9 method was chosen to de- termine the probability Q(Ψa) of a to be a best answer to q; as explained in Section 2. [sent-200, score-0.244]

60 The evaluation of the classifier’s output was based on the observation that given the set of all answers TAq relative to q and the best answer a? [sent-202, score-0.66]

61 ) > Q(Ψa)}| |TrQ| where the numerator was the number of questions for which the classifier was able to correctly rank a? [sent-206, score-0.163]

62 Figure 1shows the precision values (Y-axis) in identifying best answers as the size of TrQ increases (X-axis). [sent-208, score-0.454]

63 A training set of 12,000 examples was chosen for the summarization experiments. [sent-214, score-0.187]

64 3 Evaluating answer summaries The objective of our work was to summarize answers from cQA portals. [sent-225, score-0.727]

65 We calculated ROUGE-1 and ROUGE-2 scores10 against human annotation on the filtered version of the dataset presented in Section 3. [sent-229, score-0.163]

66 The filtered dataset consisted of 358 answers to 100 questions. [sent-231, score-0.562]

67 For each questions q, three annotators were asked to produce an extractive summary of the information contained in TAq by selecting sentences subject to a fixed length limit of 250 words. [sent-232, score-0.335]

68 Figure 2: Increase in ROUGE-L, ROUGE-1 and ROUGE2 performances of the SΠ system as more measures are taken in consideration in the scoring function, starting from Relevance alone (R) to the complete system (RQNC). [sent-242, score-0.18]

69 In order to determine what influence the single measures had on the overall performance, we conducted a final experiment on the filtered dataset to evaluate (the SΠ scoring function was used). [sent-246, score-0.226]

70 A summary example, along with the question and the best answer, is presented in Table 2. [sent-252, score-0.152]

71 The lengthM constraint for the final summary (Section 2. [sent-254, score-0.168]

72 for example adult Grizzlies can t climb trees, but Black bears can even when adults. [sent-297, score-0.18]

73 They can not climb in general as thier claws are longer and not semi-retractable like a Black bears claws. [sent-298, score-0.221]

74 ] Table 2: A summarized answer composed of five different portions of text generated with the SΠ scoring function; the chosen best answer is presented for comparison. [sent-312, score-0.551]

75 make summarization of Less satisfying examples in- clude summaries to questions that require a specific order of sentences or a compromise between strongly discordant opinions; in those cases, the summarized answer might lack logical consistency. [sent-314, score-0.597]

76 the total knowledge available about q, a coverage estimate of the final answers against it would have been ideal. [sent-315, score-0.601]

77 Unfortunately the lack of metadata about those answers prevented us from proceeding in that direction. [sent-316, score-0.547]

78 This consideration suggests the idea of building TKq using similar answers in the dataset itself, for which metadata is indeed available. [sent-317, score-0.654]

79 Furthermore, similar questions in the dataset could have been used to augment the set of answers used to generate the final summary with answers coming from similar questions. [sent-318, score-1.215]

80 (2009a) presents a method to retrieve similar questions that could be worth taking in consideration for the task. [sent-320, score-0.165]

81 A Quality feature space for questions is presented by Agichtein et al. [sent-322, score-0.159]

82 (2008) and could be used to rank the quality of questions in a way similar to how we ranked the quality of answers. [sent-323, score-0.352]

83 Finally, in addition to the chosen best answer, a DUC-styled query-focused multi-document summary could be used as a baseline against which the performances of the system can be checked. [sent-329, score-0.182]

84 (2008), where standard multidocument summarization techniques are employed along with taxonomic information about questions. [sent-331, score-0.149]

85 At the core of our work laid information trustfulness, summarization techniques and alternative concept representation. [sent-335, score-0.288]

86 (2006) presents a frame- work to use Maximum Entropy for answer quality estimation through non-textual features; with the same purpose, more recent methods based on the expertise of answerers are proposed by Suryanto et al. [sent-344, score-0.323]

87 (2009b) introduce the idea of ranking answers taking their relation to questions in consideration. [sent-346, score-0.572]

88 Our approach merged trustfulness estimation and summarization techniques: we adapted the automatic concept-level model presented by Gillick and Favre (2009) to our needs; related work in multi-document summarization has been carried out by Wang et al. [sent-350, score-0.4]

89 A relevant selection of approaches that instead make use of ML techniques for query-biased summarization is the following: Wang et al. [sent-352, score-0.149]

90 6 Conclusions We presented a framework to generate trustful, complete, relevant and succinct answers to questions posted by users in cQA portals. [sent-359, score-0.572]

91 We made use of intrinsically available metadata along with concept-level multi-document summarization techniques. [sent-360, score-0.242]

92 Furthermore, we proposed an original use for the BE representation of concepts and tested two concept-scoring functions to combine Quality, Coverage, Relevance and Novelty measures. [sent-361, score-0.164]

93 Evaluation results on human annotated data showed that our summarized answers constitute a solid complement to best answers voted by the cQA users. [sent-362, score-0.965]

94 We are in the process of building a system that performs on-line summarization of large sets of questions and answers from Yahoo! [sent-363, score-0.721]

95 Larger-scale evaluation of results against other state-of-the-art summarization systems is ongoing. [sent-365, score-0.149]

96 A framework to predict the quality of 768 answers with non-textual features. [sent-404, score-0.571]

97 Enhancing diversity, coverage and balance for summarization through structure learning. [sent-409, score-0.256]

98 Understanding and summarizing answers in community-based question answering services. [sent-414, score-0.553]

99 Multi-document summarization via sentence-level semantic analysis and symmetric matrix factorization. [sent-458, score-0.149]

100 Ranking community answers by modeling question-answer relationships via analogical reasoning. [sent-468, score-0.454]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('answers', 0.454), ('taq', 0.224), ('answer', 0.206), ('cqa', 0.178), ('concepts', 0.164), ('tkq', 0.163), ('summarization', 0.149), ('novelty', 0.143), ('concept', 0.139), ('ci', 0.134), ('isbesta', 0.122), ('trq', 0.122), ('ugc', 0.122), ('questions', 0.118), ('quality', 0.117), ('equivalence', 0.113), ('relevance', 0.109), ('coverage', 0.107), ('trustfulness', 0.102), ('bears', 0.099), ('metadata', 0.093), ('summary', 0.089), ('sigir', 0.089), ('unfiltered', 0.082), ('climb', 0.081), ('occij', 0.081), ('suryanto', 0.081), ('swaminathan', 0.081), ('sj', 0.079), ('black', 0.078), ('agichtein', 0.076), ('ny', 0.074), ('jeon', 0.071), ('yahoo', 0.07), ('summaries', 0.067), ('overlap', 0.066), ('question', 0.063), ('grizzlies', 0.061), ('lengthm', 0.061), ('dataset', 0.06), ('bear', 0.06), ('trust', 0.058), ('sc', 0.058), ('summarized', 0.057), ('wang', 0.057), ('performances', 0.055), ('cardinality', 0.055), ('york', 0.055), ('calculated', 0.055), ('trs', 0.053), ('credibility', 0.053), ('bes', 0.053), ('portals', 0.053), ('zeng', 0.053), ('length', 0.053), ('yj', 0.049), ('gillick', 0.049), ('tau', 0.049), ('filtered', 0.048), ('assessing', 0.048), ('consideration', 0.047), ('user', 0.046), ('animals', 0.046), ('xi', 0.045), ('classifier', 0.045), ('scoring', 0.044), ('social', 0.044), ('contained', 0.041), ('space', 0.041), ('ml', 0.041), ('akamine', 0.041), ('claws', 0.041), ('favre', 0.041), ('honglei', 0.041), ('includec', 0.041), ('mcguinness', 0.041), ('rq', 0.041), ('rqc', 0.041), ('rqnc', 0.041), ('stvilia', 0.041), ('trekking', 0.041), ('trustful', 0.041), ('final', 0.04), ('constraint', 0.039), ('chosen', 0.038), ('answering', 0.036), ('overlapped', 0.036), ('amini', 0.036), ('tsinghua', 0.036), ('aixin', 0.036), ('retrieval', 0.035), ('zhou', 0.035), ('acm', 0.035), ('integer', 0.034), ('extractive', 0.034), ('measures', 0.034), ('deborah', 0.033), ('climbing', 0.033), ('qc', 0.033), ('percentile', 0.033)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000011 171 acl-2010-Metadata-Aware Measures for Answer Summarization in Community Question Answering

Author: Mattia Tomasoni ; Minlie Huang

2 0.3405292 174 acl-2010-Modeling Semantic Relevance for Question-Answer Pairs in Web Social Communities

Author: Baoxun Wang ; Xiaolong Wang ; Chengjie Sun ; Bingquan Liu ; Lin Sun

Abstract: Quantifying the semantic relevance between questions and their candidate answers is essential to answer detection in social media corpora. In this paper, a deep belief network is proposed to model the semantic relevance for question-answer pairs. Observing the textual similarity between the community-driven questionanswering (cQA) dataset and the forum dataset, we present a novel learning strategy to promote the performance of our method on the social community datasets without hand-annotating work. The experimental results show that our method outperforms the traditional approaches on both the cQA and the forum corpora.

3 0.20384181 215 acl-2010-Speech-Driven Access to the Deep Web on Mobile Devices

Author: Taniya Mishra ; Srinivas Bangalore

Abstract: The Deep Web is the collection of information repositories that are not indexed by search engines. These repositories are typically accessible through web forms and contain dynamically changing information. In this paper, we present a system that allows users to access such rich repositories of information on mobile devices using spoken language.

4 0.18741615 14 acl-2010-A Risk Minimization Framework for Extractive Speech Summarization

Author: Shih-Hsiang Lin ; Berlin Chen

Abstract: In this paper, we formulate extractive summarization as a risk minimization problem and propose a unified probabilistic framework that naturally combines supervised and unsupervised summarization models to inherit their individual merits as well as to overcome their inherent limitations. In addition, the introduction of various loss functions also provides the summarization framework with a flexible but systematic way to render the redundancy and coherence relationships among sentences and between sentences and the whole document, respectively. Experiments on speech summarization show that the methods deduced from our framework are very competitive with existing summarization approaches. 1

5 0.16705586 77 acl-2010-Cross-Language Document Summarization Based on Machine Translation Quality Prediction

Author: Xiaojun Wan ; Huiying Li ; Jianguo Xiao

Abstract: Cross-language document summarization is a task of producing a summary in one language for a document set in a different language. Existing methods simply use machine translation for document translation or summary translation. However, current machine translation services are far from satisfactory, which results in that the quality of the cross-language summary is usually very poor, both in readability and content. In this paper, we propose to consider the translation quality of each sentence in the English-to-Chinese cross-language summarization process. First, the translation quality of each English sentence in the document set is predicted with the SVM regression method, and then the quality score of each sentence is incorporated into the summarization process. Finally, the English sentences with high translation quality and high informativeness are selected and translated to form the Chinese summary. Experimental results demonstrate the effectiveness and usefulness of the proposed approach. 1

6 0.1556911 2 acl-2010-"Was It Good? It Was Provocative." Learning the Meaning of Scalar Adjectives

7 0.12009714 227 acl-2010-The Impact of Interpretation Problems on Tutorial Dialogue

8 0.11866649 38 acl-2010-Automatic Evaluation of Linguistic Quality in Multi-Document Summarization

9 0.11861105 11 acl-2010-A New Approach to Improving Multilingual Summarization Using a Genetic Algorithm

10 0.11653415 189 acl-2010-Optimizing Question Answering Accuracy by Maximizing Log-Likelihood

11 0.10981645 264 acl-2010-Wrapping up a Summary: From Representation to Generation

12 0.10311705 218 acl-2010-Structural Semantic Relatedness: A Knowledge-Based Method to Named Entity Disambiguation

13 0.096289098 8 acl-2010-A Hybrid Hierarchical Model for Multi-Document Summarization

14 0.092941754 113 acl-2010-Extraction and Approximation of Numerical Attributes from the Web

15 0.090760842 39 acl-2010-Automatic Generation of Story Highlights

16 0.079448625 125 acl-2010-Generating Templates of Entity Summaries with an Entity-Aspect Model and Pattern Mining

17 0.078736402 261 acl-2010-Wikipedia as Sense Inventory to Improve Diversity in Web Search Results

18 0.078610688 204 acl-2010-Recommendation in Internet Forums and Blogs

19 0.076516427 47 acl-2010-Beetle II: A System for Tutoring and Computational Linguistics Experimentation

20 0.076071657 124 acl-2010-Generating Image Descriptions Using Dependency Relational Patterns

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.228), (1, 0.108), (2, -0.149), (3, -0.046), (4, 0.026), (5, -0.107), (6, -0.017), (7, -0.171), (8, -0.074), (9, -0.02), (10, -0.006), (11, -0.017), (12, -0.176), (13, -0.1), (14, 0.005), (15, 0.198), (16, -0.27), (17, -0.128), (18, 0.025), (19, 0.01), (20, 0.023), (21, -0.013), (22, 0.138), (23, -0.108), (24, 0.063), (25, 0.08), (26, -0.056), (27, -0.057), (28, -0.009), (29, 0.04), (30, -0.07), (31, -0.079), (32, -0.04), (33, 0.148), (34, -0.06), (35, 0.066), (36, -0.004), (37, 0.141), (38, -0.05), (39, -0.0), (40, -0.005), (41, -0.134), (42, -0.045), (43, 0.07), (44, -0.012), (45, -0.058), (46, -0.103), (47, 0.037), (48, -0.038), (49, -0.035)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96258819 171 acl-2010-Metadata-Aware Measures for Answer Summarization in Community Question Answering

Author: Mattia Tomasoni ; Minlie Huang

2 0.88956797 174 acl-2010-Modeling Semantic Relevance for Question-Answer Pairs in Web Social Communities

Author: Baoxun Wang ; Xiaolong Wang ; Chengjie Sun ; Bingquan Liu ; Lin Sun

3 0.80048108 189 acl-2010-Optimizing Question Answering Accuracy by Maximizing Log-Likelihood

Author: Matthias H. Heie ; Edward W. D. Whittaker ; Sadaoki Furui

Abstract: In this paper we demonstrate that there is a strong correlation between the Question Answering (QA) accuracy and the log-likelihood of the answer typing component of our statistical QA model. We exploit this observation in a clustering algorithm which optimizes QA accuracy by maximizing the log-likelihood of a set of question-and-answer pairs. Experimental results show that we achieve better QA accuracy using the resulting clusters than by using manually derived clusters.

4 0.71255797 215 acl-2010-Speech-Driven Access to the Deep Web on Mobile Devices

Author: Taniya Mishra ; Srinivas Bangalore

5 0.57224905 2 acl-2010-"Was It Good? It Was Provocative." Learning the Meaning of Scalar Adjectives

Author: Marie-Catherine de Marneffe ; Christopher D. Manning ; Christopher Potts

Abstract: Texts and dialogues often express information indirectly. For instance, speakers’ answers to yes/no questions do not always straightforwardly convey a ‘yes’ or ‘no’ answer. The intended reply is clear in some cases (Was it good? It was great!) but uncertain in others (Was it acceptable? It was unprecedented.). In this paper, we present methods for interpreting the answers to questions like these which involve scalar modifiers. We show how to ground scalar modifier meaning based on data collected from the Web. We learn scales between modifiers and infer the extent to which a given answer conveys ‘yes’ or ‘no’ . To evaluate the methods, we collected examples of question–answer pairs involving scalar modifiers from CNN transcripts and the Dialog Act corpus and use response distributions from Mechanical Turk workers to assess the degree to which each answer conveys ‘yes’ or ‘no’ . Our experimental results closely match the Turkers’ response data, demonstrating that meanings can be learned from Web data and that such meanings can drive pragmatic inference.

6 0.56742978 140 acl-2010-Identifying Non-Explicit Citing Sentences for Citation-Based Summarization.

7 0.52915865 113 acl-2010-Extraction and Approximation of Numerical Attributes from the Web

8 0.49559459 14 acl-2010-A Risk Minimization Framework for Extractive Speech Summarization

9 0.47003847 11 acl-2010-A New Approach to Improving Multilingual Summarization Using a Genetic Algorithm

10 0.45401827 63 acl-2010-Comparable Entity Mining from Comparative Questions

11 0.44922453 248 acl-2010-Unsupervised Ontology Induction from Text

12 0.42599756 204 acl-2010-Recommendation in Internet Forums and Blogs

13 0.41382781 264 acl-2010-Wrapping up a Summary: From Representation to Generation

14 0.39567479 77 acl-2010-Cross-Language Document Summarization Based on Machine Translation Quality Prediction

15 0.38459727 7 acl-2010-A Generalized-Zero-Preserving Method for Compact Encoding of Concept Lattices

16 0.37842426 38 acl-2010-Automatic Evaluation of Linguistic Quality in Multi-Document Summarization

17 0.36607111 39 acl-2010-Automatic Generation of Story Highlights

18 0.35159078 8 acl-2010-A Hybrid Hierarchical Model for Multi-Document Summarization

19 0.33306777 183 acl-2010-Online Generation of Locality Sensitive Hash Signatures

20 0.33155161 166 acl-2010-Learning Word-Class Lattices for Definition and Hypernym Extraction

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(14, 0.014), (25, 0.045), (33, 0.013), (39, 0.018), (42, 0.024), (44, 0.018), (59, 0.077), (72, 0.369), (73, 0.045), (76, 0.012), (78, 0.026), (83, 0.096), (84, 0.025), (98, 0.133)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.85624564 159 acl-2010-Learning 5000 Relational Extractors

Author: Raphael Hoffmann ; Congle Zhang ; Daniel S. Weld

Abstract: Many researchers are trying to use information extraction (IE) to create large-scale knowledge bases from natural language text on the Web. However, the primary approach (supervised learning of relation-specific extractors) requires manually-labeled training data for each relation and doesn’t scale to the thousands of relations encoded in Web text. This paper presents LUCHS, a self-supervised, relation-specific IE system which learns 5025 relations more than an order of magnitude greater than any previous approach with an average F1 score of 61%. Crucial to LUCHS’s performance is an automated system for dynamic lexicon learning, which allows it to learn accurately from heuristically-generated training data, which is often noisy and sparse. — —

same-paper 2 0.82957417 171 acl-2010-Metadata-Aware Measures for Answer Summarization in Community Question Answering

Author: Mattia Tomasoni ; Minlie Huang

3 0.76187009 209 acl-2010-Sentiment Learning on Product Reviews via Sentiment Ontology Tree

Author: Wei Wei ; Jon Atle Gulla

Abstract: Existing works on sentiment analysis on product reviews suffer from the following limitations: (1) The knowledge of hierarchical relationships of products attributes is not fully utilized. (2) Reviews or sentences mentioning several attributes associated with complicated sentiments are not dealt with very well. In this paper, we propose a novel HL-SOT approach to labeling a product’s attributes and their associated sentiments in product reviews by a Hierarchical Learning (HL) process with a defined Sentiment Ontology Tree (SOT). The empirical analysis against a humanlabeled data set demonstrates promising and reasonable performance of the proposed HL-SOT approach. While this paper is mainly on sentiment analysis on reviews of one product, our proposed HLSOT approach is easily generalized to labeling a mix of reviews of more than one products.

4 0.72444308 127 acl-2010-Global Learning of Focused Entailment Graphs

Author: Jonathan Berant ; Ido Dagan ; Jacob Goldberger

Abstract: We propose a global algorithm for learning entailment relations between predicates. We define a graph structure over predicates that represents entailment relations as directed edges, and use a global transitivity constraint on the graph to learn the optimal set of edges, by formulating the optimization problem as an Integer Linear Program. We motivate this graph with an application that provides a hierarchical summary for a set of propositions that focus on a target concept, and show that our global algorithm improves performance by more than 10% over baseline algorithms.

5 0.60060179 174 acl-2010-Modeling Semantic Relevance for Question-Answer Pairs in Web Social Communities

Author: Baoxun Wang ; Xiaolong Wang ; Chengjie Sun ; Bingquan Liu ; Lin Sun

6 0.57669663 215 acl-2010-Speech-Driven Access to the Deep Web on Mobile Devices

7 0.57272291 113 acl-2010-Extraction and Approximation of Numerical Attributes from the Web

8 0.54462039 122 acl-2010-Generating Fine-Grained Reviews of Songs from Album Reviews

9 0.5274272 208 acl-2010-Sentence and Expression Level Annotation of Opinions in User-Generated Discourse

10 0.52455211 185 acl-2010-Open Information Extraction Using Wikipedia

11 0.51365876 109 acl-2010-Experiments in Graph-Based Semi-Supervised Learning Methods for Class-Instance Acquisition

12 0.51139975 251 acl-2010-Using Anaphora Resolution to Improve Opinion Target Identification in Movie Reviews

13 0.5109365 188 acl-2010-Optimizing Informativeness and Readability for Sentiment Summarization

14 0.51045167 218 acl-2010-Structural Semantic Relatedness: A Knowledge-Based Method to Named Entity Disambiguation

15 0.50602806 189 acl-2010-Optimizing Question Answering Accuracy by Maximizing Log-Likelihood

16 0.50144351 245 acl-2010-Understanding the Semantic Structure of Noun Phrase Queries

17 0.49986681 80 acl-2010-Cross Lingual Adaptation: An Experiment on Sentiment Classifications

18 0.49918661 160 acl-2010-Learning Arguments and Supertypes of Semantic Relations Using Recursive Patterns

19 0.49899387 248 acl-2010-Unsupervised Ontology Induction from Text

20 0.49899197 22 acl-2010-A Unified Graph Model for Sentence-Based Opinion Retrieval