acl acl2013 acl2013-218 knowledge-graph by maker-knowledge-mining

218 acl-2013-Latent Semantic Tensor Indexing for Community-based Question Answering


Source: pdf

Author: Xipeng Qiu ; Le Tian ; Xuanjing Huang

Abstract: Retrieving similar questions is very important in community-based question answering(CQA) . In this paper, we propose a unified question retrieval model based on latent semantic indexing with tensor analysis, which can capture word associations among different parts of CQA triples simultaneously. Thus, our method can reduce lexical chasm of question retrieval with the help of the information of question content and answer parts. The experimental result shows that our method outperforms the traditional methods.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 cn Abstract Retrieving similar questions is very important in community-based question answering(CQA) . [sent-6, score-0.313]

2 In this paper, we propose a unified question retrieval model based on latent semantic indexing with tensor analysis, which can capture word associations among different parts of CQA triples simultaneously. [sent-7, score-1.17]

3 Thus, our method can reduce lexical chasm of question retrieval with the help of the information of question content and answer parts. [sent-8, score-0.67]

4 The experimental result shows that our method outperforms the traditional methods. [sent-9, score-0.026]

5 Unlike traditional question answering (QA) , information seekers can post their questions on a CQA website which are later answered by other users. [sent-12, score-0.43]

6 However, with the increase of the CQA archive, there accumulate massive duplicate questions on CQA websites. [sent-13, score-0.135]

7 One of the primary reasons is that information seekers cannot retrieve answers they need and thus post another new question consequently. [sent-14, score-0.32]

8 The major challenge for CQA retrieval is the lexical gap (or lexical chasm) among the questions (Jeon et al. [sent-16, score-0.267]

9 Since question-answer pairs are usually short, the word mismatching problem is especially important. [sent-23, score-0.046]

10 However, due to the lexical gap between questions and answers as well as spam typically existing in user-generated content, filtering and ranking answers is very challenging. [sent-24, score-0.315]

11 The earlier studies mainly focus on generating redundant features, or finding textual clues using machine learning techniques; none of them ever consider questions and their answers as relational data but instead model them as independent information. [sent-25, score-0.225]

12 Moreover, they only consider the answers of the current question, and ignore any previous knowledge that would be helpful to bridge the lexical and se mantic gap. [sent-26, score-0.09]

13 In recent years, many methods have been proposed to solve the word mismatching problem between user questions and the questions in a QA archive(Blooma and Kurian, 2011) , among which the translation-based (Riezler et al. [sent-27, score-0.316]

14 c A2s0s1o3ci Aatsiosonc fioartio Cno fmorpu Ctoamtiopnuatalt Lioin gauli Lsitnicgsu,i psatgices 434–439, pipeline methods: (1) modeling word association; (2) question retrieval combined with other models, such as vector space model (VSM) , Okapi model (Robertson et al. [sent-34, score-0.336]

15 In this paper, we propose a novel unified retrieval model for CQA, latent semantic tensor indexing (LSTI) , which is an extension of the conventional latent semantic indexing (LSI) (Deerwester et al. [sent-37, score-1.061]

16 Similar to LSI, LSTI can integrate the two detached parts (modeling word association and question retrieval) into a single model. [sent-39, score-0.217]

17 In traditional document retrieval, LSI is an effective method to overcome two of the most severe constraints on Boolean keyword queries: synonymy, that is, multiple words with similar meanings, and polysemy, or words with more than one meanings. [sent-40, score-0.052]

18 Usually in a CQA archive, each entry (or question) is in the following triple form: ⟨question title, question content, answer⟩ . [sent-41, score-0.241]

19 Bioencau tsiet et,he performance based solely on the content or the answer part is less than satisfactory, many works proposed that additional relevant information should be provided to help question retrieval(Xue et al. [sent-42, score-0.291]

20 For example, if a question title contains the keyword “why”, the CQA triple, which contains “because” or “reason” in its answer part, is more likely to be what the user looks for. [sent-44, score-0.329]

21 Since each triple in CQA has three parts, the natural representation of the CQA collection is a three-dimensional array, or 3rd-order tensor, rather than a matrix. [sent-45, score-0.063]

22 Based on the tensor decomposition, we can model the word association simultaneously in the pairs: questionquestion, question-body and question-answer. [sent-46, score-0.527]

23 2 Related Works There are some related works on question retrieval in CQA. [sent-51, score-0.31]

24 Various query expansion tech- niques have been studied to solve word mismatch problems between queries and documents. [sent-52, score-0.052]

25 The early works on question retrieval can be traced back to finding similar questions in Frequently Asked Questions (FAQ) archives, such as the FAQ finder (Burke et al. [sent-53, score-0.491]

26 , 1997) , which usually used statistical and semantic similarity measures to rank FAQs. [sent-54, score-0.05]

27 , the vector space model(Jijkoun and de Rijke, 2005) , the Okapi BM25 model (Robertson et al. [sent-58, score-0.026]

28 , 1994) , the language model, and the translation model, for question retrieval on CQA data, and the experimental results showed that the translation model outperforms the others. [sent-59, score-0.31]

29 However, they focused only on similarity measures between queries (questions) and question titles. [sent-60, score-0.204]

30 , 2008) , a translation-based language model combining the translation model and the language model for question retrieval was proposed. [sent-62, score-0.31]

31 The results showed that translation models help question retrieval since they could effectively address the word mismatch problem of questions. [sent-63, score-0.336]

32 (2008) proposed a solution that made use of question structures for retrieval by building a structure tree for questions in a category of Yahoo! [sent-66, score-0.445]

33 Answers, which gave more weight to important phrases in question matching. [sent-67, score-0.178]

34 (2009) employed a parser to build syntactic trees for questions, and questions were ranked based on the similarity between their syntactic trees and that of the query question. [sent-69, score-0.135]

35 It is worth noting that our method is totally different to the work (Cai et al. [sent-70, score-0.028]

36 They regard documents as matrices, or the second order tensors to generate a low rank approximations of matrices (Ye, 2005) . [sent-72, score-0.219]

37 For example, they convert a 1, 000, 000-dimensional vector of word space × into a 1000 1000 matrix. [sent-73, score-0.026]

38 We just project a higher-dimensional vector to a lower-dimensional vector, but not a matrix in Cai’s model. [sent-76, score-0.134]

39 A 3rd-order tensor is 435 also introduced in our model for better representation for CQA corpus. [sent-77, score-0.527]

40 , 1990) , also called Latent Semantic Analysis (LSA) , is an approach to automatic indexing and information retrieval that attempts to overcome these problems by mapping documents as well as terms to a representation in the so-called latent semantic space. [sent-79, score-0.319]

41 The key idea of LSI is to map documents (and by symmetry terms) to a low dimensional vector space, the latent semantic space. [sent-80, score-0.086]

42 This mapping is computed by decomposing the term-document matrix N with SVD, N = UΣVt, where U and V are orthogonal matrices UtU = VtV = I and the diagonal matrix Σ contains the singular values of N. [sent-81, score-0.549]

43 The LSA approximation of N is computed by just keep the largest K singular values in Σ, which is rank K optimal in the sense of the L2-norm. [sent-82, score-0.118]

44 Scalars are denoted by lower case letters (a, b, . [sent-86, score-0.041]

45 ) , and higher-order tensors by calligraphic upper-case letters (A, B, . [sent-95, score-0.119]

46 ) known as n-way array, is a higher order generalization of a vector (first order tensor) and a matrix (second order tensor). [sent-102, score-0.134]

47 nT heele morednetr o off D te niss odre nDot ∈ed R as di1 . [sent-105, score-0.023]

48 An nN etlehm-oerdnter o fte Dns iosr d can ebed aflsat dtened into a matrix by N ways. [sent-110, score-0.157]

49 We denote the matrix D(n) as the mode-n flattening of D (Kolda, 2002) . [sent-111, score-0.186]

50 Similar with a matrix, an Nth-order tensor can be decomposed through “N-mode singular value decomposition (SVD) ”, which is a an extension of SVD that expresses the tensor as the mode-n product of N-orthogonal spaces. [sent-112, score-1.177]

51 (1) Tensor Z, known as the core tensor, is analogous otro Zth,e k diagonal singular tveanlsuoer ,m isat arnixa oinconventional matrix SVD. [sent-114, score-0.254]

52 so Zr governs ntherea lin ateraction between the mode matrices Un, for n = 1, . [sent-117, score-0.086]

53 Mode matrix Un contains the orthogonal left singular vectors of the mode-n flattened matrix D(n) . [sent-121, score-0.509]

54 The N-mode SVD algorithm for decomposing D is as follows: 1. [sent-122, score-0.03]

55 (1) by computing the SVD of the flattened matrix D(n) and setting Un to be the left matrix of the SVD. [sent-127, score-0.372]

56 Solve for the core tensor as follows Z = SDo ×1 U foT1r ×2 eU cT2o · · · ×noUrnT a · · · ×N UTN. [sent-129, score-0.527]

57 , K) , twiohner oef qi isA t threip question and⟩ ci and ai are the content and answer of qi respectively. [sent-134, score-0.415]

58 We can use a 3-order tensor D ∈ RK×3×Tto represent the collection, where TD i∈s tRhe number of terms. [sent-135, score-0.527]

59 The first dimension corresponds to entries, the second dimension, to parts and the third dimension, to the terms. [sent-136, score-0.066]

60 For example, the flattened matrix of CQA tensor with “terms” direction is composed by three sub-matrices MTitle, MContent and MAnswer, as was illustrated in Figure 1. [sent-137, score-0.788]

61 Each sub-matrix is equivalent to the traditional document-term matrix. [sent-138, score-0.026]

62 Figure 1: Flattening CQA tensor with “terms” (right matrix)and “entries” (bottom matrix) Denote pi,j to be part j of entry i. [sent-139, score-0.527]

63 idfj,k= log1 +∑i|IK(t|k∈ pi,j), (3) where |K| is the total ∑number of entries and I(·) rise t|Khe| i nisd tichaeto tor fauln cntuimonb. [sent-143, score-0.024]

64 Then the element di,j,k of tensor D is di,j,k = tfi,j,k idfj,k. [sent-144, score-0.527]

65 3 Latent Semantic Tensor Indexing For the CQA tensor, we can decompose it as illustrated in Figure 2. [sent-146, score-0.023]

66 D = Z ×1 UEntry ×2 UPart ×3 UTerm, (5) where UEntry, UPart and UTerm are left singular matrices of corresponding flattened ma- trices. [sent-147, score-0.282]

67 UTerm spans the term space, and we just use the vectors corresponding to the 1, 000 largest singular values in this paper, denoted as U′Term. [sent-148, score-0.092]

68 Figure 2: 3-mode SVD of CQA tensor To deal with such a huge sparse data set, we use singular value decomposition (SVD) implemented in Apache Mahout3 machine learning library, which is implemented on top of Apache Hadoop4 using the map/reduce paradigm and scalable to reasonably large data sets. [sent-149, score-0.65]

69 4 Question Retrieval In order to retrieve similar question effectively, we project each CQA triple Dq ∈ tRiv1×el3y×,T wtoe t phreo tjeecrmt space by Dˆi = Di ×3 U′TTerm. [sent-155, score-0.29]

70 (6) Given a new question only with title part, we can represent it by tensor Dq ∈ R1×3×T, awned c aitns MContent iatn bdy MAnswer are zero matrices. [sent-156, score-0.776]

71 Then we project Dq to the term space tarnidce get Dˆq. [sent-157, score-0.026]

72 Here, Dˆq and Dˆi are degraded tensors and can be regarded as matrices. [sent-158, score-0.078]

73 Thus, we can calculate the similarity between Dˆq and Dˆi with normalized Frobenius inner product. [sent-159, score-0.052]

74 For two matrices A and B, the Frobenius inner product, indicated as A : B, is the component-wise inner product of two matrices as though they are vectors. [sent-160, score-0.276]

75 A :B= ∑Ai,jBi,j ∑i,j (7) To reduce the affect of length, we use the normalized Frobenius inner product. [sent-161, score-0.052]

76 A : B =√A : AA × : B√B : B (8) While given a new question both with title and content parts, MContent is not a zero matrix and could be also employed in the question retrieval process. [sent-162, score-0.706]

77 1 Datasets We collected the resolved CQA triples from the “computer” category of Yahoo! [sent-165, score-0.079]

78 We just selected the resolved questions that already have been given their best answers. [sent-167, score-0.135]

79 The CQA triples are preprocessed with stopwords removal (Chinese sentences are segmented into words in advance by FudanNLP toolkit(Qiu et al. [sent-168, score-0.079]

80 In order to evaluate our retrieval system, we divide our dataset into two parts. [sent-170, score-0.132]

81 In LSI, we regard each triple as a single document. [sent-181, score-0.063]

82 Given a returned result, two annotators are asked to label it with “relevant” or “irrelevant”. [sent-183, score-0.054]

83 If an annotator considers the returned result semantically equivalent to the queried question, he labels it as “relevant”; otherwise, it is labeled as “irrelevant”. [sent-184, score-0.023]

84 The experiment results are illustrated in Table 3 and 4, which show that our method outperforms the others on both datasets. [sent-187, score-0.023]

85 The primary reason is that we incorporate the content of the question body and the answer parts into the process of ques- tion retrieval, which should tional relevance information. [sent-188, score-0.33]

86 45Ao921Pn083Datse from Baidu Zhidao the translation-based methods, our method can capture the mapping relations in three parts (question, content and answer) simultaneously. [sent-191, score-0.075]

87 It is worth noting that the problem of data sparsity is more crucial for LSTI since the size of a tensor in LSTI is larger than a termdocument matrix in LSI. [sent-192, score-0.712]

88 Therefore, more CQA triples may result in better performance for our method. [sent-194, score-0.079]

89 6 Conclusion In this paper, we proposed a novel retrieval approach for community-based QA, called LSTI, which analyzes the CQA triples with naturally tensor representation. [sent-195, score-0.738]

90 LSTI is a unified model and effectively resolves the problem of lexical chasm for question retrieval. [sent-196, score-0.275]

91 Question answering from frequently asked question files: Experiences with the faq finder system. [sent-219, score-0.363]

92 Searching questions by identifying question topic and question focus. [sent-239, score-0.491]

93 Phrase-based translation model for question retrieval in community question answer archives. [sent-264, score-0.565]

94 Finding similar questions in large question and answer archives. [sent-274, score-0.39]

95 Retrieving answers from frequently asked questions pages on the web. [sent-280, score-0.256]

96 Statistical machine translation for query expansion in answer retrieval. [sent-299, score-0.077]

97 A syntactic tree matching approach to finding similar questions in community-based QA services. [sent-318, score-0.135]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('tensor', 0.527), ('cqa', 0.511), ('lsti', 0.209), ('question', 0.178), ('lsi', 0.16), ('questions', 0.135), ('matrix', 0.134), ('retrieval', 0.132), ('jeon', 0.114), ('svd', 0.112), ('flattened', 0.104), ('indexing', 0.101), ('singular', 0.092), ('answers', 0.09), ('matrices', 0.086), ('okapi', 0.08), ('triples', 0.079), ('mcontent', 0.078), ('tensors', 0.078), ('uterm', 0.078), ('answer', 0.077), ('chasm', 0.069), ('faq', 0.069), ('xue', 0.068), ('robertson', 0.064), ('triple', 0.063), ('latent', 0.062), ('apache', 0.061), ('yahoo', 0.061), ('dq', 0.06), ('frobenius', 0.057), ('archive', 0.055), ('cai', 0.053), ('baidu', 0.053), ('blooma', 0.052), ('flattening', 0.052), ('jrxoutnermi', 0.052), ('manswer', 0.052), ('otlevksa', 0.052), ('seekers', 0.052), ('uentry', 0.052), ('upart', 0.052), ('inner', 0.052), ('qiu', 0.051), ('sigir', 0.05), ('qi', 0.049), ('title', 0.048), ('un', 0.048), ('qa', 0.047), ('finder', 0.046), ('mismatching', 0.046), ('fudannlp', 0.046), ('deerwester', 0.046), ('orthogonal', 0.045), ('xipeng', 0.043), ('jijkoun', 0.043), ('letters', 0.041), ('xuanjing', 0.04), ('zhidao', 0.04), ('answering', 0.039), ('parts', 0.039), ('burke', 0.038), ('content', 0.036), ('acm', 0.034), ('array', 0.032), ('asked', 0.031), ('decomposition', 0.031), ('decomposing', 0.03), ('croft', 0.03), ('approximations', 0.029), ('duan', 0.028), ('riezler', 0.028), ('diagonal', 0.028), ('retrieving', 0.028), ('unified', 0.028), ('noting', 0.028), ('dimension', 0.027), ('space', 0.026), ('ai', 0.026), ('mismatch', 0.026), ('traditional', 0.026), ('rank', 0.026), ('queries', 0.026), ('keyword', 0.026), ('lsa', 0.025), ('entries', 0.024), ('semantic', 0.024), ('illustrated', 0.023), ('returned', 0.023), ('zr', 0.023), ('niss', 0.023), ('sdo', 0.023), ('xiaofei', 0.023), ('wtoe', 0.023), ('termdocument', 0.023), ('yunbo', 0.023), ('awned', 0.023), ('fte', 0.023), ('fudan', 0.023), ('xjhuang', 0.023)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0 218 acl-2013-Latent Semantic Tensor Indexing for Community-based Question Answering

Author: Xipeng Qiu ; Le Tian ; Xuanjing Huang

Abstract: Retrieving similar questions is very important in community-based question answering(CQA) . In this paper, we propose a unified question retrieval model based on latent semantic indexing with tensor analysis, which can capture word associations among different parts of CQA triples simultaneously. Thus, our method can reduce lexical chasm of question retrieval with the help of the information of question content and answer parts. The experimental result shows that our method outperforms the traditional methods.

2 0.28784138 329 acl-2013-Statistical Machine Translation Improves Question Retrieval in Community Question Answering via Matrix Factorization

Author: Guangyou Zhou ; Fang Liu ; Yang Liu ; Shizhu He ; Jun Zhao

Abstract: Community question answering (CQA) has become an increasingly popular research topic. In this paper, we focus on the problem of question retrieval. Question retrieval in CQA can automatically find the most relevant and recent questions that have been solved by other users. However, the word ambiguity and word mismatch problems bring about new challenges for question retrieval in CQA. State-of-the-art approaches address these issues by implicitly expanding the queried questions with additional words or phrases using monolingual translation models. While useful, the effectiveness of these models is highly dependent on the availability of quality parallel monolingual corpora (e.g., question-answer pairs) in the absence of which they are troubled by noise issue. In this work, we propose an alternative way to address the word ambiguity and word mismatch problems by taking advantage of potentially rich semantic information drawn from other languages. Our proposed method employs statistical machine translation to improve question retrieval and enriches the question representation with the translated words from other languages via matrix factorization. Experiments conducted on a real CQA data show that our proposed approach is promising.

3 0.17617041 291 acl-2013-Question Answering Using Enhanced Lexical Semantic Models

Author: Wen-tau Yih ; Ming-Wei Chang ; Christopher Meek ; Andrzej Pastusiak

Abstract: In this paper, we study the answer sentence selection problem for question answering. Unlike previous work, which primarily leverages syntactic analysis through dependency tree matching, we focus on improving the performance using models of lexical semantic resources. Experiments show that our systems can be consistently and significantly improved with rich lexical semantic information, regardless of the choice of learning algorithms. When evaluated on a benchmark dataset, the MAP and MRR scores are increased by 8 to 10 points, compared to one of our baseline systems using only surface-form matching. Moreover, our best system also outperforms pervious work that makes use of the dependency tree structure by a wide margin.

4 0.16586815 254 acl-2013-Multimodal DBN for Predicting High-Quality Answers in cQA portals

Author: Haifeng Hu ; Bingquan Liu ; Baoxun Wang ; Ming Liu ; Xiaolong Wang

Abstract: In this paper, we address the problem for predicting cQA answer quality as a classification task. We propose a multimodal deep belief nets based approach that operates in two stages: First, the joint representation is learned by taking both textual and non-textual features into a deep learning network. Then, the joint representation learned by the network is used as input features for a linear classifier. Extensive experimental results conducted on two cQA datasets demonstrate the effectiveness of our proposed approach.

5 0.1653229 169 acl-2013-Generating Synthetic Comparable Questions for News Articles

Author: Oleg Rokhlenko ; Idan Szpektor

Abstract: We introduce the novel task of automatically generating questions that are relevant to a text but do not appear in it. One motivating example of its application is for increasing user engagement around news articles by suggesting relevant comparable questions, such as “is Beyonce a better singer than Madonna?”, for the user to answer. We present the first algorithm for the task, which consists of: (a) offline construction of a comparable question template database; (b) ranking of relevant templates to a given article; and (c) instantiation of templates only with entities in the article whose comparison under the template’s relation makes sense. We tested the suggestions generated by our algorithm via a Mechanical Turk experiment, which showed a significant improvement over the strongest baseline of more than 45% in all metrics.

6 0.15300012 60 acl-2013-Automatic Coupling of Answer Extraction and Information Retrieval

7 0.1215205 272 acl-2013-Paraphrase-Driven Learning for Open Question Answering

8 0.11543189 292 acl-2013-Question Classification Transfer

9 0.10920163 290 acl-2013-Question Analysis for Polish Question Answering

10 0.099005789 263 acl-2013-On the Predictability of Human Assessment: when Matrix Completion Meets NLP Evaluation

11 0.098263502 107 acl-2013-Deceptive Answer Prediction with User Preference Graph

12 0.097575359 76 acl-2013-Building and Evaluating a Distributional Memory for Croatian

13 0.084403917 241 acl-2013-Minimum Bayes Risk based Answer Re-ranking for Question Answering

14 0.077341244 296 acl-2013-Recognizing Identical Events with Graph Kernels

15 0.074879035 266 acl-2013-PAL: A Chatterbot System for Answering Domain-specific Questions

16 0.068777457 55 acl-2013-Are Semantically Coherent Topic Models Useful for Ad Hoc Information Retrieval?

17 0.061524805 217 acl-2013-Latent Semantic Matching: Application to Cross-language Text Categorization without Alignment Information

18 0.056241766 73 acl-2013-Broadcast News Story Segmentation Using Manifold Learning on Latent Topic Distributions

19 0.050578833 31 acl-2013-A corpus-based evaluation method for Distributional Semantic Models

20 0.050397221 338 acl-2013-Task Alternation in Parallel Sentence Retrieval for Twitter Translation


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.136), (1, 0.051), (2, 0.024), (3, -0.115), (4, 0.073), (5, 0.024), (6, 0.008), (7, -0.302), (8, 0.084), (9, 0.019), (10, 0.086), (11, -0.038), (12, 0.091), (13, -0.037), (14, 0.051), (15, 0.039), (16, -0.033), (17, -0.078), (18, 0.052), (19, 0.062), (20, 0.059), (21, -0.001), (22, 0.017), (23, -0.122), (24, -0.049), (25, -0.059), (26, -0.013), (27, -0.011), (28, 0.02), (29, 0.075), (30, 0.003), (31, 0.012), (32, 0.03), (33, -0.025), (34, 0.026), (35, 0.011), (36, -0.013), (37, -0.041), (38, -0.028), (39, 0.019), (40, -0.054), (41, 0.032), (42, 0.026), (43, -0.064), (44, -0.081), (45, 0.036), (46, 0.062), (47, 0.083), (48, 0.05), (49, 0.004)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96519089 218 acl-2013-Latent Semantic Tensor Indexing for Community-based Question Answering

Author: Xipeng Qiu ; Le Tian ; Xuanjing Huang

Abstract: Retrieving similar questions is very important in community-based question answering(CQA) . In this paper, we propose a unified question retrieval model based on latent semantic indexing with tensor analysis, which can capture word associations among different parts of CQA triples simultaneously. Thus, our method can reduce lexical chasm of question retrieval with the help of the information of question content and answer parts. The experimental result shows that our method outperforms the traditional methods.

2 0.88208276 329 acl-2013-Statistical Machine Translation Improves Question Retrieval in Community Question Answering via Matrix Factorization

Author: Guangyou Zhou ; Fang Liu ; Yang Liu ; Shizhu He ; Jun Zhao

Abstract: Community question answering (CQA) has become an increasingly popular research topic. In this paper, we focus on the problem of question retrieval. Question retrieval in CQA can automatically find the most relevant and recent questions that have been solved by other users. However, the word ambiguity and word mismatch problems bring about new challenges for question retrieval in CQA. State-of-the-art approaches address these issues by implicitly expanding the queried questions with additional words or phrases using monolingual translation models. While useful, the effectiveness of these models is highly dependent on the availability of quality parallel monolingual corpora (e.g., question-answer pairs) in the absence of which they are troubled by noise issue. In this work, we propose an alternative way to address the word ambiguity and word mismatch problems by taking advantage of potentially rich semantic information drawn from other languages. Our proposed method employs statistical machine translation to improve question retrieval and enriches the question representation with the translated words from other languages via matrix factorization. Experiments conducted on a real CQA data show that our proposed approach is promising.

3 0.81156033 60 acl-2013-Automatic Coupling of Answer Extraction and Information Retrieval

Author: Xuchen Yao ; Benjamin Van Durme ; Peter Clark

Abstract: Information Retrieval (IR) and Answer Extraction are often designed as isolated or loosely connected components in Question Answering (QA), with repeated overengineering on IR, and not necessarily performance gain for QA. We propose to tightly integrate them by coupling automatically learned features for answer extraction to a shallow-structured IR model. Our method is very quick to implement, and significantly improves IR for QA (measured in Mean Average Precision and Mean Reciprocal Rank) by 10%-20% against an uncoupled retrieval baseline in both document and passage retrieval, which further leads to a downstream 20% improvement in QA F1.

4 0.78571391 292 acl-2013-Question Classification Transfer

Author: Anne-Laure Ligozat

Abstract: Question answering systems have been developed for many languages, but most resources were created for English, which can be a problem when developing a system in another language such as French. In particular, for question classification, no labeled question corpus is available for French, so this paper studies the possibility to use existing English corpora and transfer a classification by translating the question and their labels. By translating the training corpus, we obtain results close to a monolingual setting.

5 0.76862651 241 acl-2013-Minimum Bayes Risk based Answer Re-ranking for Question Answering

Author: Nan Duan

Abstract: This paper presents two minimum Bayes risk (MBR) based Answer Re-ranking (MBRAR) approaches for the question answering (QA) task. The first approach re-ranks single QA system’s outputs by using a traditional MBR model, by measuring correlations between answer candidates; while the second approach reranks the combined outputs of multiple QA systems with heterogenous answer extraction components by using a mixture model-based MBR model. Evaluations are performed on factoid questions selected from two different domains: Jeopardy! and Web, and significant improvements are achieved on all data sets.

6 0.75139952 266 acl-2013-PAL: A Chatterbot System for Answering Domain-specific Questions

7 0.74985373 290 acl-2013-Question Analysis for Polish Question Answering

8 0.72121835 291 acl-2013-Question Answering Using Enhanced Lexical Semantic Models

9 0.71569455 272 acl-2013-Paraphrase-Driven Learning for Open Question Answering

10 0.61085147 169 acl-2013-Generating Synthetic Comparable Questions for News Articles

11 0.57046467 254 acl-2013-Multimodal DBN for Predicting High-Quality Answers in cQA portals

12 0.56968379 107 acl-2013-Deceptive Answer Prediction with User Preference Graph

13 0.46399084 239 acl-2013-Meet EDGAR, a tutoring agent at MONSERRATE

14 0.45206884 387 acl-2013-Why-Question Answering using Intra- and Inter-Sentential Causal Relations

15 0.44746378 158 acl-2013-Feature-Based Selection of Dependency Paths in Ad Hoc Information Retrieval

16 0.3875708 215 acl-2013-Large-scale Semantic Parsing via Schema Matching and Lexicon Extension

17 0.36922169 271 acl-2013-ParaQuery: Making Sense of Paraphrase Collections

18 0.36852747 141 acl-2013-Evaluating a City Exploration Dialogue System with Integrated Question-Answering and Pedestrian Navigation

19 0.36424378 250 acl-2013-Models of Translation Competitions

20 0.36185375 263 acl-2013-On the Predictability of Human Assessment: when Matrix Completion Meets NLP Evaluation


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.037), (6, 0.017), (11, 0.052), (24, 0.032), (26, 0.019), (34, 0.01), (35, 0.086), (42, 0.027), (48, 0.04), (70, 0.499), (88, 0.016), (90, 0.019), (95, 0.052)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.9859336 384 acl-2013-Visual Features for Linguists: Basic image analysis techniques for multimodally-curious NLPers

Author: Elia Bruni ; Marco Baroni

Abstract: unkown-abstract

2 0.94426638 296 acl-2013-Recognizing Identical Events with Graph Kernels

Author: Goran Glavas ; Jan Snajder

Abstract: Identifying news stories that discuss the same real-world events is important for news tracking and retrieval. Most existing approaches rely on the traditional vector space model. We propose an approach for recognizing identical real-world events based on a structured, event-oriented document representation. We structure documents as graphs of event mentions and use graph kernels to measure the similarity between document pairs. Our experiments indicate that the proposed graph-based approach can outperform the traditional vector space model, and is especially suitable for distinguishing between topically similar, yet non-identical events.

3 0.94404286 89 acl-2013-Computerized Analysis of a Verbal Fluency Test

Author: James O. Ryan ; Serguei Pakhomov ; Susan Marino ; Charles Bernick ; Sarah Banks

Abstract: We present a system for automated phonetic clustering analysis of cognitive tests of phonemic verbal fluency, on which one must name words starting with a specific letter (e.g., ‘F’) for one minute. Test responses are typically subjected to manual phonetic clustering analysis that is labor-intensive and subject to inter-rater variability. Our system provides an automated alternative. In a pilot study, we applied this system to tests of 55 novice and experienced professional fighters (boxers and mixed martial artists) and found that experienced fighters produced significantly longer chains of phonetically similar words, while no differences were found in the total number of words produced. These findings are preliminary, but strongly suggest that our system can be used to detect subtle signs of brain damage due to repetitive head trauma in individuals that are otherwise unimpaired.

4 0.93155468 348 acl-2013-The effect of non-tightness on Bayesian estimation of PCFGs

Author: Shay B. Cohen ; Mark Johnson

Abstract: Probabilistic context-free grammars have the unusual property of not always defining tight distributions (i.e., the sum of the “probabilities” of the trees the grammar generates can be less than one). This paper reviews how this non-tightness can arise and discusses its impact on Bayesian estimation of PCFGs. We begin by presenting the notion of “almost everywhere tight grammars” and show that linear CFGs follow it. We then propose three different ways of reinterpreting non-tight PCFGs to make them tight, show that the Bayesian estimators in Johnson et al. (2007) are correct under one of them, and provide MCMC samplers for the other two. We conclude with a discussion of the impact of tightness empirically.

same-paper 5 0.92397887 218 acl-2013-Latent Semantic Tensor Indexing for Community-based Question Answering

Author: Xipeng Qiu ; Le Tian ; Xuanjing Huang

Abstract: Retrieving similar questions is very important in community-based question answering(CQA) . In this paper, we propose a unified question retrieval model based on latent semantic indexing with tensor analysis, which can capture word associations among different parts of CQA triples simultaneously. Thus, our method can reduce lexical chasm of question retrieval with the help of the information of question content and answer parts. The experimental result shows that our method outperforms the traditional methods.

6 0.90042603 220 acl-2013-Learning Latent Personas of Film Characters

7 0.8987186 19 acl-2013-A Shift-Reduce Parsing Algorithm for Phrase-based String-to-Dependency Translation

8 0.82936031 356 acl-2013-Transfer Learning Based Cross-lingual Knowledge Extraction for Wikipedia

9 0.68453032 153 acl-2013-Extracting Events with Informal Temporal References in Personal Histories in Online Communities

10 0.65554142 249 acl-2013-Models of Semantic Representation with Visual Attributes

11 0.64314818 380 acl-2013-VSEM: An open library for visual semantics representation

12 0.64038777 329 acl-2013-Statistical Machine Translation Improves Question Retrieval in Community Question Answering via Matrix Factorization

13 0.6013335 274 acl-2013-Parsing Graphs with Hyperedge Replacement Grammars

14 0.59471035 167 acl-2013-Generalizing Image Captions for Image-Text Parallel Corpus

15 0.58307713 80 acl-2013-Chinese Parsing Exploiting Characters

16 0.57206136 169 acl-2013-Generating Synthetic Comparable Questions for News Articles

17 0.56963962 168 acl-2013-Generating Recommendation Dialogs by Extracting Information from User Reviews

18 0.56202388 339 acl-2013-Temporal Signals Help Label Temporal Relations

19 0.55380732 165 acl-2013-General binarization for parsing and translation

20 0.54993069 73 acl-2013-Broadcast News Story Segmentation Using Manifold Learning on Latent Topic Distributions