acl acl2013 acl2013-290 knowledge-graph by maker-knowledge-mining

290 acl-2013-Question Analysis for Polish Question Answering


Source: pdf

Author: Piotr Przybyla

Abstract: This study is devoted to the problem of question analysis for a Polish question answering system. The goal of the question analysis is to determine its general structure, type of an expected answer and create a search query for finding relevant documents in a textual knowledge base. The paper contains an overview of available solutions of these problems, description of their implementation and presents an evaluation based on a set of 1137 questions from a Polish quiz TV show. The results help to understand how an environment of a Slavonic language affects the performance of methods created for English.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Abstract This study is devoted to the problem of question analysis for a Polish question answering system. [sent-4, score-0.714]

2 The goal of the question analysis is to determine its general structure, type of an expected answer and create a search query for finding relevant documents in a textual knowledge base. [sent-5, score-0.707]

3 The paper contains an overview of available solutions of these problems, description of their implementation and presents an evaluation based on a set of 1137 questions from a Polish quiz TV show. [sent-6, score-0.179]

4 It accepts the question as an input and returns a data structure containing relevant information, herein called question model. [sent-13, score-0.65]

5 It consists of two elements: a question type and a search query. [sent-14, score-0.414]

6 The question type classifies a question to one of the categories based on its structure. [sent-15, score-0.674]

7 A general question type takes one of the following values: verification (Czy Lee Oswald zabił Johna Kennedy ’ego? [sent-16, score-0.376]

8 ), option choosing (Który z nich zabił Johna Kennedy’ego: Lance Oswald czy Lee Oswald? [sent-19, score-0.06]

9 ), unnamed entity (Czego u z˙ył Lee Oswald, z˙eby zabi ´c Johna Kennedy’ego? [sent-28, score-0.204]

10 ), other name for a given named entity (Jakiego pseudonimu u ˙zywał John Kennedy w trakcie słu z˙by wojskowej? [sent-31, score-0.224]

11 ) and multiple entities (Którzy prezydenci Stanów Zjednoczonych zostali zabici w trakcie kadencji? [sent-34, score-0.06]

12 There are many others possible, such as definition or explanation questions, but they require specific techniques for answer finding and remain beyond the scope of this work. [sent-40, score-0.057]

13 In case of named entity questions, it is also useful to find its named entity type, corresponding to a type of an entity which could be provided as an answer. [sent-48, score-0.489]

14 A list of possible options, suited to questions about general knowledge, is given in Table 1. [sent-49, score-0.094]

15 The need for a search query is motivated by performance reasons. [sent-53, score-0.238]

16 A linguistic analysis applied to a source text to find the expected answer is usually resource-consuming, so it cannot be performed on the whole corpus (in case of this experiment 839,269 articles). [sent-54, score-0.093]

17 To avoid it, we transform the question into the search query, which is subsequently used in a search engine, incorporating a full-text index of the corpus. [sent-55, score-0.374]

18 Although the query generation plays an auxiliary role, failure at this stage may lead both to too long processing times (in case of excessive number of returned documents) and lack of a final answer (in 96 Sofia, BuPlrgoacreiae,d Ainug su osft 4 th-9e2 A 0C1L3S. [sent-57, score-0.411]

19 c ud2e0n1t3 R Aes esaorc ihat Wio nr fkosrh Copo,m p augtesat 9io6n–a1l0 L2i,nguistics Question typeOccurrences named entity types and numbers of their occurrences in the test set. [sent-59, score-0.164]

20 2 Related work The problem of determination of the general question type is not frequent in existing QA solutions, as most of the public evaluation tasks, such as the TREC question answering track (Dang et al. [sent-61, score-0.848]

21 However, when it comes to named entity type determination, a proper classification is indispensable for finding an answer of a desired type. [sent-63, score-0.343]

22 Some of the interrogative pronouns, such as gdzie (Eng. [sent-64, score-0.059]

23 In questions created with them, such as Który znany malarz twierdził, z˙e obci a˛ ł sobie ucho? [sent-74, score-0.456]

24 Which famous painter claimed to have cut his ear? [sent-76, score-0.053]

25 famous painter), following the pronoun, should be analysed, as its type corresponds to a named entity type (a PERSON in this case). [sent-78, score-0.32]

26 The presented problem of question classification for Polish question answering is studied in a paper by Przybyła (2013). [sent-86, score-0.714]

27 The type determination part presented here bases on that solution, but includes several improvements. [sent-87, score-0.134]

28 To find relevant documents, existing QA solutions usually employ one of the widely available general-purpose search engines, such as Lucene. [sent-88, score-0.079]

29 Words of the question are interpreted as keywords and form a boolean query, where all the constituents are considered required. [sent-89, score-0.404]

30 When working with smaller corpora, one needs to take into account different formulations of the desired information. [sent-93, score-0.083]

31 Therefore, an initial query is subject to some modifications. [sent-94, score-0.2]

32 First, some of the keywords may be dropped from the query; Moldovan et al. [sent-95, score-0.058]

33 (2000) present 8 different heuristics of selecting them, based on quotation marks, parts of speech, detected named entities and other features, whereas Katz et al. [sent-96, score-0.081]

34 Cˇeh and Ojsteršek (2009) start term removal from the end of the sentence. [sent-98, score-0.075]

35 Apart from simplifying the query, its expansion is 97 also possible. [sent-99, score-0.043]

36 3 Question analysis For the purpose of building an open-domain corpus-based Polish question answering system, a question analysis module, based on some of the solutions presented above, has been implemented. [sent-103, score-0.755]

37 The module accepts a single question in Polish and outputs a data structure, called a question model. [sent-104, score-0.65]

38 It includes a general question type, a set of named entity types (if the general type equals NAMED_ENTITY) and a Lucene search query. [sent-105, score-0.578]

39 A set of named entity types, instead of a single one, is possible as some of the question constructions are ambiguous, e. [sent-106, score-0.462]

40 ) question may be answered by a PERSON, COUNTRY, BAND, etc. [sent-111, score-0.298]

41 1 Question type classification For the question type classification all the techniques presented above are implemented. [sent-113, score-0.454]

42 Pattern matching stage bases on a list of 176 regular expressions and sets of corresponding question types. [sent-114, score-0.38]

43 If any of the expressions matches the question, its corresponding set of types may be immediately returned at this stage. [sent-115, score-0.054]

44 relatively free word order and rich nominal inflection (Przepiórkowski, 2007). [sent-119, score-0.093]

45 However, in case of ambiguous interrogative pronouns, such as jaki (Eng. [sent-169, score-0.119]

46 which), a further analysis gets necessary to determine a question focus type. [sent-171, score-0.335]

47 The question is annotated using the morphological analyser Morfeusz (Woli ´nski, 2006), the tagger PANTERA (Aceda´ nski, 2010) and the shallow parser Spejd (Przepiórkowski, 2008). [sent-172, score-0.343]

48 The first nominal group after the pronoun is assumed to be a question focus. [sent-173, score-0.298]

49 Then, we check whether its direct and indirect parents (i. [sent-178, score-0.07]

50 synsets connected via hypernymy relations) include one of the predefined synsets, corresponding to the available named entity types. [sent-180, score-0.208]

51 In this way, each sentence is described with a set of boolean features (420 for the evaluation set described in next section), denoting the appearance of a particular root form. [sent-194, score-0.048]

52 Additionally, morphological interpretations of the first five words in the question are also extracted as features. [sent-195, score-0.343]

53 2 Query formation The basic procedure for creating a query treats each segment from the question (apart from the words included in a matched regular expression) as a keyword of an OR boolean query. [sent-198, score-0.706]

54 No term weighting or stop-words removal is implemented as Lucene uses TF/IDF statistic, which penalizes omnipresent tokens. [sent-199, score-0.075]

55 First, we start with a restrictive AND query and fall back into OR only in case it provides no results. [sent-204, score-0.244]

56 A question focus removal (applied by Moldovan et al. [sent-205, score-0.373]

57 For example, let us consider again the question Który znany malarz twierdził, z˙e obci a˛ ł sobie ucho? [sent-207, score-0.66]

58 The words of the question focus znany malarz are not absolutely necessary in a source document, but their appearance may be a helpful clue. [sent-209, score-0.577]

59 The query could also be expanded by replacing each keyword by a nested OR query, containing synonyms of the keyword, extracted from plWordNet. [sent-210, score-0.318]

60 Both the focus removal and synonym expansion have been implemented as options ofthe presented query formation mechanism. [sent-211, score-0.318]

61 Finally, one needs to remember about an important feature of Polish, typical for a Slavonic language, namely rich nominal inflection (Przepiórkowski, 2007). [sent-212, score-0.171]

62 We could either ignore this fact and look for exact matches between words in the question and a document or allow some modifications. [sent-214, score-0.298]

63 4 Evaluation For the purpose of evaluation, a set of 1137 questions from a Polish quiz TV show "Jeden z dziesi˛ eciu", published in (Karzewski, 1997), has been manually reviewed and updated. [sent-221, score-0.138]

64 A general question type and a named entity type has been assigned to each of the questions. [sent-222, score-0.618]

65 Table 1 presents the number of question types occurrences in the test set. [sent-223, score-0.298]

66 To evaluate query generation an article name has been assigned to those questions (1057), for which a single article in Wikipedia containing an answer exists. [sent-225, score-0.501]

67 Outputs of type classifiers have been gathered 99 ClassifierClassifiedPrecisionOverall pWatroaedrn edcNomi seaito-canfohidrtiensdegt931860. [sent-226, score-0.124]

68 9041284% % Table 2: Accuracy of the four question type classifiers: numbers of questions classified, percentages of correct answers and products of these two. [sent-229, score-0.47]

69 The machine learning classifiers have been evaluated using 100fold cross-validation1 . [sent-231, score-0.046]

70 Four of the presented improvements of query generation tested here include: basic OR query, AND query with fallback to OR, focus segments removal and expansion with synonyms. [sent-232, score-0.518]

71 The recorded results include recall (percentage of result lists including the desired article among the first 100) and average position of the article in the list. [sent-234, score-0.194]

72 5 Results The result of evaluation of classifiers is presented in Table 2. [sent-235, score-0.046]

73 The pattern matching stage behaves as expected: accepts only a small part of ques- tions, but yields a high precision. [sent-236, score-0.136]

74 The WordNetaided focus analysis is able to handle almost all questions with an acceptable precision. [sent-237, score-0.094]

75 Unfortunately, the accuracy of ML classifiers is not satisfactory, which could be easily explained using Table 1: there are many categories represented by very few cases. [sent-238, score-0.046]

76 An expansion of training set or dropping the least frequent categories (depending on a particular application) is necessary for better classification. [sent-239, score-0.08]

77 Results of considered query generation techniques are shown in Table 3. [sent-240, score-0.2]

78 Starting with an AND query and using OR only in case of a failure leads to an improvement of the expected article ranking position but the recall ratio drops significantly, which means that quite often the results of a restrictive query do not include the relevant article. [sent-242, score-0.614]

79 The removal of the question focus from the list of keywords also has a negative impact on performance. [sent-243, score-0.431]

80 results are those of expanding a query with synonyms - the number of matching articles grows abruptly and Lucene ranking mechanism does not lead to satisfying selection of the best 100. [sent-252, score-0.241]

81 One needs to remember that only one article has been selected for each test question, whereas probably there are many relevant Wikipedia entries in most cases. [sent-253, score-0.153]

82 Query Relative fuzziness Absolute Fixed prefix Figure 2: Impact of the fuzziness of queries on the recall using three types of fuzzy queries. [sent-255, score-0.35]

83 To show the relative and absolute fuzziness on one plot, a word-length of 10 letters is assumed. [sent-256, score-0.137]

84 As expected, taking into account inflection is necessary (cf. [sent-259, score-0.13]

85 results of exact matching), but fuzzy queries provide more accurate re100 sults, although they use no linguistic knowledge. [sent-260, score-0.11]

86 As the fuzzy queries yield the best results, an additional experiment becomes necessary to find an optimal fuzziness, i. [sent-261, score-0.147]

87 This parameter needs tuning for particular language of implementation (in this case Polish) as it reflects a mutability of its words, caused by inflection and derivation. [sent-264, score-0.132]

88 Three strategies for specifying the distance have been used: relative (with distance being a fraction of a keyword’s length), absolute (the same distance for all keywords) and with prefix (same as absolute, but with changes limited to the end of a keyword; with fixed prefix). [sent-265, score-0.08]

89 In Figure 2 the results are shown - it seems that allowing 3 changes at the end of the keyword is enough. [sent-266, score-0.118]

90 This option reflects the Polish inflection schemes and is also very fast thanks to the fixedness of the prefix. [sent-267, score-0.093]

91 6 Conclusion In this paper a set of techniques used to build a question model has been presented. [sent-268, score-0.298]

92 They have been implemented as a question analysis module for the Polish question answering task. [sent-269, score-0.714]

93 Several experiments using Polish questions and knowledge base have been performed to evaluate their performance in the environment of the Slavonic language. [sent-270, score-0.133]

94 They have led to the following conclusions: firstly, the best technique to find a correct question type is to combine pattern matching with the WordNet-aided focus analysis. [sent-271, score-0.417]

95 Secondly, it does not suffice to process the first 100 article, returned by the search engine using the default ranking procedure, as they may not contain desired information. [sent-272, score-0.136]

96 This study is part of an effort to build an opendomain corpus-based question answering system for Polish. [sent-274, score-0.465]

97 The obvious next step is to create a sentence similarity measure to select the best answer in the source document. [sent-275, score-0.057]

98 The role of lexico-semantic feedback in opendomain textual question-answering. [sent-305, score-0.049]

99 The structure and performance of an open-domain question answering system. [sent-334, score-0.416]

100 Developing a question answering system for the slovene language. [sent-358, score-0.46]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('polish', 0.381), ('question', 0.298), ('oswald', 0.213), ('kennedy', 0.203), ('query', 0.2), ('przepi', 0.133), ('slavonic', 0.133), ('lucene', 0.123), ('malarz', 0.121), ('przyby', 0.121), ('zabi', 0.121), ('znany', 0.121), ('keyword', 0.118), ('answering', 0.118), ('kt', 0.111), ('johna', 0.107), ('fuzziness', 0.099), ('ego', 0.099), ('questions', 0.094), ('inflection', 0.093), ('ojster', 0.091), ('ry', 0.086), ('entity', 0.083), ('trec', 0.082), ('moldovan', 0.081), ('named', 0.081), ('rkowski', 0.08), ('type', 0.078), ('removal', 0.075), ('article', 0.075), ('fuzzy', 0.07), ('parents', 0.07), ('qa', 0.069), ('pronouns', 0.066), ('brill', 0.063), ('aceda', 0.06), ('czy', 0.06), ('czyj', 0.06), ('dziesi', 0.06), ('eciu', 0.06), ('jaki', 0.06), ('jeden', 0.06), ('kto', 0.06), ('obci', 0.06), ('plwordnet', 0.06), ('rju', 0.06), ('sobie', 0.06), ('statek', 0.06), ('trakcie', 0.06), ('twierdzi', 0.06), ('ucho', 0.06), ('failure', 0.059), ('interrogative', 0.059), ('keywords', 0.058), ('harabagiu', 0.057), ('answer', 0.057), ('lee', 0.056), ('eh', 0.056), ('determination', 0.056), ('accepts', 0.054), ('returned', 0.054), ('piotr', 0.053), ('maziarz', 0.053), ('morfeusz', 0.053), ('nski', 0.053), ('woli', 0.053), ('painter', 0.053), ('katz', 0.049), ('vasile', 0.049), ('opendomain', 0.049), ('boolean', 0.048), ('hovy', 0.047), ('classifiers', 0.046), ('morphological', 0.045), ('desired', 0.044), ('restrictive', 0.044), ('slovene', 0.044), ('quiz', 0.044), ('synsets', 0.044), ('expansion', 0.043), ('roxana', 0.042), ('marek', 0.042), ('clef', 0.042), ('procedure', 0.042), ('prefix', 0.042), ('solutions', 0.041), ('stage', 0.041), ('matching', 0.041), ('apart', 0.041), ('queries', 0.04), ('killed', 0.039), ('sanda', 0.039), ('remember', 0.039), ('needs', 0.039), ('environment', 0.039), ('absolute', 0.038), ('search', 0.038), ('necessary', 0.037), ('lance', 0.037), ('expected', 0.036)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999946 290 acl-2013-Question Analysis for Polish Question Answering

Author: Piotr Przybyla

Abstract: This study is devoted to the problem of question analysis for a Polish question answering system. The goal of the question analysis is to determine its general structure, type of an expected answer and create a search query for finding relevant documents in a textual knowledge base. The paper contains an overview of available solutions of these problems, description of their implementation and presents an evaluation based on a set of 1137 questions from a Polish quiz TV show. The results help to understand how an environment of a Slavonic language affects the performance of methods created for English.

2 0.20962633 291 acl-2013-Question Answering Using Enhanced Lexical Semantic Models

Author: Wen-tau Yih ; Ming-Wei Chang ; Christopher Meek ; Andrzej Pastusiak

Abstract: In this paper, we study the answer sentence selection problem for question answering. Unlike previous work, which primarily leverages syntactic analysis through dependency tree matching, we focus on improving the performance using models of lexical semantic resources. Experiments show that our systems can be consistently and significantly improved with rich lexical semantic information, regardless of the choice of learning algorithms. When evaluated on a benchmark dataset, the MAP and MRR scores are increased by 8 to 10 points, compared to one of our baseline systems using only surface-form matching. Moreover, our best system also outperforms pervious work that makes use of the dependency tree structure by a wide margin.

3 0.19141066 60 acl-2013-Automatic Coupling of Answer Extraction and Information Retrieval

Author: Xuchen Yao ; Benjamin Van Durme ; Peter Clark

Abstract: Information Retrieval (IR) and Answer Extraction are often designed as isolated or loosely connected components in Question Answering (QA), with repeated overengineering on IR, and not necessarily performance gain for QA. We propose to tightly integrate them by coupling automatically learned features for answer extraction to a shallow-structured IR model. Our method is very quick to implement, and significantly improves IR for QA (measured in Mean Average Precision and Mean Reciprocal Rank) by 10%-20% against an uncoupled retrieval baseline in both document and passage retrieval, which further leads to a downstream 20% improvement in QA F1.

4 0.18306974 272 acl-2013-Paraphrase-Driven Learning for Open Question Answering

Author: Anthony Fader ; Luke Zettlemoyer ; Oren Etzioni

Abstract: We study question answering as a machine learning problem, and induce a function that maps open-domain questions to queries over a database of web extractions. Given a large, community-authored, question-paraphrase corpus, we demonstrate that it is possible to learn a semantic lexicon and linear ranking function without manually annotating questions. Our approach automatically generalizes a seed lexicon and includes a scalable, parallelized perceptron parameter estimation scheme. Experiments show that our approach more than quadruples the recall of the seed lexicon, with only an 8% loss in precision.

5 0.17812884 292 acl-2013-Question Classification Transfer

Author: Anne-Laure Ligozat

Abstract: Question answering systems have been developed for many languages, but most resources were created for English, which can be a problem when developing a system in another language such as French. In particular, for question classification, no labeled question corpus is available for French, so this paper studies the possibility to use existing English corpora and transfer a classification by translating the question and their labels. By translating the training corpus, we obtain results close to a monolingual setting.

6 0.16293277 169 acl-2013-Generating Synthetic Comparable Questions for News Articles

7 0.14983439 329 acl-2013-Statistical Machine Translation Improves Question Retrieval in Community Question Answering via Matrix Factorization

8 0.1353156 227 acl-2013-Learning to lemmatise Polish noun phrases

9 0.10920163 218 acl-2013-Latent Semantic Tensor Indexing for Community-based Question Answering

10 0.10344613 241 acl-2013-Minimum Bayes Risk based Answer Re-ranking for Question Answering

11 0.093748048 273 acl-2013-Paraphrasing Adaptation for Web Search Ranking

12 0.093090758 99 acl-2013-Crowd Prefers the Middle Path: A New IAA Metric for Crowdsourcing Reveals Turker Biases in Query Segmentation

13 0.0925401 55 acl-2013-Are Semantically Coherent Topic Models Useful for Ad Hoc Information Retrieval?

14 0.089788035 183 acl-2013-ICARUS - An Extensible Graphical Search Tool for Dependency Treebanks

15 0.087602288 231 acl-2013-Linggle: a Web-scale Linguistic Search Engine for Words in Context

16 0.084680982 303 acl-2013-Robust multilingual statistical morphological generation models

17 0.082808085 266 acl-2013-PAL: A Chatterbot System for Answering Domain-specific Questions

18 0.08089406 179 acl-2013-HYENA-live: Fine-Grained Online Entity Type Classification from Natural-language Text

19 0.080475934 160 acl-2013-Fine-grained Semantic Typing of Emerging Entities

20 0.078579292 126 acl-2013-Diverse Keyword Extraction from Conversations


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.188), (1, 0.078), (2, 0.009), (3, -0.152), (4, 0.089), (5, 0.042), (6, -0.013), (7, -0.279), (8, 0.137), (9, -0.004), (10, 0.061), (11, -0.014), (12, -0.012), (13, -0.005), (14, -0.011), (15, 0.036), (16, 0.019), (17, -0.048), (18, 0.008), (19, 0.088), (20, -0.063), (21, 0.045), (22, 0.022), (23, 0.004), (24, 0.031), (25, -0.065), (26, -0.02), (27, -0.056), (28, 0.016), (29, 0.062), (30, 0.044), (31, -0.034), (32, -0.046), (33, -0.023), (34, 0.008), (35, -0.119), (36, 0.061), (37, 0.002), (38, 0.047), (39, -0.031), (40, -0.055), (41, -0.01), (42, 0.044), (43, -0.044), (44, -0.023), (45, 0.031), (46, 0.092), (47, -0.009), (48, -0.015), (49, 0.066)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97086805 290 acl-2013-Question Analysis for Polish Question Answering

Author: Piotr Przybyla

Abstract: This study is devoted to the problem of question analysis for a Polish question answering system. The goal of the question analysis is to determine its general structure, type of an expected answer and create a search query for finding relevant documents in a textual knowledge base. The paper contains an overview of available solutions of these problems, description of their implementation and presents an evaluation based on a set of 1137 questions from a Polish quiz TV show. The results help to understand how an environment of a Slavonic language affects the performance of methods created for English.

2 0.85976428 272 acl-2013-Paraphrase-Driven Learning for Open Question Answering

Author: Anthony Fader ; Luke Zettlemoyer ; Oren Etzioni

Abstract: We study question answering as a machine learning problem, and induce a function that maps open-domain questions to queries over a database of web extractions. Given a large, community-authored, question-paraphrase corpus, we demonstrate that it is possible to learn a semantic lexicon and linear ranking function without manually annotating questions. Our approach automatically generalizes a seed lexicon and includes a scalable, parallelized perceptron parameter estimation scheme. Experiments show that our approach more than quadruples the recall of the seed lexicon, with only an 8% loss in precision.

3 0.82901925 60 acl-2013-Automatic Coupling of Answer Extraction and Information Retrieval

Author: Xuchen Yao ; Benjamin Van Durme ; Peter Clark

Abstract: Information Retrieval (IR) and Answer Extraction are often designed as isolated or loosely connected components in Question Answering (QA), with repeated overengineering on IR, and not necessarily performance gain for QA. We propose to tightly integrate them by coupling automatically learned features for answer extraction to a shallow-structured IR model. Our method is very quick to implement, and significantly improves IR for QA (measured in Mean Average Precision and Mean Reciprocal Rank) by 10%-20% against an uncoupled retrieval baseline in both document and passage retrieval, which further leads to a downstream 20% improvement in QA F1.

4 0.79482716 292 acl-2013-Question Classification Transfer

Author: Anne-Laure Ligozat

Abstract: Question answering systems have been developed for many languages, but most resources were created for English, which can be a problem when developing a system in another language such as French. In particular, for question classification, no labeled question corpus is available for French, so this paper studies the possibility to use existing English corpora and transfer a classification by translating the question and their labels. By translating the training corpus, we obtain results close to a monolingual setting.

5 0.78210163 218 acl-2013-Latent Semantic Tensor Indexing for Community-based Question Answering

Author: Xipeng Qiu ; Le Tian ; Xuanjing Huang

Abstract: Retrieving similar questions is very important in community-based question answering(CQA) . In this paper, we propose a unified question retrieval model based on latent semantic indexing with tensor analysis, which can capture word associations among different parts of CQA triples simultaneously. Thus, our method can reduce lexical chasm of question retrieval with the help of the information of question content and answer parts. The experimental result shows that our method outperforms the traditional methods.

6 0.74029118 169 acl-2013-Generating Synthetic Comparable Questions for News Articles

7 0.71609211 266 acl-2013-PAL: A Chatterbot System for Answering Domain-specific Questions

8 0.71580511 329 acl-2013-Statistical Machine Translation Improves Question Retrieval in Community Question Answering via Matrix Factorization

9 0.69667876 241 acl-2013-Minimum Bayes Risk based Answer Re-ranking for Question Answering

10 0.69584131 291 acl-2013-Question Answering Using Enhanced Lexical Semantic Models

11 0.62055176 158 acl-2013-Feature-Based Selection of Dependency Paths in Ad Hoc Information Retrieval

12 0.59939933 273 acl-2013-Paraphrasing Adaptation for Web Search Ranking

13 0.55132043 271 acl-2013-ParaQuery: Making Sense of Paraphrase Collections

14 0.52275997 159 acl-2013-Filling Knowledge Base Gaps for Distant Supervision of Relation Extraction

15 0.49329314 215 acl-2013-Large-scale Semantic Parsing via Schema Matching and Lexicon Extension

16 0.46416825 231 acl-2013-Linggle: a Web-scale Linguistic Search Engine for Words in Context

17 0.46007288 239 acl-2013-Meet EDGAR, a tutoring agent at MONSERRATE

18 0.4490093 227 acl-2013-Learning to lemmatise Polish noun phrases

19 0.42319378 340 acl-2013-Text-Driven Toponym Resolution using Indirect Supervision

20 0.4230935 107 acl-2013-Deceptive Answer Prediction with User Preference Graph


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.05), (6, 0.031), (10, 0.281), (11, 0.046), (14, 0.01), (15, 0.011), (24, 0.058), (26, 0.034), (35, 0.096), (42, 0.049), (48, 0.046), (70, 0.078), (88, 0.045), (90, 0.022), (95, 0.062)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.78015292 290 acl-2013-Question Analysis for Polish Question Answering

Author: Piotr Przybyla

Abstract: This study is devoted to the problem of question analysis for a Polish question answering system. The goal of the question analysis is to determine its general structure, type of an expected answer and create a search query for finding relevant documents in a textual knowledge base. The paper contains an overview of available solutions of these problems, description of their implementation and presents an evaluation based on a set of 1137 questions from a Polish quiz TV show. The results help to understand how an environment of a Slavonic language affects the performance of methods created for English.

2 0.53773016 249 acl-2013-Models of Semantic Representation with Visual Attributes

Author: Carina Silberer ; Vittorio Ferrari ; Mirella Lapata

Abstract: We consider the problem of grounding the meaning of words in the physical world and focus on the visual modality which we represent by visual attributes. We create a new large-scale taxonomy of visual attributes covering more than 500 concepts and their corresponding 688K images. We use this dataset to train attribute classifiers and integrate their predictions with text-based distributional models of word meaning. We show that these bimodal models give a better fit to human word association data compared to amodal models and word representations based on handcrafted norming data.

3 0.53432089 172 acl-2013-Graph-based Local Coherence Modeling

Author: Camille Guinaudeau ; Michael Strube

Abstract: We propose a computationally efficient graph-based approach for local coherence modeling. We evaluate our system on three tasks: sentence ordering, summary coherence rating and readability assessment. The performance is comparable to entity grid based approaches though these rely on a computationally expensive training phase and face data sparsity problems.

4 0.53423816 272 acl-2013-Paraphrase-Driven Learning for Open Question Answering

Author: Anthony Fader ; Luke Zettlemoyer ; Oren Etzioni

Abstract: We study question answering as a machine learning problem, and induce a function that maps open-domain questions to queries over a database of web extractions. Given a large, community-authored, question-paraphrase corpus, we demonstrate that it is possible to learn a semantic lexicon and linear ranking function without manually annotating questions. Our approach automatically generalizes a seed lexicon and includes a scalable, parallelized perceptron parameter estimation scheme. Experiments show that our approach more than quadruples the recall of the seed lexicon, with only an 8% loss in precision.

5 0.53377807 85 acl-2013-Combining Intra- and Multi-sentential Rhetorical Parsing for Document-level Discourse Analysis

Author: Shafiq Joty ; Giuseppe Carenini ; Raymond Ng ; Yashar Mehdad

Abstract: We propose a novel approach for developing a two-stage document-level discourse parser. Our parser builds a discourse tree by applying an optimal parsing algorithm to probabilities inferred from two Conditional Random Fields: one for intrasentential parsing and the other for multisentential parsing. We present two approaches to combine these two stages of discourse parsing effectively. A set of empirical evaluations over two different datasets demonstrates that our discourse parser significantly outperforms the stateof-the-art, often by a wide margin.

6 0.53306651 329 acl-2013-Statistical Machine Translation Improves Question Retrieval in Community Question Answering via Matrix Factorization

7 0.53249127 224 acl-2013-Learning to Extract International Relations from Political Context

8 0.53080338 252 acl-2013-Multigraph Clustering for Unsupervised Coreference Resolution

9 0.52989405 291 acl-2013-Question Answering Using Enhanced Lexical Semantic Models

10 0.52788997 169 acl-2013-Generating Synthetic Comparable Questions for News Articles

11 0.52699292 254 acl-2013-Multimodal DBN for Predicting High-Quality Answers in cQA portals

12 0.52685565 167 acl-2013-Generalizing Image Captions for Image-Text Parallel Corpus

13 0.52650714 134 acl-2013-Embedding Semantic Similarity in Tree Kernels for Domain Adaptation of Relation Extraction

14 0.5255338 347 acl-2013-The Role of Syntax in Vector Space Models of Compositional Semantics

15 0.52497238 275 acl-2013-Parsing with Compositional Vector Grammars

16 0.52495879 158 acl-2013-Feature-Based Selection of Dependency Paths in Ad Hoc Information Retrieval

17 0.52488708 46 acl-2013-An Infinite Hierarchical Bayesian Model of Phrasal Translation

18 0.52449769 175 acl-2013-Grounded Language Learning from Video Described with Sentences

19 0.52425569 318 acl-2013-Sentiment Relevance

20 0.52318406 215 acl-2013-Large-scale Semantic Parsing via Schema Matching and Lexicon Extension