acl acl2013 acl2013-272 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Anthony Fader ; Luke Zettlemoyer ; Oren Etzioni
Abstract: We study question answering as a machine learning problem, and induce a function that maps open-domain questions to queries over a database of web extractions. Given a large, community-authored, question-paraphrase corpus, we demonstrate that it is possible to learn a semantic lexicon and linear ranking function without manually annotating questions. Our approach automatically generalizes a seed lexicon and includes a scalable, parallelized perceptron parameter estimation scheme. Experiments show that our approach more than quadruples the recall of the seed lexicon, with only an 8% loss in precision.
Reference: text
sentIndex sentText sentNum sentScore
1 edu s Abstract We study question answering as a machine learning problem, and induce a function that maps open-domain questions to queries over a database of web extractions. [sent-3, score-0.932]
2 Our approach automatically generalizes a seed lexicon and includes a scalable, parallelized perceptron parameter estimation scheme. [sent-5, score-0.329]
3 1 Introduction Open-domain question answering (QA) is a longstanding, unsolved problem. [sent-7, score-0.302]
4 The central challenge is to automate every step of QA system construction, including gathering large databases and answering questions against these databases. [sent-8, score-0.454]
5 , 2010), the problem of answering questions with the noisy knowledge bases that IE systems produce has received less attention. [sent-12, score-0.423]
6 In this paper, we present an approach for learning to map questions to formal queries over a large, open-domain database of extracted facts (Fader et al. [sent-13, score-0.676]
7 We focus on answering open-domain questions that can be answered with single-relation queries, e. [sent-22, score-0.428]
8 The algorithm answers such questions by mapping them to executable queries over a tuple store containing relations such as authored (milne , winnie-the-pooh ) and t reat ( bl oody-mary , hangover-sympt oms ) . [sent-27, score-0.509]
9 Crucially, the approach does not require any explicit labeling of the questions in our paraphrase corpus. [sent-40, score-0.497]
10 Instead, we use 16 seed question templates and string-matching to find high-quality queries for a small subset of the questions. [sent-41, score-0.472]
11 We then learn a linear ranking model to filter the learned lexical equivalences, keeping only those that are likely to answer questions well in practice. [sent-43, score-0.49]
12 In sum, we make the following contributions: • We introduce PARALEX, an end-to-end opendWoem iantinro question answering system. [sent-48, score-0.302]
13 • We describe scalable learning algorithms that iWndeu dcees general question templates ranitdhm mlesx itchaalt variants of entities and relations. [sent-49, score-0.322]
14 • We evaluate PARALEX on the end-task of answering questions from WikiAnswers using a database of web extractions, and show that it outperforms baseline systems. [sent-51, score-0.594]
15 While much progress has been made in converting text into structured knowledge, there has been little work on answering natural language questions over these databases. [sent-66, score-0.384]
16 More recently, researchers have created systems that use machine learning techniques to automatically construct question answering systems from data (Zelle and Mooney, 1996; Popescu et al. [sent-72, score-0.302]
17 These systems have the ability to handle questions with complex semantics on small domain-specific databases like GeoQuery (Tang and Mooney, 2001) or subsets of Freebase (Cai and Yates, 2013), but have yet to scale to the task of general, open-domain question answering. [sent-76, score-0.566]
18 In contrast, our system answers questions with more limited semantics, but does so at a very large scale in an open-domain manner. [sent-77, score-0.347]
19 However, we use a paraphrase corpus for extracting lexical items relating natural language patterns to database concepts, as opposed to relationships between pairs of natural language utterances. [sent-84, score-0.565]
20 Problem Our goal is to learn a function that will map a natural language question x to a query z over a database D. [sent-86, score-0.574]
21 The database D is a collection of assertions in the form r(e1, e2) where r is a bi1609 nary relation from a vocabulary R, and e1 and e2 are entities from a vocabulary E. [sent-87, score-0.309]
22 The database is equipped with a simple interface that accepts queries in the form r(? [sent-90, score-0.341]
23 Thus, our task is to find the query z that best captures the semantics of the question x. [sent-94, score-0.364]
24 Model The question answering model includes a lexicon and a linear ranking function. [sent-95, score-0.484]
25 The lexicon L associates natural language patterns to database concepts, thereby defining the space of queries that can be derived from the input question (see Table 2). [sent-96, score-0.767]
26 Lexical entries can pair strings with database entities (nyc and new-york), strings with database relations (big and population), or question patterns with templated database queries (how r is e ? [sent-97, score-1.249]
27 We learn L by bootstrapping from an initial seed lexicon L0 over a corpus of question paraphrases C = {(x, x0) : cxo0 risp a paraphrase onf x}, plihkrea stehse examples ,inx Tablei s1 . [sent-102, score-0.808]
28 a W paer aepsthimraasete o θf by using tthhee einxiatimalp l eesxic inon T atoautomatically label queries in the paraphrase corpus, as described in Section 5. [sent-103, score-0.339]
29 Evaluation In Section 8, we evaluate our system against various baselines on the end-task of ques- tion answering against a large database of facts extracted from the web. [sent-106, score-0.351]
30 We use held-out knownanswerable questions from WikiAnswers as a test set. [sent-107, score-0.289]
31 1 Lexicon and Derivations To define the space of possible queries, PARALEX uses a lexicon L that encodes mappings from natural language to database concepts (entities, relations, and queries). [sent-110, score-0.394]
32 Entity patterns match a contiguous string of words and are associated with some database entity e ∈ E. [sent-117, score-0.385]
33 Question patterns match an entire question string, with gaps that recursively match an entity or relation patterns. [sent-121, score-0.417]
34 Question patterns are associated with a templated database query, where the values of the variables are determined by the matched entity and relation patterns. [sent-122, score-0.462]
35 A question pattern may be 1-Argument, with a variable for an entity pattern, or 2-Argument, with variables for an entity pattern and a relation pattern. [sent-123, score-0.421]
36 The lexicon is used to generate a derivation y from an input question x to a database query z. [sent-128, score-0.822]
37 For example, the entries in Table 2 can be used to make the following derivation from the question How big is nyc? [sent-129, score-0.443]
38 , new-york ) : This derivation proceeds in two steps: first matching a question form like How r is e ? [sent-131, score-0.306]
39 and then mapping big to populat ion and nyc to new-york. [sent-132, score-0.281]
40 Factoring the derivation this way allows the lexical entries for big and nyc to be reused in semanti1610 cally equivalent variants like nyc how big is it? [sent-133, score-0.673]
41 This factorization helps the system generalize to novel questions that do not appear in the training set. [sent-135, score-0.343]
42 We model a derivation as a set of (pi, di) pairs, where each pi matches a substring of x, the substrings cover all words in x, and the database concepts di compose to form z. [sent-136, score-0.344]
43 Derivations are rooted at either a 1-argument or 2-argument question entry and have entity or relation entries as leaves. [sent-137, score-0.455]
44 2 Linear Ranking Function In general, multiple queries may be derived from a single input question x using a lexicon L. [sent-139, score-0.487]
45 Given a question x, we consider all derivations y and score them with θ · φ(x, y), where φ(x, y) is a ann-ddim scoenresio thneaml f weiatthur θe representation a φn(dx ,θy i)s a n-dimensional parameter vector. [sent-141, score-0.329]
46 The best derivation y∗ (x) according to the model (θ, L) is given by: y∗ (x) = arg max θ · φ(x, y) y∈GEN(x;L) The best query z∗ (x) can be computed directly from the derivation y∗ (x). [sent-143, score-0.355]
47 Computing the set GEN(x; L) involves finding all 1-Argument and 2-Argument question patterns that match x, and then enumerating all possible database concepts that match entity and relation strings. [sent-144, score-0.662]
48 When the database and lexicon are large, this becomes intractable. [sent-145, score-0.359]
49 We score an answer a with the highest score of all derivations that generate a query with answer a. [sent-148, score-0.389]
50 This assumption allows the algorithm to generalize from the initial seed lexicon L0, greatly increasing the lexical coverage. [sent-156, score-0.381]
51 Suppose x can be mapped to a query under L0 using the following derivation y: what is the r ofe population new york = = = r (? [sent-159, score-0.419]
52 new-york We call this procedure InduceLex(x, x0, y, A), which takes a paraphrase pair (x, x0), a derivation y of x, and a word alignment A, and returns a new set of lexical entries. [sent-162, score-0.406]
53 We will now define InduceLex(x, x0, y, A) for the case where the derivation y consists of a 2argument question entry (pq, dq), a relation entry 1611 (pr, dr), and an entity entry (pe, de), as shown in the example above. [sent-177, score-0.614]
54 In practice, for a given paraphrase pair (x, x0) and alignment A, InduceLex will generate multiple sets of new lexical entries, resulting in a lexicon with millions of entries. [sent-185, score-0.456]
55 2 Parameter Learning Parameter learning is necessary for filtering out derivations that use incorrect lexical entries like new mexico = mexico, which arise from noise in the paraphrases and noise in the word alignment. [sent-189, score-0.329]
56 2InduceLex has similar behavior for the other type of derivation, which consists of a 1-argument question entry (pq , dq) and an entity (pe , de). [sent-190, score-0.337]
57 First, we use the initial lexicon L0 to automatically generate (question x, query z) training examples from the paraphrase corpus C. [sent-196, score-0.566]
58 We use a hidden variable version of the perceptron algorithm (Collins, 2002), where the model parameters are updated using the highest scoring derivation y∗ that will generate the correct query z using the learned lexicon L. [sent-211, score-0.518]
59 3 The database consists of a set of triples r(e1, e2) over a vocabulary of approximately 600K relations and 2M entities, extracted from the ClueWeb09 corpus. [sent-214, score-0.294]
60 WikiAnswers users can tag pairs of questions as alternate wordings of each other. [sent-229, score-0.289]
61 Most of the incorrect paraphrases were questions that were related, but not paraphrased e. [sent-234, score-0.473]
62 A system may cor- rectly map a test question to a valid query, only to return 0 results when executed against the incomplete database. [sent-247, score-0.285]
63 We factor out this source of error by semi-automatically constructing a sample of questions that are known to be answerable using the REVERB database, and thus allows for a meaningful comparison on the task of question understanding. [sent-248, score-0.533]
64 To create the evaluation set, we identified questions x in a held out portion of the WikiAnswers corpus such that (1) x can be mapped to some query z using an initial lexicon (described in Section 7. [sent-249, score-0.677]
65 For example, the question What is the language of Hong-Kong satisfies these requirements, so we added these questions to the evaluation set: What is the language of Hong-Kong? [sent-252, score-0.496]
66 This methodology allows us to evaluate the systems’ ability to handle syntactic and lexical variations of questions that should have the same answers. [sent-258, score-0.333]
67 We removed all of these questions and their paraphrases from the training set. [sent-260, score-0.399]
68 We then created a gold-standard set of (x, a, l) triples, where x is a question, a is an answer, and l 1613 Table 3: The question patterns used in the initial lexicon L0. [sent-262, score-0.478]
69 To create the gold- standard, we first ran each system on the evaluation questions to generate (x, a) pairs. [sent-264, score-0.289]
70 If (x, a) was tagged with label land x0 is a paraphrase of x, we automatically added the labeling (x0, a, l), since questions in the same cluster should have the same answer sets. [sent-267, score-0.604]
71 Precision is the fraction of predicted answers that are correct and recall is the fraction of questions where a correct answer was predicted. [sent-272, score-0.462]
72 02A0428P Table 4: Performance on WikiAnswers questions known to be answerable using REVERB. [sent-281, score-0.326]
73 We limit each system to return the top 100 database queries for each test sentence. [sent-290, score-0.372]
74 4 Initial Lexicon Both the lexical learning and parameter learning algorithms rely on an initial seed lexicon L0. [sent-293, score-0.363]
75 The initial lexicon allows the learning algorithms to bootstrap from the paraphrase corpus. [sent-294, score-0.409]
76 We construct L0 from a set of 16 hand-written 2-argument question patterns and the output of the identity transformation on the entity and relation strings in the database. [sent-295, score-0.459]
77 parameter-learning algorithm also results in a large gain in both precision and recall: InduceLex generates a noisy set of patterns, so selecting the best query for a question is more challenging. [sent-301, score-0.437]
78 For each row, we removed the learned lexical items from each of the types described in Section 4, keeping only the initial seed lexical items. [sent-303, score-0.306]
79 The learned 2argument question templates significantly increase the recall of the system. [sent-304, score-0.352]
80 The 2-argument question templates help PARALEX generalize over different variations of the same question, like the test questions shown in Table 7. [sent-310, score-0.602]
81 For each question, PARALEX combines a 2-argument question template (shown below the questions) with the rules celebrate = holiday-of and christians = christians to derive a full query. [sent-311, score-0.291]
82 9 Error Analysis To understand how close we are to the goal of open-domain QA, we ran PARALEX on an unrestricted sample of questions from WikiAnswers. [sent-314, score-0.335]
83 We found that PARALEX performs significantly worse on this dataset, with recall maxing out at ap- Table 7: Questions from the test set with 2argument question patterns that PARALEX used to derive a correct query. [sent-316, score-0.319]
84 proximately 6% of the questions answered at precision 0. [sent-317, score-0.367]
85 This is not surprising, since the test questions are not restricted to topics covered by the REVERB database, and may be too complex to be answered by any database of relational triples. [sent-319, score-0.585]
86 We performed an error analysis on a sample of 100 questions that were either incorrectly answered or unanswered. [sent-320, score-0.333]
87 We examined the candidate queries that PARALEX generated for each question and tagged each query as correct (would return a valid answer given a correct and complete database) or incorrect. [sent-321, score-0.633]
88 Because the input questions are unrestricted, we also judged whether the questions could be faithfully represented as a r(? [sent-322, score-0.578]
89 The largest source of error (36%) were on com1615 plex questions that could not be represented as a query for various reasons. [sent-326, score-0.446]
90 The largest group (14%) were questions that need n-ary or higher-order database relations, for example How long does it take to drive from Sacramento to Cancun? [sent-328, score-0.499]
91 Approximately 13% of the questions were how-to questions like How do you make axes in minecraft? [sent-330, score-0.578]
92 Lastly, 9% of the questions require database operators like joins, for example When were Bobby Orr’s children born? [sent-332, score-0.499]
93 The second largest source of error (32%) were questions that could be represented as a query, but where PARALEX was unable to derive any correct queries. [sent-333, score-0.289]
94 was not mapped to any queries, even though the REVERB database contains the relation grown-in and the entity nigeria. [sent-335, score-0.38]
95 We found that 13% of the incorrect questions were cases where the entity was not recognized, 12% were cases where the relation was not recognized, and 6% were cases where both the entity and relation were not recognized. [sent-336, score-0.606]
96 Finally, approximately 4% of the questions included typos or were judged to be inscrutable, for example Barovier hiriacy of evidence based for pressure sore ? [sent-341, score-0.33]
97 Discussion Our experiments show that the learning algorithms described in Section 5 allow PARALEX to generalize beyond an initial lexicon and answer questions with significantly higher accuracy. [sent-342, score-0.617]
98 Our error analysis on an unrestricted set of WikiAnswers questions shows that PARALEX is still far from the goal of truly high-recall, opendomain QA. [sent-343, score-0.335]
99 We found that many questions asked on WikiAnswers are either too complex to be mapped to a simple relational query, or are not covered by the REVERB database. [sent-344, score-0.361]
100 Table 8: Error distribution of PARALEX on an unrestricted sample of questions from the WikiAnswers dataset. [sent-346, score-0.335]
wordName wordTfidf (topN-words)
[('paralex', 0.505), ('questions', 0.289), ('wikianswers', 0.252), ('database', 0.21), ('paraphrase', 0.208), ('question', 0.207), ('query', 0.157), ('nyc', 0.154), ('lexicon', 0.149), ('inducelex', 0.147), ('reverb', 0.138), ('queries', 0.131), ('paraphrases', 0.11), ('derivation', 0.099), ('answering', 0.095), ('derivations', 0.086), ('big', 0.085), ('seed', 0.082), ('qa', 0.076), ('entity', 0.074), ('answer', 0.073), ('patterns', 0.07), ('databases', 0.07), ('population', 0.07), ('hoffmann', 0.069), ('fader', 0.066), ('relation', 0.066), ('ofe', 0.063), ('pooh', 0.063), ('perceptron', 0.062), ('answers', 0.058), ('extractions', 0.057), ('pq', 0.056), ('entry', 0.056), ('alignment', 0.055), ('generalize', 0.054), ('initial', 0.052), ('entries', 0.052), ('templates', 0.052), ('gen', 0.052), ('learned', 0.051), ('dq', 0.048), ('executed', 0.047), ('facts', 0.046), ('induces', 0.046), ('unrestricted', 0.046), ('lexical', 0.044), ('answered', 0.044), ('zettlemoyer', 0.044), ('oren', 0.044), ('triples', 0.043), ('strings', 0.042), ('christians', 0.042), ('dietician', 0.042), ('mall', 0.042), ('nlidb', 0.042), ('populat', 0.042), ('templated', 0.042), ('winnie', 0.042), ('relational', 0.042), ('recall', 0.042), ('riedel', 0.041), ('approximately', 0.041), ('banko', 0.039), ('noisy', 0.039), ('incorrect', 0.037), ('answerable', 0.037), ('yahya', 0.037), ('equivalences', 0.037), ('kwok', 0.037), ('paraphrased', 0.037), ('unger', 0.037), ('kong', 0.037), ('parameter', 0.036), ('marton', 0.036), ('barzilay', 0.035), ('hong', 0.035), ('partition', 0.035), ('concepts', 0.035), ('yao', 0.034), ('grosz', 0.034), ('shingt', 0.034), ('tagged', 0.034), ('precision', 0.034), ('luke', 0.034), ('items', 0.033), ('ranking', 0.033), ('entities', 0.033), ('zelle', 0.032), ('ie', 0.032), ('contiguous', 0.031), ('return', 0.031), ('factoring', 0.031), ('bannard', 0.031), ('authored', 0.031), ('congle', 0.031), ('mapped', 0.03), ('scalable', 0.03), ('etzioni', 0.029), ('brill', 0.029)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000002 272 acl-2013-Paraphrase-Driven Learning for Open Question Answering
Author: Anthony Fader ; Luke Zettlemoyer ; Oren Etzioni
Abstract: We study question answering as a machine learning problem, and induce a function that maps open-domain questions to queries over a database of web extractions. Given a large, community-authored, question-paraphrase corpus, we demonstrate that it is possible to learn a semantic lexicon and linear ranking function without manually annotating questions. Our approach automatically generalizes a seed lexicon and includes a scalable, parallelized perceptron parameter estimation scheme. Experiments show that our approach more than quadruples the recall of the seed lexicon, with only an 8% loss in precision.
2 0.22855891 169 acl-2013-Generating Synthetic Comparable Questions for News Articles
Author: Oleg Rokhlenko ; Idan Szpektor
Abstract: We introduce the novel task of automatically generating questions that are relevant to a text but do not appear in it. One motivating example of its application is for increasing user engagement around news articles by suggesting relevant comparable questions, such as “is Beyonce a better singer than Madonna?”, for the user to answer. We present the first algorithm for the task, which consists of: (a) offline construction of a comparable question template database; (b) ranking of relevant templates to a given article; and (c) instantiation of templates only with entities in the article whose comparison under the template’s relation makes sense. We tested the suggestions generated by our algorithm via a Mechanical Turk experiment, which showed a significant improvement over the strongest baseline of more than 45% in all metrics.
3 0.19896819 273 acl-2013-Paraphrasing Adaptation for Web Search Ranking
Author: Chenguang Wang ; Nan Duan ; Ming Zhou ; Ming Zhang
Abstract: Mismatch between queries and documents is a key issue for the web search task. In order to narrow down such mismatch, in this paper, we present an in-depth investigation on adapting a paraphrasing technique to web search from three aspects: a search-oriented paraphrasing model; an NDCG-based parameter optimization algorithm; an enhanced ranking model leveraging augmented features computed on paraphrases of original queries. Ex- periments performed on the large scale query-document data set show that, the search performance can be significantly improved, with +3.28% and +1.14% NDCG gains on dev and test sets respectively.
4 0.19620982 291 acl-2013-Question Answering Using Enhanced Lexical Semantic Models
Author: Wen-tau Yih ; Ming-Wei Chang ; Christopher Meek ; Andrzej Pastusiak
Abstract: In this paper, we study the answer sentence selection problem for question answering. Unlike previous work, which primarily leverages syntactic analysis through dependency tree matching, we focus on improving the performance using models of lexical semantic resources. Experiments show that our systems can be consistently and significantly improved with rich lexical semantic information, regardless of the choice of learning algorithms. When evaluated on a benchmark dataset, the MAP and MRR scores are increased by 8 to 10 points, compared to one of our baseline systems using only surface-form matching. Moreover, our best system also outperforms pervious work that makes use of the dependency tree structure by a wide margin.
5 0.18306974 290 acl-2013-Question Analysis for Polish Question Answering
Author: Piotr Przybyla
Abstract: This study is devoted to the problem of question analysis for a Polish question answering system. The goal of the question analysis is to determine its general structure, type of an expected answer and create a search query for finding relevant documents in a textual knowledge base. The paper contains an overview of available solutions of these problems, description of their implementation and presents an evaluation based on a set of 1137 questions from a Polish quiz TV show. The results help to understand how an environment of a Slavonic language affects the performance of methods created for English.
6 0.18008736 60 acl-2013-Automatic Coupling of Answer Extraction and Information Retrieval
7 0.17052706 292 acl-2013-Question Classification Transfer
9 0.16678755 271 acl-2013-ParaQuery: Making Sense of Paraphrase Collections
10 0.13847066 215 acl-2013-Large-scale Semantic Parsing via Schema Matching and Lexicon Extension
11 0.1215205 218 acl-2013-Latent Semantic Tensor Indexing for Community-based Question Answering
12 0.11872172 352 acl-2013-Towards Accurate Distant Supervision for Relational Facts Extraction
13 0.11788942 159 acl-2013-Filling Knowledge Base Gaps for Distant Supervision of Relation Extraction
14 0.11559853 241 acl-2013-Minimum Bayes Risk based Answer Re-ranking for Question Answering
15 0.10285346 266 acl-2013-PAL: A Chatterbot System for Answering Domain-specific Questions
16 0.10109579 168 acl-2013-Generating Recommendation Dialogs by Extracting Information from User Reviews
17 0.086879723 99 acl-2013-Crowd Prefers the Middle Path: A New IAA Metric for Crowdsourcing Reveals Turker Biases in Query Segmentation
18 0.086529061 160 acl-2013-Fine-grained Semantic Typing of Emerging Entities
19 0.085329995 107 acl-2013-Deceptive Answer Prediction with User Preference Graph
20 0.081649721 174 acl-2013-Graph Propagation for Paraphrasing Out-of-Vocabulary Words in Statistical Machine Translation
topicId topicWeight
[(0, 0.219), (1, 0.054), (2, 0.025), (3, -0.137), (4, 0.05), (5, 0.101), (6, -0.015), (7, -0.342), (8, 0.136), (9, 0.006), (10, 0.082), (11, -0.071), (12, 0.037), (13, -0.031), (14, 0.057), (15, 0.023), (16, 0.066), (17, -0.034), (18, -0.003), (19, 0.054), (20, -0.033), (21, 0.025), (22, 0.008), (23, 0.071), (24, 0.051), (25, -0.045), (26, -0.023), (27, 0.09), (28, -0.053), (29, 0.093), (30, 0.01), (31, -0.005), (32, -0.111), (33, -0.034), (34, -0.036), (35, -0.13), (36, 0.088), (37, -0.039), (38, -0.009), (39, 0.062), (40, -0.001), (41, 0.036), (42, 0.046), (43, -0.051), (44, 0.009), (45, 0.013), (46, 0.032), (47, -0.0), (48, 0.041), (49, 0.075)]
simIndex simValue paperId paperTitle
same-paper 1 0.96294868 272 acl-2013-Paraphrase-Driven Learning for Open Question Answering
Author: Anthony Fader ; Luke Zettlemoyer ; Oren Etzioni
Abstract: We study question answering as a machine learning problem, and induce a function that maps open-domain questions to queries over a database of web extractions. Given a large, community-authored, question-paraphrase corpus, we demonstrate that it is possible to learn a semantic lexicon and linear ranking function without manually annotating questions. Our approach automatically generalizes a seed lexicon and includes a scalable, parallelized perceptron parameter estimation scheme. Experiments show that our approach more than quadruples the recall of the seed lexicon, with only an 8% loss in precision.
2 0.85476887 290 acl-2013-Question Analysis for Polish Question Answering
Author: Piotr Przybyla
Abstract: This study is devoted to the problem of question analysis for a Polish question answering system. The goal of the question analysis is to determine its general structure, type of an expected answer and create a search query for finding relevant documents in a textual knowledge base. The paper contains an overview of available solutions of these problems, description of their implementation and presents an evaluation based on a set of 1137 questions from a Polish quiz TV show. The results help to understand how an environment of a Slavonic language affects the performance of methods created for English.
3 0.75889754 60 acl-2013-Automatic Coupling of Answer Extraction and Information Retrieval
Author: Xuchen Yao ; Benjamin Van Durme ; Peter Clark
Abstract: Information Retrieval (IR) and Answer Extraction are often designed as isolated or loosely connected components in Question Answering (QA), with repeated overengineering on IR, and not necessarily performance gain for QA. We propose to tightly integrate them by coupling automatically learned features for answer extraction to a shallow-structured IR model. Our method is very quick to implement, and significantly improves IR for QA (measured in Mean Average Precision and Mean Reciprocal Rank) by 10%-20% against an uncoupled retrieval baseline in both document and passage retrieval, which further leads to a downstream 20% improvement in QA F1.
4 0.7398454 218 acl-2013-Latent Semantic Tensor Indexing for Community-based Question Answering
Author: Xipeng Qiu ; Le Tian ; Xuanjing Huang
Abstract: Retrieving similar questions is very important in community-based question answering(CQA) . In this paper, we propose a unified question retrieval model based on latent semantic indexing with tensor analysis, which can capture word associations among different parts of CQA triples simultaneously. Thus, our method can reduce lexical chasm of question retrieval with the help of the information of question content and answer parts. The experimental result shows that our method outperforms the traditional methods.
5 0.73635203 169 acl-2013-Generating Synthetic Comparable Questions for News Articles
Author: Oleg Rokhlenko ; Idan Szpektor
Abstract: We introduce the novel task of automatically generating questions that are relevant to a text but do not appear in it. One motivating example of its application is for increasing user engagement around news articles by suggesting relevant comparable questions, such as “is Beyonce a better singer than Madonna?”, for the user to answer. We present the first algorithm for the task, which consists of: (a) offline construction of a comparable question template database; (b) ranking of relevant templates to a given article; and (c) instantiation of templates only with entities in the article whose comparison under the template’s relation makes sense. We tested the suggestions generated by our algorithm via a Mechanical Turk experiment, which showed a significant improvement over the strongest baseline of more than 45% in all metrics.
6 0.70186651 291 acl-2013-Question Answering Using Enhanced Lexical Semantic Models
7 0.70151836 273 acl-2013-Paraphrasing Adaptation for Web Search Ranking
8 0.68788576 271 acl-2013-ParaQuery: Making Sense of Paraphrase Collections
9 0.67960835 292 acl-2013-Question Classification Transfer
10 0.6782074 266 acl-2013-PAL: A Chatterbot System for Answering Domain-specific Questions
12 0.65889418 215 acl-2013-Large-scale Semantic Parsing via Schema Matching and Lexicon Extension
13 0.626378 158 acl-2013-Feature-Based Selection of Dependency Paths in Ad Hoc Information Retrieval
14 0.60980392 241 acl-2013-Minimum Bayes Risk based Answer Re-ranking for Question Answering
15 0.59398264 159 acl-2013-Filling Knowledge Base Gaps for Distant Supervision of Relation Extraction
16 0.53802097 176 acl-2013-Grounded Unsupervised Semantic Parsing
17 0.46144384 387 acl-2013-Why-Question Answering using Intra- and Inter-Sentential Causal Relations
18 0.45641571 231 acl-2013-Linggle: a Web-scale Linguistic Search Engine for Words in Context
19 0.43467614 183 acl-2013-ICARUS - An Extensible Graphical Search Tool for Dependency Treebanks
20 0.43329144 285 acl-2013-Propminer: A Workflow for Interactive Information Extraction and Exploration using Dependency Trees
topicId topicWeight
[(0, 0.046), (4, 0.011), (6, 0.042), (11, 0.084), (21, 0.02), (24, 0.08), (26, 0.046), (35, 0.096), (42, 0.041), (48, 0.047), (57, 0.177), (64, 0.015), (70, 0.084), (88, 0.026), (90, 0.034), (95, 0.065)]
simIndex simValue paperId paperTitle
1 0.89581877 230 acl-2013-Lightly Supervised Learning of Procedural Dialog Systems
Author: Svitlana Volkova ; Pallavi Choudhury ; Chris Quirk ; Bill Dolan ; Luke Zettlemoyer
Abstract: Procedural dialog systems can help users achieve a wide range of goals. However, such systems are challenging to build, currently requiring manual engineering of substantial domain-specific task knowledge and dialog management strategies. In this paper, we demonstrate that it is possible to learn procedural dialog systems given only light supervision, of the type that can be provided by non-experts. We consider domains where the required task knowledge exists in textual form (e.g., instructional web pages) and where system builders have access to statements of user intent (e.g., search query logs or dialog interactions). To learn from such textual resources, we describe a novel approach that first automatically extracts task knowledge from instructions, then learns a dialog manager over this task knowledge to provide assistance. Evaluation in a Microsoft Office domain shows that the individual components are highly accurate and can be integrated into a dialog system that provides effective help to users.
2 0.86596113 325 acl-2013-Smoothed marginal distribution constraints for language modeling
Author: Brian Roark ; Cyril Allauzen ; Michael Riley
Abstract: We present an algorithm for re-estimating parameters of backoff n-gram language models so as to preserve given marginal distributions, along the lines of wellknown Kneser-Ney (1995) smoothing. Unlike Kneser-Ney, our approach is designed to be applied to any given smoothed backoff model, including models that have already been heavily pruned. As a result, the algorithm avoids issues observed when pruning Kneser-Ney models (Siivola et al., 2007; Chelba et al., 2010), while retaining the benefits of such marginal distribution constraints. We present experimental results for heavily pruned backoff ngram models, and demonstrate perplexity and word error rate reductions when used with various baseline smoothing methods. An open-source version of the algorithm has been released as part of the OpenGrm ngram library.1
3 0.84338707 23 acl-2013-A System for Summarizing Scientific Topics Starting from Keywords
Author: Rahul Jha ; Amjad Abu-Jbara ; Dragomir Radev
Abstract: In this paper, we investigate the problem of automatic generation of scientific surveys starting from keywords provided by a user. We present a system that can take a topic query as input and generate a survey of the topic by first selecting a set of relevant documents, and then selecting relevant sentences from those documents. We discuss the issues of robust evaluation of such systems and describe an evaluation corpus we generated by manually extracting factoids, or information units, from 47 gold standard documents (surveys and tutorials) on seven topics in Natural Language Processing. We have manually annotated 2,625 sentences with these factoids (around 375 sentences per topic) to build an evaluation corpus for this task. We present evaluation results for the performance of our system using this annotated data.
same-paper 4 0.84078145 272 acl-2013-Paraphrase-Driven Learning for Open Question Answering
Author: Anthony Fader ; Luke Zettlemoyer ; Oren Etzioni
Abstract: We study question answering as a machine learning problem, and induce a function that maps open-domain questions to queries over a database of web extractions. Given a large, community-authored, question-paraphrase corpus, we demonstrate that it is possible to learn a semantic lexicon and linear ranking function without manually annotating questions. Our approach automatically generalizes a seed lexicon and includes a scalable, parallelized perceptron parameter estimation scheme. Experiments show that our approach more than quadruples the recall of the seed lexicon, with only an 8% loss in precision.
5 0.7450549 169 acl-2013-Generating Synthetic Comparable Questions for News Articles
Author: Oleg Rokhlenko ; Idan Szpektor
Abstract: We introduce the novel task of automatically generating questions that are relevant to a text but do not appear in it. One motivating example of its application is for increasing user engagement around news articles by suggesting relevant comparable questions, such as “is Beyonce a better singer than Madonna?”, for the user to answer. We present the first algorithm for the task, which consists of: (a) offline construction of a comparable question template database; (b) ranking of relevant templates to a given article; and (c) instantiation of templates only with entities in the article whose comparison under the template’s relation makes sense. We tested the suggestions generated by our algorithm via a Mechanical Turk experiment, which showed a significant improvement over the strongest baseline of more than 45% in all metrics.
6 0.73000234 85 acl-2013-Combining Intra- and Multi-sentential Rhetorical Parsing for Document-level Discourse Analysis
7 0.72785681 358 acl-2013-Transition-based Dependency Parsing with Selectional Branching
8 0.72099084 215 acl-2013-Large-scale Semantic Parsing via Schema Matching and Lexicon Extension
10 0.71936357 155 acl-2013-Fast and Accurate Shift-Reduce Constituent Parsing
11 0.71903074 318 acl-2013-Sentiment Relevance
13 0.71832561 175 acl-2013-Grounded Language Learning from Video Described with Sentences
14 0.71810699 159 acl-2013-Filling Knowledge Base Gaps for Distant Supervision of Relation Extraction
15 0.71792048 185 acl-2013-Identifying Bad Semantic Neighbors for Improving Distributional Thesauri
16 0.71762574 224 acl-2013-Learning to Extract International Relations from Political Context
17 0.71596545 212 acl-2013-Language-Independent Discriminative Parsing of Temporal Expressions
18 0.71557331 249 acl-2013-Models of Semantic Representation with Visual Attributes
19 0.7149114 82 acl-2013-Co-regularizing character-based and word-based models for semi-supervised Chinese word segmentation
20 0.71344745 134 acl-2013-Embedding Semantic Similarity in Tree Kernels for Domain Adaptation of Relation Extraction