acl acl2013 acl2013-215 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Qingqing Cai ; Alexander Yates
Abstract: Supervised training procedures for semantic parsers produce high-quality semantic parsers, but they have difficulty scaling to large databases because of the sheer number of logical constants for which they must see labeled training data. We present a technique for developing semantic parsers for large databases based on a reduction to standard supervised training algorithms, schema matching, and pattern learning. Leveraging techniques from each of these areas, we develop a semantic parser for Freebase that is capable of parsing questions with an F1 that improves by 0.42 over a purely-supervised learning algorithm.
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract Supervised training procedures for semantic parsers produce high-quality semantic parsers, but they have difficulty scaling to large databases because of the sheer number of logical constants for which they must see labeled training data. [sent-3, score-0.411]
2 We present a technique for developing semantic parsers for large databases based on a reduction to standard supervised training algorithms, schema matching, and pattern learning. [sent-4, score-0.461]
3 Leveraging techniques from each of these areas, we develop a semantic parser for Freebase that is capable of parsing questions with an F1 that improves by 0. [sent-5, score-0.281]
4 There has been recent interest in producing such semantic parsers for large, heterogeneous databases like Freebase (Krishnamurthy and Mitchell, 2012; Cai and Yates, 2013) and Yago2 (Yahya et al. [sent-12, score-0.172]
5 Previous purely-supervised approaches have been limited to smaller domains and databases, such as the GeoQuery database, in part because of the cost of labeling enough samples to cover all of the logical constants involved in a domain. [sent-14, score-0.181]
6 This paper investigates a reduction of the problem of building a semantic parser to three standard problems in semantics and machine learning: supervised training of a semantic parser, schema matching, and pattern learning. [sent-15, score-0.506]
7 We apply an existing supervised training algorithm for semantic parsing to a labeled data set. [sent-17, score-0.131]
8 apply schema matching techniques to the problem of finding correspondences between English words w and ontological symbols s. [sent-20, score-0.365]
9 And we apply pattern learning techniques to incorporate new (w, s) pairs into the lexicon of the trained semantic parser. [sent-21, score-0.174]
10 On a dataset of 917 questions taken from 81 domains of the Freebase database, a standard learning algorithm for semantic parsing yields a parser with an F1 of 0. [sent-23, score-0.275]
11 Our techniques can extend this parser to new logical symbols through schema matching, and yield a semantic parser with an F1 of 0. [sent-25, score-0.592]
12 On a more challenging task where training and test data are divided so that all logical constants in test are never observed dur423 Proce dingSsof oifa, th Beu 5l1gsarti Aan,An u aglu Mste 4e-ti9n2g 0 o1f3 t. [sent-27, score-0.159]
13 c A2s0s1o3ci Aatsiosonc fioartio Cno fmorpu Ctoamtiopnuatalt Lioin gauli Lsitnicgsu,i psatgices 423–43 , ing training, our approach yields a semantic parser with an F1 of 0. [sent-29, score-0.132]
14 These results indicate that it is possible to automatically extend semantic parsers to symbols for which little or no training data has been observed. [sent-31, score-0.177]
15 Section 3 describes our MATCHER algorithm for performing schema matching between a knowledge base and text. [sent-34, score-0.259]
16 Section 4 explains how we use MATCHER’s schema matching to extend a standard semantic parser to logical symbols for which it has seen no labeled training data. [sent-35, score-0.556]
17 2 Previous Work Two existing systems translate between natural language questions and database queries over large-scale databases. [sent-38, score-0.222]
18 Our techniques automate the process of identifying matches between textual phrases and database relation symbols, in order to scale up to databases with more relations, like Freebase. [sent-44, score-0.394]
19 Krishnamurthy and Mitchell (2012) also create a semantic parser for Freebase covering 77 of Freebase’s over 2000 relations. [sent-50, score-0.132]
20 In comparison, we fully automate the process of constructing CCG lexical entries for the semantic parser by making it a prediction task. [sent-53, score-0.224]
21 Finally, we test our results on a dataset of 917 questions covering over 600 Freebase relations, a more extensive test than the 50 questions used by Krishnamurthy and Mitchell. [sent-55, score-0.15]
22 For instance, the DIRT system (Lin and Pantel, 2001) uses the mutual information between the (X, Y ) argument pairs for two binary relations to measure the similarity between them, and clusters relations accordingly. [sent-57, score-0.159]
23 Our techniques for comparing relations fit into this line of work, but they are novel in their application of these techniques to the task of comparing database relations and relations extracted from text. [sent-59, score-0.395]
24 , 2005) is a task from the database and knowledge representation community in which systems attempt to identify a “common schema” that covers the relations defined in a set of databases or ontologies, and the mapping between each individual database and the common schema. [sent-62, score-0.31]
25 Owing to the complexity of the general case, researchers have resorted to defining standard similarity metrics between relations and attributes, as well as machine learning algorithms for learning and predicting matches between relations (Doan et al. [sent-63, score-0.269]
26 These techniques consider only matches between relational databases, whereas we apply these ideas to matches between Freebase and extracted relations. [sent-67, score-0.31]
27 Schema matching in the database sense often considers complex matches between relations (Dhamanka et al. [sent-68, score-0.378]
28 , 2004), whereas as our techniques are currently restricted to matches involving one database relation and one relation extracted from text. [sent-69, score-0.347]
29 1 Problem Formulation The textual schema matching task is to identify natural language words and phrases that correspond with each relation and entity in a fixed schema for a relational database. [sent-71, score-0.564]
30 A schema S = (E, R, C, I) consists of a set of entities E, a set of relations R, a set of categories C, and a set of instances I. [sent-73, score-0.291]
31 Instances are known tuples of entities that make a relation or category true, such as film ( Titani c ) or dire cted by ( T it ani c Jame s Cameron ) . [sent-79, score-0.225]
32 For a given r ∈ R (or c ∈ C), IS(r) indicatesF othre ase gt vofe nkn row ∈n i Rnst (aonrc ces o∈f r i)n, schema S (and likewise for IS(c)). [sent-80, score-0.197]
33 We say a schema is a textual schema if it has been extracted , × from free text, such as the Nell (Carlson et al. [sent-84, score-0.435]
34 Given a textual schema T and a database schema D, the textual schema matching task is to identify an alignment or matching M ⊂ RT RD isduecnht tfhya at (rT, rD) ∈ rM ma tifc ainndg only iRf rT can be used to refer to) rD i nM no ifrm aandl language usage. [sent-87, score-0.926]
35 Our MATCHER algorithm for textual schema matching handles this by producing a confidence score for every possible (rT, rD) pair, which downstream applications can then use to reason about the possible alignments. [sent-90, score-0.3]
36 Even worse than the ambiguities in alignment, some textual relations do not correspond with any database relation exactly, but instead they correspond with a projection of a relation, or a join between multiple relations, or another complex view of a database schema. [sent-91, score-0.367]
37 As a simple example, “actress” corresponds to a subset of the Freebase film actor relation that intersects with the set {x: gender (x female ) }. [sent-92, score-0.17]
38 aligns with film actor or not; it cannot produce an alignment between “actress” and a join of film act or and gender. [sent-94, score-0.245]
39 These more complex alignments are an important consideration for future work, but as our experiments will show, quite useful alignments can be produced without handling these more complex cases. [sent-95, score-0.15]
40 2 Identifying candidate matches MATCHER uses a generate-and-test architecture for determining M. [sent-97, score-0.162]
41 It uses a Web search engine to issue queries for a database relation rD consist- ing of all the entities in a tuple t ∈ ID (rD). [sent-98, score-0.24]
42 The top 500 nonstopword word types are chosen as candidates for matches with rD. [sent-103, score-0.15]
43 The first type of evidence we consider for identifying true matches from C(rD) consists of pattern-matching. [sent-121, score-0.152]
44 Let c(p, rD, rT) indicate the sum of all the counts for a particular pattern p, database relation, and textual relation: fp(rT,rD) =Xc(p,r0Dc,r(pT,)r∗D ,XrT)c(p,rD,rT0) Xr0D Xr0T For the sum over all rD0, we use all rD0 in Freebase for which rT was extracted as a candidate. [sent-131, score-0.185]
45 Using more patterns and more entities per pattern are desirable for accumulating more evidence about candidate matches, but there is a trade-off with the time required to issue the necessary queries. [sent-137, score-0.155]
46 4 Comparing database relations with extracted relations Open Information Extraction (Open IE) systems (Banko et al. [sent-139, score-0.232]
47 In its simplest form, MATCHER computes: PMI(rT,rD) =||IIDD((rrDD))| ∩ · | IITT((rrTT))|| (1) While this PMI statistic is already quite useful, we have found that in practice there are many cases where an exact match between tuples in ID (rD) and tuples in IT(rT) is too strict of a criterion. [sent-142, score-0.197]
48 MATCHER uses a variety of approximate matches to compute variations of this statistic. [sent-143, score-0.131]
49 Considered as predictors for the true matches in M, these variations of the PMI statistic have a lower precision, in that they are more likely to have high values for incorrect matches. [sent-144, score-0.153]
50 The API returns all matching triples; types must match exactly, but relation or argument strings in the query will match any relation or argument that contains the query string as a substring. [sent-157, score-0.242]
51 5 Regression models for scoring candidates Pattern statistics, the ReVerb statistics from Table 2, and the count of rT during the candidate identification step all provide evidence for correct matches between rD and rT. [sent-176, score-0.202]
52 Our experiments analyze MATCHER’s success by comparing its performance across a range of different values for the number of rT matches for each rD. [sent-182, score-0.15]
53 Here, we describe an application in which we build a questionanswering system for Freebase by extending a standard learning technique for semantic parsing with schema alignment information. [sent-184, score-0.34]
54 (2010) to learn a semantic parser based on probabilistic Combinatory Categorial Grammar (PCCG). [sent-186, score-0.132]
55 Using a fixed CCG grammar and a procedure based on unification in second-order logic, UBL learns a lexicon Λ from the training data which includes entries like: Example Lexical Entries New York City ‘ NP : new york neighborhoods i‘n ‘N S\NP/NP : λxλy. [sent-190, score-0.135]
56 Our Freebase data covers 81 of the 86 core domains in Freebase, and 635 of its over 2000 relations, but we wish to develop a semantic parser that can scale to all of Freebase. [sent-195, score-0.154]
57 It can also learn lexical entries for relations rD that appear in the training data. [sent-197, score-0.161]
58 We use MATCHER’s learned alignment to extend the semantic parser that we get from UBL by automatically adding in lexical entries for Freebase relations. [sent-199, score-0.259]
59 However, this simple process is complicated by the fact that the semantic parser requires two additional types of information for each lexical entry: a syntactic category, and a weight. [sent-202, score-0.154]
60 ing a semantic parser is that we can automatically construct training examples for this prediction task from the other components in the reduction. [sent-206, score-0.132]
61 We use the output lexical entries learned by UBL as (potentially noisy) examples of true lexical entries for (rT, rD) pairs where rT matches the word in one of UBL’s lexical entries, and rD forms part of the semantics in the same lexical entry. [sent-207, score-0.382]
62 For (rT, rD) pairs in M where rD occurs in UBL’s lexical entries, but not paired with rT, we create dummy “negative” lexical entries with very low weights, one for each possible syntactic category observed in all lexical entries. [sent-208, score-0.136]
63 book author(x, y), the event space would include the single expression in which relations film actor and book author were replaced by 428 a new variable: λpλxλy. [sent-215, score-0.192]
64 For W, we use a linear regression model whose features are the score from MATCHER, the probabilities from the Syn and Sem NBC models, and the average weight of all lexical entries in UBL with matching syntax and semantics. [sent-221, score-0.186]
65 Using the predictions from these models, LEXTENDER extends UBL’s learned lexicon with all possible lexical entries with their predicted weights, although typically only a few lexical entries have high enough weight to make a difference during parsing. [sent-222, score-0.218]
66 5 Experiments We conducted experiments to test the ability of MATCHER and LEXTENDER to produce a semantic parser for Freebase. [sent-224, score-0.132]
67 We first analyze MATCHER on the task of finding matches between Freebase relations and textual relations. [sent-225, score-0.241]
68 We then compare the performance of the semantic parser learned by UBL with its extension provided by LEXTENDER on a dataset of English questions posed to Freebase. [sent-226, score-0.207]
69 The full schema and contents are available for download. [sent-232, score-0.197]
70 As a reference point, the GeoQuery database which is a standard benchmark database for semantic parsing Examples — — 1. [sent-234, score-0.296]
71 award honor(y) ∧ award winner(y, x) ∧ award(y, peabody award) Figure 2: Example questions with their logical forms. [sent-244, score-0.271]
72 The logical forms make use of Freebase symbols as logical constants, as well as a few additional symbols such as count and argmin, to allow for aggregation queries. [sent-245, score-0.33]
73 No restrictions were placed on the type of questions they should produce, except that they should produce questions for multiple domains. [sent-254, score-0.15]
74 By inspection, a large majority of the questions appear to be answerable from Freebase, although no instructions were given to restrict questions to this sort. [sent-255, score-0.15]
75 We also created a dataset of alignments from these annotated questions by creating an alignment for each Freebase relation mentioned in the logical form for a question, paired with a manually-selected word from the question. [sent-256, score-0.307]
76 Let M be the set of (rT, rD) matches produced by the system, and G the set of matches in the gold-standard manual data. [sent-259, score-0.282]
77 We now compare our alignments on a semantic parsing task for Freebase. [sent-271, score-0.151]
78 In a second test, we focus on the hard case where all questions from the test set contain logical constants that have never been seen before during training. [sent-273, score-0.234]
79 We split the data into 3 folds, making sure that no Freebase domain has symbols appearing in questions in more than one fold. [sent-274, score-0.133]
80 We varied the number of matches that the alignment model (MATCHER, Pattern, Extractions, or Frequency) could make for each Freebase relation, and measured semantic parsing performance as a function of the number of matches. [sent-276, score-0.274]
81 Figure 4 shows the F1 scores for these semantic parsers, judged by exact match between the top-scoring logical form from the parser and the manually-produced logical form. [sent-277, score-0.368]
82 Exact-match tests are overly-strict, in the sense that the system may be judged incorrect even when the logical form that is produced is logically equivalent to the correct logical form. [sent-278, score-0.252]
83 The semantic parsers produced by MATCHER+LEXTENDER and the other alignment techniques significantly outperform the baseline semantic parser learned by UBL, which achieves an overall F1 of 0. [sent-280, score-0.334]
84 Purely-supervised approaches to this data are severely limited, since they have almost no chance of correctly parsing questions that refer to logical symbols that never appeared during training. [sent-282, score-0.286]
85 The best semantic parser we tested, which was produced by UBL, MATCHER, and LEXTENDER with 9 matches per Freebase relation, had a precision of 0. [sent-284, score-0.283]
86 MATCHER drops in F1 with more matches as additional matches tend to be low-quality and low-probability, whereas Pat430 mxcrefth1aFoiglfmscr0 . [sent-288, score-0.262]
87 To place these results in context, many different semantic parsers for databases like GeoQuery and ATIS (including parsers produced by UBL) have achieved F1 scores of 0. [sent-297, score-0.249]
88 However, in all such tests, the test questions refer to logical constants that also appeared during training, allowing supervised techniques for learning semantic parsers to achieve strong accuracy. [sent-299, score-0.404]
89 An unsupervised semantic parser for GeoQuery has achieved an F1 score of 0. [sent-301, score-0.132]
90 However, this parser was given questions which it knew a priori to contain words that refer to the logical constants in the database. [sent-304, score-0.304]
91 Our MATCHER and LEXTENDER systems address a different challenge: how to learn a semantic parser for Freebase given the Web and a set of initial labeled questions. [sent-305, score-0.132]
92 6 Conclusion Scaling semantic parsing to large databases requires an engineering effort to handle large datasets, but also novel algorithms to extend semantic parsing models to testing examples that look significantly different from labeled training data. [sent-306, score-0.269]
93 The MATCHER and LEXTENDER algorithms represent an initial investigation into such techniques, with early results indicating that semantic parsers can handle Freebase questions on a large variety of domains with an F1 of 0. [sent-307, score-0.216]
94 In particular, more research is needed to handle more complex matches between database and textual relations, and to handle more complex natural language queries. [sent-310, score-0.31]
95 1, words like “actress” cannot be addressed by the current methodology, since MATCHER assumes that a word maps to a single Freebase relation, but the closest Freebase equivalent to the meaning of “actress” involves the two relations film act or and gender. [sent-312, score-0.156]
96 Another limitation is that our current methodology focuses on finding matches for nouns and verbs. [sent-313, score-0.131]
97 While significant challenges remain, the reduction of large-scale semantic parsing to a combination of schema matching and supervised learning offers a new path toward building high-coverage semantic parsers. [sent-315, score-0.471]
98 Database schema matching using machine learning with feature selection. [sent-332, score-0.259]
99 Information retrieval and machine learning for probabilistic schema matching. [sent-450, score-0.197]
100 A unified approach for schema matching, coreference and canonicalization. [sent-487, score-0.197]
wordName wordTfidf (topN-words)
[('matcher', 0.65), ('freebase', 0.306), ('ubl', 0.276), ('rd', 0.261), ('rt', 0.222), ('lextender', 0.2), ('schema', 0.197), ('matches', 0.131), ('logical', 0.107), ('database', 0.094), ('film', 0.087), ('questions', 0.075), ('parser', 0.07), ('entries', 0.07), ('relations', 0.069), ('extractions', 0.067), ('tuples', 0.066), ('reverb', 0.064), ('matching', 0.062), ('semantic', 0.062), ('symbols', 0.058), ('parsers', 0.057), ('syn', 0.055), ('databases', 0.053), ('queries', 0.053), ('constants', 0.052), ('sem', 0.052), ('pattern', 0.05), ('relation', 0.047), ('parsing', 0.046), ('yahya', 0.044), ('alignments', 0.043), ('pmi', 0.041), ('textual', 0.041), ('actress', 0.041), ('ccg', 0.039), ('wick', 0.038), ('geoquery', 0.038), ('temple', 0.037), ('actor', 0.036), ('alignment', 0.035), ('krishnamurthy', 0.035), ('lexicon', 0.034), ('doan', 0.033), ('qingqing', 0.033), ('award', 0.032), ('regression', 0.032), ('yates', 0.032), ('neighborhoods', 0.031), ('candidate', 0.031), ('techniques', 0.028), ('patterns', 0.028), ('bollacker', 0.027), ('cai', 0.025), ('hoffart', 0.025), ('dhamanka', 0.025), ('ehrig', 0.025), ('giunchiglia', 0.025), ('nottelmann', 0.025), ('peabody', 0.025), ('rupee', 0.025), ('entities', 0.025), ('lambda', 0.024), ('goldwasser', 0.024), ('domingos', 0.024), ('fader', 0.024), ('poon', 0.024), ('semantics', 0.023), ('supervised', 0.023), ('lexical', 0.022), ('domains', 0.022), ('complex', 0.022), ('match', 0.022), ('nbc', 0.022), ('pccg', 0.022), ('rahm', 0.022), ('statistic', 0.022), ('folds', 0.021), ('tuple', 0.021), ('evidence', 0.021), ('argument', 0.021), ('strict', 0.021), ('rohanimanesh', 0.02), ('sparql', 0.02), ('clarke', 0.02), ('relational', 0.02), ('correspondences', 0.02), ('produced', 0.02), ('comparing', 0.019), ('candidates', 0.019), ('driven', 0.019), ('reduction', 0.019), ('tests', 0.018), ('sheer', 0.018), ('calculus', 0.018), ('dirt', 0.018), ('berberich', 0.018), ('artzi', 0.018), ('api', 0.018), ('ends', 0.018)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000002 215 acl-2013-Large-scale Semantic Parsing via Schema Matching and Lexicon Extension
Author: Qingqing Cai ; Alexander Yates
Abstract: Supervised training procedures for semantic parsers produce high-quality semantic parsers, but they have difficulty scaling to large databases because of the sheer number of logical constants for which they must see labeled training data. We present a technique for developing semantic parsers for large databases based on a reduction to standard supervised training algorithms, schema matching, and pattern learning. Leveraging techniques from each of these areas, we develop a semantic parser for Freebase that is capable of parsing questions with an F1 that improves by 0.42 over a purely-supervised learning algorithm.
2 0.13847066 272 acl-2013-Paraphrase-Driven Learning for Open Question Answering
Author: Anthony Fader ; Luke Zettlemoyer ; Oren Etzioni
Abstract: We study question answering as a machine learning problem, and induce a function that maps open-domain questions to queries over a database of web extractions. Given a large, community-authored, question-paraphrase corpus, we demonstrate that it is possible to learn a semantic lexicon and linear ranking function without manually annotating questions. Our approach automatically generalizes a seed lexicon and includes a scalable, parallelized perceptron parameter estimation scheme. Experiments show that our approach more than quadruples the recall of the seed lexicon, with only an 8% loss in precision.
3 0.10296196 312 acl-2013-Semantic Parsing as Machine Translation
Author: Jacob Andreas ; Andreas Vlachos ; Stephen Clark
Abstract: Semantic parsing is the problem of deriving a structured meaning representation from a natural language utterance. Here we approach it as a straightforward machine translation task, and demonstrate that standard machine translation components can be adapted into a semantic parser. In experiments on the multilingual GeoQuery corpus we find that our parser is competitive with the state of the art, and in some cases achieves higher accuracy than recently proposed purpose-built systems. These results support the use of machine translation methods as an informative baseline in semantic parsing evaluations, and suggest that research in semantic parsing could benefit from advances in machine translation.
4 0.093962073 352 acl-2013-Towards Accurate Distant Supervision for Relational Facts Extraction
Author: Xingxing Zhang ; Jianwen Zhang ; Junyu Zeng ; Jun Yan ; Zheng Chen ; Zhifang Sui
Abstract: Distant supervision (DS) is an appealing learning method which learns from existing relational facts to extract more from a text corpus. However, the accuracy is still not satisfying. In this paper, we point out and analyze some critical factors in DS which have great impact on accuracy, including valid entity type detection, negative training examples construction and ensembles. We propose an approach to handle these factors. By experimenting on Wikipedia articles to extract the facts in Freebase (the top 92 relations), we show the impact of these three factors on the accuracy of DS and the remarkable improvement led by the proposed approach.
5 0.091290116 291 acl-2013-Question Answering Using Enhanced Lexical Semantic Models
Author: Wen-tau Yih ; Ming-Wei Chang ; Christopher Meek ; Andrzej Pastusiak
Abstract: In this paper, we study the answer sentence selection problem for question answering. Unlike previous work, which primarily leverages syntactic analysis through dependency tree matching, we focus on improving the performance using models of lexical semantic resources. Experiments show that our systems can be consistently and significantly improved with rich lexical semantic information, regardless of the choice of learning algorithms. When evaluated on a benchmark dataset, the MAP and MRR scores are increased by 8 to 10 points, compared to one of our baseline systems using only surface-form matching. Moreover, our best system also outperforms pervious work that makes use of the dependency tree structure by a wide margin.
6 0.087030925 228 acl-2013-Leveraging Domain-Independent Information in Semantic Parsing
7 0.080932319 159 acl-2013-Filling Knowledge Base Gaps for Distant Supervision of Relation Extraction
8 0.080466524 169 acl-2013-Generating Synthetic Comparable Questions for News Articles
9 0.072084211 160 acl-2013-Fine-grained Semantic Typing of Emerging Entities
10 0.060983926 179 acl-2013-HYENA-live: Fine-Grained Online Entity Type Classification from Natural-language Text
11 0.060703766 306 acl-2013-SPred: Large-scale Harvesting of Semantic Predicates
12 0.054282442 176 acl-2013-Grounded Unsupervised Semantic Parsing
13 0.052012697 313 acl-2013-Semantic Parsing with Combinatory Categorial Grammars
14 0.051955935 155 acl-2013-Fast and Accurate Shift-Reduce Constituent Parsing
15 0.051502462 242 acl-2013-Mining Equivalent Relations from Linked Data
16 0.049912572 365 acl-2013-Understanding Tables in Context Using Standard NLP Toolkits
17 0.048931256 85 acl-2013-Combining Intra- and Multi-sentential Rhetorical Parsing for Document-level Discourse Analysis
18 0.04781951 98 acl-2013-Cross-lingual Transfer of Semantic Role Labeling Models
19 0.046452772 358 acl-2013-Transition-based Dependency Parsing with Selectional Branching
20 0.046358272 27 acl-2013-A Two Level Model for Context Sensitive Inference Rules
topicId topicWeight
[(0, 0.147), (1, 0.015), (2, -0.041), (3, -0.075), (4, -0.032), (5, 0.07), (6, -0.017), (7, -0.078), (8, 0.046), (9, -0.004), (10, 0.021), (11, -0.036), (12, 0.014), (13, -0.021), (14, 0.026), (15, 0.02), (16, 0.025), (17, -0.009), (18, -0.042), (19, -0.031), (20, -0.026), (21, 0.018), (22, -0.054), (23, 0.078), (24, 0.062), (25, 0.047), (26, -0.011), (27, 0.035), (28, -0.069), (29, 0.061), (30, 0.018), (31, -0.001), (32, -0.011), (33, 0.003), (34, -0.009), (35, -0.028), (36, -0.007), (37, -0.024), (38, -0.024), (39, 0.134), (40, -0.021), (41, 0.052), (42, -0.025), (43, -0.037), (44, 0.0), (45, 0.045), (46, -0.036), (47, 0.003), (48, 0.007), (49, 0.046)]
simIndex simValue paperId paperTitle
same-paper 1 0.91546303 215 acl-2013-Large-scale Semantic Parsing via Schema Matching and Lexicon Extension
Author: Qingqing Cai ; Alexander Yates
Abstract: Supervised training procedures for semantic parsers produce high-quality semantic parsers, but they have difficulty scaling to large databases because of the sheer number of logical constants for which they must see labeled training data. We present a technique for developing semantic parsers for large databases based on a reduction to standard supervised training algorithms, schema matching, and pattern learning. Leveraging techniques from each of these areas, we develop a semantic parser for Freebase that is capable of parsing questions with an F1 that improves by 0.42 over a purely-supervised learning algorithm.
2 0.6950931 176 acl-2013-Grounded Unsupervised Semantic Parsing
Author: Hoifung Poon
Abstract: We present the first unsupervised approach for semantic parsing that rivals the accuracy of supervised approaches in translating natural-language questions to database queries. Our GUSP system produces a semantic parse by annotating the dependency-tree nodes and edges with latent states, and learns a probabilistic grammar using EM. To compensate for the lack of example annotations or question-answer pairs, GUSP adopts a novel grounded-learning approach to leverage database for indirect supervision. On the challenging ATIS dataset, GUSP attained an accuracy of 84%, effectively tying with the best published results by supervised approaches.
3 0.69195241 272 acl-2013-Paraphrase-Driven Learning for Open Question Answering
Author: Anthony Fader ; Luke Zettlemoyer ; Oren Etzioni
Abstract: We study question answering as a machine learning problem, and induce a function that maps open-domain questions to queries over a database of web extractions. Given a large, community-authored, question-paraphrase corpus, we demonstrate that it is possible to learn a semantic lexicon and linear ranking function without manually annotating questions. Our approach automatically generalizes a seed lexicon and includes a scalable, parallelized perceptron parameter estimation scheme. Experiments show that our approach more than quadruples the recall of the seed lexicon, with only an 8% loss in precision.
4 0.67608327 228 acl-2013-Leveraging Domain-Independent Information in Semantic Parsing
Author: Dan Goldwasser ; Dan Roth
Abstract: Semantic parsing is a domain-dependent process by nature, as its output is defined over a set of domain symbols. Motivated by the observation that interpretation can be decomposed into domain-dependent and independent components, we suggest a novel interpretation model, which augments a domain dependent model with abstract information that can be shared by multiple domains. Our experiments show that this type of information is useful and can reduce the annotation effort significantly when moving between domains.
5 0.65955168 159 acl-2013-Filling Knowledge Base Gaps for Distant Supervision of Relation Extraction
Author: Wei Xu ; Raphael Hoffmann ; Le Zhao ; Ralph Grishman
Abstract: Distant supervision has attracted recent interest for training information extraction systems because it does not require any human annotation but rather employs existing knowledge bases to heuristically label a training corpus. However, previous work has failed to address the problem of false negative training examples mislabeled due to the incompleteness of knowledge bases. To tackle this problem, we propose a simple yet novel framework that combines a passage retrieval model using coarse features into a state-of-the-art relation extractor using multi-instance learning with fine features. We adapt the information retrieval technique of pseudo- relevance feedback to expand knowledge bases, assuming entity pairs in top-ranked passages are more likely to express a relation. Our proposed technique significantly improves the quality of distantly supervised relation extraction, boosting recall from 47.7% to 61.2% with a consistently high level of precision of around 93% in the experiments.
6 0.65446538 313 acl-2013-Semantic Parsing with Combinatory Categorial Grammars
7 0.62525243 169 acl-2013-Generating Synthetic Comparable Questions for News Articles
8 0.59748209 352 acl-2013-Towards Accurate Distant Supervision for Relational Facts Extraction
9 0.58139265 365 acl-2013-Understanding Tables in Context Using Standard NLP Toolkits
10 0.57682866 242 acl-2013-Mining Equivalent Relations from Linked Data
11 0.56944567 324 acl-2013-Smatch: an Evaluation Metric for Semantic Feature Structures
12 0.56541097 160 acl-2013-Fine-grained Semantic Typing of Emerging Entities
13 0.56384569 291 acl-2013-Question Answering Using Enhanced Lexical Semantic Models
14 0.54413748 36 acl-2013-Adapting Discriminative Reranking to Grounded Language Learning
15 0.54082739 61 acl-2013-Automatic Interpretation of the English Possessive
16 0.53604627 163 acl-2013-From Natural Language Specifications to Program Input Parsers
17 0.5352931 158 acl-2013-Feature-Based Selection of Dependency Paths in Ad Hoc Information Retrieval
18 0.53515083 271 acl-2013-ParaQuery: Making Sense of Paraphrase Collections
19 0.52230889 285 acl-2013-Propminer: A Workflow for Interactive Information Extraction and Exploration using Dependency Trees
20 0.51649946 179 acl-2013-HYENA-live: Fine-Grained Online Entity Type Classification from Natural-language Text
topicId topicWeight
[(0, 0.054), (6, 0.039), (11, 0.079), (15, 0.012), (24, 0.047), (26, 0.033), (35, 0.111), (42, 0.056), (48, 0.049), (57, 0.013), (64, 0.02), (70, 0.053), (71, 0.017), (73, 0.191), (88, 0.022), (90, 0.028), (95, 0.074)]
simIndex simValue paperId paperTitle
1 0.95390576 337 acl-2013-Tag2Blog: Narrative Generation from Satellite Tag Data
Author: Kapila Ponnamperuma ; Advaith Siddharthan ; Cheng Zeng ; Chris Mellish ; Rene van der Wal
Abstract: The aim of the Tag2Blog system is to bring satellite tagged wild animals “to life” through narratives that place their movements in an ecological context. Our motivation is to use such automatically generated texts to enhance public engagement with a specific species reintroduction programme, although the protocols developed here can be applied to any animal or other movement study that involves signal data from tags. We are working with one of the largest nature conservation charities in Europe in this regard, focusing on a single species, the red kite. We describe a system that interprets a sequence of locational fixes obtained from a satellite tagged individual, and constructs a story around its use of the landscape.
same-paper 2 0.82206589 215 acl-2013-Large-scale Semantic Parsing via Schema Matching and Lexicon Extension
Author: Qingqing Cai ; Alexander Yates
Abstract: Supervised training procedures for semantic parsers produce high-quality semantic parsers, but they have difficulty scaling to large databases because of the sheer number of logical constants for which they must see labeled training data. We present a technique for developing semantic parsers for large databases based on a reduction to standard supervised training algorithms, schema matching, and pattern learning. Leveraging techniques from each of these areas, we develop a semantic parser for Freebase that is capable of parsing questions with an F1 that improves by 0.42 over a purely-supervised learning algorithm.
3 0.71138793 272 acl-2013-Paraphrase-Driven Learning for Open Question Answering
Author: Anthony Fader ; Luke Zettlemoyer ; Oren Etzioni
Abstract: We study question answering as a machine learning problem, and induce a function that maps open-domain questions to queries over a database of web extractions. Given a large, community-authored, question-paraphrase corpus, we demonstrate that it is possible to learn a semantic lexicon and linear ranking function without manually annotating questions. Our approach automatically generalizes a seed lexicon and includes a scalable, parallelized perceptron parameter estimation scheme. Experiments show that our approach more than quadruples the recall of the seed lexicon, with only an 8% loss in precision.
4 0.70596415 159 acl-2013-Filling Knowledge Base Gaps for Distant Supervision of Relation Extraction
Author: Wei Xu ; Raphael Hoffmann ; Le Zhao ; Ralph Grishman
Abstract: Distant supervision has attracted recent interest for training information extraction systems because it does not require any human annotation but rather employs existing knowledge bases to heuristically label a training corpus. However, previous work has failed to address the problem of false negative training examples mislabeled due to the incompleteness of knowledge bases. To tackle this problem, we propose a simple yet novel framework that combines a passage retrieval model using coarse features into a state-of-the-art relation extractor using multi-instance learning with fine features. We adapt the information retrieval technique of pseudo- relevance feedback to expand knowledge bases, assuming entity pairs in top-ranked passages are more likely to express a relation. Our proposed technique significantly improves the quality of distantly supervised relation extraction, boosting recall from 47.7% to 61.2% with a consistently high level of precision of around 93% in the experiments.
5 0.70520365 291 acl-2013-Question Answering Using Enhanced Lexical Semantic Models
Author: Wen-tau Yih ; Ming-Wei Chang ; Christopher Meek ; Andrzej Pastusiak
Abstract: In this paper, we study the answer sentence selection problem for question answering. Unlike previous work, which primarily leverages syntactic analysis through dependency tree matching, we focus on improving the performance using models of lexical semantic resources. Experiments show that our systems can be consistently and significantly improved with rich lexical semantic information, regardless of the choice of learning algorithms. When evaluated on a benchmark dataset, the MAP and MRR scores are increased by 8 to 10 points, compared to one of our baseline systems using only surface-form matching. Moreover, our best system also outperforms pervious work that makes use of the dependency tree structure by a wide margin.
6 0.70251238 185 acl-2013-Identifying Bad Semantic Neighbors for Improving Distributional Thesauri
7 0.70177734 389 acl-2013-Word Association Profiles and their Use for Automated Scoring of Essays
8 0.7009508 158 acl-2013-Feature-Based Selection of Dependency Paths in Ad Hoc Information Retrieval
9 0.70067501 172 acl-2013-Graph-based Local Coherence Modeling
10 0.70045662 99 acl-2013-Crowd Prefers the Middle Path: A New IAA Metric for Crowdsourcing Reveals Turker Biases in Query Segmentation
11 0.7000289 46 acl-2013-An Infinite Hierarchical Bayesian Model of Phrasal Translation
12 0.69983351 17 acl-2013-A Random Walk Approach to Selectional Preferences Based on Preference Ranking and Propagation
13 0.6991142 83 acl-2013-Collective Annotation of Linguistic Resources: Basic Principles and a Formal Model
14 0.69908297 283 acl-2013-Probabilistic Domain Modelling With Contextualized Distributional Semantic Vectors
15 0.69761938 228 acl-2013-Leveraging Domain-Independent Information in Semantic Parsing
16 0.69599199 325 acl-2013-Smoothed marginal distribution constraints for language modeling
17 0.69590771 265 acl-2013-Outsourcing FrameNet to the Crowd
18 0.69561845 275 acl-2013-Parsing with Compositional Vector Grammars
19 0.69520754 85 acl-2013-Combining Intra- and Multi-sentential Rhetorical Parsing for Document-level Discourse Analysis
20 0.69519901 58 acl-2013-Automated Collocation Suggestion for Japanese Second Language Learners