emnlp emnlp2013 emnlp2013-166 knowledge-graph by maker-knowledge-mining

166 emnlp-2013-Semantic Parsing on Freebase from Question-Answer Pairs


Source: pdf

Author: Jonathan Berant ; Andrew Chou ; Roy Frostig ; Percy Liang

Abstract: In this paper, we train a semantic parser that scales up to Freebase. Instead of relying on annotated logical forms, which is especially expensive to obtain at large scale, we learn from question-answer pairs. The main challenge in this setting is narrowing down the huge number of possible logical predicates for a given question. We tackle this problem in two ways: First, we build a coarse mapping from phrases to predicates using a knowledge base and a large text corpus. Second, we use a bridging operation to generate additional predicates based on neighboring predicates. On the dataset ofCai and Yates (2013), despite not having annotated logical forms, our system outperforms their state-of-the-art parser. Additionally, we collected a more realistic and challenging dataset of question-answer pairs and improves over a natural baseline.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Instead of relying on annotated logical forms, which is especially expensive to obtain at large scale, we learn from question-answer pairs. [sent-3, score-0.401]

2 The main challenge in this setting is narrowing down the huge number of possible logical predicates for a given question. [sent-4, score-0.74]

3 We tackle this problem in two ways: First, we build a coarse mapping from phrases to predicates using a knowledge base and a large text corpus. [sent-5, score-0.442]

4 Second, we use a bridging operation to generate additional predicates based on neighboring predicates. [sent-6, score-0.749]

5 On the dataset ofCai and Yates (2013), despite not having annotated logical forms, our system outperforms their state-of-the-art parser. [sent-7, score-0.429]

6 1 Introduction We focus on the problem of semantic parsing natural language utterances into logical forms that can be executed to produce denotations. [sent-9, score-0.547]

7 , 2010) have two limitations: (i) they require annotated logical forms as supervision, and (ii) they operate in limited domains with a small number of logical predicates. [sent-11, score-0.877]

8 , 2011; Artzi and Zettlemoyer, 2011) or by increasing the number of logical 1533 , {rf pliang} @ cs . [sent-15, score-0.401]

9 BarackObama bridging Education alignment BarackObama alignment Which college did Obama go to ? [sent-20, score-0.525]

10 Figure 1: Our task is to map questions to answers via latent logical forms. [sent-21, score-0.582]

11 To narrow down the space of logical predicates, we use a (i) coarse alignment based on Freebase and a text corpus and (ii) a bridging operation that generates predicates compatible with neighboring predicates. [sent-22, score-1.277]

12 The goal of this paper is to do both: learn a semantic parser without annotated logical forms that scales to the large number of predicates on Freebase. [sent-24, score-0.909]

13 While limited-domain semantic parsers are able to learn the lexicon from per-example supervision (Kwiatkowski et al. [sent-30, score-0.211]

14 , “go ”) and prepositions is not very informative due to polysemy, and rare predicates (e. [sent-42, score-0.339]

15 To improve coverage, we propose a new bridging operation that generates predicates based on adjacent predicates rather than on words. [sent-45, score-1.054]

16 At the compositional level, a semantic parser must combine the predicates into a coherent logical form. [sent-46, score-0.834]

17 Previous work based on CCG requires manually specifying combination rules (Krishnamurthy and Mitchell, 2012) or inducing the rules from annotated logical forms (Kwiatkowski et al. [sent-47, score-0.476]

18 In particular, we use POS tag features and features on the denotations of the predicted logical forms. [sent-50, score-0.51]

19 First, on the dataset of Cai and Yates (2013), we showed that our system outperforms their state-of-the-art system 62% to 59%, despite using no annotated logical forms. [sent-52, score-0.429]

20 Second, we collected a new realistic dataset of questions by performing a breadth-first search using the Google Suggest API; these questions are then answered by Amazon Mechanical Turk workers. [sent-53, score-0.242]

21 2 Setup Problem Statement Our task is as follows: Given (i) a knowledge base K, and (ii) a training set of question-answer pairs {(xi, yi)}in=1, output a semquaensttiico parser tehrat p maps new questions x to answers × y via latent logical forms z and the knowledge base K. [sent-60, score-0.8]

22 2 Logical forms To query the knowledge base, we use a logical language called Lambda Dependency-Based Compositional Semantics (λ-DCS)—see Liang (2013) for details. [sent-72, score-0.476]

23 The chief motivation of λ-DCS is to produce logical forms that are simpler than lambda calculus forms. [sent-74, score-0.589]

24 Each logical form in simple λ-DCS is either a unary (which denotes a subset of E) or a binary (which dwehniocthes d a soutebsse at souf bEs E). [sent-86, score-0.6]

25 )T ohre aba bsiicn rλy(DwChSic logical tfeosrm as s z asnedt tohfe Eir d ×en Eo)t. [sent-87, score-0.401]

26 , Seattle), eth ceans e i sI a unary logical nftoirtym ew. [sent-90, score-0.544]

27 , JPlzKaceOfBirth), t:h Iefn p p ∈is a binary logical f(oe. [sent-96, score-0.457]

28 Then simple λDCS unary logical forms are tree-like graph patterns which pick out a subset of the nodes. [sent-129, score-0.619]

29 3 Framework Given an utterance x, our semantic parser constructs a distribution over possible derivations D(x). [sent-131, score-0.276]

30 Each derivation d ∈ D(x) is a tree specifying the applicdaertiiovna toiofn a dse ∈t o Df c(xom)ib sin aatt rioeen rspuleecsi ftyhiantg gc tuhlem ainpapteli-s in the logical form d. [sent-132, score-0.446]

31 Composition Derivations are constructed recursively based on (i) a lexicon mapping natural language phrases to knowledge base predicates, and (ii) a small set of composition rules. [sent-134, score-0.24]

32 We first use the lexicon to generate single-predicate derivations for any matching span (e. [sent-136, score-0.235]

33 Then, given any logical form z1 that has been constructed over the span [i1 : j1] and z2 over a nonoverlapping span [i2 : j2], we generate the following logical forms over the enclosing span [min(i1 , i2) : max(j1 ,j2)] : intersection z1 u z2, join z1. [sent-139, score-0.976]

34 3 a Note that the construction of derivations D(x) allows us to skip any words, and in general heav3We also discard logical forms are incompatible according to the Freebase types (e. [sent-146, score-0.626]

35 lexicon join where BarackObama PeopleBornHere lexicon lexicon Obama born Type Figure 2: An example of a derivation d of the utterance “Where was Obama born? [sent-156, score-0.514]

36 ” and its sub-derivations, each labeled with composition rule (in blue) and logical form (in red). [sent-157, score-0.453]

37 3 Approach Our knowledge base has more than 19,000 properties, so a major challenge is generating a manageable set of predicates for an utterance. [sent-171, score-0.392]

38 1), we construct a lexicon that maps natural language phrases to logical predicates by aligning a large text corpus to Freebase, reminiscent of Cai and Yates (2013). [sent-174, score-0.904]

39 Second, we generate logical predicates compatible with neighboring predicates using the bridg- ing operation (Section 3. [sent-175, score-1.183]

40 The derivations produced by combining these predicates bgmor aenwriu endp[P ine r[sP oenr,sDLoa tncea,LD]toaic ent]io MP Dal rtaceiOea fsgBLe i. [sent-178, score-0.489]

41 1 Alignment We now discuss the construction of a lexicon L, wWheic nho iws a mapping efr coomn ntrautuctriaoln language phrases to logical predicates accompanied by a set of features. [sent-185, score-0.875]

42 We perform a similtaor F procedure t”ha[t uses a Hearst-like pattern (Hearst, 1992) to map phrases to unary predicates. [sent-215, score-0.238]

43 From the initial 15M triples, we exttroa cFte(d“ 55,081 typed binary phrases (9,456 untyped) and 6,299 unary phrases. [sent-219, score-0.249]

44 Logical predicates Binary logical predicates contain (i) all KB properties6 and (ii) concatenations of two properties p1. [sent-220, score-1.107]

45 For unary predicates, we consider all logical forms Type. [sent-225, score-0.619]

46 T ∈he E types of logical predicates considered during alignment is restricted in this paper, but automatic induction of more compositional logical predicates is an interesting direction. [sent-233, score-1.572]

47 Finally, we define the extension of a logical predicate r ∈ R2 to be its denotation, that is, tloheg corresponding s ∈et Rof entities or entity pairs. [sent-234, score-0.605]

48 Lexicon construction Given typed phrases R1, logical predicates R2, anGdi vtheneir t yepxetden psihonrass F, we now generate athtees le Rxi,co ann. [sent-235, score-0.79]

49 For the alignment and text similarity, r1 is a phrase, r2 is a predicate with Freebase name s2, and b is a binary predicate with type signature (t1, t2) . [sent-241, score-0.445]

50 iOounrs f hianavle graph mcopntytai onvse r1la09pK (F edges ∩foFr binary predicates aanld g 2ra9p4hK c edges fso 1r unary predicates. [sent-244, score-0.538]

51 Lexicalized features are standard conjunctions of the phrase w and the logical form z. [sent-251, score-0.439]

52 , “born ”) to the Freebase name of the logical predicate (e. [sent-254, score-0.505]

53 , “People born here ”): Given the phrase r1 and the Freebase name s2 of the predicate r2, we compute string similarity features such as whether r1 and s2 are equal, = 7Each Freebase property has a designated type signature, which can be extended to composite predicates, e. [sent-256, score-0.349]

54 2 Bridging While alignment can cover many predicates, it is unreliable for cases where the predicates are expressed weakly or implicitly. [sent-262, score-0.431]

55 Recall that at this point our main goal is to generate a manageable set ofcandidate logical forms to be scored by the log-linear model. [sent-268, score-0.476]

56 The two predicates impose strong type constraints on that binary, so we can afford to generate all the binary predicates that type check (see Table 2). [sent-271, score-0.834]

57 More formally, given two unaries z1 and z2 with types t1and t2, we generate a logical form z1 u b. [sent-272, score-0.454]

58 Figure 1 visualizes bridging of the unaries Type . [sent-274, score-0.394]

59 To handle this, we allow bridging to generate a binary based on a single unary; in this case, based on the unary X-Men (Table 2), we generate several binaries including ComicBookCoverPrice. [sent-278, score-0.647]

60 Generically, given a unary z with type t, we construct a logical form b. [sent-279, score-0.594]

61 To handle this, we apply bridging to a unary and the intermediate event (see Table 2). [sent-290, score-0.484]

62 z0 where p2 has type (t1, ∗), and a unary z with type t, bridging injects z ,a∗n)d, 1#TFoyrpme. [sent-293, score-0.584]

63 z) for each logical predicate b with type (t1, t). [sent-307, score-0.555]

64 In each of the three examples, bridging generates a binary predicate based on neighboring logical predicates rather than on explicit lexical material. [sent-308, score-1.275]

65 In a way, our bridging operation shares with bridging anaphora (Clark, 1975) the idea of establishing a novel relation between distinct parts of a sentence. [sent-309, score-0.717]

66 Naturally, we need features to distinguish between the generated predicates, or decide whether bridging is even appropriate at all. [sent-310, score-0.341]

67 Rule features Each derivation d is the result of applying some number of intersection, join, and bridging operations. [sent-316, score-0.386]

68 Specifically, when we combine 1538 logical forms z1 and z2 via a join or bridging, we include a feature on the POS tag of (the first word spanned by) z1 conjoined with the POS tag corresponding to z2. [sent-324, score-0.595]

69 Denotation features While it is clear that learning from denotations rather than logical forms is a drawback since it provides less information, it is less obvious that working with denotations actually gives us additional information. [sent-335, score-0.636]

70 Specifically, we include four features indicating whether the denotation of the predicted logical form has size 0, 1, 2, or at least 3. [sent-336, score-0.538]

71 This allows us to favor logical forms with this property. [sent-338, score-0.476]

72 Setup We implemented a standard beam-based bottom-up parser which stores the k-best derivations for each span. [sent-344, score-0.203]

73 In addition, we used 17 hand-written rules to map question words such as “where ” and “how many” to logical forms such as Type . [sent-360, score-0.568]

74 To compute denotations, we convert a logical form z into a SPARQL query and execute it on our copy of Freebase using the Virtuoso engine. [sent-365, score-0.401]

75 1 FREE917 Cai and Yates (2013) created a dataset consisting of 917 questions involving 635 Freebase relations, annotated with lambda calculus forms. [sent-371, score-0.248]

76 To map phrases to Freebase entities we used the manually-created entity 1539 lexicon used by Cai and Yates (2013), which contains 1,100 entries. [sent-373, score-0.28]

77 Because entity disambiguation is a challenging problem in semantic parsing, the entity lexicon simplifies the problem. [sent-374, score-0.214]

78 Our main empirical result is that our system, which was trained only on question-answer pairs, obtained 62% accuracy on the test set, outperforming the 59% accuracy reported by Cai and Yates (2013), who trained on full logical forms. [sent-379, score-0.401]

79 2 WEBQUESTIONS Dataset collection Because FREE9 17 requires logical forms, it is difficult to scale up due to the required expertise of annotating logical forms. [sent-382, score-0.802]

80 One major difference in the datasets is the distribution of questions: FREE9 17 starts from Freebase properties and solicits questions about these properties; these questions tend to be tailored to the properties. [sent-400, score-0.242]

81 WEBQUESTIONS starts from questions completely independent of Freebase, and therefore the questions tend to be more natural and varied. [sent-401, score-0.214]

82 In ALIGNMENT, binaries are generated only via the alignment lexicon. [sent-419, score-0.199]

83 In BRIDGING, binaries are generated through the bridging operation only. [sent-420, score-0.483]

84 As a baseline, we omit from our system the main contributions presented in this paper—that is, we disallow bridging, and remove denotation and alignment features. [sent-422, score-0.229]

85 Note that the number of possible derivations for questions in WEBQUESTIONS is quite large. [sent-426, score-0.257]

86 ” the phrase “United States ” maps to 231 entities in our lexicon, the verb “have” maps to 203 binaries, and the phrases “kind”, “system ”, and “government” all map to many different unary and binary predicates. [sent-428, score-0.446]

87 Parsing correctly involves skipping some words, mapping other words to predicates, while resolving many ambiguities in the way that the various predicates can combine. [sent-429, score-0.385]

88 Generation of binary predicates Recall that our system has two mechanisms for suggesting binaries: from the alignment lexicon or via the bridging operation. [sent-433, score-0.913]

89 Interestingly, alignment alone is better than bridging alone on WEBQUESTIONS, whereas for FREE9 17, it is the opposite. [sent-435, score-0.433]

90 Bridging suggests binaries that are compatible with the common types Person and Datetime, and the binary PlaceOfBirth is chosen. [sent-450, score-0.198]

91 Overall, running on WEBQUESTIONS, the parser constructs derivations that contain about 12,000 distinct binary predicates. [sent-455, score-0.259]

92 A significant loss is incurred without denotation features, largely due to the parser returning logical forms with empty denotations. [sent-460, score-0.666]

93 ” is answered with a logical form containing the property PeopleInvolved rather than SoccerMatchAttendance, resulting in an empty denotation. [sent-462, score-0.401]

94 We rely heavily on the k-best beam approximation in the parser keeping good derivations that lead to the correct answer. [sent-475, score-0.27]

95 In the initial stages of learning, θ is far from optimal, so good derivations are likely to fall below the k-best cutoff of internal parser beams. [sent-477, score-0.203]

96 Still, placing these few derivations on the beam allows the training procedure to bootstrap θ into a good solution. [sent-479, score-0.217]

97 Our system uses denotations rather than logical forms as a training signal, but also benefits from denotation features, which becomes possible in the grounded setting. [sent-506, score-0.724]

98 Learning these composite predicates would drastically irnn-crease the possible space of logical forms, but we believe that the methods proposed in this paper— alignment via distant supervision and bridging—can provide some traction on this problem. [sent-537, score-0.924]

99 Inducing probabilistic CCG grammars from logical form with higher-order unification. [sent-706, score-0.401]

100 Learning to map sentences to logical form: Structured classification with probabilistic categorial grammars. [sent-857, score-0.446]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('logical', 0.401), ('bridging', 0.341), ('predicates', 0.339), ('webquestions', 0.305), ('freebase', 0.235), ('yates', 0.169), ('derivations', 0.15), ('unary', 0.143), ('cai', 0.14), ('denotation', 0.137), ('born', 0.121), ('questions', 0.107), ('binaries', 0.107), ('predicate', 0.104), ('barackobama', 0.093), ('alignment', 0.092), ('zettlemoyer', 0.087), ('lexicon', 0.085), ('denotations', 0.08), ('kwiatkowski', 0.079), ('lambda', 0.079), ('placeofbirth', 0.076), ('forms', 0.075), ('branavan', 0.068), ('beam', 0.067), ('krishnamurthy', 0.064), ('peoplebornhere', 0.061), ('join', 0.061), ('binary', 0.056), ('entities', 0.056), ('supervision', 0.056), ('parser', 0.053), ('unaries', 0.053), ('base', 0.053), ('composition', 0.052), ('type', 0.05), ('phrases', 0.05), ('amt', 0.048), ('question', 0.047), ('fader', 0.047), ('chile', 0.046), ('cruise', 0.046), ('skipping', 0.046), ('tomcruise', 0.046), ('artzi', 0.045), ('advancement', 0.045), ('map', 0.045), ('derivation', 0.045), ('answer', 0.044), ('entity', 0.044), ('liang', 0.042), ('workers', 0.041), ('kb', 0.041), ('semantic', 0.041), ('kushman', 0.04), ('unger', 0.04), ('yahya', 0.04), ('spouse', 0.04), ('lexicalized', 0.039), ('signature', 0.039), ('intersection', 0.038), ('phrase', 0.038), ('goldwasser', 0.036), ('composite', 0.036), ('obama', 0.035), ('compatible', 0.035), ('operation', 0.035), ('honolulu', 0.035), ('neighboring', 0.034), ('calculus', 0.034), ('located', 0.033), ('pos', 0.032), ('clarke', 0.032), ('utterance', 0.032), ('grounded', 0.031), ('errs', 0.031), ('jzkk', 0.031), ('marry', 0.031), ('masaum', 0.031), ('sempre', 0.031), ('sutime', 0.031), ('walt', 0.031), ('youkilis', 0.031), ('ii', 0.03), ('parsing', 0.03), ('tag', 0.029), ('parsers', 0.029), ('answers', 0.029), ('maps', 0.029), ('pages', 0.028), ('ccg', 0.028), ('lei', 0.028), ('hoffmann', 0.028), ('dataset', 0.028), ('properties', 0.028), ('open', 0.027), ('api', 0.027), ('generically', 0.027), ('disney', 0.027), ('dcs', 0.027)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999946 166 emnlp-2013-Semantic Parsing on Freebase from Question-Answer Pairs

Author: Jonathan Berant ; Andrew Chou ; Roy Frostig ; Percy Liang

Abstract: In this paper, we train a semantic parser that scales up to Freebase. Instead of relying on annotated logical forms, which is especially expensive to obtain at large scale, we learn from question-answer pairs. The main challenge in this setting is narrowing down the huge number of possible logical predicates for a given question. We tackle this problem in two ways: First, we build a coarse mapping from phrases to predicates using a knowledge base and a large text corpus. Second, we use a bridging operation to generate additional predicates based on neighboring predicates. On the dataset ofCai and Yates (2013), despite not having annotated logical forms, our system outperforms their state-of-the-art parser. Additionally, we collected a more realistic and challenging dataset of question-answer pairs and improves over a natural baseline.

2 0.37722576 164 emnlp-2013-Scaling Semantic Parsers with On-the-Fly Ontology Matching

Author: Tom Kwiatkowski ; Eunsol Choi ; Yoav Artzi ; Luke Zettlemoyer

Abstract: We consider the challenge of learning semantic parsers that scale to large, open-domain problems, such as question answering with Freebase. In such settings, the sentences cover a wide variety of topics and include many phrases whose meaning is difficult to represent in a fixed target ontology. For example, even simple phrases such as ‘daughter’ and ‘number of people living in’ cannot be directly represented in Freebase, whose ontology instead encodes facts about gender, parenthood, and population. In this paper, we introduce a new semantic parsing approach that learns to resolve such ontological mismatches. The parser is learned from question-answer pairs, uses a probabilistic CCG to build linguistically motivated logicalform meaning representations, and includes an ontology matching model that adapts the output logical forms for each target ontology. Experiments demonstrate state-of-the-art performance on two benchmark semantic parsing datasets, including a nine point accuracy improvement on a recent Freebase QA corpus.

3 0.29616624 43 emnlp-2013-Cascading Collective Classification for Bridging Anaphora Recognition using a Rich Linguistic Feature Set

Author: Yufang Hou ; Katja Markert ; Michael Strube

Abstract: Recognizing bridging anaphora is difficult due to the wide variation within the phenomenon, the resulting lack of easily identifiable surface markers and their relative rarity. We develop linguistically motivated discourse structure, lexico-semantic and genericity detection features and integrate these into a cascaded minority preference algorithm that models bridging recognition as a subtask of learning finegrained information status (IS). We substantially improve bridging recognition without impairing performance on other IS classes.

4 0.26043057 119 emnlp-2013-Learning Distributions over Logical Forms for Referring Expression Generation

Author: Nicholas FitzGerald ; Yoav Artzi ; Luke Zettlemoyer

Abstract: We present a new approach to referring expression generation, casting it as a density estimation problem where the goal is to learn distributions over logical expressions identifying sets of objects in the world. Despite an extremely large space of possible expressions, we demonstrate effective learning of a globally normalized log-linear distribution. This learning is enabled by a new, multi-stage approximate inference technique that uses a pruning model to construct only the most likely logical forms. We train and evaluate the approach on a new corpus of references to sets of visual objects. Experiments show the approach is able to learn accurate models, which generate over 87% of the expressions people used. Additionally, on the previously studied special case of single object reference, we show a 35% relative error reduction over previous state of the art.

5 0.23744793 193 emnlp-2013-Unsupervised Induction of Cross-Lingual Semantic Relations

Author: Mike Lewis ; Mark Steedman

Abstract: Creating a language-independent meaning representation would benefit many crosslingual NLP tasks. We introduce the first unsupervised approach to this problem, learning clusters of semantically equivalent English and French relations between referring expressions, based on their named-entity arguments in large monolingual corpora. The clusters can be used as language-independent semantic relations, by mapping clustered expressions in different languages onto the same relation. Our approach needs no parallel text for training, but outperforms a baseline that uses machine translation on a cross-lingual question answering task. We also show how to use the semantics to improve the accuracy of machine translation, by using it in a simple reranker.

6 0.18497062 62 emnlp-2013-Detection of Product Comparisons - How Far Does an Out-of-the-Box Semantic Role Labeling System Take You?

7 0.15188947 12 emnlp-2013-A Semantically Enhanced Approach to Determine Textual Similarity

8 0.145357 110 emnlp-2013-Joint Bootstrapping of Corpus Annotations and Entity Types

9 0.1379769 51 emnlp-2013-Connecting Language and Knowledge Bases with Embedding Models for Relation Extraction

10 0.095253989 7 emnlp-2013-A Hierarchical Entity-Based Approach to Structuralize User Generated Content in Social Media: A Case of Yahoo! Answers

11 0.093949638 126 emnlp-2013-MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text

12 0.092253186 183 emnlp-2013-The VerbCorner Project: Toward an Empirically-Based Semantic Decomposition of Verbs

13 0.08910457 146 emnlp-2013-Optimal Incremental Parsing via Best-First Dynamic Programming

14 0.087354854 31 emnlp-2013-Automatic Feature Engineering for Answer Selection and Extraction

15 0.074836679 167 emnlp-2013-Semi-Markov Phrase-Based Monolingual Alignment

16 0.073015697 194 emnlp-2013-Unsupervised Relation Extraction with General Domain Knowledge

17 0.071293503 49 emnlp-2013-Combining Generative and Discriminative Model Scores for Distant Supervision

18 0.071263105 128 emnlp-2013-Max-Violation Perceptron and Forced Decoding for Scalable MT Training

19 0.069437496 63 emnlp-2013-Discourse Level Explanatory Relation Extraction from Product Reviews Using First-Order Logic

20 0.068117581 75 emnlp-2013-Event Schema Induction with a Probabilistic Entity-Driven Model


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.247), (1, 0.081), (2, 0.041), (3, 0.082), (4, -0.062), (5, 0.373), (6, -0.171), (7, 0.03), (8, 0.342), (9, 0.204), (10, 0.097), (11, -0.088), (12, 0.257), (13, -0.094), (14, 0.172), (15, 0.061), (16, 0.101), (17, -0.103), (18, -0.137), (19, -0.0), (20, -0.057), (21, -0.095), (22, 0.003), (23, 0.085), (24, -0.024), (25, 0.019), (26, 0.03), (27, 0.017), (28, -0.006), (29, -0.087), (30, 0.03), (31, -0.001), (32, -0.043), (33, 0.1), (34, 0.03), (35, 0.054), (36, 0.037), (37, 0.028), (38, 0.048), (39, 0.054), (40, -0.008), (41, -0.046), (42, -0.093), (43, 0.024), (44, -0.001), (45, 0.086), (46, -0.023), (47, -0.062), (48, -0.025), (49, 0.049)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95476049 166 emnlp-2013-Semantic Parsing on Freebase from Question-Answer Pairs

Author: Jonathan Berant ; Andrew Chou ; Roy Frostig ; Percy Liang

Abstract: In this paper, we train a semantic parser that scales up to Freebase. Instead of relying on annotated logical forms, which is especially expensive to obtain at large scale, we learn from question-answer pairs. The main challenge in this setting is narrowing down the huge number of possible logical predicates for a given question. We tackle this problem in two ways: First, we build a coarse mapping from phrases to predicates using a knowledge base and a large text corpus. Second, we use a bridging operation to generate additional predicates based on neighboring predicates. On the dataset ofCai and Yates (2013), despite not having annotated logical forms, our system outperforms their state-of-the-art parser. Additionally, we collected a more realistic and challenging dataset of question-answer pairs and improves over a natural baseline.

2 0.84511954 164 emnlp-2013-Scaling Semantic Parsers with On-the-Fly Ontology Matching

Author: Tom Kwiatkowski ; Eunsol Choi ; Yoav Artzi ; Luke Zettlemoyer

Abstract: We consider the challenge of learning semantic parsers that scale to large, open-domain problems, such as question answering with Freebase. In such settings, the sentences cover a wide variety of topics and include many phrases whose meaning is difficult to represent in a fixed target ontology. For example, even simple phrases such as ‘daughter’ and ‘number of people living in’ cannot be directly represented in Freebase, whose ontology instead encodes facts about gender, parenthood, and population. In this paper, we introduce a new semantic parsing approach that learns to resolve such ontological mismatches. The parser is learned from question-answer pairs, uses a probabilistic CCG to build linguistically motivated logicalform meaning representations, and includes an ontology matching model that adapts the output logical forms for each target ontology. Experiments demonstrate state-of-the-art performance on two benchmark semantic parsing datasets, including a nine point accuracy improvement on a recent Freebase QA corpus.

3 0.63468617 43 emnlp-2013-Cascading Collective Classification for Bridging Anaphora Recognition using a Rich Linguistic Feature Set

Author: Yufang Hou ; Katja Markert ; Michael Strube

Abstract: Recognizing bridging anaphora is difficult due to the wide variation within the phenomenon, the resulting lack of easily identifiable surface markers and their relative rarity. We develop linguistically motivated discourse structure, lexico-semantic and genericity detection features and integrate these into a cascaded minority preference algorithm that models bridging recognition as a subtask of learning finegrained information status (IS). We substantially improve bridging recognition without impairing performance on other IS classes.

4 0.59617645 119 emnlp-2013-Learning Distributions over Logical Forms for Referring Expression Generation

Author: Nicholas FitzGerald ; Yoav Artzi ; Luke Zettlemoyer

Abstract: We present a new approach to referring expression generation, casting it as a density estimation problem where the goal is to learn distributions over logical expressions identifying sets of objects in the world. Despite an extremely large space of possible expressions, we demonstrate effective learning of a globally normalized log-linear distribution. This learning is enabled by a new, multi-stage approximate inference technique that uses a pruning model to construct only the most likely logical forms. We train and evaluate the approach on a new corpus of references to sets of visual objects. Experiments show the approach is able to learn accurate models, which generate over 87% of the expressions people used. Additionally, on the previously studied special case of single object reference, we show a 35% relative error reduction over previous state of the art.

5 0.5823099 193 emnlp-2013-Unsupervised Induction of Cross-Lingual Semantic Relations

Author: Mike Lewis ; Mark Steedman

Abstract: Creating a language-independent meaning representation would benefit many crosslingual NLP tasks. We introduce the first unsupervised approach to this problem, learning clusters of semantically equivalent English and French relations between referring expressions, based on their named-entity arguments in large monolingual corpora. The clusters can be used as language-independent semantic relations, by mapping clustered expressions in different languages onto the same relation. Our approach needs no parallel text for training, but outperforms a baseline that uses machine translation on a cross-lingual question answering task. We also show how to use the semantics to improve the accuracy of machine translation, by using it in a simple reranker.

6 0.42299971 62 emnlp-2013-Detection of Product Comparisons - How Far Does an Out-of-the-Box Semantic Role Labeling System Take You?

7 0.39777949 110 emnlp-2013-Joint Bootstrapping of Corpus Annotations and Entity Types

8 0.39449662 12 emnlp-2013-A Semantically Enhanced Approach to Determine Textual Similarity

9 0.33981013 183 emnlp-2013-The VerbCorner Project: Toward an Empirically-Based Semantic Decomposition of Verbs

10 0.33860403 51 emnlp-2013-Connecting Language and Knowledge Bases with Embedding Models for Relation Extraction

11 0.29874316 7 emnlp-2013-A Hierarchical Entity-Based Approach to Structuralize User Generated Content in Social Media: A Case of Yahoo! Answers

12 0.28461537 189 emnlp-2013-Two-Stage Method for Large-Scale Acquisition of Contradiction Pattern Pairs using Entailment

13 0.2780928 126 emnlp-2013-MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text

14 0.27453634 79 emnlp-2013-Exploiting Multiple Sources for Open-Domain Hypernym Discovery

15 0.27269667 146 emnlp-2013-Optimal Incremental Parsing via Best-First Dynamic Programming

16 0.26161167 161 emnlp-2013-Rule-Based Information Extraction is Dead! Long Live Rule-Based Information Extraction Systems!

17 0.23883796 108 emnlp-2013-Interpreting Anaphoric Shell Nouns using Antecedents of Cataphoric Shell Nouns as Training Data

18 0.22708613 196 emnlp-2013-Using Crowdsourcing to get Representations based on Regular Expressions

19 0.22166173 49 emnlp-2013-Combining Generative and Discriminative Model Scores for Distant Supervision

20 0.21821932 31 emnlp-2013-Automatic Feature Engineering for Answer Selection and Extraction


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(3, 0.018), (9, 0.013), (18, 0.036), (22, 0.029), (26, 0.014), (30, 0.048), (50, 0.011), (51, 0.592), (66, 0.023), (71, 0.02), (75, 0.046), (77, 0.012), (90, 0.012), (96, 0.023), (97, 0.019)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99724758 166 emnlp-2013-Semantic Parsing on Freebase from Question-Answer Pairs

Author: Jonathan Berant ; Andrew Chou ; Roy Frostig ; Percy Liang

Abstract: In this paper, we train a semantic parser that scales up to Freebase. Instead of relying on annotated logical forms, which is especially expensive to obtain at large scale, we learn from question-answer pairs. The main challenge in this setting is narrowing down the huge number of possible logical predicates for a given question. We tackle this problem in two ways: First, we build a coarse mapping from phrases to predicates using a knowledge base and a large text corpus. Second, we use a bridging operation to generate additional predicates based on neighboring predicates. On the dataset ofCai and Yates (2013), despite not having annotated logical forms, our system outperforms their state-of-the-art parser. Additionally, we collected a more realistic and challenging dataset of question-answer pairs and improves over a natural baseline.

2 0.9960683 24 emnlp-2013-Application of Localized Similarity for Web Documents

Author: Peter Rebersek ; Mateja Verlic

Abstract: In this paper we present a novel approach to automatic creation of anchor texts for hyperlinks in a document pointing to similar documents. Methods used in this approach rank parts of a document based on the similarity to a presumably related document. Ranks are then used to automatically construct the best anchor text for a link inside original document to the compared document. A number of different methods from information retrieval and natural language processing are adapted for this task. Automatically constructed anchor texts are manually evaluated in terms of relatedness to linked documents and compared to baseline consisting of originally inserted anchor texts. Additionally we use crowdsourcing for evaluation of original anchors and au- tomatically constructed anchors. Results show that our best adapted methods rival the precision of the baseline method.

3 0.99566174 32 emnlp-2013-Automatic Idiom Identification in Wiktionary

Author: Grace Muzny ; Luke Zettlemoyer

Abstract: Online resources, such as Wiktionary, provide an accurate but incomplete source ofidiomatic phrases. In this paper, we study the problem of automatically identifying idiomatic dictionary entries with such resources. We train an idiom classifier on a newly gathered corpus of over 60,000 Wiktionary multi-word definitions, incorporating features that model whether phrase meanings are constructed compositionally. Experiments demonstrate that the learned classifier can provide high quality idiom labels, more than doubling the number of idiomatic entries from 7,764 to 18,155 at precision levels of over 65%. These gains also translate to idiom detection in sentences, by simply using known word sense disambiguation algorithms to match phrases to their definitions. In a set of Wiktionary definition example sentences, the more complete set of idioms boosts detection recall by over 28 percentage points.

4 0.99529123 178 emnlp-2013-Success with Style: Using Writing Style to Predict the Success of Novels

Author: Vikas Ganjigunte Ashok ; Song Feng ; Yejin Choi

Abstract: Predicting the success of literary works is a curious question among publishers and aspiring writers alike. We examine the quantitative connection, if any, between writing style and successful literature. Based on novels over several different genres, we probe the predictive power of statistical stylometry in discriminating successful literary works, and identify characteristic stylistic elements that are more prominent in successful writings. Our study reports for the first time that statistical stylometry can be surprisingly effective in discriminating highly successful literature from less successful counterpart, achieving accuracy up to 84%. Closer analyses lead to several new insights into characteristics ofthe writing style in successful literature, including findings that are contrary to the conventional wisdom with respect to good writing style and readability. ,

5 0.99512112 91 emnlp-2013-Grounding Strategic Conversation: Using Negotiation Dialogues to Predict Trades in a Win-Lose Game

Author: Anais Cadilhac ; Nicholas Asher ; Farah Benamara ; Alex Lascarides

Abstract: This paper describes a method that predicts which trades players execute during a winlose game. Our method uses data collected from chat negotiations of the game The Settlers of Catan and exploits the conversation to construct dynamically a partial model of each player’s preferences. This in turn yields equilibrium trading moves via principles from game theory. We compare our method against four baselines and show that tracking how preferences evolve through the dialogue and reasoning about equilibrium moves are both crucial to success.

6 0.99452794 96 emnlp-2013-Identifying Phrasal Verbs Using Many Bilingual Corpora

7 0.99398685 35 emnlp-2013-Automatically Detecting and Attributing Indirect Quotations

8 0.99365592 148 emnlp-2013-Orthonormal Explicit Topic Analysis for Cross-Lingual Document Matching

9 0.95274639 73 emnlp-2013-Error-Driven Analysis of Challenges in Coreference Resolution

10 0.94694942 60 emnlp-2013-Detecting Compositionality of Multi-Word Expressions using Nearest Neighbours in Vector Space Models

11 0.94614333 132 emnlp-2013-Mining Scientific Terms and their Definitions: A Study of the ACL Anthology

12 0.94334877 62 emnlp-2013-Detection of Product Comparisons - How Far Does an Out-of-the-Box Semantic Role Labeling System Take You?

13 0.94031495 126 emnlp-2013-MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text

14 0.93947226 181 emnlp-2013-The Effects of Syntactic Features in Automatic Prediction of Morphology

15 0.93922007 193 emnlp-2013-Unsupervised Induction of Cross-Lingual Semantic Relations

16 0.935018 173 emnlp-2013-Simulating Early-Termination Search for Verbose Spoken Queries

17 0.93471444 27 emnlp-2013-Authorship Attribution of Micro-Messages

18 0.93225479 53 emnlp-2013-Cross-Lingual Discriminative Learning of Sequence Models with Posterior Regularization

19 0.9316529 69 emnlp-2013-Efficient Collective Entity Linking with Stacking

20 0.92953056 37 emnlp-2013-Automatically Identifying Pseudepigraphic Texts