emnlp emnlp2013 emnlp2013-164 knowledge-graph by maker-knowledge-mining

164 emnlp-2013-Scaling Semantic Parsers with On-the-Fly Ontology Matching


Source: pdf

Author: Tom Kwiatkowski ; Eunsol Choi ; Yoav Artzi ; Luke Zettlemoyer

Abstract: We consider the challenge of learning semantic parsers that scale to large, open-domain problems, such as question answering with Freebase. In such settings, the sentences cover a wide variety of topics and include many phrases whose meaning is difficult to represent in a fixed target ontology. For example, even simple phrases such as ‘daughter’ and ‘number of people living in’ cannot be directly represented in Freebase, whose ontology instead encodes facts about gender, parenthood, and population. In this paper, we introduce a new semantic parsing approach that learns to resolve such ontological mismatches. The parser is learned from question-answer pairs, uses a probabilistic CCG to build linguistically motivated logicalform meaning representations, and includes an ontology matching model that adapts the output logical forms for each target ontology. Experiments demonstrate state-of-the-art performance on two benchmark semantic parsing datasets, including a nine point accuracy improvement on a recent Freebase QA corpus.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 For example, even simple phrases such as ‘daughter’ and ‘number of people living in’ cannot be directly represented in Freebase, whose ontology instead encodes facts about gender, parenthood, and population. [sent-5, score-0.223]

2 In this paper, we introduce a new semantic parsing approach that learns to resolve such ontological mismatches. [sent-6, score-0.302]

3 The parser is learned from question-answer pairs, uses a probabilistic CCG to build linguistically motivated logicalform meaning representations, and includes an ontology matching model that adapts the output logical forms for each target ontology. [sent-7, score-0.832]

4 In each case, the parser uses a predefined set of logical constants, or an ontology, to construct meaning representations. [sent-14, score-0.502]

5 person(x) ∧ live(x, Seattle)) A semantic parser might aim to construct MR1 for Q1 and MR2 for Q2; these pairings align constants (count, person, etc. [sent-20, score-0.463]

6 Such ontological mismatches become increasingly common as domain and language complexity increases. [sent-25, score-0.209]

7 In this paper, we introduce a semantic parsing approach that supports scalable, open-domain ontological reasoning. [sent-26, score-0.302]

8 It then uses a learned ontology matching model to transform this represenProce Sdeiantgtlse o,f W thaesh 2i0n1gt3o nC,o UnSfeAre,n 1c8e- o2n1 E Omctpoibriecra 2l0 M13et. [sent-29, score-0.276]

9 ,e1)7 } Figure 1: Examples of logical forms y, and sentences answers a x, domain-independent underspecified logical forms l0, fully specified drawn from the Freebase domain. [sent-46, score-1.388]

10 This two stage approach enables parsing without any domain-dependent lexicon that pairs words with logical constants. [sent-49, score-0.47]

11 Instead, word meaning is filled in on-the-fly through ontology matching, enabling the parser to infer the meaning of previously unseen words and more easily transfer across domains. [sent-50, score-0.37]

12 The first parsing stage uses a probabilistic combinatory categorial grammar (CCG) (Steedman, 2000; Clark and Curran, 2007) to map sentences to new, underspecified logical-form meaning represen- tations containing generic logical constants that are not tied to any specific ontology. [sent-52, score-1.334]

13 It enables us to incorporate a number of cues, including the target ontology structure and lexical similarity between the names of the domain-independent and dependent constants, to construct the final logical forms. [sent-55, score-0.629]

14 During learning, we estimate a linear model over derivations that include all of the CCG parsing decisions and the choices for ontology matching. [sent-56, score-0.323]

15 This approach aligns naturally with our two-stage parsing setup, where the final logical expression can be directly used to provide answers. [sent-60, score-0.471]

16 GeoQuery includes a geography database with a small ontology and questions with relatively complex, compositional structure. [sent-62, score-0.284]

17 2 Formal Overview Task Let an ontology O be a set of logical constants aLndet a knowledge O bas bee K a s beet a fc loolgliecctailon co noflogical sntdat eam kennotsw lceodngsetr bucatseed Kwi bthe caon cosltlaencttsi ofnro mof O. [sent-65, score-1.004]

18 Also, let y be a logical expression that can be executed against K to return an answer a = EXEC(y, K). [sent-70, score-0.452]

19 1O suhro goal ixs to bplueild q a rfuiensct aionnd y = PARSE(x, O) for mapping a natural language sentence x to( a domain-dependent logical lf loarnmg y. [sent-73, score-0.434]

20 from Wiktionary1 to build domain-independent underspecified logical forms, which closely mirror the linguistic structure of the sentence but do not use constants from O. [sent-76, score-1.157]

21 For example, in Figure 1, l0 decnoontesst tnhtes underspecified logical ifnor mFigs paired with each sentence x. [sent-77, score-0.811]

22 The parser then maps this intermediate representation to a logical form that uses constants from O, such as the y seen in Figure 1. [sent-78, score-0.817]

23 The lexicon is open domain, using no symbols from the ontology O for K. [sent-95, score-0.223]

24 The burden of learning word meaning is shifted to the second, ontology matching, stage of parsing (Section 5. [sent-98, score-0.311]

25 However, these techniques require training data with hand-labeled domain-specific logical expressions. [sent-109, score-0.394]

26 This approach was one of the first to scale to Freebase, but required labeled logical forms and did not jointly model semantic parsing and ontological reasoning. [sent-117, score-0.75]

27 However, we introduce the idea of learning an open-domain CCG semantic parser; all previous methods suffered, to various degrees, from the ontological mismatch problem that motivates our work. [sent-121, score-0.253]

28 The challenge of ontological mismatch has been previously recognized in many settings. [sent-122, score-0.209]

29 Hobbs (1985) describes the need for ontological promiscuity in general language understanding. [sent-123, score-0.209]

30 (2013) recently presented a scalable approach to learning an open domain QA system, where ontological mismatches are resolved with learned paraphrases. [sent-130, score-0.209]

31 4 Background Semantic Modeling We use the typed lambda calculus to build logical forms that represent the meanings of words, phrases and sentences. [sent-135, score-0.523]

32 These are then combined using the set of CCG combinators to build a logical form that captures the meaning of the entire sentence. [sent-155, score-0.525]

33 eE saecth o dfe priovsasitibolne d = hΠ, Mi builds a logical form y using constants fdro =m h tΠhe,M ontology O a . [sent-159, score-0.971]

34 l gΠic aisl a rCmC Gy parse otrenset athntast maps x teo an underspecified logical Gfo prmar l0. [sent-160, score-0.857]

35 Mree ei tsh an ontological match that maps l0 onto the fully specified logical form y. [sent-161, score-0.705]

36 1 Domain Independent Parsing Domain-independent CCG parse trees Π are built using a predefined set of 56 underspecified lexi1548 cal categories, 49 domain-independent lexical items, and the combinatory rules introduced in Section 4. [sent-164, score-0.538]

37 An underspecified CCG lexical category has a syntactic category and a logical form containing no constants from the domain ontology O. [sent-165, score-1.499]

38 Instead, the logical ftos frmro minc thlued deos underspecified c. [sent-166, score-0.811]

39 on Instsatenatsd ,t thhaet are typed placeholders which will later be replaced during ontology matching. [sent-167, score-0.227]

40 We manually define a set of POS tags for each underspecified lexical category, and use Wiktionary as a tag dictionary to define the possible POS tags for words and phrases. [sent-171, score-0.456]

41 We accordingly assign it all underspecified categories for the classes, including: N : λx. [sent-174, score-0.417]

42 Figure 3a shows the lexical categories and combinator applications used to construct the underspecified logical form l0. [sent-188, score-0.885]

43 Underspecified constants in this figure have been labeled with the words that they are ∧ associated with for readability. [sent-189, score-0.346]

44 2 Ontological Matching The second, domain specific, step M maps the underspecified logical form l0 onto the fully specified logical form y. [sent-191, score-1.342]

45 The mapping from constants in l0 to constants in y is not one-to-one. [sent-192, score-0.692]

46 The ontological match is a sequence of matching operations M = ho1 . [sent-194, score-0.289]

47 , oni that can transform the sotpruecrattuiroen osf M Mthe = logical formi or replace underspecified constants with constants from O. [sent-197, score-1.503]

48 (a) Underspecified CCG parse Π: Map words onto underspecified lexical categories as described in Section 5. [sent-198, score-0.533]

49 Use the CCG combinators to combine lexical categories to give the full underpecified logical form l0. [sent-200, score-0.498]

50 (Ienv><) each step one of the operators is applied to a subexpression of the existing logical form to generate a modified logical form with a new underspecified constant marked in bold. [sent-220, score-1.472]

51 tVAPisun bitul(iyacl(,zyP)(u∧xb,lPLicuLbr iblaircaLry (iObzr)faN∧reyOwOfY(ozNr,keN,we Y)w∧oYrAoknr)kn)u,ael) ∧y(Ae)n ualy(e) (c) Constant Matching Steps in M: Replace all underspecified constants in the transformed logical form with a similarly typed constant from O, as described in Section 5. [sent-228, score-1.324]

52 The underspecified constant to be replaced is marked isnim m boil adrl ayn tdy pceodns ctoanntssta fnrot mfro Om are wasr idtteesnc riinb typeset. [sent-231, score-0.518]

53 ’ Underspecified constants are labelled with the words from the query that they are associated with for readability. [sent-237, score-0.346]

54 Collapses merge a subexpression from l0 to create a new underspecified constant, generating a logical form with fewer constants. [sent-281, score-0.89]

55 Expansions split a subexpression from l0 to generate a new logical form containing one extra constant. [sent-282, score-0.473]

56 Collapsing Operators The collapsing operator defined in Figure 4a merges all constants in a literal to generate a single constant of the same type. [sent-283, score-0.552]

57 Performing collapses on the underspecified logical form allows non-contiguous phrases to be represented in the collapsed form. [sent-293, score-0.911]

58 In this example, the logical form representing the phrase ‘how many people visit’ has been merged with the logical form representing the non-adjacent adverb ‘annually. [sent-294, score-0.858]

59 ’ This generates a new underspecified constant that can be mapped onto the Freebase relation public library system annual visits that relates to both phrases. [sent-295, score-0.589]

60 The collapsing operations preserve semantic type, ensuring that all logical forms generated by the derivation sequence are well typed. [sent-296, score-0.58]

61 The size of this set is limited by the number of constants in l0, since each collapse removes at least one constant. [sent-298, score-0.375]

62 At each step, the number of possible collapses is polynomial in the number of constants in l0 and exponential in the arity of the most complex type in O. [sent-299, score-0.411]

63 Expansion Operators The fully specified logical form y can contain constants relating to multiple words in x. [sent-302, score-0.811]

64 It can also use multiple constants to represent the meaning of a single word. [sent-303, score-0.412]

65 2 Constant Matching To build an executable logical form y, all underspecified constants must be replaced with constants from O. [sent-311, score-1.538]

66 This is done through a sequence of consftroamnt replacement operations, gehac ah osefq qwuehnicche replaces a single underspecified constant with a constant of the same type from O. [sent-312, score-0.662]

67 TTwheo output flreo mrep pthlaec lemaste replacement operation is a fully specified logical form. [sent-315, score-0.473]

68 However, each logical form (both final and interim) can be constructed with many different derivations, and we only need to find the highest scoring one. [sent-319, score-0.459]

69 We use a CKY style chart parser to calculate the k-best logical forms output by parses of x. [sent-321, score-0.49]

70 We then store each interim logical form generated by an operator in M once in a hyper-graph chart structure. [sent-322, score-0.52]

71 The branching factor of this hypergraph is polynomial in the number of constants in l0 and linear in the size of O. [sent-323, score-0.346]

72 Subsequently, there are too many possible logical forms to enumerate explicitly; we prune as follows. [sent-324, score-0.448]

73 We allow the top N scoring ontological matches for each original subexpression in l0 and remove matches that differ from score from the maximum scoring match by more than a threshold τ. [sent-325, score-0.313]

74 When building derivations, we apply constant matching operators as soon as they are applicable to new underspecified constants created by collapses and expansions. [sent-326, score-1.061]

75 sociated with a fully specified logical form y = YIELD(d) that can be executed in K. [sent-342, score-0.465]

76 The first indicates the number of times each underspecified category is used. [sent-364, score-0.453]

77 For example, the parse in Figure 3a uses the underspecified category N : λx. [sent-365, score-0.499]

78 1) in M generate new underspecified constants that define the types of constants in the output logical form y. [sent-374, score-1.538]

79 These operators are scored using features that indicate the type of each complex-typed constant present in y and the identity of domain-independent functional constants in y. [sent-375, score-0.499]

80 The logical form y generated in Figure 3 contains one complex typed constant with type hi, he, tii and no domain-independent tfuanntc wtioitnhal t ycpoens hit,anhtes,. [sent-376, score-0.561]

81 2) in M replaces an underspecified constant cu with a constant cO from O. [sent-382, score-0.709]

82 The underspecified constant cu is associatefdr owmith O th. [sent-383, score-0.608]

83 We assume that each of the constants cO in O is associated with a string label w~ O. [sent-385, score-0.346]

84 The feature φnp(cu, cO) signals the replacement of an entity-typed constant cu with entity cO that has label For the second example in Figure 1 this feature indicates the replacement of the underspecified constant associated with the word ‘mozart’ with the Freebase entity mo z art. [sent-387, score-0.795]

85 Knowledge Base Features Guided by the observation that we generally want to create queries y which have answers in knowledge base K, we defwinhei cfhea htuarvees atnos signal wnh kentohwerl eedagceh operation ceo duledbuild a logical form y with an answer in K. [sent-397, score-0.542]

86 ebase property date of birth does not take arguments of type location, φpp(y, K) will fire if y contains the logical form λxλy. [sent-409, score-0.455]

87 FQ contains 917 questions labeled with logical form meaning representations for querying Freebase. [sent-415, score-0.556]

88 We gathered question answer labels by executing the logical forms against Freebase, and manually correcting any inconsistencies. [sent-416, score-0.478]

89 We report two different experiments on the FQ data: test results on the existing 642/275 train/test split and domain adaptation results where the data is split three ways, partitioning the topics so that the logical meaning expressions do not share any symbols across folds. [sent-421, score-0.46]

90 We initialize weights for φpp and φemp to -1 to favour logical forms that have an interpretation in the knowledge base K. [sent-425, score-0.448]

91 CY13 and FUBL report fully correct logical forms, which is a close proxy to our numbers. [sent-442, score-0.394]

92 93 approach outperforms the previous state of the art, achieving a nine point improvement in test recall, while not requiring labeled logical forms in training. [sent-451, score-0.448]

93 The learned ontology matching model is able to reason about previously unseen ontological subdomains as well as if it was provided explicit, in-domain training data. [sent-453, score-0.485]

94 The first and second examples show parse failures, where the underspecified CCG grammar did not have sufficient coverage. [sent-487, score-0.463]

95 The third shows a failed structural match, where all of the correct logical constants are selected, but the argument order is reversed for one of the literals. [sent-488, score-0.74]

96 We introduced a new approach for learning a two-stage semantic parser that enables scalable, on-the-fly ontological matching. [sent-494, score-0.295]

97 Large-scale semantic parsing via schema matching and lexicon extension. [sent-529, score-0.233]

98 Inducing probabilistic CCG grammars from logical form with higherorder unification. [sent-637, score-0.464]

99 Learning to map sentences to logical form: Structured classification with probabilistic categorial grammars. [sent-703, score-0.42]

100 Online learning of relaxed CCG grammars for parsing to logical form. [sent-708, score-0.478]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('underspecified', 0.417), ('logical', 0.394), ('constants', 0.346), ('ccg', 0.247), ('freebase', 0.223), ('ontological', 0.209), ('ontology', 0.196), ('fq', 0.157), ('zettlemoyer', 0.14), ('constant', 0.101), ('geoquery', 0.1), ('kwiatkowski', 0.1), ('cu', 0.09), ('yates', 0.089), ('matching', 0.08), ('derivations', 0.078), ('artzi', 0.078), ('exec', 0.075), ('mooney', 0.074), ('steedman', 0.073), ('cai', 0.073), ('qa', 0.068), ('co', 0.068), ('meaning', 0.066), ('collapses', 0.065), ('dcs', 0.065), ('operator', 0.061), ('stem', 0.058), ('gen', 0.057), ('database', 0.057), ('forms', 0.054), ('wiktionary', 0.052), ('krishnamurthy', 0.052), ('operators', 0.052), ('parsing', 0.049), ('parse', 0.046), ('euzenat', 0.045), ('fubl', 0.045), ('mozart', 0.045), ('ork', 0.045), ('parsers', 0.045), ('subexpression', 0.044), ('collapsing', 0.044), ('lambda', 0.044), ('semantic', 0.044), ('queries', 0.044), ('derivation', 0.044), ('replacement', 0.043), ('parser', 0.042), ('seattle', 0.041), ('library', 0.04), ('lf', 0.04), ('join', 0.04), ('answers', 0.039), ('lexical', 0.039), ('goldwater', 0.037), ('category', 0.036), ('benchmark', 0.036), ('combinatory', 0.036), ('goldwasser', 0.036), ('specified', 0.036), ('loc', 0.035), ('form', 0.035), ('grammars', 0.035), ('failures', 0.033), ('gl', 0.033), ('zelle', 0.033), ('schema', 0.033), ('ai', 0.032), ('typed', 0.031), ('pairings', 0.031), ('onto', 0.031), ('questions', 0.031), ('answer', 0.03), ('ccgs', 0.03), ('combinators', 0.03), ('emp', 0.03), ('interim', 0.03), ('ualy', 0.03), ('scoring', 0.03), ('visit', 0.03), ('representations', 0.03), ('collapse', 0.029), ('yield', 0.028), ('databases', 0.028), ('expression', 0.028), ('execution', 0.027), ('calculates', 0.027), ('fader', 0.027), ('collins', 0.027), ('domains', 0.027), ('xi', 0.027), ('facts', 0.027), ('lexicon', 0.027), ('categorial', 0.026), ('birth', 0.026), ('alshawi', 0.026), ('bollacker', 0.026), ('davidson', 0.026), ('hobbs', 0.026)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999952 164 emnlp-2013-Scaling Semantic Parsers with On-the-Fly Ontology Matching

Author: Tom Kwiatkowski ; Eunsol Choi ; Yoav Artzi ; Luke Zettlemoyer

Abstract: We consider the challenge of learning semantic parsers that scale to large, open-domain problems, such as question answering with Freebase. In such settings, the sentences cover a wide variety of topics and include many phrases whose meaning is difficult to represent in a fixed target ontology. For example, even simple phrases such as ‘daughter’ and ‘number of people living in’ cannot be directly represented in Freebase, whose ontology instead encodes facts about gender, parenthood, and population. In this paper, we introduce a new semantic parsing approach that learns to resolve such ontological mismatches. The parser is learned from question-answer pairs, uses a probabilistic CCG to build linguistically motivated logicalform meaning representations, and includes an ontology matching model that adapts the output logical forms for each target ontology. Experiments demonstrate state-of-the-art performance on two benchmark semantic parsing datasets, including a nine point accuracy improvement on a recent Freebase QA corpus.

2 0.37722576 166 emnlp-2013-Semantic Parsing on Freebase from Question-Answer Pairs

Author: Jonathan Berant ; Andrew Chou ; Roy Frostig ; Percy Liang

Abstract: In this paper, we train a semantic parser that scales up to Freebase. Instead of relying on annotated logical forms, which is especially expensive to obtain at large scale, we learn from question-answer pairs. The main challenge in this setting is narrowing down the huge number of possible logical predicates for a given question. We tackle this problem in two ways: First, we build a coarse mapping from phrases to predicates using a knowledge base and a large text corpus. Second, we use a bridging operation to generate additional predicates based on neighboring predicates. On the dataset ofCai and Yates (2013), despite not having annotated logical forms, our system outperforms their state-of-the-art parser. Additionally, we collected a more realistic and challenging dataset of question-answer pairs and improves over a natural baseline.

3 0.28293455 119 emnlp-2013-Learning Distributions over Logical Forms for Referring Expression Generation

Author: Nicholas FitzGerald ; Yoav Artzi ; Luke Zettlemoyer

Abstract: We present a new approach to referring expression generation, casting it as a density estimation problem where the goal is to learn distributions over logical expressions identifying sets of objects in the world. Despite an extremely large space of possible expressions, we demonstrate effective learning of a globally normalized log-linear distribution. This learning is enabled by a new, multi-stage approximate inference technique that uses a pruning model to construct only the most likely logical forms. We train and evaluate the approach on a new corpus of references to sets of visual objects. Experiments show the approach is able to learn accurate models, which generate over 87% of the expressions people used. Additionally, on the previously studied special case of single object reference, we show a 35% relative error reduction over previous state of the art.

4 0.16983914 193 emnlp-2013-Unsupervised Induction of Cross-Lingual Semantic Relations

Author: Mike Lewis ; Mark Steedman

Abstract: Creating a language-independent meaning representation would benefit many crosslingual NLP tasks. We introduce the first unsupervised approach to this problem, learning clusters of semantically equivalent English and French relations between referring expressions, based on their named-entity arguments in large monolingual corpora. The clusters can be used as language-independent semantic relations, by mapping clustered expressions in different languages onto the same relation. Our approach needs no parallel text for training, but outperforms a baseline that uses machine translation on a cross-lingual question answering task. We also show how to use the semantics to improve the accuracy of machine translation, by using it in a simple reranker.

5 0.11444028 110 emnlp-2013-Joint Bootstrapping of Corpus Annotations and Entity Types

Author: Hrushikesh Mohapatra ; Siddhanth Jain ; Soumen Chakrabarti

Abstract: Web search can be enhanced in powerful ways if token spans in Web text are annotated with disambiguated entities from large catalogs like Freebase. Entity annotators need to be trained on sample mention snippets. Wikipedia entities and annotated pages offer high-quality labeled data for training and evaluation. Unfortunately, Wikipedia features only one-ninth the number of entities as Freebase, and these are a highly biased sample of well-connected, frequently mentioned “head” entities. To bring hope to “tail” entities, we broaden our goal to a second task: assigning types to entities in Freebase but not Wikipedia. The two tasks are synergistic: knowing the types of unfamiliar entities helps disambiguate mentions, and words in mention contexts help assign types to entities. We present TMI, a bipartite graphical model for joint type-mention inference. TMI attempts no schema integration or entity resolution, but exploits the above-mentioned synergy. In experiments involving 780,000 people in Wikipedia, 2.3 million people in Freebase, 700 million Web pages, and over 20 professional editors, TMI shows considerable annotation accuracy improvement (e.g., 70%) compared to baselines (e.g., 46%), especially for “tail” and emerging entities. We also compare with Google’s recent annotations of the same corpus with Freebase entities, and report considerable improvements within the people domain.

6 0.10588652 51 emnlp-2013-Connecting Language and Knowledge Bases with Embedding Models for Relation Extraction

7 0.073071972 31 emnlp-2013-Automatic Feature Engineering for Answer Selection and Extraction

8 0.066232495 146 emnlp-2013-Optimal Incremental Parsing via Best-First Dynamic Programming

9 0.063646369 7 emnlp-2013-A Hierarchical Entity-Based Approach to Structuralize User Generated Content in Social Media: A Case of Yahoo! Answers

10 0.063399091 19 emnlp-2013-Adaptor Grammars for Learning Non-Concatenative Morphology

11 0.059473798 126 emnlp-2013-MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text

12 0.058552746 12 emnlp-2013-A Semantically Enhanced Approach to Determine Textual Similarity

13 0.054511886 32 emnlp-2013-Automatic Idiom Identification in Wiktionary

14 0.05254316 49 emnlp-2013-Combining Generative and Discriminative Model Scores for Distant Supervision

15 0.051957004 194 emnlp-2013-Unsupervised Relation Extraction with General Domain Knowledge

16 0.050613645 127 emnlp-2013-Max-Margin Synchronous Grammar Induction for Machine Translation

17 0.049228631 190 emnlp-2013-Ubertagging: Joint Segmentation and Supertagging for English

18 0.048694618 40 emnlp-2013-Breaking Out of Local Optima with Count Transforms and Model Recombination: A Study in Grammar Induction

19 0.047656074 97 emnlp-2013-Identifying Web Search Query Reformulation using Concept based Matching

20 0.047510415 106 emnlp-2013-Inducing Document Plans for Concept-to-Text Generation


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.202), (1, 0.051), (2, 0.019), (3, 0.062), (4, -0.09), (5, 0.31), (6, -0.135), (7, 0.034), (8, 0.255), (9, 0.123), (10, 0.034), (11, -0.087), (12, 0.177), (13, -0.068), (14, 0.158), (15, 0.004), (16, 0.054), (17, -0.107), (18, -0.143), (19, 0.006), (20, -0.054), (21, -0.165), (22, 0.06), (23, 0.127), (24, -0.008), (25, -0.048), (26, 0.022), (27, -0.09), (28, 0.013), (29, -0.006), (30, 0.094), (31, -0.086), (32, -0.036), (33, -0.048), (34, 0.003), (35, 0.04), (36, 0.031), (37, -0.048), (38, 0.163), (39, 0.0), (40, 0.012), (41, -0.123), (42, -0.038), (43, -0.014), (44, 0.04), (45, -0.001), (46, 0.053), (47, -0.052), (48, -0.01), (49, -0.015)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95358735 164 emnlp-2013-Scaling Semantic Parsers with On-the-Fly Ontology Matching

Author: Tom Kwiatkowski ; Eunsol Choi ; Yoav Artzi ; Luke Zettlemoyer

Abstract: We consider the challenge of learning semantic parsers that scale to large, open-domain problems, such as question answering with Freebase. In such settings, the sentences cover a wide variety of topics and include many phrases whose meaning is difficult to represent in a fixed target ontology. For example, even simple phrases such as ‘daughter’ and ‘number of people living in’ cannot be directly represented in Freebase, whose ontology instead encodes facts about gender, parenthood, and population. In this paper, we introduce a new semantic parsing approach that learns to resolve such ontological mismatches. The parser is learned from question-answer pairs, uses a probabilistic CCG to build linguistically motivated logicalform meaning representations, and includes an ontology matching model that adapts the output logical forms for each target ontology. Experiments demonstrate state-of-the-art performance on two benchmark semantic parsing datasets, including a nine point accuracy improvement on a recent Freebase QA corpus.

2 0.84629649 166 emnlp-2013-Semantic Parsing on Freebase from Question-Answer Pairs

Author: Jonathan Berant ; Andrew Chou ; Roy Frostig ; Percy Liang

Abstract: In this paper, we train a semantic parser that scales up to Freebase. Instead of relying on annotated logical forms, which is especially expensive to obtain at large scale, we learn from question-answer pairs. The main challenge in this setting is narrowing down the huge number of possible logical predicates for a given question. We tackle this problem in two ways: First, we build a coarse mapping from phrases to predicates using a knowledge base and a large text corpus. Second, we use a bridging operation to generate additional predicates based on neighboring predicates. On the dataset ofCai and Yates (2013), despite not having annotated logical forms, our system outperforms their state-of-the-art parser. Additionally, we collected a more realistic and challenging dataset of question-answer pairs and improves over a natural baseline.

3 0.74893361 119 emnlp-2013-Learning Distributions over Logical Forms for Referring Expression Generation

Author: Nicholas FitzGerald ; Yoav Artzi ; Luke Zettlemoyer

Abstract: We present a new approach to referring expression generation, casting it as a density estimation problem where the goal is to learn distributions over logical expressions identifying sets of objects in the world. Despite an extremely large space of possible expressions, we demonstrate effective learning of a globally normalized log-linear distribution. This learning is enabled by a new, multi-stage approximate inference technique that uses a pruning model to construct only the most likely logical forms. We train and evaluate the approach on a new corpus of references to sets of visual objects. Experiments show the approach is able to learn accurate models, which generate over 87% of the expressions people used. Additionally, on the previously studied special case of single object reference, we show a 35% relative error reduction over previous state of the art.

4 0.53929329 193 emnlp-2013-Unsupervised Induction of Cross-Lingual Semantic Relations

Author: Mike Lewis ; Mark Steedman

Abstract: Creating a language-independent meaning representation would benefit many crosslingual NLP tasks. We introduce the first unsupervised approach to this problem, learning clusters of semantically equivalent English and French relations between referring expressions, based on their named-entity arguments in large monolingual corpora. The clusters can be used as language-independent semantic relations, by mapping clustered expressions in different languages onto the same relation. Our approach needs no parallel text for training, but outperforms a baseline that uses machine translation on a cross-lingual question answering task. We also show how to use the semantics to improve the accuracy of machine translation, by using it in a simple reranker.

5 0.37810719 43 emnlp-2013-Cascading Collective Classification for Bridging Anaphora Recognition using a Rich Linguistic Feature Set

Author: Yufang Hou ; Katja Markert ; Michael Strube

Abstract: Recognizing bridging anaphora is difficult due to the wide variation within the phenomenon, the resulting lack of easily identifiable surface markers and their relative rarity. We develop linguistically motivated discourse structure, lexico-semantic and genericity detection features and integrate these into a cascaded minority preference algorithm that models bridging recognition as a subtask of learning finegrained information status (IS). We substantially improve bridging recognition without impairing performance on other IS classes.

6 0.32723346 110 emnlp-2013-Joint Bootstrapping of Corpus Annotations and Entity Types

7 0.30579183 51 emnlp-2013-Connecting Language and Knowledge Bases with Embedding Models for Relation Extraction

8 0.2948826 196 emnlp-2013-Using Crowdsourcing to get Representations based on Regular Expressions

9 0.29426616 190 emnlp-2013-Ubertagging: Joint Segmentation and Supertagging for English

10 0.28807843 12 emnlp-2013-A Semantically Enhanced Approach to Determine Textual Similarity

11 0.26806095 126 emnlp-2013-MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text

12 0.26132491 185 emnlp-2013-Towards Situated Dialogue: Revisiting Referring Expression Generation

13 0.24297409 50 emnlp-2013-Combining PCFG-LA Models with Dual Decomposition: A Case Study with Function Labels and Binarization

14 0.24258922 10 emnlp-2013-A Multi-Teraflop Constituency Parser using GPUs

15 0.23519173 32 emnlp-2013-Automatic Idiom Identification in Wiktionary

16 0.23382327 146 emnlp-2013-Optimal Incremental Parsing via Best-First Dynamic Programming

17 0.22980465 195 emnlp-2013-Unsupervised Spectral Learning of WCFG as Low-rank Matrix Completion

18 0.21998088 199 emnlp-2013-Using Topic Modeling to Improve Prediction of Neuroticism and Depression in College Students

19 0.21841589 14 emnlp-2013-A Synchronous Context Free Grammar for Time Normalization

20 0.21641944 62 emnlp-2013-Detection of Product Comparisons - How Far Does an Out-of-the-Box Semantic Role Labeling System Take You?


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.016), (3, 0.031), (9, 0.01), (18, 0.243), (22, 0.04), (30, 0.08), (50, 0.025), (51, 0.254), (66, 0.023), (71, 0.028), (75, 0.045), (77, 0.014), (90, 0.012), (96, 0.024), (97, 0.039)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.98207396 49 emnlp-2013-Combining Generative and Discriminative Model Scores for Distant Supervision

Author: Benjamin Roth ; Dietrich Klakow

Abstract: Distant supervision is a scheme to generate noisy training data for relation extraction by aligning entities of a knowledge base with text. In this work we combine the output of a discriminative at-least-one learner with that of a generative hierarchical topic model to reduce the noise in distant supervision data. The combination significantly increases the ranking quality of extracted facts and achieves state-of-the-art extraction performance in an end-to-end setting. A simple linear interpolation of the model scores performs better than a parameter-free scheme based on nondominated sorting.

2 0.97649652 169 emnlp-2013-Semi-Supervised Representation Learning for Cross-Lingual Text Classification

Author: Min Xiao ; Yuhong Guo

Abstract: Cross-lingual adaptation aims to learn a prediction model in a label-scarce target language by exploiting labeled data from a labelrich source language. An effective crosslingual adaptation system can substantially reduce the manual annotation effort required in many natural language processing tasks. In this paper, we propose a new cross-lingual adaptation approach for document classification based on learning cross-lingual discriminative distributed representations of words. Specifically, we propose to maximize the loglikelihood of the documents from both language domains under a cross-lingual logbilinear document model, while minimizing the prediction log-losses of labeled documents. We conduct extensive experiments on cross-lingual sentiment classification tasks of Amazon product reviews. Our experimental results demonstrate the efficacy of the pro- posed cross-lingual adaptation approach.

3 0.94375873 71 emnlp-2013-Efficient Left-to-Right Hierarchical Phrase-Based Translation with Improved Reordering

Author: Maryam Siahbani ; Baskaran Sankaran ; Anoop Sarkar

Abstract: Left-to-right (LR) decoding (Watanabe et al., 2006b) is a promising decoding algorithm for hierarchical phrase-based translation (Hiero). It generates the target sentence by extending the hypotheses only on the right edge. LR decoding has complexity O(n2b) for input of n words and beam size b, compared to O(n3) for the CKY algorithm. It requires a single language model (LM) history for each target hypothesis rather than two LM histories per hypothesis as in CKY. In this paper we present an augmented LR decoding algorithm that builds on the original algorithm in (Watanabe et al., 2006b). Unlike that algorithm, using experiments over multiple language pairs we show two new results: our LR decoding algorithm provides demonstrably more efficient decoding than CKY Hiero, four times faster; and by introducing new distortion and reordering features for LR decoding, it maintains the same translation quality (as in BLEU scores) ob- tained phrase-based and CKY Hiero with the same translation model.

same-paper 4 0.93120873 164 emnlp-2013-Scaling Semantic Parsers with On-the-Fly Ontology Matching

Author: Tom Kwiatkowski ; Eunsol Choi ; Yoav Artzi ; Luke Zettlemoyer

Abstract: We consider the challenge of learning semantic parsers that scale to large, open-domain problems, such as question answering with Freebase. In such settings, the sentences cover a wide variety of topics and include many phrases whose meaning is difficult to represent in a fixed target ontology. For example, even simple phrases such as ‘daughter’ and ‘number of people living in’ cannot be directly represented in Freebase, whose ontology instead encodes facts about gender, parenthood, and population. In this paper, we introduce a new semantic parsing approach that learns to resolve such ontological mismatches. The parser is learned from question-answer pairs, uses a probabilistic CCG to build linguistically motivated logicalform meaning representations, and includes an ontology matching model that adapts the output logical forms for each target ontology. Experiments demonstrate state-of-the-art performance on two benchmark semantic parsing datasets, including a nine point accuracy improvement on a recent Freebase QA corpus.

5 0.91674942 120 emnlp-2013-Learning Latent Word Representations for Domain Adaptation using Supervised Word Clustering

Author: Min Xiao ; Feipeng Zhao ; Yuhong Guo

Abstract: Domain adaptation has been popularly studied on exploiting labeled information from a source domain to learn a prediction model in a target domain. In this paper, we develop a novel representation learning approach to address domain adaptation for text classification with automatically induced discriminative latent features, which are generalizable across domains while informative to the prediction task. Specifically, we propose a hierarchical multinomial Naive Bayes model with latent variables to conduct supervised word clustering on labeled documents from both source and target domains, and then use the produced cluster distribution of each word as its latent feature representation for domain adaptation. We train this latent graphical model us- ing a simple expectation-maximization (EM) algorithm. We empirically evaluate the proposed method with both cross-domain document categorization tasks on Reuters-21578 dataset and cross-domain sentiment classification tasks on Amazon product review dataset. The experimental results demonstrate that our proposed approach achieves superior performance compared with alternative methods.

6 0.87200862 51 emnlp-2013-Connecting Language and Knowledge Bases with Embedding Models for Relation Extraction

7 0.86333275 64 emnlp-2013-Discriminative Improvements to Distributional Sentence Similarity

8 0.83461177 53 emnlp-2013-Cross-Lingual Discriminative Learning of Sequence Models with Posterior Regularization

9 0.82905549 82 emnlp-2013-Exploring Representations from Unlabeled Data with Co-training for Chinese Word Segmentation

10 0.82804275 189 emnlp-2013-Two-Stage Method for Large-Scale Acquisition of Contradiction Pattern Pairs using Entailment

11 0.8277142 193 emnlp-2013-Unsupervised Induction of Cross-Lingual Semantic Relations

12 0.82412112 154 emnlp-2013-Prior Disambiguation of Word Tensors for Constructing Sentence Vectors

13 0.82092929 204 emnlp-2013-Word Level Language Identification in Online Multilingual Communication

14 0.82003903 134 emnlp-2013-Modeling and Learning Semantic Co-Compositionality through Prototype Projections and Neural Networks

15 0.8186276 119 emnlp-2013-Learning Distributions over Logical Forms for Referring Expression Generation

16 0.81820333 157 emnlp-2013-Recursive Autoencoders for ITG-Based Translation

17 0.81468046 139 emnlp-2013-Noise-Aware Character Alignment for Bootstrapping Statistical Machine Transliteration from Bilingual Corpora

18 0.81186604 86 emnlp-2013-Feature Noising for Log-Linear Structured Prediction

19 0.8110655 79 emnlp-2013-Exploiting Multiple Sources for Open-Domain Hypernym Discovery

20 0.81081885 56 emnlp-2013-Deep Learning for Chinese Word Segmentation and POS Tagging