emnlp emnlp2012 emnlp2012-97 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Mohamed Yahya ; Klaus Berberich ; Shady Elbassuoni ; Maya Ramanath ; Volker Tresp ; Gerhard Weikum
Abstract: The Linked Data initiative comprises structured databases in the Semantic-Web data model RDF. Exploring this heterogeneous data by structured query languages is tedious and error-prone even for skilled users. To ease the task, this paper presents a methodology for translating natural language questions into structured SPARQL queries over linked-data sources. Our method is based on an integer linear program to solve several disambiguation tasks jointly: the segmentation of questions into phrases; the mapping of phrases to semantic entities, classes, and relations; and the construction of SPARQL triple patterns. Our solution harnesses the rich type system provided by knowledge bases in the web of linked data, to constrain our semantic-coherence objective function. We present experiments on both the . in question translation and the resulting query answering.
Reference: text
sentIndex sentText sentNum sentScore
1 Exploring this heterogeneous data by structured query languages is tedious and error-prone even for skilled users. [sent-9, score-0.23]
2 To ease the task, this paper presents a methodology for translating natural language questions into structured SPARQL queries over linked-data sources. [sent-10, score-0.406]
3 Our method is based on an integer linear program to solve several disambiguation tasks jointly: the segmentation of questions into phrases; the mapping of phrases to semantic entities, classes, and relations; and the construction of SPARQL triple patterns. [sent-11, score-0.902]
4 in question translation and the resulting query answering. [sent-14, score-0.278]
5 For less initiated users the only option to query this rich data is by keyword search (e. [sent-26, score-0.272]
6 As an example, consider a quiz question like “Which female actor played in Casablanca and is married to a writer who was born in Rome? [sent-33, score-0.446]
7 One can think of different formulations of the example question, such as “Which actress from Casablanca is married to a writer from Rome? [sent-36, score-0.232]
8 A possible SPARQL formulation, assuming a user familiar with the schema of the underlying knowledge base(s), could consist of the following six triple patterns (joined by shared-variable bind- ings): isa actor, ? [sent-38, score-0.35]
9 Our goal is to automatically create such structured queries by mapping the user’s question into this representation. [sent-50, score-0.444]
10 In the example, the obvious keyword query “female actress Casablanca married writer born Rome” lacks a clear specification of the relations among the different entities. [sent-53, score-0.601]
11 2 Problem Given a natural language question qNL and a knowledge base KB, our goal is to translate qNL into a formal query qFL that captures the information need expressed by qNL. [sent-55, score-0.319]
12 We focus on input questions that put the emphasis on entities, classes, and relations between them. [sent-56, score-0.225]
13 As a result, we generate structured queries of the form known as conjunctive queries or select-project-join queries in database terminology. [sent-59, score-0.586]
14 0, where the above focus leads to queries that consist of multiple triple patterns, that is, conjunctions of SPO search conditions. [sent-61, score-0.416]
15 We do not use any pre-existing query templates, but generate queries from scratch as they involve a variable number of joins with apriori unknown join structure. [sent-62, score-0.34]
16 For example, a phrase like “wrote score for” in a question about film music composers, could map to the composerfilm relation wroteSoundtrackForFilm, to the class of movieSoundtracks (a subclass of music pieces), or to an entity like the movie “The Score”. [sent-73, score-0.465]
17 Depending on the choice, we may arrive at a structurally 380 good query (with triple patterns that can actually be joined) or at a meaningless and non-executable query (with disconnected triple patterns). [sent-74, score-0.885]
18 This generalized disambiguation problem is much more challenging than the more focused task of named entity disambiguation (NED). [sent-75, score-0.422]
19 3 Contribution In our approach, we introduce new elements towards making translation of questions into SPARQL triple patterns more expressive and robust. [sent-80, score-0.479]
20 The first three steps prepare the input for constructing a disambiguation graph for mapping the phrases in a question onto entities, classes, and relations, in a coherent manner. [sent-103, score-0.566]
21 The fourth step formulates this generalized disambiguation problem as an ILP with complex constraints and computes the best solution using an ILP solver. [sent-104, score-0.246]
22 Phrases are detected that potentially correspond to semantic items such as 381 ‘Who’, ‘played in’, ‘movie’ and ‘Casablanca’ . [sent-116, score-0.249]
23 This includes finding that the phrase ‘played in’ can either refer to the semantic relation actedIn or to playedForTeam and that the phrase ‘Casablanca’ can potentially refer to Casablanca ( ( film) or Casablanca, Morocco. [sent-119, score-0.353]
24 Intuitively, a q-unit is a triple composed of phrases. [sent-124, score-0.242]
25 Here, we determine for our running example that ‘played in’ refers to the semantic relation actedIn and not to playedForTeam and the phrase ‘Casablanca’ refers to Casablanca ( ( film) and not Cas ablanca ,Mo rocco. [sent-129, score-0.276]
26 For example, we determine that the relation marriedTo connects person referred to by ‘Who’ and writer to form the semantic triple person marriedTo writer. [sent-132, score-0.591]
27 For SPARQL queries, semantic triples such as person marriedTo writer have to be mapped to suitable triple patterns with appropriate join conditions expressed through common variables: ? [sent-136, score-0.555]
28 1 Phrase Detection A detected phrase p is a pair < Toks, l > where Toks is a phrase and l is a label, l ∈ {concept, relation}, indicating whether a phrase i∈s a c reonlactieopnt phrase or a concept phrase. [sent-142, score-0.412]
29 Pr ri sa t phher sasete o isf all detected relation phrases and Pc is the set of all detected concept phrases. [sent-143, score-0.381]
30 One special type of detected relation phrase is the null phrase, where no relation is explicitly mentioned, but can be induced. [sent-144, score-0.408]
31 This dictionary was mostly constructed as part of the knowledge base, independently of the questionto-query translation task in the form of instances of the means relation in Yago2, an example of which is shown in Figure 1 For relation detection, we experimented with various approaches. [sent-149, score-0.226]
32 2 Phrase Mapping After phrases are detected, each phrase is mapped to a set of semantic items. [sent-153, score-0.31]
33 The mapping of concept phrases also relies on the phrase-concept dictionary. [sent-154, score-0.236]
34 To map relation phrases, we rely on a corpus of textual patterns to relation mappings of the form: {‘play’ ,‘star in’,‘act’,‘leading role’ } → actedIn {‘married’ , ‘spouse’, ‘wife’ } → marriedTo Distinct phrase occurrences will map to different semantic item instances. [sent-155, score-0.505]
35 We discuss why this is important when we discuss the construction of the disambiguation graph and variable assignment in the structured query. [sent-156, score-0.323]
36 3 Dependency Parsing & Q-Unit Generation Dependency parsing identifies triples of tokens, or triploids, htrel , targ1 , targ2i, where trel , targ1 , targ2 ∈ qNL are seeds for phrases, with the triploid acting as a seed for a potential SPARQL triple pattern. [sent-158, score-0.294]
37 A q-unit is a triple of sets of phrases, h{prel ∈ Pr}, {parg1 ∈ Pc}, {parg2 ∈ Pc}i, where trel ∈ prel an}d, similarly Pfor} arg1 and arg2. [sent-166, score-0.294]
38 Conceptually, one can view a q-unit as a placeholder node with three sets of edges, each connecting the same q-node to a phrase that corresponds to a relation or concept phrase in the same q-unit. [sent-167, score-0.36]
39 This notion of nodes and edges will be made more concrete when we present our disambiguation graph construction. [sent-168, score-0.295]
40 4 Disambiguation of Phrase Mappings The core contribution of this paper is a framework for disambiguating phrases into semantic items covering relations, classes, and entities in a unified manner. [sent-170, score-0.395]
41 This can be seen as a joint task combining named entity disambiguation for entities, word sense disambiguation for classes (common nouns), and relation extraction. [sent-171, score-0.604]
42 5 Query Generation Once phrases are mapped to unique semantic items, we proceed to generate queries in two steps. [sent-174, score-0.407]
43 The power of using a knowledge base is that we have a rich type system that allows us to tell if two semantic items are compatible or not. [sent-177, score-0.269]
44 Each relation has a type signature and we check whether the candidate items are compatible with the signature. [sent-178, score-0.29]
45 We did not assign subject/object roles in triploids and q-units because a natural language relation phrase might express the inverse of a semantic relation, e. [sent-179, score-0.363]
46 Once semantic items are grouped into triples, it is an easy task to expand them to SPARQL triple patterns. [sent-184, score-0.428]
47 4 Joint Disambiguation The goal of the disambiguation step is to compute a partial mapping of phrases onto semantic items, such that each phrase is assigned to at most one semantic item. [sent-189, score-0.655]
48 As the result of disambiguating one phrase can influence the mapping of other phrases, we consider all phrases jointly in one big disambiguation task. [sent-191, score-0.483]
49 383 In the following, we construct a disambiguation graph that encodes all possible mappings. [sent-192, score-0.259]
50 Because of this complication and to capture our complex constraints, we do not em- × ploy graph algorithms, but model the general disambiguation problem as an ILP. [sent-201, score-0.259]
51 1 Disambiguation Graph Joint disambiguation takes place over a disambiguation graph DG = (V, E), where V = Vs ∪ Vp ∪ Vq and• E Vs =is E thsime s∪et E ofco she∪ma Enqti,c w ihteemres:, • • • • • vs ∈ Vs is an Vs-node. [sent-203, score-0.521]
52 We denote the set of p-nodes corresponding to relation phrases by Vrp and the set of pnodes corresponding to concept phrases by Vrc. [sent-205, score-0.356]
53 Esim Vp Vs is a set of weighted similarity edges ⊆tha Vt capture the strength of the mapping of a phrase to a semantic item. [sent-208, score-0.293]
54 Ecoh Vs Vs is a set of weighted coherence edges ⊆tha Vt capture the semantic coherence between two semantic items. [sent-209, score-0.32]
55 Figure 3 shows the disambiguation graph for our running example (excluding coherence edges between s-nodes). [sent-214, score-0.351]
56 For Yago2, we characterize an entity e by its inlinks InLinks(e): the set of Yago2 entities whose corresponding Wikipedia pages link to the entity. [sent-223, score-0.265]
57 To be able to compare semantic items of different semantic types (entities, relations, and classes), we need to extend this to classes and relations. [sent-224, score-0.341]
58 For class c with entities e, its inlinks are defined as follows: InLinks(c) = Se∈c Inlinks(e) For relations, we only cSonsider those that map entities to entities (e. [sent-225, score-0.481]
59 actedIn, produced), for which we define the set of inlinks as follows: InLinks(r) = S(e1 (InLinks(e1 ) ∩ InLinks(e2)) The intuition bSehind this is that when the two arguments of an instance of the relation co-occur, then the relation is being expressed. [sent-227, score-0.383]
60 ,e2)∈r 384 We define the semantic coherence (Cohsem) between two semantic items s1 and s2 as the Jaccard coefficient of their sets of inlinks. [sent-228, score-0.328]
61 3 Disambiguation Graph Processing The result of disambiguation is a subgraph of the disambiguation graph, yielding the most coherent mappings. [sent-237, score-0.422]
62 Each semantic triple should include a relation: + + Er ≥ Qmn0d Xn0 Yn0r 2 ∀m, n0, r, d = rel 8. [sent-267, score-0.375]
63 Each triple should have at least one class: Cc1 Cc2 ≥ Qmn00d1 Xn00 Yn000c1+ Qmn000d2 + Xn000 + Yn000c2 5, ∀m, n00, n000, r, c1, c2, d1 = arg1, d2 = arg2 This is not invoked for existential questions that return Boolean answers and are translated to ASK queries in SPARQL. [sent-268, score-0.683]
64 Figure 4 shows the resulting subgraph for the disambiguation graph of Figure 3. [sent-276, score-0.259]
65 1 Datasets Our experiments are based on two collections of questions: the QALD-1 task for question answering over linked data (QAL, 2011) and a collection of questions used in (Elbassuoni et al. [sent-279, score-0.432]
66 , 2009) in the context of the NAGA project, for informative ranking of SPARQL query answers (Elbassuoni et al. [sent-281, score-0.265]
67 For both collections, some questions are out-of-scope for our setting, because they mention entities or relations that are not available in the underlying datasets, contain date or time comparisons, or involve aggregation such as counting. [sent-289, score-0.333]
68 2 Evaluation Metrics We evaluated the output of DEANNA at three stages in the processing pipeline: a) after the disambiguation of phrases, b) after the generation of the SPARQL query, and c) after obtaining answers from the underlying linked-data sources. [sent-293, score-0.31]
69 For the disambiguation stage, the judges looked at each q-node/s-node pair, in the context of the question and the underlying data schemas, and determined whether the mapping was correct or not and whether any expected mappings were missing. [sent-300, score-0.503]
70 For the query-generation stage, the judges looked at each triple pattern and determined whether the pattern was meaningful for the question or not and whether any expected triple pattern was missing. [sent-301, score-0.635]
71 Note that, because our approach does not use any query templates, the same question may generate semantically equivalent queries that differ widely in terms of their structure. [sent-302, score-0.452]
72 Hence, we rely on our evaluation metrics that are based on triple patterns, as there is no gold-standard query for a given question. [sent-303, score-0.408]
73 , q-node/s-node pairs or triple 386 patterns) regardless of the questions to which they belong. [sent-308, score-0.41]
74 2 Query Generation Table 2 shows the same metrics for the generated triple patterns. [sent-319, score-0.242]
75 Missing or incorrect triple patterns can be attributed to (i) incorrect mappings in the disambiguation stage or (ii) incorrect detection of dependencies between phrases despite having the correct mappings. [sent-321, score-0.67]
76 Here, we attempt to generate answers to questions by executing the generated queries over the datasets. [sent-325, score-0.441]
77 The table shows the number of questions for which the system successfully generated SPARQL queries (#queries), and among those, how many resulted in satisfactory answers as judged by our evaluators (#satisfactory). [sent-326, score-0.441]
78 heirSMaPRNnmrusOpewNyltaETArysonWdBwmreLaisndcol Queries that produced no answers, such as the third query in Table 4 were further relaxed using an incarnation of the techniques described in (Elbassuoni et al. [sent-344, score-0.233]
79 , 2009), by retaining the triple patterns expressing type constraints and relaxing all other triple patterns. [sent-345, score-0.668]
80 Relaxing a triple pattern was done by replacing all entities with variables and casting entity mentions into keywords that are attached to the relaxed triple pattern. [sent-346, score-0.659]
81 x bornIn Germany which produced no answers when run over the Yago2 knowledge base since the relation bornIn relates people to cities and not countries in Yago2. [sent-351, score-0.253]
82 This relaxed (and keywordaugmented) triple-pattern query was then processed the same way as triple-pattern queries without any keywords. [sent-357, score-0.407]
83 The results of such query were then ranked based on how well they match the keyword conditions specified in the relaxed query using the ranking model in (Elbassuoni et al. [sent-358, score-0.505]
84 Using this technique, the top ranked results for the relaxed query were all actors born in German cities as shown in Table 5. [sent-360, score-0.318]
85 After relaxation, the judges again assessed the results of the relaxed queries and determined whether they were satisfactory or not. [sent-361, score-0.28]
86 The number of additional queries that obtained satisfactory answers after relaxation are shown under #relaxed in Table 3. [sent-362, score-0.273]
87 , 2007; Voorhees, 2003) cast the user’s question into a keyword query to a Web search engine (perhaps with phrases for location and person names or other proper nouns). [sent-372, score-0.523]
88 A key element in Watson’s approach is to decompose complex questions into several cues and sub-cues, with the aim of generating answers from matches for the various cues (tapping into the Web and Wikipedia). [sent-377, score-0.267]
89 , 2007)) are used for both answering parts of questions that can be translated to structured form (Chu-Carroll et al. [sent-381, score-0.317]
90 The recent QALD-1 initiative (QAL, 2011) proposed a benchmark task to translate questions into SPARQL queries over linked-data sources like DB- pedia and MusicBrainz. [sent-384, score-0.342]
91 Earlier work on mapping questions into structured queries includes the work by Frank et al. [sent-393, score-0.5]
92 In Pythia, Unger and Cimiano (201 1) relied on an ontology-driven grammar for the question language so that questions could be directly mapped onto the vocabulary of the underlying ontology. [sent-398, score-0.326]
93 Nalix is an attempt to bring question answering to XML data (Li, Yang, and Jagadish, 2007) by mapping questions to XQuery expressions, relying on human interaction to resolve possible ambiguity. [sent-400, score-0.459]
94 (2012) developed a template-based approach based on Pythia, where questions are automatically mapped to structured queries in a two step process. [sent-402, score-0.452]
95 The latter is a compromise between full natural language and structured queries, where the user provides the structure and the system takes care of the disambiguation of keyword phrases. [sent-409, score-0.42]
96 In contrast to this prior work on related problems, our graph construction and 388 constraints are more complex, as we address the joint mapping of arbitrary phrases onto entities, classes, or relations. [sent-413, score-0.278]
97 7 Conclusions and Future Work We presented a method for translating naturallanguage questions into structured queries. [sent-420, score-0.232]
98 Our experimental studies showed very high precision and good coverage of the query translation, and good results in the actual question answers. [sent-424, score-0.278]
99 Second, queries sometimes return empty answers although they perfectly capture the original question, but the underlying data sources are incomplete or represent the relevant information in an unexpected manner. [sent-427, score-0.273]
100 We plan to extend our approach of combining structured data with textual descriptions, and generate queries that combine structured search predicates with keyword or phrase matching. [sent-428, score-0.485]
wordName wordTfidf (topN-words)
[('sparql', 0.349), ('casablanca', 0.297), ('triple', 0.242), ('disambiguation', 0.211), ('queries', 0.174), ('questions', 0.168), ('query', 0.166), ('inlinks', 0.157), ('elbassuoni', 0.14), ('marriedto', 0.122), ('relation', 0.113), ('question', 0.112), ('entities', 0.108), ('ilp', 0.108), ('keyword', 0.106), ('actedin', 0.105), ('phrases', 0.101), ('film', 0.1), ('items', 0.1), ('answers', 0.099), ('mapping', 0.094), ('married', 0.088), ('deanna', 0.087), ('qnl', 0.087), ('triploids', 0.087), ('unger', 0.087), ('semantic', 0.086), ('answering', 0.085), ('weikum', 0.082), ('dbpedia', 0.081), ('played', 0.078), ('phrase', 0.077), ('rome', 0.075), ('writer', 0.074), ('actress', 0.07), ('cimiano', 0.07), ('cohsem', 0.07), ('naga', 0.07), ('qmnd', 0.07), ('ramanath', 0.07), ('zkl', 0.07), ('classes', 0.069), ('patterns', 0.069), ('relaxed', 0.067), ('linked', 0.067), ('structured', 0.064), ('movie', 0.063), ('detected', 0.063), ('hoffart', 0.063), ('woody', 0.06), ('relations', 0.057), ('coherence', 0.056), ('actor', 0.054), ('gurobi', 0.054), ('suchanek', 0.053), ('placeholder', 0.052), ('pound', 0.052), ('pythia', 0.052), ('rdf', 0.052), ('simsem', 0.052), ('trel', 0.052), ('vq', 0.052), ('welty', 0.052), ('vs', 0.051), ('bornin', 0.05), ('graph', 0.048), ('vp', 0.047), ('mappings', 0.047), ('bases', 0.047), ('rel', 0.047), ('mapped', 0.046), ('yij', 0.045), ('actors', 0.045), ('web', 0.045), ('movies', 0.044), ('germany', 0.043), ('type', 0.042), ('base', 0.041), ('concept', 0.041), ('auer', 0.041), ('bizer', 0.041), ('ned', 0.041), ('born', 0.04), ('user', 0.039), ('judges', 0.039), ('person', 0.038), ('kalyanpur', 0.038), ('watson', 0.038), ('relaxing', 0.038), ('allen', 0.038), ('wikipedia', 0.036), ('edges', 0.036), ('constraints', 0.035), ('signature', 0.035), ('querying', 0.035), ('bollacker', 0.035), ('aggregations', 0.035), ('bhalotia', 0.035), ('freya', 0.035), ('harness', 0.035)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 97 emnlp-2012-Natural Language Questions for the Web of Data
Author: Mohamed Yahya ; Klaus Berberich ; Shady Elbassuoni ; Maya Ramanath ; Volker Tresp ; Gerhard Weikum
Abstract: The Linked Data initiative comprises structured databases in the Semantic-Web data model RDF. Exploring this heterogeneous data by structured query languages is tedious and error-prone even for skilled users. To ease the task, this paper presents a methodology for translating natural language questions into structured SPARQL queries over linked-data sources. Our method is based on an integer linear program to solve several disambiguation tasks jointly: the segmentation of questions into phrases; the mapping of phrases to semantic entities, classes, and relations; and the construction of SPARQL triple patterns. Our solution harnesses the rich type system provided by knowledge bases in the web of linked data, to constrain our semantic-coherence objective function. We present experiments on both the . in question translation and the resulting query answering.
2 0.15341368 98 emnlp-2012-No Noun Phrase Left Behind: Detecting and Typing Unlinkable Entities
Author: Thomas Lin ; Mausam ; Oren Etzioni
Abstract: Entity linking systems link noun-phrase mentions in text to their corresponding Wikipedia articles. However, NLP applications would gain from the ability to detect and type all entities mentioned in text, including the long tail of entities not prominent enough to have their own Wikipedia articles. In this paper we show that once the Wikipedia entities mentioned in a corpus of textual assertions are linked, this can further enable the detection and fine-grained typing of the unlinkable entities. Our proposed method for detecting unlinkable entities achieves 24% greater accuracy than a Named Entity Recognition baseline, and our method for fine-grained typing is able to propagate over 1,000 types from linked Wikipedia entities to unlinkable entities. Detection and typing of unlinkable entities can increase yield for NLP applications such as typed question answering.
3 0.14366268 103 emnlp-2012-PATTY: A Taxonomy of Relational Patterns with Semantic Types
Author: Ndapandula Nakashole ; Gerhard Weikum ; Fabian Suchanek
Abstract: This paper presents PATTY: a large resource for textual patterns that denote binary relations between entities. The patterns are semantically typed and organized into a subsumption taxonomy. The PATTY system is based on efficient algorithms for frequent itemset mining and can process Web-scale corpora. It harnesses the rich type system and entity population of large knowledge bases. The PATTY taxonomy comprises 350,569 pattern synsets. Random-sampling-based evaluation shows a pattern accuracy of 84.7%. PATTY has 8,162 subsumptions, with a random-sampling-based precision of 75%. The PATTY resource is freely available for interactive access and download.
4 0.14326216 136 emnlp-2012-Weakly Supervised Training of Semantic Parsers
Author: Jayant Krishnamurthy ; Tom Mitchell
Abstract: We present a method for training a semantic parser using only a knowledge base and an unlabeled text corpus, without any individually annotated sentences. Our key observation is that multiple forms ofweak supervision can be combined to train an accurate semantic parser: semantic supervision from a knowledge base, and syntactic supervision from dependencyparsed sentences. We apply our approach to train a semantic parser that uses 77 relations from Freebase in its knowledge representation. This semantic parser extracts instances of binary relations with state-of-theart accuracy, while simultaneously recovering much richer semantic structures, such as conjunctions of multiple relations with partially shared arguments. We demonstrate recovery of this richer structure by extracting logical forms from natural language queries against Freebase. On this task, the trained semantic parser achieves 80% precision and 56% recall, despite never having seen an annotated logical form.
5 0.13472712 110 emnlp-2012-Reading The Web with Learned Syntactic-Semantic Inference Rules
Author: Ni Lao ; Amarnag Subramanya ; Fernando Pereira ; William W. Cohen
Abstract: We study how to extend a large knowledge base (Freebase) by reading relational information from a large Web text corpus. Previous studies on extracting relational knowledge from text show the potential of syntactic patterns for extraction, but they do not exploit background knowledge of other relations in the knowledge base. We describe a distributed, Web-scale implementation of a path-constrained random walk model that learns syntactic-semantic inference rules for binary relations from a graph representation of the parsed text and the knowledge base. Experiments show significant accuracy improvements in binary relation prediction over methods that consider only text, or only the existing knowledge base.
6 0.12548713 84 emnlp-2012-Linking Named Entities to Any Database
7 0.12439799 137 emnlp-2012-Why Question Answering using Sentiment Analysis and Word Classes
8 0.11748607 41 emnlp-2012-Entity based QA Retrieval
9 0.11388045 20 emnlp-2012-Answering Opinion Questions on Products by Exploiting Hierarchical Organization of Consumer Reviews
10 0.11370666 40 emnlp-2012-Ensemble Semantics for Large-scale Unsupervised Relation Extraction
11 0.1035409 47 emnlp-2012-Explore Person Specific Evidence in Web Person Name Disambiguation
12 0.099724203 5 emnlp-2012-A Discriminative Model for Query Spelling Correction with Latent Structural SVM
13 0.098735042 23 emnlp-2012-Besting the Quiz Master: Crowdsourcing Incremental Classification Games
14 0.098470174 78 emnlp-2012-Learning Lexicon Models from Search Logs for Query Expansion
15 0.087525643 69 emnlp-2012-Joining Forces Pays Off: Multilingual Joint Word Sense Disambiguation
16 0.079624034 62 emnlp-2012-Identifying Constant and Unique Relations by using Time-Series Text
17 0.077899203 93 emnlp-2012-Multi-instance Multi-label Learning for Relation Extraction
18 0.073488697 42 emnlp-2012-Entropy-based Pruning for Phrase-based Machine Translation
19 0.071361557 19 emnlp-2012-An Entity-Topic Model for Entity Linking
20 0.070749573 112 emnlp-2012-Resolving Complex Cases of Definite Pronouns: The Winograd Schema Challenge
topicId topicWeight
[(0, 0.254), (1, 0.183), (2, -0.018), (3, 0.036), (4, 0.02), (5, -0.144), (6, 0.16), (7, 0.281), (8, -0.15), (9, -0.011), (10, 0.094), (11, -0.064), (12, 0.003), (13, -0.041), (14, 0.055), (15, -0.073), (16, 0.008), (17, 0.083), (18, 0.03), (19, -0.063), (20, 0.125), (21, 0.112), (22, -0.065), (23, 0.028), (24, 0.149), (25, 0.043), (26, 0.051), (27, 0.0), (28, -0.023), (29, 0.063), (30, -0.012), (31, 0.065), (32, 0.047), (33, -0.037), (34, -0.025), (35, -0.004), (36, -0.049), (37, -0.008), (38, -0.066), (39, 0.087), (40, 0.109), (41, -0.022), (42, 0.097), (43, -0.035), (44, 0.045), (45, -0.044), (46, -0.081), (47, 0.018), (48, 0.067), (49, -0.004)]
simIndex simValue paperId paperTitle
same-paper 1 0.96964478 97 emnlp-2012-Natural Language Questions for the Web of Data
Author: Mohamed Yahya ; Klaus Berberich ; Shady Elbassuoni ; Maya Ramanath ; Volker Tresp ; Gerhard Weikum
Abstract: The Linked Data initiative comprises structured databases in the Semantic-Web data model RDF. Exploring this heterogeneous data by structured query languages is tedious and error-prone even for skilled users. To ease the task, this paper presents a methodology for translating natural language questions into structured SPARQL queries over linked-data sources. Our method is based on an integer linear program to solve several disambiguation tasks jointly: the segmentation of questions into phrases; the mapping of phrases to semantic entities, classes, and relations; and the construction of SPARQL triple patterns. Our solution harnesses the rich type system provided by knowledge bases in the web of linked data, to constrain our semantic-coherence objective function. We present experiments on both the . in question translation and the resulting query answering.
2 0.68553442 103 emnlp-2012-PATTY: A Taxonomy of Relational Patterns with Semantic Types
Author: Ndapandula Nakashole ; Gerhard Weikum ; Fabian Suchanek
Abstract: This paper presents PATTY: a large resource for textual patterns that denote binary relations between entities. The patterns are semantically typed and organized into a subsumption taxonomy. The PATTY system is based on efficient algorithms for frequent itemset mining and can process Web-scale corpora. It harnesses the rich type system and entity population of large knowledge bases. The PATTY taxonomy comprises 350,569 pattern synsets. Random-sampling-based evaluation shows a pattern accuracy of 84.7%. PATTY has 8,162 subsumptions, with a random-sampling-based precision of 75%. The PATTY resource is freely available for interactive access and download.
3 0.63109916 98 emnlp-2012-No Noun Phrase Left Behind: Detecting and Typing Unlinkable Entities
Author: Thomas Lin ; Mausam ; Oren Etzioni
Abstract: Entity linking systems link noun-phrase mentions in text to their corresponding Wikipedia articles. However, NLP applications would gain from the ability to detect and type all entities mentioned in text, including the long tail of entities not prominent enough to have their own Wikipedia articles. In this paper we show that once the Wikipedia entities mentioned in a corpus of textual assertions are linked, this can further enable the detection and fine-grained typing of the unlinkable entities. Our proposed method for detecting unlinkable entities achieves 24% greater accuracy than a Named Entity Recognition baseline, and our method for fine-grained typing is able to propagate over 1,000 types from linked Wikipedia entities to unlinkable entities. Detection and typing of unlinkable entities can increase yield for NLP applications such as typed question answering.
4 0.63015145 41 emnlp-2012-Entity based QA Retrieval
Author: Amit Singh
Abstract: Bridging the lexical gap between the user’s question and the question-answer pairs in the Q&A; archives has been a major challenge for Q&A; retrieval. State-of-the-art approaches address this issue by implicitly expanding the queries with additional words using statistical translation models. While useful, the effectiveness of these models is highly dependant on the availability of quality corpus in the absence of which they are troubled by noise issues. Moreover these models perform word based expansion in a context agnostic manner resulting in translation that might be mixed and fairly general. This results in degraded retrieval performance. In this work we address the above issues by extending the lexical word based translation model to incorporate semantic concepts (entities). We explore strategies to learn the translation probabilities between words and the concepts using the Q&A; archives and a popular entity catalog. Experiments conducted on a large scale real data show that the proposed techniques are promising.
5 0.61419541 110 emnlp-2012-Reading The Web with Learned Syntactic-Semantic Inference Rules
Author: Ni Lao ; Amarnag Subramanya ; Fernando Pereira ; William W. Cohen
Abstract: We study how to extend a large knowledge base (Freebase) by reading relational information from a large Web text corpus. Previous studies on extracting relational knowledge from text show the potential of syntactic patterns for extraction, but they do not exploit background knowledge of other relations in the knowledge base. We describe a distributed, Web-scale implementation of a path-constrained random walk model that learns syntactic-semantic inference rules for binary relations from a graph representation of the parsed text and the knowledge base. Experiments show significant accuracy improvements in binary relation prediction over methods that consider only text, or only the existing knowledge base.
6 0.5190376 137 emnlp-2012-Why Question Answering using Sentiment Analysis and Word Classes
7 0.49535665 23 emnlp-2012-Besting the Quiz Master: Crowdsourcing Incremental Classification Games
8 0.45529774 78 emnlp-2012-Learning Lexicon Models from Search Logs for Query Expansion
9 0.45179728 136 emnlp-2012-Weakly Supervised Training of Semantic Parsers
10 0.44979969 40 emnlp-2012-Ensemble Semantics for Large-scale Unsupervised Relation Extraction
11 0.44256637 100 emnlp-2012-Open Language Learning for Information Extraction
12 0.42604995 20 emnlp-2012-Answering Opinion Questions on Products by Exploiting Hierarchical Organization of Consumer Reviews
13 0.39860907 62 emnlp-2012-Identifying Constant and Unique Relations by using Time-Series Text
14 0.36914608 47 emnlp-2012-Explore Person Specific Evidence in Web Person Name Disambiguation
15 0.345723 10 emnlp-2012-A Statistical Relational Learning Approach to Identifying Evidence Based Medicine Categories
16 0.34211081 84 emnlp-2012-Linking Named Entities to Any Database
17 0.34042099 69 emnlp-2012-Joining Forces Pays Off: Multilingual Joint Word Sense Disambiguation
18 0.33774191 5 emnlp-2012-A Discriminative Model for Query Spelling Correction with Latent Structural SVM
19 0.33116269 30 emnlp-2012-Constructing Task-Specific Taxonomies for Document Collection Browsing
20 0.32361385 93 emnlp-2012-Multi-instance Multi-label Learning for Relation Extraction
topicId topicWeight
[(2, 0.038), (16, 0.034), (24, 0.086), (25, 0.02), (34, 0.058), (60, 0.104), (63, 0.309), (64, 0.016), (65, 0.043), (70, 0.016), (73, 0.02), (74, 0.021), (76, 0.033), (80, 0.019), (86, 0.013), (94, 0.028), (95, 0.046)]
simIndex simValue paperId paperTitle
1 0.94886768 100 emnlp-2012-Open Language Learning for Information Extraction
Author: Mausam ; Michael Schmitz ; Stephen Soderland ; Robert Bart ; Oren Etzioni
Abstract: Open Information Extraction (IE) systems extract relational tuples from text, without requiring a pre-specified vocabulary, by identifying relation phrases and associated arguments in arbitrary sentences. However, stateof-the-art Open IE systems such as REVERB and WOE share two important weaknesses (1) they extract only relations that are mediated by verbs, and (2) they ignore context, thus extracting tuples that are not asserted as factual. This paper presents OLLIE, a substantially improved Open IE system that addresses both these limitations. First, OLLIE achieves high yield by extracting relations mediated by nouns, adjectives, and more. Second, a context-analysis step increases precision by including contextual information from the sentence in the extractions. OLLIE obtains 2.7 times the area under precision-yield curve (AUC) compared to REVERB and 1.9 times the AUC of WOEparse. –
2 0.94708854 94 emnlp-2012-Multiple Aspect Summarization Using Integer Linear Programming
Author: Kristian Woodsend ; Mirella Lapata
Abstract: Multi-document summarization involves many aspects of content selection and surface realization. The summaries must be informative, succinct, grammatical, and obey stylistic writing conventions. We present a method where such individual aspects are learned separately from data (without any hand-engineering) but optimized jointly using an integer linear programme. The ILP framework allows us to combine the decisions of the expert learners and to select and rewrite source content through a mixture of objective setting, soft and hard constraints. Experimental results on the TAC-08 data set show that our model achieves state-of-the-art performance using ROUGE and significantly improves the informativeness of the summaries.
3 0.94556749 90 emnlp-2012-Modelling Sequential Text with an Adaptive Topic Model
Author: Lan Du ; Wray Buntine ; Huidong Jin
Abstract: Topic models are increasingly being used for text analysis tasks, often times replacing earlier semantic techniques such as latent semantic analysis. In this paper, we develop a novel adaptive topic model with the ability to adapt topics from both the previous segment and the parent document. For this proposed model, a Gibbs sampler is developed for doing posterior inference. Experimental results show that with topic adaptation, our model significantly improves over existing approaches in terms of perplexity, and is able to uncover clear sequential structure on, for example, Herman Melville’s book “Moby Dick”.
4 0.93123716 17 emnlp-2012-An "AI readability" Formula for French as a Foreign Language
Author: Thomas Francois ; Cedrick Fairon
Abstract: This paper present a new readability formula for French as a foreign language (FFL), which relies on 46 textual features representative of the lexical, syntactic, and semantic levels as well as some of the specificities of the FFL context. We report comparisons between several techniques for feature selection and various learning algorithms. Our best model, based on support vector machines (SVM), significantly outperforms previous FFL formulas. We also found that semantic features behave poorly in our case, in contrast with some previous readability studies on English as a first language.
same-paper 5 0.93085837 97 emnlp-2012-Natural Language Questions for the Web of Data
Author: Mohamed Yahya ; Klaus Berberich ; Shady Elbassuoni ; Maya Ramanath ; Volker Tresp ; Gerhard Weikum
Abstract: The Linked Data initiative comprises structured databases in the Semantic-Web data model RDF. Exploring this heterogeneous data by structured query languages is tedious and error-prone even for skilled users. To ease the task, this paper presents a methodology for translating natural language questions into structured SPARQL queries over linked-data sources. Our method is based on an integer linear program to solve several disambiguation tasks jointly: the segmentation of questions into phrases; the mapping of phrases to semantic entities, classes, and relations; and the construction of SPARQL triple patterns. Our solution harnesses the rich type system provided by knowledge bases in the web of linked data, to constrain our semantic-coherence objective function. We present experiments on both the . in question translation and the resulting query answering.
6 0.78369731 20 emnlp-2012-Answering Opinion Questions on Products by Exploiting Hierarchical Organization of Consumer Reviews
7 0.75728595 103 emnlp-2012-PATTY: A Taxonomy of Relational Patterns with Semantic Types
8 0.756235 8 emnlp-2012-A Phrase-Discovering Topic Model Using Hierarchical Pitman-Yor Processes
9 0.7489866 115 emnlp-2012-SSHLDA: A Semi-Supervised Hierarchical Topic Model
10 0.69528687 44 emnlp-2012-Excitatory or Inhibitory: A New Semantic Orientation Extracts Contradiction and Causality from the Web
11 0.68707979 89 emnlp-2012-Mixed Membership Markov Models for Unsupervised Conversation Modeling
12 0.68395817 42 emnlp-2012-Entropy-based Pruning for Phrase-based Machine Translation
13 0.68314981 33 emnlp-2012-Discovering Diverse and Salient Threads in Document Collections
14 0.68063682 128 emnlp-2012-Translation Model Based Cross-Lingual Language Model Adaptation: from Word Models to Phrase Models
15 0.67761576 124 emnlp-2012-Three Dependency-and-Boundary Models for Grammar Induction
16 0.67746365 14 emnlp-2012-A Weakly Supervised Model for Sentence-Level Semantic Orientation Analysis with Multiple Experts
17 0.67551327 19 emnlp-2012-An Entity-Topic Model for Entity Linking
18 0.66966814 23 emnlp-2012-Besting the Quiz Master: Crowdsourcing Incremental Classification Games
19 0.66945052 3 emnlp-2012-A Coherence Model Based on Syntactic Patterns
20 0.6684739 114 emnlp-2012-Revisiting the Predictability of Language: Response Completion in Social Media