acl acl2011 acl2011-191 knowledge-graph by maker-knowledge-mining

191 acl-2011-Knowledge Base Population: Successful Approaches and Challenges


Source: pdf

Author: Heng Ji ; Ralph Grishman

Abstract: In this paper we give an overview of the Knowledge Base Population (KBP) track at the 2010 Text Analysis Conference. The main goal of KBP is to promote research in discovering facts about entities and augmenting a knowledge base (KB) with these facts. This is done through two tasks, Entity Linking linking names in context to entities in the KB and Slot Filling – adding information about an entity to the KB. A large source collection of newswire and web documents is provided from which systems are to discover information. Attributes (“slots”) derived from Wikipedia infoboxes are used to create the reference KB. In this paper we provide an overview of the techniques which can serve as a basis for a good KBP system, lay out the – – remaining challenges by comparison with traditional Information Extraction (IE) and Question Answering (QA) tasks, and provide some suggestions to address these challenges. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 The main goal of KBP is to promote research in discovering facts about entities and augmenting a knowledge base (KB) with these facts. [sent-5, score-0.238]

2 This is done through two tasks, Entity Linking linking names in context to entities in the KB and Slot Filling – adding information about an entity to the KB. [sent-6, score-0.586]

3 A large source collection of newswire and web documents is provided from which systems are to discover information. [sent-7, score-0.071]

4 In this paper we provide an overview of the techniques which can serve as a basis for a good KBP system, lay out the – – remaining challenges by comparison with traditional Information Extraction (IE) and Question Answering (QA) tasks, and provide some suggestions to address these challenges. [sent-9, score-0.089]

5 In practice, however, we may need to gather information about a person or organization that is scattered among the documents of a large collection. [sent-11, score-0.093]

6 On the other hand, traditional Question Answering (QA) evaluations made limited efforts at disambiguating entities in queries (e. [sent-17, score-0.274]

7 , 2006), and limited use of relation/event extraction in answer search (e. [sent-20, score-0.121]

8 The Knowledge Base Population (KBP) shared task, conducted as part of the NIST Text Analysis Conference, aims to address and evaluate these capabilities, and bridge the IE and QA communities to promote research in discovering facts about entities and expanding a knowledge base with these facts. [sent-24, score-0.238]

9 In this paper we aim to answer some of these questions based on our detailed analysis of evaluation results. [sent-33, score-0.086]

10 The overall goal of KBP is to automatically identify salient and novel entities, link them to corresponding Knowledge Base (KB) entries (if the linkage exists), then discover attributes about the entities, and finally expand the KB with any new attributes. [sent-37, score-0.12]

11 The background document, drawn from the KBP corpus, serves to disambiguate ambiguous name strings. [sent-39, score-0.094]

12 The goal of Slot Filling is to collect from the corpus information regarding certain attributes of an entity, which may be a person or some type of organization. [sent-43, score-0.122]

13 Attributes are excluded if they are already filled in the reference data base and can only take on a single value. [sent-45, score-0.11]

14 Along with each slot fill, the system must provide the ID of a document which supports the correctness of this fill. [sent-46, score-0.476]

15 If the corpus does not provide any information for a given attribute, the system should generate a NIL response (and no document ID). [sent-47, score-0.115]

16 KBP2010 defined 26 types of attributes for persons (such as the age, birthplace, spouse, children, job title, and employing organization) and 16 types of attributes for organizations (such as the top employees, the founder, the year founded, the headquarters location, and subsidiar1149 ies). [sent-48, score-0.178]

17 Some of these attributes are specified as only taking a single value (e. [sent-49, score-0.089]

18 The reference KB includes hundreds of thousands of entities based on articles from an October 2008 dump of English Wikipedia which includes 818,741 nodes. [sent-54, score-0.125]

19 The source collection includes 1,286,609 newswire documents, 490,596 web documents and hundreds of transcribed spoken documents. [sent-55, score-0.071]

20 To score Entity Linking, we take each query and check whether the KB node ID (or NIL) returned by a system is correct or not. [sent-56, score-0.177]

21 To score Slot Filling, we first pool all the system responses (as is done for information retrieval evaluations) together with a set of manuallyprepared slot fills. [sent-58, score-0.468]

22 Each system response is rated as correct, wrong, or redundant (a response which is equivalent to another response for the same slot or an entry already in the knowledge base). [sent-61, score-0.587]

23 When measured against a benchmark based on inter-annotator agreement, two systems’ performance approached and one system exceeded the benchmark on person entities. [sent-64, score-0.099]

24 1 A General Architecture – A typical entity linking system architecture is depicted in Figure 1. [sent-66, score-0.527]

25 GAen sewrearl Enti y Linking System Architecture It includes three steps: (1) query expansion expand the query into a richer set of forms using Wikipedia structure mining or coreference resolution in the background document. [sent-68, score-0.433]

26 (2) candidate generation finding all possible KB entries that a query might link to; (3) candidate ranking rank the probabilities of all candidates and NIL answer. [sent-69, score-0.204]

27 In the – – – following subsections we will highlight the new and effective techniques used in entity linking. [sent-71, score-0.228]

28 Such information provides additional sources for entity linking: (1). [sent-75, score-0.228]

29 , 2010) used Wikipedia link structure (source, anchors, redirects and disambiguation) to extend the KB and compute entity cooccurrence estimates. [sent-77, score-0.259]

30 Many other teams including CUNY and Siel used redirect pages and disambiguation pages for query expansion. [sent-78, score-0.14]

31 The Siel team also exploited bold texts from first paragraphs because they often contain nicknames, alias names and full names. [sent-79, score-0.096]

32 In fact, when the mined attributes become rich enough, they can be used as an expanded query and sent into an information retrieval engine in order to obtain the relevant source documents. [sent-85, score-0.229]

33 3 Ranking Approach Comparison The ranking approaches exploited in the KBP2010 entity linking systems can be generally categorized into four types: (1). [sent-89, score-0.544]

34 Supervised learning, in which a pair of entity and KB node is modeled as an instance for classification. [sent-93, score-0.228]

35 Graph-based ranking, in which context entities are taken into account in order to reach a global optimized solution together with the query entity. [sent-96, score-0.265]

36 IR (Information Retrieval) approach, in which the entire background source document is considered as a single query to retrieve the most relevant Wikipedia article. [sent-98, score-0.216]

37 Among the 16 entity linking systems which participated in the regular evaluation, LCC (Lehmann et al. [sent-100, score-0.461]

38 However, a highperforming entity linking system can also be implemented in an unsupervised fashion by exploiting effective characteristics and algorithms, as we will discuss in the next sections. [sent-106, score-0.498]

39 4 Semantic Relation Features Almost all entity linking systems have used semantic relations as features (e. [sent-108, score-0.461]

40 The semantic features used in the BuptPris system include name tagging, infoboxes, synonyms, variants and abbreviations. [sent-113, score-0.095]

41 In the CUNY system, the semantic features are automatically extracted from their slot filling system. [sent-114, score-0.625]

42 As we can see, except for person entities in the BuptPris system, all types of entities have obtained significant improvement by using semantic features in entity linking. [sent-116, score-0.511]

43 5 Context Inference In the current setting of KBP, a set of target entities is provided to each system in order to simplify the task and its evaluation, because it’s not feasible to require a system to generate answers for all possible entities in the entire source collection. [sent-123, score-0.38]

44 However, ideally a fully-automatic KBP system should be able to automatically discover novel entities (“queries”) which have no KB entry or few slot fills in the KB, extract their attributes, and conduct global reasoning over these attributes in order to generate the final output. [sent-124, score-0.696]

45 At the very least, due to the semantic coherence principle (McNamara, 2001), the information of an entity depends on the information of other entities. [sent-125, score-0.228]

46 For example, the WebTLab team and the CMCRC team extracted all entities in the context of a given query, and disambiguated all entities at the same time using a PageRank-like algorithm (Page et al. [sent-126, score-0.342]

47 The SMU-SIS team (Gottipati and Jiang, 2010) re-formulated queries using contexts. [sent-128, score-0.153]

48 The research on cross-document entity coreference resolution can be traced back to the Web People Search task (Artiles et al. [sent-135, score-0.345]

49 Compared to WePS and ACE, KBP requires linking an entity mention in a source document to a knowledge base with or without Wikipedia articles. [sent-139, score-0.644]

50 Therefore sometimes the linking decisions heavily rely on entity profile comparison with Wikipedia infoboxes. [sent-140, score-0.461]

51 In source documents, especially in web data, usually few explicit attributes about GPE entities are provided, so an entity linking system also needs to conduct external knowledge discovery from background related documents or hyperlink mining. [sent-142, score-0.891]

52 2 Analysis of Difficult Queries There are 2250 queries in the Entity Linking evaluation; for 58 of them at most 5 (out of the 46) system runs produced correct answers. [sent-144, score-0.144]

53 For 19 queries all 46 systems produced different results from the answer key. [sent-146, score-0.193]

54 Interestingly, the systems which perform well on the difficult queries are not necessarily those achieved top overall performance they were ranked 13rd, 6th, 5th, 12nd, 10th, and 16th respectively for overall queries. [sent-147, score-0.107]

55 11 queries are highly ambiguous city names which can exist in many states or countries (e. [sent-148, score-0.107]

56 From these most – difficult queries we observed the following challenges and possible solutions. [sent-151, score-0.154]

57 For 6 queries, a system would need to interpret or extract attributes for their context entities. [sent-153, score-0.126]

58 In the following example, a system is required to capture the knowledge that “Chinese Christian man” normally appears in “China” or there is a “Mission School” in “Canton, China” in order to link the query “Canton” to the correct KB entry. [sent-155, score-0.245]

59 This is a very difficult query also because the more common way of spelling “Canton” in China is “Guangdong”. [sent-156, score-0.14]

60 For example, in the source document “… Filed under: Falcons ”, a system will need to analyze the document which this hyperlink refers to. [sent-158, score-0.152]

61 Such cases might require new query reformulation and cross-document aggregation techniques, which are both beyond traditional entity disambiguation paradigms. [sent-159, score-0.41]

62 • Require Entity Salience Ranking … Some of these queries represent salient entities and so using web popularity rank (e. [sent-160, score-0.274]

63 In fact we found that a naïve candidate ranking approach based on web popularity alone can achieve 71% micro-averaged accuracy, which is better than 24 system runs in KBP2010. [sent-165, score-0.112]

64 Since the web information is used as a black box (including query expansion and query log analysis) which changes over time, it’s more difficult to duplicate research results. [sent-166, score-0.322]

65 However, gazetteers with entities ranked by salience or major entities marked are worth encoding as additional features. [sent-167, score-0.25]

66 Louis after its flight crew detected mechanical problems… … … although there is little background information to decide where the query “St Louis” is located, a sys- tem can rely on such a major city list to generate the correct linking. [sent-172, score-0.176]

67 Similarly, if a system knows that “Georgia Institute of Technology” has higher salience than “Georgian Technical University”, it can correctly link a query “Georgia Tech” in most cases. [sent-173, score-0.208]

68 1 Slot Filling: What Works A General Architecture The slot-filling task is a hybrid of traditional IE (a fixed set of relations) and QA (responding to a query, generating a unified response from a large collection). [sent-175, score-0.08]

69 The basic system structure (Figure 2) involved three phases: document/passage retrieval (retrieving passages involving the queried entity), answer extraction (getting specific answers from the retrieved passages), and answer combination (merging and selecting among the answers extracted). [sent-178, score-0.388]

70 The solutions adopted for answer extraction reflected the range of current IE methods as well as QA answer extraction techniques (see Table 3). [sent-179, score-0.242]

71 Slot Filling Answer Extraction Method Comparison On the other hand, there were a lot of 'facts' available pairs of entities bearing a relationship corresponding closely to the KBP relations in the form of filled Wikipedia infoboxes. [sent-187, score-0.159]

72 However, such instances are noisy if a pair of entities participates in more than one relation, the found instance may not be an example of the intended relation and so some filtering of the instances or resulting patterns may be needed. [sent-189, score-0.125]

73 Several sites used such distant supervision to acquire patterns or train classifiers, in some cases combined with direct supervision using the training data (Chrupala et al. [sent-190, score-0.098]

74 Mapping the ACE relations and events by themselves provided limited coverage (34% of slot fills in the training data), but was helpful when combined with other sources (e. [sent-193, score-0.399]

75 , 2010) extended their mention detection component to cover 36 entity types which include many non-ACE types; and added new relation types between entities and event anchors. [sent-200, score-0.383]

76 For example, IBM, NYU (Grishman and Min, 2010) and CUNY exploited entity coreference in pattern learning and reasoning. [sent-203, score-0.354]

77 It is also notable that traditional extrac- tion components trained from newswire data suffer from noise in web data. [sent-204, score-0.084]

78 The main motivation of the KBP program is to automatically distill information from news and web unstructured data instead of manually constructed knowledge bases, but these existing knowledge bases can provide a large number of seed tuples to bootstrap slot filling or guide distant learning. [sent-213, score-0.781]

79 For example, CUNY exploited Freebase and LCC exploited DBpedia as fact validation in slot filling. [sent-215, score-0.499]

80 For example, while Freebase contains 116 million instances of 7,300 relations for 9 million entities, it only covers 48% of the slot types and 5% of the slot answers in KBP2010 evaluation data. [sent-217, score-0.854]

81 Therefore, both CUNY and LCC observed limited gains from the answer validation approach from Freebase. [sent-218, score-0.086]

82 3 Cross-Slot and Cross-Query Reasoning Slot Filling can also benefit from extracting revertible queries from the context of any target query, and conducting global ranking or reasoning to refine the results. [sent-221, score-0.251]

83 CUNY and IBM developed recursive reasoning components to refine extraction results. [sent-222, score-0.081]

84 For a given query, if there are no other related answer candidates available, they built "revertible” queries in the contexts, similar to (Prager et al. [sent-223, score-0.193]

85 For example, if a is extracted as the answer for org:subsidiaries of the query q, we can consider a as a new revertible query and verify that a org:parents answer of a is q. [sent-225, score-0.517]

86 6 Slot Filling: Remaining Challenges Slot filling remains a very challenging task; only one system exceeded 30% F-measure on the 2010 evaluation. [sent-230, score-0.292]

87 During the 2010 evaluation data annotation/adjudication process, an initial answer key annotation was created by a manual search of the corpus (resulting in 797 instances), and then an independent adjudication pass was applied to assess these annotations together with pooled system responses. [sent-231, score-0.123]

88 Most of the shortfall in system performance reflects inadequacies in the answer extraction stage, reflecting limitations in the current state-of-the-art in information extraction. [sent-234, score-0.158]

89 An analysis of the 2010 training data shows that cross-sentence coreference and some types of inference are critical to slot filling. [sent-235, score-0.475]

90 4% of the cases do the entity name and slot fill appear together in the same sentence, 1155 so a system which processes sentences in isolation is severely limited in its performance. [sent-237, score-0.722]

91 In the KBP slot filling task, slots are often dependent on each other, so we can improve the results by improving the “coherence” of the story (i. [sent-244, score-0.704]

92 Also, some of the constraints of ACE relation mention extraction – notably, that both arguments are present in the same sentence – are not present, making the role of coreference and cross-sentence inference more critical. [sent-253, score-0.141]

93 The role of coreference and inference as limiting factors, while generally recognized, is emphasized by examining the 163 slot values that the human annotators filled but that none of the systems were able to get correct. [sent-254, score-0.509]

94 While the types of inferences which may be required is open-ended, certain types come up repeatedly, reflecting the types of slots to be filled: systems would benefit from specialists which are able to reason about times, locations, family relationships, and employment relationships. [sent-256, score-0.079]

95 … … 7 Toward System Combination The increasing number of diverse approaches based on different resources provide new opportunities for both entity linking and slot filling tasks to benefit from system combination. [sent-257, score-1.123]

96 The NUSchime entity linking system trained a SVM based re-scoring model to combine two individual pipelines. [sent-258, score-0.498]

97 We also applied a voting approach on the top 9 entity linking systems and found that all combination orders achieved significant gains, with the highest absolute improvement of 4. [sent-264, score-0.461]

98 7% in micro-averaged accuracy over the top entity linking system. [sent-265, score-0.461]

99 The CUNY slot filling system trained a maximum-entropy-based re-ranking model to combine three individual pipelines, based on various global features including voting and dependency relations. [sent-266, score-0.662]

100 When we applied the same re-ranking approach to the slot filling systems which were ranked from the 2nd to 14th, we achieved 4. [sent-272, score-0.625]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('slot', 0.399), ('kbp', 0.369), ('tac', 0.298), ('linking', 0.233), ('entity', 0.228), ('filling', 0.226), ('cuny', 0.186), ('kb', 0.184), ('query', 0.14), ('entities', 0.125), ('qa', 0.118), ('lcc', 0.114), ('moder', 0.114), ('queries', 0.107), ('wikipedia', 0.101), ('roberts', 0.097), ('attributes', 0.089), ('answer', 0.086), ('buptpris', 0.081), ('slots', 0.079), ('coreference', 0.076), ('base', 0.076), ('mcnamee', 0.074), ('ie', 0.067), ('canton', 0.065), ('danny', 0.065), ('lahoud', 0.065), ('revertible', 0.065), ('webtlab', 0.065), ('population', 0.058), ('name', 0.058), ('answers', 0.056), ('henry', 0.054), ('ace', 0.053), ('exploited', 0.05), ('gpe', 0.049), ('emile', 0.049), ('brentwood', 0.049), ('jake', 0.049), ('challenges', 0.047), ('team', 0.046), ('reasoning', 0.046), ('hyperlinks', 0.045), ('freebase', 0.045), ('nil', 0.043), ('dbpedia', 0.043), ('lehmann', 0.043), ('artiles', 0.043), ('traditional', 0.042), ('web', 0.042), ('resolution', 0.041), ('document', 0.04), ('distant', 0.04), ('infoboxes', 0.04), ('response', 0.038), ('knowledge', 0.037), ('hltcoe', 0.037), ('louis', 0.037), ('system', 0.037), ('background', 0.036), ('hyperlink', 0.035), ('extraction', 0.035), ('filled', 0.034), ('ibm', 0.033), ('person', 0.033), ('ranking', 0.033), ('angel', 0.032), ('birthplace', 0.032), ('budapestacad', 0.032), ('bysani', 0.032), ('castelli', 0.032), ('childof', 0.032), ('chrupala', 0.032), ('cmcrc', 0.032), ('gottipati', 0.032), ('ihj', 0.032), ('nemeskey', 0.032), ('nuschime', 0.032), ('pizzato', 0.032), ('siel', 0.032), ('weps', 0.032), ('passages', 0.032), ('responses', 0.032), ('id', 0.032), ('florian', 0.031), ('link', 0.031), ('organization', 0.031), ('passage', 0.03), ('mention', 0.03), ('documents', 0.029), ('supervision', 0.029), ('architecture', 0.029), ('children', 0.029), ('auer', 0.029), ('georgia', 0.029), ('bollacker', 0.029), ('exceeded', 0.029), ('fernandez', 0.029), ('mission', 0.029), ('monday', 0.029)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000005 191 acl-2011-Knowledge Base Population: Successful Approaches and Challenges

Author: Heng Ji ; Ralph Grishman

Abstract: In this paper we give an overview of the Knowledge Base Population (KBP) track at the 2010 Text Analysis Conference. The main goal of KBP is to promote research in discovering facts about entities and augmenting a knowledge base (KB) with these facts. This is done through two tasks, Entity Linking linking names in context to entities in the KB and Slot Filling – adding information about an entity to the KB. A large source collection of newswire and web documents is provided from which systems are to discover information. Attributes (“slots”) derived from Wikipedia infoboxes are used to create the reference KB. In this paper we provide an overview of the techniques which can serve as a basis for a good KBP system, lay out the – – remaining challenges by comparison with traditional Information Extraction (IE) and Question Answering (QA) tasks, and provide some suggestions to address these challenges. 1

2 0.31664035 12 acl-2011-A Generative Entity-Mention Model for Linking Entities with Knowledge Base

Author: Xianpei Han ; Le Sun

Abstract: Linking entities with knowledge base (entity linking) is a key issue in bridging the textual data with the structural knowledge base. Due to the name variation problem and the name ambiguity problem, the entity linking decisions are critically depending on the heterogenous knowledge of entities. In this paper, we propose a generative probabilistic model, called entitymention model, which can leverage heterogenous entity knowledge (including popularity knowledge, name knowledge and context knowledge) for the entity linking task. In our model, each name mention to be linked is modeled as a sample generated through a three-step generative story, and the entity knowledge is encoded in the distribution of entities in document P(e), the distribution of possible names of a specific entity P(s|e), and the distribution of possible contexts of a specific entity P(c|e). To find the referent entity of a name mention, our method combines the evidences from all the three distributions P(e), P(s|e) and P(c|e). Experimental results show that our method can significantly outperform the traditional methods. 1

3 0.29705706 128 acl-2011-Exploring Entity Relations for Named Entity Disambiguation

Author: Danuta Ploch

Abstract: Named entity disambiguation is the task of linking an entity mention in a text to the correct real-world referent predefined in a knowledge base, and is a crucial subtask in many areas like information retrieval or topic detection and tracking. Named entity disambiguation is challenging because entity mentions can be ambiguous and an entity can be referenced by different surface forms. We present an approach that exploits Wikipedia relations between entities co-occurring with the ambiguous form to derive a range of novel features for classifying candidate referents. We find that our features improve disambiguation results significantly over a strong popularity baseline, and are especially suitable for recognizing entities not contained in the knowledge base. Our system achieves state-of-the-art results on the TAC-KBP 2009 dataset.

4 0.15346737 293 acl-2011-Template-Based Information Extraction without the Templates

Author: Nathanael Chambers ; Dan Jurafsky

Abstract: Standard algorithms for template-based information extraction (IE) require predefined template schemas, and often labeled data, to learn to extract their slot fillers (e.g., an embassy is the Target of a Bombing template). This paper describes an approach to template-based IE that removes this requirement and performs extraction without knowing the template structure in advance. Our algorithm instead learns the template structure automatically from raw text, inducing template schemas as sets of linked events (e.g., bombings include detonate, set off, and destroy events) associated with semantic roles. We also solve the standard IE task, using the induced syntactic patterns to extract role fillers from specific documents. We evaluate on the MUC-4 terrorism dataset and show that we induce template structure very similar to handcreated gold structure, and we extract role fillers with an F1 score of .40, approaching the performance of algorithms that require full knowledge of the templates.

5 0.14387567 129 acl-2011-Extending the Entity Grid with Entity-Specific Features

Author: Micha Elsner ; Eugene Charniak

Abstract: We extend the popular entity grid representation for local coherence modeling. The grid abstracts away information about the entities it models; we add discourse prominence, named entity type and coreference features to distinguish between important and unimportant entities. We improve the best result for WSJ document discrimination by 6%.

6 0.14356142 196 acl-2011-Large-Scale Cross-Document Coreference Using Distributed Inference and Hierarchical Models

7 0.14052349 181 acl-2011-Jigs and Lures: Associating Web Queries with Structured Entities

8 0.13992317 114 acl-2011-End-to-End Relation Extraction Using Distant Supervision from External Semantic Repositories

9 0.13739912 182 acl-2011-Joint Annotation of Search Queries

10 0.12715833 137 acl-2011-Fine-Grained Class Label Markup of Search Queries

11 0.12441836 256 acl-2011-Query Weighting for Ranking Model Adaptation

12 0.12333924 258 acl-2011-Ranking Class Labels Using Query Sessions

13 0.11428124 328 acl-2011-Using Cross-Entity Inference to Improve Event Extraction

14 0.11342707 271 acl-2011-Search in the Lost Sense of "Query": Question Formulation in Web Search Queries and its Temporal Changes

15 0.10193389 23 acl-2011-A Pronoun Anaphora Resolution System based on Factorial Hidden Markov Models

16 0.10127076 117 acl-2011-Entity Set Expansion using Topic information

17 0.099141665 65 acl-2011-Can Document Selection Help Semi-supervised Learning? A Case Study On Event Extraction

18 0.099138252 86 acl-2011-Coreference for Learning to Extract Relations: Yes Virginia, Coreference Matters

19 0.098550647 213 acl-2011-Local and Global Algorithms for Disambiguation to Wikipedia

20 0.097902991 277 acl-2011-Semi-supervised Relation Extraction with Large-scale Word Clustering


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.193), (1, 0.112), (2, -0.218), (3, 0.113), (4, 0.062), (5, -0.089), (6, -0.049), (7, -0.206), (8, -0.134), (9, -0.006), (10, 0.109), (11, 0.022), (12, -0.128), (13, -0.125), (14, 0.069), (15, 0.054), (16, 0.148), (17, 0.004), (18, 0.013), (19, 0.0), (20, 0.026), (21, -0.023), (22, 0.022), (23, -0.011), (24, 0.043), (25, 0.01), (26, -0.014), (27, -0.01), (28, 0.007), (29, 0.092), (30, -0.05), (31, -0.026), (32, 0.017), (33, -0.011), (34, 0.081), (35, 0.095), (36, -0.001), (37, -0.037), (38, 0.073), (39, 0.031), (40, -0.048), (41, 0.018), (42, -0.04), (43, 0.056), (44, -0.061), (45, -0.004), (46, 0.007), (47, -0.018), (48, -0.011), (49, 0.003)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97053874 191 acl-2011-Knowledge Base Population: Successful Approaches and Challenges

Author: Heng Ji ; Ralph Grishman

Abstract: In this paper we give an overview of the Knowledge Base Population (KBP) track at the 2010 Text Analysis Conference. The main goal of KBP is to promote research in discovering facts about entities and augmenting a knowledge base (KB) with these facts. This is done through two tasks, Entity Linking linking names in context to entities in the KB and Slot Filling – adding information about an entity to the KB. A large source collection of newswire and web documents is provided from which systems are to discover information. Attributes (“slots”) derived from Wikipedia infoboxes are used to create the reference KB. In this paper we provide an overview of the techniques which can serve as a basis for a good KBP system, lay out the – – remaining challenges by comparison with traditional Information Extraction (IE) and Question Answering (QA) tasks, and provide some suggestions to address these challenges. 1

2 0.8640697 12 acl-2011-A Generative Entity-Mention Model for Linking Entities with Knowledge Base

Author: Xianpei Han ; Le Sun

Abstract: Linking entities with knowledge base (entity linking) is a key issue in bridging the textual data with the structural knowledge base. Due to the name variation problem and the name ambiguity problem, the entity linking decisions are critically depending on the heterogenous knowledge of entities. In this paper, we propose a generative probabilistic model, called entitymention model, which can leverage heterogenous entity knowledge (including popularity knowledge, name knowledge and context knowledge) for the entity linking task. In our model, each name mention to be linked is modeled as a sample generated through a three-step generative story, and the entity knowledge is encoded in the distribution of entities in document P(e), the distribution of possible names of a specific entity P(s|e), and the distribution of possible contexts of a specific entity P(c|e). To find the referent entity of a name mention, our method combines the evidences from all the three distributions P(e), P(s|e) and P(c|e). Experimental results show that our method can significantly outperform the traditional methods. 1

3 0.85684794 128 acl-2011-Exploring Entity Relations for Named Entity Disambiguation

Author: Danuta Ploch

Abstract: Named entity disambiguation is the task of linking an entity mention in a text to the correct real-world referent predefined in a knowledge base, and is a crucial subtask in many areas like information retrieval or topic detection and tracking. Named entity disambiguation is challenging because entity mentions can be ambiguous and an entity can be referenced by different surface forms. We present an approach that exploits Wikipedia relations between entities co-occurring with the ambiguous form to derive a range of novel features for classifying candidate referents. We find that our features improve disambiguation results significantly over a strong popularity baseline, and are especially suitable for recognizing entities not contained in the knowledge base. Our system achieves state-of-the-art results on the TAC-KBP 2009 dataset.

4 0.69327307 213 acl-2011-Local and Global Algorithms for Disambiguation to Wikipedia

Author: Lev Ratinov ; Dan Roth ; Doug Downey ; Mike Anderson

Abstract: Disambiguating concepts and entities in a context sensitive way is a fundamental problem in natural language processing. The comprehensiveness of Wikipedia has made the online encyclopedia an increasingly popular target for disambiguation. Disambiguation to Wikipedia is similar to a traditional Word Sense Disambiguation task, but distinct in that the Wikipedia link structure provides additional information about which disambiguations are compatible. In this work we analyze approaches that utilize this information to arrive at coherent sets of disambiguations for a given document (which we call “global” approaches), and compare them to more traditional (local) approaches. We show that previous approaches for global disambiguation can be improved, but even then the local disambiguation provides a baseline which is very hard to beat.

5 0.667997 196 acl-2011-Large-Scale Cross-Document Coreference Using Distributed Inference and Hierarchical Models

Author: Sameer Singh ; Amarnag Subramanya ; Fernando Pereira ; Andrew McCallum

Abstract: Cross-document coreference, the task of grouping all the mentions of each entity in a document collection, arises in information extraction and automated knowledge base construction. For large collections, it is clearly impractical to consider all possible groupings of mentions into distinct entities. To solve the problem we propose two ideas: (a) a distributed inference technique that uses parallelism to enable large scale processing, and (b) a hierarchical model of coreference that represents uncertainty over multiple granularities of entities to facilitate more effective approximate inference. To evaluate these ideas, we constructed a labeled corpus of 1.5 million disambiguated mentions in Web pages by selecting link anchors referring to Wikipedia entities. We show that the combination of the hierarchical model with distributed inference quickly obtains high accuracy (with error reduction of 38%) on this large dataset, demonstrating the scalability of our approach.

6 0.5965842 181 acl-2011-Jigs and Lures: Associating Web Queries with Structured Entities

7 0.56657481 129 acl-2011-Extending the Entity Grid with Entity-Specific Features

8 0.54687744 114 acl-2011-End-to-End Relation Extraction Using Distant Supervision from External Semantic Repositories

9 0.53370136 246 acl-2011-Piggyback: Using Search Engines for Robust Cross-Domain Named Entity Recognition

10 0.52438664 337 acl-2011-Wikipedia Revision Toolkit: Efficiently Accessing Wikipedias Edit History

11 0.50933689 126 acl-2011-Exploiting Syntactico-Semantic Structures for Relation Extraction

12 0.49990663 320 acl-2011-Unsupervised Discovery of Domain-Specific Knowledge from Text

13 0.48487198 271 acl-2011-Search in the Lost Sense of "Query": Question Formulation in Web Search Queries and its Temporal Changes

14 0.4737314 26 acl-2011-A Speech-based Just-in-Time Retrieval System using Semantic Search

15 0.46586561 40 acl-2011-An Error Analysis of Relation Extraction in Social Media Documents

16 0.4612574 285 acl-2011-Simple supervised document geolocation with geodesic grids

17 0.44876856 258 acl-2011-Ranking Class Labels Using Query Sessions

18 0.44577813 182 acl-2011-Joint Annotation of Search Queries

19 0.44309771 23 acl-2011-A Pronoun Anaphora Resolution System based on Factorial Hidden Markov Models

20 0.43541855 222 acl-2011-Model-Portability Experiments for Textual Temporal Analysis


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(5, 0.027), (9, 0.011), (17, 0.054), (26, 0.047), (30, 0.011), (37, 0.068), (39, 0.04), (41, 0.068), (44, 0.022), (55, 0.024), (59, 0.06), (72, 0.026), (91, 0.037), (93, 0.248), (96, 0.136), (97, 0.018)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.83711267 294 acl-2011-Temporal Evaluation

Author: Naushad UzZaman ; James Allen

Abstract: In this paper we propose a new method for evaluating systems that extract temporal information from text. It uses temporal closure1 to reward relations that are equivalent but distinct. Our metric measures the overall performance of systems with a single score, making comparison between different systems straightforward. Our approach is easy to implement, intuitive, accurate, scalable and computationally inexpensive. 1

same-paper 2 0.80116105 191 acl-2011-Knowledge Base Population: Successful Approaches and Challenges

Author: Heng Ji ; Ralph Grishman

Abstract: In this paper we give an overview of the Knowledge Base Population (KBP) track at the 2010 Text Analysis Conference. The main goal of KBP is to promote research in discovering facts about entities and augmenting a knowledge base (KB) with these facts. This is done through two tasks, Entity Linking linking names in context to entities in the KB and Slot Filling – adding information about an entity to the KB. A large source collection of newswire and web documents is provided from which systems are to discover information. Attributes (“slots”) derived from Wikipedia infoboxes are used to create the reference KB. In this paper we provide an overview of the techniques which can serve as a basis for a good KBP system, lay out the – – remaining challenges by comparison with traditional Information Extraction (IE) and Question Answering (QA) tasks, and provide some suggestions to address these challenges. 1

3 0.78957248 8 acl-2011-A Corpus of Scope-disambiguated English Text

Author: Mehdi Manshadi ; James Allen ; Mary Swift

Abstract: Previous work on quantifier scope annotation focuses on scoping sentences with only two quantified noun phrases (NPs), where the quantifiers are restricted to a predefined list. It also ignores negation, modal/logical operators, and other sentential adverbials. We present a comprehensive scope annotation scheme. We annotate the scope interaction between all scopal terms in the sentence from quantifiers to scopal adverbials, without putting any restriction on the number of scopal terms in a sentence. In addition, all NPs, explicitly quantified or not, with no restriction on the type of quantification, are investigated for possible scope interactions. 1

4 0.61144817 324 acl-2011-Unsupervised Semantic Role Induction via Split-Merge Clustering

Author: Joel Lang ; Mirella Lapata

Abstract: In this paper we describe an unsupervised method for semantic role induction which holds promise for relieving the data acquisition bottleneck associated with supervised role labelers. We present an algorithm that iteratively splits and merges clusters representing semantic roles, thereby leading from an initial clustering to a final clustering of better quality. The method is simple, surprisingly effective, and allows to integrate linguistic knowledge transparently. By combining role induction with a rule-based component for argument identification we obtain an unsupervised end-to-end semantic role labeling system. Evaluation on the CoNLL 2008 benchmark dataset demonstrates that our method outperforms competitive unsupervised approaches by a wide margin.

5 0.60922587 137 acl-2011-Fine-Grained Class Label Markup of Search Queries

Author: Joseph Reisinger ; Marius Pasca

Abstract: We develop a novel approach to the semantic analysis of short text segments and demonstrate its utility on a large corpus of Web search queries. Extracting meaning from short text segments is difficult as there is little semantic redundancy between terms; hence methods based on shallow semantic analysis may fail to accurately estimate meaning. Furthermore search queries lack explicit syntax often used to determine intent in question answering. In this paper we propose a hybrid model of semantic analysis combining explicit class-label extraction with a latent class PCFG. This class-label correlation (CLC) model admits a robust parallel approximation, allowing it to scale to large amounts of query data. We demonstrate its performance in terms of (1) its predicted label accuracy on polysemous queries and (2) its ability to accurately chunk queries into base constituents.

6 0.60620981 123 acl-2011-Exact Decoding of Syntactic Translation Models through Lagrangian Relaxation

7 0.60489821 190 acl-2011-Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations

8 0.60467398 170 acl-2011-In-domain Relation Discovery with Meta-constraints via Posterior Regularization

9 0.60423791 135 acl-2011-Faster and Smaller N-Gram Language Models

10 0.60388494 3 acl-2011-A Bayesian Model for Unsupervised Semantic Parsing

11 0.60303795 327 acl-2011-Using Bilingual Parallel Corpora for Cross-Lingual Textual Entailment

12 0.6022296 65 acl-2011-Can Document Selection Help Semi-supervised Learning? A Case Study On Event Extraction

13 0.60151035 244 acl-2011-Peeling Back the Layers: Detecting Event Role Fillers in Secondary Contexts

14 0.60026747 269 acl-2011-Scaling up Automatic Cross-Lingual Semantic Role Annotation

15 0.60020852 36 acl-2011-An Efficient Indexer for Large N-Gram Corpora

16 0.59939587 178 acl-2011-Interactive Topic Modeling

17 0.59911358 126 acl-2011-Exploiting Syntactico-Semantic Structures for Relation Extraction

18 0.59852314 196 acl-2011-Large-Scale Cross-Document Coreference Using Distributed Inference and Hierarchical Models

19 0.59810245 311 acl-2011-Translationese and Its Dialects

20 0.59777659 295 acl-2011-Temporal Restricted Boltzmann Machines for Dependency Parsing