emnlp emnlp2013 emnlp2013-160 knowledge-graph by maker-knowledge-mining

160 emnlp-2013-Relational Inference for Wikification


Source: pdf

Author: Xiao Cheng ; Dan Roth

Abstract: Wikification, commonly referred to as Disambiguation to Wikipedia (D2W), is the task of identifying concepts and entities in text and disambiguating them into the most specific corresponding Wikipedia pages. Previous approaches to D2W focused on the use of local and global statistics over the given text, Wikipedia articles and its link structures, to evaluate context compatibility among a list of probable candidates. However, these methods fail (often, embarrassingly), when some level of text understanding is needed to support Wikification. In this paper we introduce a novel approach to Wikification by incorporating, along with statistical methods, richer relational analysis of the text. We provide an extensible, efficient and modular Integer Linear Programming (ILP) formulation of Wikification that incorporates the entity-relation inference problem, and show that the ability to identify relations in text helps both candi- date generation and ranking Wikipedia titles considerably. Our results show significant improvements in both Wikification and the TAC Entity Linking task.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu l ino s Abstract Wikification, commonly referred to as Disambiguation to Wikipedia (D2W), is the task of identifying concepts and entities in text and disambiguating them into the most specific corresponding Wikipedia pages. [sent-2, score-0.166]

2 In this paper we introduce a novel approach to Wikification by incorporating, along with statistical methods, richer relational analysis of the text. [sent-5, score-0.21]

3 , 2011) and has already found broad applications in NLP, Information Extraction, and Knowledge Acquisition from text, from coreference resolution (Ratinov and Roth, 2012) to entity linking and knowledge population (Ellis et al. [sent-10, score-0.357]

4 We also allow a special NIL title that captures all mentions that are outside Wikipedia. [sent-14, score-0.342]

5 It was shown that by disambiguating to the most likely title for every surface, independently maximizing the conditional probability Pr(title|surface), we already achieve a very competitive | bsuasrefliancee on see avlereraald y W aickhiifeicvaeti aon v dryat caosemts(Ratinov et al. [sent-16, score-0.226]

6 A certain level of text understanding is required even to be able to generate a good list of title candidates. [sent-38, score-0.301]

7 Mack Robinson College of Business which is located at Georgia State University instead of Robinson College, Cambridge, which is the only probable title linked by the surface Robinson College in the version of the Wikipedia dump we used. [sent-54, score-0.31]

8 We show that by leveraging a better understanding of the textual relations, we can substantially improve the Wikification performance. [sent-60, score-0.115]

9 2 The Wikification Approach A general Wikification decision consists of three computational components: (1) generating a ranked list of title candidates for each mention, (2) ranking candidates globally, and (3) dealing with NIL mentions. [sent-62, score-0.459]

10 For (1), the “standard” way of using Pr(title|surface) is often not sufficient; consider tPhre( case w|suhrerfea athcee) )m iesn otifotenn nis n tohte single ewnto;rd c o“Pnsriedsei-r dent”; disambiguating such mentions depends heav- ily on the context, i. [sent-63, score-0.212]

11 For (2), even though the anchor texts cover many possible ways of paraphrasing the Wikipedia article titles and thus using the top Pr(title|surface) is proven to be a fairly strong baseline, |its uisr never comprehensive. [sent-67, score-0.171]

12 T ah efareir liys a need to disambiguate titles that were never linked by any anchor text, and to disambiguate mentions that have never been observed as the linked text. [sent-68, score-0.493]

13 For (3) the Wikifier needs to determine when a mention corresponds to no title, and map it to a NIL entity. [sent-69, score-0.139]

14 1) of our approach to improve Wikification by leveraging textual relations in these three stages. [sent-72, score-0.184]

15 Require: Document D, Knowledge Base K consisting of relation triples σ = (ta, p, tb), where p is the relation predicate. [sent-74, score-0.258]

16 2: GGeenneerraattee cinaintidaild mateens ti = M{tik =} {fomr m}e fnrotmion D mi and initialize candidate= priors fPorr(t mik |mi) nw mith existing Wikification system, for all mi ∈ M. [sent-76, score-0.198]

17 3: Instantiate non-coreference relational∈ constraints and add relational candidates. [sent-77, score-0.254]

18 4: Instantiate coreference relational constraints and add relational candidates. [sent-78, score-0.588]

19 Most of our discussion addresses the relational analysis and its impact on stage (2) and (3) above. [sent-81, score-0.242]

20 We use two types of boolean variables: eik is used to denote whether we disambiguate mi to tik (Γ(mi) = tik) or not. [sent-85, score-0.416]

21 r(ikj,l) is used to denote if 1789 titles tik and tjl are chosen simultaneously, that is, r(ikj,l) = eik ∧ ejl. [sent-86, score-0.53]

22 Our mod∧el es determine two types of score for the boolean variables above: sik = Pr(eik) = Pr(Γ(mi) = tik), represents the initial score for the kth candidate title being chosen for mention mi. [sent-87, score-0.453]

23 For a pair of titles (tik, tjl), we denote the confidence of w(ikj,l). [sent-88, score-0.171]

24 finding a relation between them by Its value depends on the textual relation type and on how coherent it is with our existing knowledge. [sent-89, score-0.282]

25 Our goal is to find the best assignment to variables eik, such that it satisfies some legitimacy (hard) constraints and the soft constraints dictated by the w(ikj,l)). [sent-90, score-0.12]

26 relational constraints (via scores To accomplish that we define our objective function as a Constrained Conditional Model (CCM) (Roth and Yih, 2004; Chang et al. [sent-91, score-0.288]

27 , 2012) that is used to reward or penalize a pair of candidates tik, tjl by when w(ikj,l) they are chosen in the same document. [sent-92, score-0.182]

28 The key challenge in incorporating relational analysis into the Wikification decision is to systematically construct the relational constraints (the solid edges between candidates in Figure 1) and incorpo- rate them into our inference framework. [sent-97, score-0.646]

29 Two main components are needed: first, we need to extract high precision textual relations from the text; then, Apposition Coreference Possessive m1m2m3m4 . [sent-98, score-0.184]

30 t21 Figure 1: Textual relation inference framework: to titles while enforcing coherency The goal is to maximize the objective function assigning mentions with relations extracted from both text and an external knowledge base. [sent-112, score-0.801]

31 Here, searching the external KB reveals that Slobodan Milo ˇsevi c´ is the founder of the Socialist Party of Serbia, which can be referred to by the surface Socialist Party; we therefore reward the output containing this pair of candidates. [sent-113, score-0.137]

32 idea applies for the relation “Slobodan well as to the coreference The same Milo ˇsevi c´ holds office as President of the Federal Republic of Yugoslavia” relation between two mentions of Slobodan as Milo ˇsevi c´. [sent-114, score-0.514]

33 We determine the weights by combining type and confidence of the relation extracted from text with the confidence in relations retrieved from an external Knowledge Base (KB) by using the mention pairs as a query. [sent-116, score-0.457]

34 1 we describe how we extract relations from text; our goal is to reliably identify arguments that we hypothesize to be in a relation; we show that this is essential both to our candidate generation, our ranking and the mapping to NIL. [sent-120, score-0.248]

35 3 shows how we generate scores for the mentions and relations, as coefficients in the objective function of Sec. [sent-126, score-0.198]

36 Unlike the general ACE RDC task, we can restrict relation arguments to be named entities and thus leverage the large number of known relations in existing databases (e. [sent-132, score-0.388]

37 We also consider conference relations that potentially aid mapping different mentions to the same title. [sent-135, score-0.292]

38 A purely statistical approach would very likely map the entity [Ministry of Defense]2 to Ministry of Defense (Israel) instead of Ministry of Defense and Armed Forces Logistics (Iran) because the context is more coherent with concepts related to Israel rather than to Iran. [sent-143, score-0.121]

39 Nevertheless, the pre-modifier relation between [Iranian]1 and [Ministry of Defense]2 demands the answer to be tightly related to Iran. [sent-144, score-0.113]

40 Even though human readers may not know the correct title needed here, understanding the pre-modifier relation allows them to easily filter through a list of candidates and enforce constraints that are derived jointly from the relation expressed in the text and their background knowledge. [sent-145, score-0.696]

41 In our attempt to mimic this general approach, we employ several high precision classifiers to resolve 1790 a range of local relations that are used to retrieve relevant background knowledge, and consequently integrated into our inference framework. [sent-146, score-0.185]

42 In the above example, Iranian Ministry of Defense would be decomposed into Iranian and Ministry of Defense and our relation extraction process hypothesizes a relation between these arguments. [sent-149, score-0.226]

43 The following example illustrates the importance of under- standing co-reference relations in Wikification: Ex. [sent-154, score-0.128]

44 Clearly [Goldman]2 refers to the same person and should be mapped to the same entity (or to NIL) rather than popular entities frequently referred to as Goldman, coherent with context or not, such as Goldman Sachs. [sent-164, score-0.177]

45 To accomplish that, we cluster named entities that share tokens or are acronyms of each other when there is no ambiguity (e. [sent-165, score-0.128]

46 no other longer named entity mentions containing Goldman in the document) and use a voting algorithm (Algorithm 2) to generate candidates locally from within the clusters. [sent-167, score-0.413]

47 3 Coreferent Nominal Mentions Document level coreference also provides important relations between named entities and nominal mentions. [sent-171, score-0.384]

48 Extracting these relations proved to be very useful for classifying NIL entities, as unfamil- iar concepts tend to be introduced with these suc1791 cinct appositional nominal mentions. [sent-172, score-0.205]

49 That is, it allows us to determine whether the target mention corresponds to a candidate title. [sent-174, score-0.2]

50 Identifying the apposition relation allows us to determine that this Dorothy Byrne is not the baseline Wikipedia title. [sent-178, score-0.147]

51 , 2011) of the candidate page, head word attributes and entity relation (i. [sent-180, score-0.262]

52 between Dorothy Byrne and Florida Green Party) to determine whether any candidates of Dorothy Byrne can entail the nominal mention. [sent-182, score-0.203]

53 Instead, we make use of relational queries to generate a more likely set of candidates. [sent-186, score-0.21]

54 Once mention pairs are generated from text using the syntactico-semantic structures and coreference, we use these to query our KB of relational triples. [sent-187, score-0.387]

55 We first indexed all Wikipedia links and DBpedia relations as unordered triples σ = (ti, p, tj), where the arguments ti, tj are tokenized, stemmed and lowercased for best recall. [sent-188, score-0.259]

56 Since our baseline system has approximately 80% accuracy at this stage, it is reasonable to assume that at least one of the argument mentions is correctly disambiguated. [sent-190, score-0.164]

57 Therefore we prune the search space by making only two queries for each mention pair (mi, mj) : q0 = (ti∗ , mj) and q1 = (mi, tj∗) where ti∗ , tj∗ are the strings representing the top titles chosen by the current model for mentions mi, mj respectively. [sent-191, score-0.478]

58 4, only keeping the arguments that are known to be possible or very likely candidates of the mention, based on the ambiguity that exists in the query result. [sent-194, score-0.223]

59 We consider adding new title candidates from two sources, through the coreference module and through the combined DBpedia and Wikipedia inter-page link structures. [sent-198, score-0.478]

60 1 Scoring Knowledge Base Relations Our model uses both explicit relations p LINK from DBpedia and Wikipedia hyperlinks p = LINK (implicit relation). [sent-202, score-0.128]

61 We want to favor = relations with explicit predicate, each weighted as φ implicit relation (we use φ = 5 in our experiments, noting the results are insensitive to slight changes of this parameter). [sent-203, score-0.241]

62 Note that we do not check the type of the relation against the textual relation. [sent-207, score-0.169]

63 The key reason is that explicit relations are not as robust, especially considering that we restrict one of the arguments in the relation and constraining the other argument’s lexical form. [sent-208, score-0.3]

64 Moreover, we back off to restricting the relations to be between known candidates when multiple lexically matched arguments are retrieved with high ambiguity. [sent-209, score-0.312]

65 Additionally, most of our relations 1http : / / lucene . [sent-210, score-0.128]

66 2 Scoring Coreference Relations For coreference relations, we simply use hard constraints by assigning candidates in the same coreference cluster a high relational weight, which is a cheap approximation to penalizing the output where the coreferent mentions disambiguate to different titles. [sent-220, score-0.923]

67 Another important issue here is that the correct coreferent candidate might not exist in the candidate list of the shorter mentions in the cluster. [sent-222, score-0.369]

68 For example, if a mention has the surface Richard, the number of potential candidates is so large that any top K list of titles will not be informative. [sent-223, score-0.525]

69 We therefore ignore candidates generated from short surface strings and give it the same candidate list as the head mentions in its cluster. [sent-224, score-0.474]

70 Figure 2 shows the voting algorithm we use to elect the potential candidates for the cluster. [sent-225, score-0.125]

71 The reason for separating the votes of longer and shorter mentions is that shorter mentions are inherently more ambiguous. [sent-226, score-0.328]

72 Once a coreferent relation is determined, longer mentions in the cluster should dictate what this cluster should collectively refer to. [sent-227, score-0.409]

73 4 Candidate Generation Beyond the algorithmic improvements, the mention and candidate generation stage is aided by a few systematic preprocessing improvement briefly described below. [sent-229, score-0.198]

74 1 Mention Segmentation Since named entities may sometimes overlap with each other, we use regular expressions to match longer surface forms that are often incorrectly segmented or ignored by NER 2 due to different annotation standards. [sent-232, score-0.181]

75 The regular expression pattern we used for Step 1in Algorithm 1 simply adds mentions formed by any two consecutive capitalized word chunks connected by up to 2 punctuation marks, prepositions, and the tokens “the”, “’s” & “and”. [sent-234, score-0.164]

76 These segments are also used as arguments for relation extraction. [sent-235, score-0.172]

77 2 Lexical Search We link certain mentions directly to their exact matching titles in Step 3 when there is very low ambiguity. [sent-238, score-0.386]

78 Specifically, when no title is known for a mention that is relatively long and fuzzily matches the lexically retrieved title, we perform this aggressive linking. [sent-239, score-0.283]

79 We only accept the link if there exists exactly one title in the lexical searching result after pruning. [sent-242, score-0.229]

80 , 2011) to initialize the candidates and corresponding priors sik in our objective function. [sent-252, score-0.234]

81 The AQUAINT dataset, originally introduced in (Milne and Witten, 2008), resembles the Wikipedia annotation structure in that only the first mention of a title is linked, and is thus less sensitive to coreference capabilities. [sent-257, score-0.407]

82 The MSNBC dataset is from (Cucerzan, 2007) and includes many mentions that do not easily map to Wikipedia titles due to rare surface or other idiosyncratic lexicalization (Cucerzan, 2007; Ratinov et al. [sent-258, score-0.428]

83 For each document, the gold bag of titles is evaluated against our bag of system output titles requiring exact segmentation match. [sent-266, score-0.342]

84 The Coreference performance includes all the inference performed without the KB triples, while the Relational Inference (RI) line represents all aspects of the proposed relational inference. [sent-295, score-0.267]

85 Given the TAC Knowledge Base (TKB), which is a subset of the 2009 Wikipedia Dump, the TAC Entity Linking objective is to answer a named entity query string with either a TKB entry ID or a NIL entity ID, where the NIL entity IDs should be clustered across documents. [sent-304, score-0.373]

86 Due to the clustering requirement, we also trivially cluster NIL entities that either are mapped to the same out-of-KB Wikipedia URL or have the same surface form. [sent-308, score-0.222]

87 RI is the complete relational inference system described in this paper; as described in the text, RI was not trained on the TAC data, unlike the other top systems. [sent-313, score-0.267]

88 We performed two runs on the TAC201 1 data to study the effects of relational inference. [sent-314, score-0.21]

89 We can regard this performance as the new baseline that benefited from the fuzzy lexical matching capabilities that we have added, as well as the broader set of surface forms and redirects from the current Wikipedia dump. [sent-320, score-0.142]

90 In the second run, RI, the complete relational inference described in this paper, scored 4. [sent-321, score-0.267]

91 This shows the robustness of our methods as well as the general importance of understanding textual relations in the task of Entity Linking and Wikification. [sent-326, score-0.243]

92 Later, various global statistical approaches were proposed to emphasize different coherence measures between the titles of the disambiguated mentions in the same doc1795 ument (Cucerzan, 2007; Milne and Witten, 2008; Ratinov et al. [sent-328, score-0.335]

93 We have demonstrated that, by incorporating textual relations and semantic knowledge as linguistic constraints in an inference framework, it is possible to significantly improve Wikification performance. [sent-331, score-0.285]

94 Our system features high modularity since the relations are considered only at inference time; consequently, we can use any underlying Wikification system as long as it outputs a distribution of title candidates for each mention. [sent-334, score-0.488]

95 One possibility for future work is to supply this framework with a richer set of relations from the text, such as verbal relations. [sent-335, score-0.128]

96 It will also be inter- esting to incorporate high-level typed relations and relax the relation arguments to be general concepts rather than only named entities. [sent-336, score-0.369]

97 Tac entity linking by performing full-document entity extraction and disambiguation. [sent-370, score-0.283]

98 Tagme: on-thefly annotation of short text fragments (by wikipedia entities). [sent-386, score-0.215]

99 Overview of the tac 2010 knowledge base population track. [sent-395, score-0.263]

100 Overview of the tac 2011 knowledge base population track. [sent-399, score-0.263]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('wikification', 0.426), ('relational', 0.21), ('tac', 0.192), ('wikipedia', 0.182), ('title', 0.178), ('ratinov', 0.176), ('titles', 0.171), ('tik', 0.17), ('mentions', 0.164), ('nil', 0.155), ('eik', 0.132), ('socialist', 0.132), ('milo', 0.131), ('relations', 0.128), ('candidates', 0.125), ('coreference', 0.124), ('relation', 0.113), ('linking', 0.107), ('mention', 0.105), ('goldman', 0.098), ('dorothy', 0.094), ('slobodan', 0.094), ('surface', 0.093), ('entity', 0.088), ('party', 0.086), ('cucerzan', 0.084), ('evi', 0.082), ('robinson', 0.082), ('glow', 0.082), ('cogcomp', 0.082), ('roth', 0.081), ('iranian', 0.075), ('mubarak', 0.075), ('sik', 0.075), ('vtik', 0.075), ('mi', 0.074), ('milne', 0.072), ('kb', 0.07), ('ministry', 0.07), ('defense', 0.07), ('byrne', 0.07), ('atmosphere', 0.066), ('dbpedia', 0.066), ('candidate', 0.061), ('arguments', 0.059), ('understanding', 0.059), ('inference', 0.057), ('coherency', 0.057), ('serbia', 0.057), ('sevi', 0.057), ('tjl', 0.057), ('textual', 0.056), ('witten', 0.053), ('entities', 0.052), ('coreferent', 0.052), ('link', 0.051), ('ti', 0.05), ('college', 0.05), ('ri', 0.05), ('monahan', 0.049), ('redirects', 0.049), ('disambiguating', 0.048), ('pr', 0.046), ('ellis', 0.045), ('external', 0.044), ('nominal', 0.044), ('constraints', 0.044), ('ilp', 0.043), ('earth', 0.042), ('tj', 0.04), ('disambiguate', 0.04), ('cluster', 0.04), ('president', 0.039), ('chan', 0.039), ('gurobi', 0.039), ('query', 0.039), ('linked', 0.039), ('population', 0.038), ('allsingle', 0.038), ('ferragina', 0.038), ('hosni', 0.038), ('syntacticosemantic', 0.038), ('tkb', 0.038), ('yugoslav', 0.038), ('mj', 0.038), ('ji', 0.037), ('mapped', 0.037), ('named', 0.036), ('determine', 0.034), ('objective', 0.034), ('concepts', 0.033), ('text', 0.033), ('msnbc', 0.033), ('bot', 0.033), ('rdc', 0.033), ('base', 0.033), ('stage', 0.032), ('triples', 0.032), ('assignment', 0.032), ('list', 0.031)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999893 160 emnlp-2013-Relational Inference for Wikification

Author: Xiao Cheng ; Dan Roth

Abstract: Wikification, commonly referred to as Disambiguation to Wikipedia (D2W), is the task of identifying concepts and entities in text and disambiguating them into the most specific corresponding Wikipedia pages. Previous approaches to D2W focused on the use of local and global statistics over the given text, Wikipedia articles and its link structures, to evaluate context compatibility among a list of probable candidates. However, these methods fail (often, embarrassingly), when some level of text understanding is needed to support Wikification. In this paper we introduce a novel approach to Wikification by incorporating, along with statistical methods, richer relational analysis of the text. We provide an extensible, efficient and modular Integer Linear Programming (ILP) formulation of Wikification that incorporates the entity-relation inference problem, and show that the ability to identify relations in text helps both candi- date generation and ranking Wikipedia titles considerably. Our results show significant improvements in both Wikification and the TAC Entity Linking task.

2 0.2676549 69 emnlp-2013-Efficient Collective Entity Linking with Stacking

Author: Zhengyan He ; Shujie Liu ; Yang Song ; Mu Li ; Ming Zhou ; Houfeng Wang

Abstract: Entity disambiguation works by linking ambiguous mentions in text to their corresponding real-world entities in knowledge base. Recent collective disambiguation methods enforce coherence among contextual decisions at the cost of non-trivial inference processes. We propose a fast collective disambiguation approach based on stacking. First, we train a local predictor g0 with learning to rank as base learner, to generate initial ranking list of candidates. Second, top k candidates of related instances are searched for constructing expressive global coherence features. A global predictor g1 is trained in the augmented feature space and stacking is employed to tackle the train/test mismatch problem. The proposed method is fast and easy to implement. Experiments show its effectiveness over various algorithms on several public datasets. By learning a rich semantic relatedness measure be- . tween entity categories and context document, performance is further improved.

3 0.20514333 67 emnlp-2013-Easy Victories and Uphill Battles in Coreference Resolution

Author: Greg Durrett ; Dan Klein

Abstract: Classical coreference systems encode various syntactic, discourse, and semantic phenomena explicitly, using heterogenous features computed from hand-crafted heuristics. In contrast, we present a state-of-the-art coreference system that captures such phenomena implicitly, with a small number of homogeneous feature templates examining shallow properties of mentions. Surprisingly, our features are actually more effective than the corresponding hand-engineered ones at modeling these key linguistic phenomena, allowing us to win “easy victories” without crafted heuristics. These features are successful on syntax and discourse; however, they do not model semantic compatibility well, nor do we see gains from experiments with shallow semantic features from the literature, suggesting that this approach to semantics is an “uphill battle.” Nonetheless, our final system1 outperforms the Stanford system (Lee et al. (201 1), the winner of the CoNLL 2011 shared task) by 3.5% absolute on the CoNLL metric and outperforms the IMS system (Bj o¨rkelund and Farkas (2012), the best publicly available English coreference system) by 1.9% absolute.

4 0.19401069 112 emnlp-2013-Joint Coreference Resolution and Named-Entity Linking with Multi-Pass Sieves

Author: Hannaneh Hajishirzi ; Leila Zilles ; Daniel S. Weld ; Luke Zettlemoyer

Abstract: Many errors in coreference resolution come from semantic mismatches due to inadequate world knowledge. Errors in named-entity linking (NEL), on the other hand, are often caused by superficial modeling of entity context. This paper demonstrates that these two tasks are complementary. We introduce NECO, a new model for named entity linking and coreference resolution, which solves both problems jointly, reducing the errors made on each. NECO extends the Stanford deterministic coreference system by automatically linking mentions to Wikipedia and introducing new NEL-informed mention-merging sieves. Linking improves mention-detection and enables new semantic attributes to be incorporated from Freebase, while coreference provides better context modeling by propagating named-entity links within mention clusters. Experiments show consistent improve- ments across a number of datasets and experimental conditions, including over 11% reduction in MUC coreference error and nearly 21% reduction in F1 NEL error on ACE 2004 newswire data.

5 0.18865259 49 emnlp-2013-Combining Generative and Discriminative Model Scores for Distant Supervision

Author: Benjamin Roth ; Dietrich Klakow

Abstract: Distant supervision is a scheme to generate noisy training data for relation extraction by aligning entities of a knowledge base with text. In this work we combine the output of a discriminative at-least-one learner with that of a generative hierarchical topic model to reduce the noise in distant supervision data. The combination significantly increases the ranking quality of extracted facts and achieves state-of-the-art extraction performance in an end-to-end setting. A simple linear interpolation of the model scores performs better than a parameter-free scheme based on nondominated sorting.

6 0.18263045 1 emnlp-2013-A Constrained Latent Variable Model for Coreference Resolution

7 0.17349488 73 emnlp-2013-Error-Driven Analysis of Challenges in Coreference Resolution

8 0.1400952 194 emnlp-2013-Unsupervised Relation Extraction with General Domain Knowledge

9 0.13500325 110 emnlp-2013-Joint Bootstrapping of Corpus Annotations and Entity Types

10 0.12768695 51 emnlp-2013-Connecting Language and Knowledge Bases with Embedding Models for Relation Extraction

11 0.1231474 130 emnlp-2013-Microblog Entity Linking by Leveraging Extra Posts

12 0.10604194 80 emnlp-2013-Exploiting Zero Pronouns to Improve Chinese Coreference Resolution

13 0.094204247 152 emnlp-2013-Predicting the Presence of Discourse Connectives

14 0.092548527 24 emnlp-2013-Application of Localized Similarity for Web Documents

15 0.091304064 31 emnlp-2013-Automatic Feature Engineering for Answer Selection and Extraction

16 0.087475367 41 emnlp-2013-Building Event Threads out of Multiple News Articles

17 0.087166719 118 emnlp-2013-Learning Biological Processes with Global Constraints

18 0.083098248 198 emnlp-2013-Using Soft Constraints in Joint Inference for Clinical Concept Recognition

19 0.082291208 114 emnlp-2013-Joint Learning and Inference for Grammatical Error Correction

20 0.081694581 93 emnlp-2013-Harvesting Parallel News Streams to Generate Paraphrases of Event Relations


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.257), (1, 0.237), (2, 0.23), (3, 0.013), (4, 0.024), (5, 0.042), (6, 0.039), (7, 0.099), (8, 0.134), (9, -0.077), (10, 0.094), (11, -0.005), (12, -0.103), (13, 0.058), (14, -0.198), (15, -0.082), (16, -0.008), (17, 0.014), (18, 0.103), (19, 0.128), (20, -0.141), (21, 0.062), (22, -0.032), (23, -0.009), (24, 0.049), (25, -0.01), (26, 0.03), (27, -0.073), (28, 0.024), (29, 0.107), (30, -0.018), (31, -0.041), (32, 0.052), (33, -0.006), (34, 0.052), (35, 0.102), (36, -0.016), (37, -0.023), (38, 0.044), (39, 0.014), (40, 0.045), (41, 0.104), (42, -0.015), (43, 0.077), (44, -0.025), (45, 0.005), (46, 0.011), (47, 0.005), (48, -0.031), (49, -0.079)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96838731 160 emnlp-2013-Relational Inference for Wikification

Author: Xiao Cheng ; Dan Roth

Abstract: Wikification, commonly referred to as Disambiguation to Wikipedia (D2W), is the task of identifying concepts and entities in text and disambiguating them into the most specific corresponding Wikipedia pages. Previous approaches to D2W focused on the use of local and global statistics over the given text, Wikipedia articles and its link structures, to evaluate context compatibility among a list of probable candidates. However, these methods fail (often, embarrassingly), when some level of text understanding is needed to support Wikification. In this paper we introduce a novel approach to Wikification by incorporating, along with statistical methods, richer relational analysis of the text. We provide an extensible, efficient and modular Integer Linear Programming (ILP) formulation of Wikification that incorporates the entity-relation inference problem, and show that the ability to identify relations in text helps both candi- date generation and ranking Wikipedia titles considerably. Our results show significant improvements in both Wikification and the TAC Entity Linking task.

2 0.77313471 69 emnlp-2013-Efficient Collective Entity Linking with Stacking

Author: Zhengyan He ; Shujie Liu ; Yang Song ; Mu Li ; Ming Zhou ; Houfeng Wang

Abstract: Entity disambiguation works by linking ambiguous mentions in text to their corresponding real-world entities in knowledge base. Recent collective disambiguation methods enforce coherence among contextual decisions at the cost of non-trivial inference processes. We propose a fast collective disambiguation approach based on stacking. First, we train a local predictor g0 with learning to rank as base learner, to generate initial ranking list of candidates. Second, top k candidates of related instances are searched for constructing expressive global coherence features. A global predictor g1 is trained in the augmented feature space and stacking is employed to tackle the train/test mismatch problem. The proposed method is fast and easy to implement. Experiments show its effectiveness over various algorithms on several public datasets. By learning a rich semantic relatedness measure be- . tween entity categories and context document, performance is further improved.

3 0.73480093 49 emnlp-2013-Combining Generative and Discriminative Model Scores for Distant Supervision

Author: Benjamin Roth ; Dietrich Klakow

Abstract: Distant supervision is a scheme to generate noisy training data for relation extraction by aligning entities of a knowledge base with text. In this work we combine the output of a discriminative at-least-one learner with that of a generative hierarchical topic model to reduce the noise in distant supervision data. The combination significantly increases the ranking quality of extracted facts and achieves state-of-the-art extraction performance in an end-to-end setting. A simple linear interpolation of the model scores performs better than a parameter-free scheme based on nondominated sorting.

4 0.68117058 112 emnlp-2013-Joint Coreference Resolution and Named-Entity Linking with Multi-Pass Sieves

Author: Hannaneh Hajishirzi ; Leila Zilles ; Daniel S. Weld ; Luke Zettlemoyer

Abstract: Many errors in coreference resolution come from semantic mismatches due to inadequate world knowledge. Errors in named-entity linking (NEL), on the other hand, are often caused by superficial modeling of entity context. This paper demonstrates that these two tasks are complementary. We introduce NECO, a new model for named entity linking and coreference resolution, which solves both problems jointly, reducing the errors made on each. NECO extends the Stanford deterministic coreference system by automatically linking mentions to Wikipedia and introducing new NEL-informed mention-merging sieves. Linking improves mention-detection and enables new semantic attributes to be incorporated from Freebase, while coreference provides better context modeling by propagating named-entity links within mention clusters. Experiments show consistent improve- ments across a number of datasets and experimental conditions, including over 11% reduction in MUC coreference error and nearly 21% reduction in F1 NEL error on ACE 2004 newswire data.

5 0.66514105 110 emnlp-2013-Joint Bootstrapping of Corpus Annotations and Entity Types

Author: Hrushikesh Mohapatra ; Siddhanth Jain ; Soumen Chakrabarti

Abstract: Web search can be enhanced in powerful ways if token spans in Web text are annotated with disambiguated entities from large catalogs like Freebase. Entity annotators need to be trained on sample mention snippets. Wikipedia entities and annotated pages offer high-quality labeled data for training and evaluation. Unfortunately, Wikipedia features only one-ninth the number of entities as Freebase, and these are a highly biased sample of well-connected, frequently mentioned “head” entities. To bring hope to “tail” entities, we broaden our goal to a second task: assigning types to entities in Freebase but not Wikipedia. The two tasks are synergistic: knowing the types of unfamiliar entities helps disambiguate mentions, and words in mention contexts help assign types to entities. We present TMI, a bipartite graphical model for joint type-mention inference. TMI attempts no schema integration or entity resolution, but exploits the above-mentioned synergy. In experiments involving 780,000 people in Wikipedia, 2.3 million people in Freebase, 700 million Web pages, and over 20 professional editors, TMI shows considerable annotation accuracy improvement (e.g., 70%) compared to baselines (e.g., 46%), especially for “tail” and emerging entities. We also compare with Google’s recent annotations of the same corpus with Freebase entities, and report considerable improvements within the people domain.

6 0.62461048 51 emnlp-2013-Connecting Language and Knowledge Bases with Embedding Models for Relation Extraction

7 0.57976907 68 emnlp-2013-Effectiveness and Efficiency of Open Relation Extraction

8 0.5685882 73 emnlp-2013-Error-Driven Analysis of Challenges in Coreference Resolution

9 0.55907035 79 emnlp-2013-Exploiting Multiple Sources for Open-Domain Hypernym Discovery

10 0.53239357 194 emnlp-2013-Unsupervised Relation Extraction with General Domain Knowledge

11 0.51895231 1 emnlp-2013-A Constrained Latent Variable Model for Coreference Resolution

12 0.50340188 67 emnlp-2013-Easy Victories and Uphill Battles in Coreference Resolution

13 0.48511267 7 emnlp-2013-A Hierarchical Entity-Based Approach to Structuralize User Generated Content in Social Media: A Case of Yahoo! Answers

14 0.48434278 114 emnlp-2013-Joint Learning and Inference for Grammatical Error Correction

15 0.4734599 189 emnlp-2013-Two-Stage Method for Large-Scale Acquisition of Contradiction Pattern Pairs using Entailment

16 0.47161371 198 emnlp-2013-Using Soft Constraints in Joint Inference for Clinical Concept Recognition

17 0.45200416 152 emnlp-2013-Predicting the Presence of Discourse Connectives

18 0.40218273 93 emnlp-2013-Harvesting Parallel News Streams to Generate Paraphrases of Event Relations

19 0.40123084 130 emnlp-2013-Microblog Entity Linking by Leveraging Extra Posts

20 0.40050483 24 emnlp-2013-Application of Localized Similarity for Web Documents


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(3, 0.034), (9, 0.014), (18, 0.035), (22, 0.051), (30, 0.065), (36, 0.285), (50, 0.023), (51, 0.177), (66, 0.031), (71, 0.025), (75, 0.058), (77, 0.026), (90, 0.017), (95, 0.022), (96, 0.054)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.78780985 160 emnlp-2013-Relational Inference for Wikification

Author: Xiao Cheng ; Dan Roth

Abstract: Wikification, commonly referred to as Disambiguation to Wikipedia (D2W), is the task of identifying concepts and entities in text and disambiguating them into the most specific corresponding Wikipedia pages. Previous approaches to D2W focused on the use of local and global statistics over the given text, Wikipedia articles and its link structures, to evaluate context compatibility among a list of probable candidates. However, these methods fail (often, embarrassingly), when some level of text understanding is needed to support Wikification. In this paper we introduce a novel approach to Wikification by incorporating, along with statistical methods, richer relational analysis of the text. We provide an extensible, efficient and modular Integer Linear Programming (ILP) formulation of Wikification that incorporates the entity-relation inference problem, and show that the ability to identify relations in text helps both candi- date generation and ranking Wikipedia titles considerably. Our results show significant improvements in both Wikification and the TAC Entity Linking task.

2 0.7800023 37 emnlp-2013-Automatically Identifying Pseudepigraphic Texts

Author: Moshe Koppel ; Shachar Seidman

Abstract: The identification of pseudepigraphic texts texts not written by the authors to which they are attributed – has important historical, forensic and commercial applications. We introduce an unsupervised technique for identifying pseudepigrapha. The idea is to identify textual outliers in a corpus based on the pairwise similarities of all documents in the corpus. The crucial point is that document similarity not be measured in any of the standard ways but rather be based on the output of a recently introduced algorithm for authorship verification. The proposed method strongly outperforms existing techniques in systematic experiments on a blog corpus. 1

3 0.72826183 69 emnlp-2013-Efficient Collective Entity Linking with Stacking

Author: Zhengyan He ; Shujie Liu ; Yang Song ; Mu Li ; Ming Zhou ; Houfeng Wang

Abstract: Entity disambiguation works by linking ambiguous mentions in text to their corresponding real-world entities in knowledge base. Recent collective disambiguation methods enforce coherence among contextual decisions at the cost of non-trivial inference processes. We propose a fast collective disambiguation approach based on stacking. First, we train a local predictor g0 with learning to rank as base learner, to generate initial ranking list of candidates. Second, top k candidates of related instances are searched for constructing expressive global coherence features. A global predictor g1 is trained in the augmented feature space and stacking is employed to tackle the train/test mismatch problem. The proposed method is fast and easy to implement. Experiments show its effectiveness over various algorithms on several public datasets. By learning a rich semantic relatedness measure be- . tween entity categories and context document, performance is further improved.

4 0.60433567 48 emnlp-2013-Collective Personal Profile Summarization with Social Networks

Author: Zhongqing Wang ; Shoushan LI ; Fang Kong ; Guodong Zhou

Abstract: Personal profile information on social media like LinkedIn.com and Facebook.com is at the core of many interesting applications, such as talent recommendation and contextual advertising. However, personal profiles usually lack organization confronted with the large amount of available information. Therefore, it is always a challenge for people to find desired information from them. In this paper, we address the task of personal profile summarization by leveraging both personal profile textual information and social networks. Here, using social networks is motivated by the intuition that, people with similar academic, business or social connections (e.g. co-major, co-university, and cocorporation) tend to have similar experience and summaries. To achieve the learning process, we propose a collective factor graph (CoFG) model to incorporate all these resources of knowledge to summarize personal profiles with local textual attribute functions and social connection factors. Extensive evaluation on a large-scale dataset from LinkedIn.com demonstrates the effectiveness of the proposed approach. 1

5 0.59947002 110 emnlp-2013-Joint Bootstrapping of Corpus Annotations and Entity Types

Author: Hrushikesh Mohapatra ; Siddhanth Jain ; Soumen Chakrabarti

Abstract: Web search can be enhanced in powerful ways if token spans in Web text are annotated with disambiguated entities from large catalogs like Freebase. Entity annotators need to be trained on sample mention snippets. Wikipedia entities and annotated pages offer high-quality labeled data for training and evaluation. Unfortunately, Wikipedia features only one-ninth the number of entities as Freebase, and these are a highly biased sample of well-connected, frequently mentioned “head” entities. To bring hope to “tail” entities, we broaden our goal to a second task: assigning types to entities in Freebase but not Wikipedia. The two tasks are synergistic: knowing the types of unfamiliar entities helps disambiguate mentions, and words in mention contexts help assign types to entities. We present TMI, a bipartite graphical model for joint type-mention inference. TMI attempts no schema integration or entity resolution, but exploits the above-mentioned synergy. In experiments involving 780,000 people in Wikipedia, 2.3 million people in Freebase, 700 million Web pages, and over 20 professional editors, TMI shows considerable annotation accuracy improvement (e.g., 70%) compared to baselines (e.g., 46%), especially for “tail” and emerging entities. We also compare with Google’s recent annotations of the same corpus with Freebase entities, and report considerable improvements within the people domain.

6 0.59667963 51 emnlp-2013-Connecting Language and Knowledge Bases with Embedding Models for Relation Extraction

7 0.59401459 80 emnlp-2013-Exploiting Zero Pronouns to Improve Chinese Coreference Resolution

8 0.5922929 179 emnlp-2013-Summarizing Complex Events: a Cross-Modal Solution of Storylines Extraction and Reconstruction

9 0.58939832 65 emnlp-2013-Document Summarization via Guided Sentence Compression

10 0.58870697 193 emnlp-2013-Unsupervised Induction of Cross-Lingual Semantic Relations

11 0.58853596 64 emnlp-2013-Discriminative Improvements to Distributional Sentence Similarity

12 0.5878517 56 emnlp-2013-Deep Learning for Chinese Word Segmentation and POS Tagging

13 0.58638394 114 emnlp-2013-Joint Learning and Inference for Grammatical Error Correction

14 0.58534032 194 emnlp-2013-Unsupervised Relation Extraction with General Domain Knowledge

15 0.5848195 79 emnlp-2013-Exploiting Multiple Sources for Open-Domain Hypernym Discovery

16 0.58401972 53 emnlp-2013-Cross-Lingual Discriminative Learning of Sequence Models with Posterior Regularization

17 0.58380818 7 emnlp-2013-A Hierarchical Entity-Based Approach to Structuralize User Generated Content in Social Media: A Case of Yahoo! Answers

18 0.58375216 152 emnlp-2013-Predicting the Presence of Discourse Connectives

19 0.58282286 77 emnlp-2013-Exploiting Domain Knowledge in Aspect Extraction

20 0.58163273 73 emnlp-2013-Error-Driven Analysis of Challenges in Coreference Resolution