acl acl2011 acl2011-12 knowledge-graph by maker-knowledge-mining

12 acl-2011-A Generative Entity-Mention Model for Linking Entities with Knowledge Base

Source: pdf

Author: Xianpei Han ; Le Sun

Abstract: Linking entities with knowledge base (entity linking) is a key issue in bridging the textual data with the structural knowledge base. Due to the name variation problem and the name ambiguity problem, the entity linking decisions are critically depending on the heterogenous knowledge of entities. In this paper, we propose a generative probabilistic model, called entitymention model, which can leverage heterogenous entity knowledge (including popularity knowledge, name knowledge and context knowledge) for the entity linking task. In our model, each name mention to be linked is modeled as a sample generated through a three-step generative story, and the entity knowledge is encoded in the distribution of entities in document P(e), the distribution of possible names of a specific entity P(s|e), and the distribution of possible contexts of a specific entity P(c|e). To find the referent entity of a name mention, our method combines the evidences from all the three distributions P(e), P(s|e) and P(c|e). Experimental results show that our method can significantly outperform the traditional methods. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Due to the name variation problem and the name ambiguity problem, the entity linking decisions are critically depending on the heterogenous knowledge of entities. [sent-6, score-1.98]

2 In this paper, we propose a generative probabilistic model, called entitymention model, which can leverage heterogenous entity knowledge (including popularity knowledge, name knowledge and context knowledge) for the entity linking task. [sent-7, score-2.368]

3 To find the referent entity of a name mention, our method combines the evidences from all the three distributions P(e), P(s|e) and P(c|e). [sent-9, score-1.294]

4 Bridging these knowledge bases with the textual data can facilitate many different tasks such as entity search, information extraction and text classification. [sent-19, score-0.675]

5 A key issue in bridging the knowledge base with the textual data is linking the entities in a document with their referents in a knowledge base, which is usually referred to as the Entity Linking task. [sent-21, score-0.67]

6 Given a set of name mentions M = {m1, m2, mk} contained in documents and a knowledge base KB containing a set of entities E = {e1, e2, en}, an entity linking system is a function  :M  E which links these name mentions to their referent entities in KB. [sent-22, score-2.63]

7 For …, …, example, in Figure 1 an entity linking system should link the name mention Jordan to the entity Michael Jeffrey Jordan and the name mention Bulls to the entity Chicago Bulls. [sent-23, score-3.342]

8 The entity linking task, however, is not trivial due to the name variation problem and the name ambiguity problem. [sent-24, score-1.799]

9 The name ambiguity problem is related to the fact that a name may refer to different entities in different contexts. [sent-27, score-1.128]

10 For example, the name Bulls can refer to more than 20 entities in Wikipedia, such as the NBA team Chicago Bulls, the football team Belfast Bulls and the cricket team Queensland Bulls. [sent-28, score-0.69]

11 Complicated by the name variation problem and the name ambiguity problem, the entity linking decisions are critically depending on the knowledge of entities (Li et al. [sent-29, score-2.057]

12 Based on the previous work, we found that the following three types of entity knowledge can provide critical evidence for the entity linking decisions:  Popularity Knowledge. [sent-32, score-1.446]

13 The popularity knowledge of entities tells us the likelihood of an entity appearing in a document. [sent-33, score-0.98]

14 In entity linking, the entity popularity knowledge can provide a priori information to the possible referent entities of a name mention. [sent-34, score-2.184]

15 For example, without any other information, the popularity knowledge can tell that in a Web page the name “Michael Jordan” will more likely refer to the notorious basketball player Michael Jeffrey Jordan, rather than the less popular Berkeley professor Michael I. [sent-35, score-0.937]

16 The name knowledge tells us the possible names of an entity and the likelihood of a name referring to a specific entity. [sent-38, score-1.686]

17 For example, we would expect the name knowledge tells that both the “MJ” and “Michael Jordan” are possible names of the basketball player Michael Jeffrey Jordan, but the “Michael Jordan” has a larger likelihood. [sent-39, score-0.78]

18 The name knowledge plays the central role in resolving the name variation problem, and is also helpful in resolving the name ambiguity problem. [sent-40, score-1.649]

19 The context knowledge tells us the likelihood of an entity appearing in a specific context. [sent-42, score-0.751]

20 For example, given the context “__wins NBA MVP”, the name “Michael Jordan” should more likely refer to the basketball player Michael Jeffrey Jordan than the Berkeley professor Michael I. [sent-43, score-0.701]

21 Unfortunately, in entity linking system, the modeling and exploitation of these types of entity 946 knowledge is not straightforward. [sent-46, score-1.446]

22 Furthermore, in most cases the knowledge of entities is not explicitly given, making it challenging to extract the entity knowledge from data. [sent-48, score-0.899]

23 To resolve the above problems, this paper proposes a generative probabilistic model, called entity-mention model, which can leverage the heterogeneous entity knowledge (including popularity knowledge, name knowledge and context knowledge) for the entity linking task. [sent-49, score-2.287]

24 The P(e), P(s|e) and P(c|e) are respectively called the entity popularity model, the entity name model and the entity context model. [sent-51, score-2.354]

25 To find the referent entity of a name mention, our method combines the evidences from all the three distributions P(e), P(s|e) and P(c|e). [sent-52, score-1.294]

26 Experimental results show that our method can significantly improve the entity linking accuracy. [sent-54, score-0.808]

27 We first describe the generative story of our model, then formulate the model and show how to apply it to the entity linking task. [sent-64, score-0.882]

28 1 The Generative Story In the entity mention model, each name mention is modeled as a generated sample. [sent-66, score-1.488]

29 For demonstration, Figure 2 shows two examples of name mention generation. [sent-67, score-0.709]

30 As shown in Figure 2, the generative story of a name mention is composed of three steps, which are detailed as follows: (i) Firstly, the model chooses the referent entity e of the name mention from the given knowledge base, according to the distribution of entities in document P(e). [sent-68, score-2.571]

31 In Figure 2, the model chooses the entity “Michael Jeffrey Jordan” for the first name mention, and the entity “Michael I. [sent-69, score-1.618]

32 Jordan” for the second name mention; (ii) Secondly, the model outputs the name s of the name mention according to the distribution of possible names of the referent entity P(s|e). [sent-70, score-2.532]

33 In Figure 2, the model outputs “Jordan” as the name of the entity “Michael Jeffrey Jordan”, and the “Michael Jordan” as the name of the entity “Michael I. [sent-71, score-2.102]

34 Jordan”; (iii) Finally, the model outputs the context c of the name mention according to the distribution of possible contexts of the referent entity P(c|e). [sent-72, score-1.596]

35 In Figure 2, the model outputs the context “joins Bulls in 1984” for the first name mention, and the context “is a professor in UC Berkeley” for the second name mention. [sent-73, score-1.124]

36 947 generation Given a name mention m, to perform entity linking, we need to find the entity e which maximizes the probability P(e|m). [sent-76, score-1.817]

37 Then we can resolve the entity linking task as follows: eargmeaxPP( m( m , e ) )argemax P( e ) P ( s | e ) P ( c | e ) Therefore, the main problem of entity linking is to estimate the three distributions P(e), P(s|e) and P(c|e), i. [sent-77, score-1.633]

38 Because a knowledge base usually contains millions of entities, it is timeconsuming to compute all P(m,e) scores between a name mention and all the entities contained in a knowledge base. [sent-82, score-1.122]

39 To reduce the time required, the entity linking system employs a candidate selection process to filter out the impossible referent candidates of a name mention. [sent-83, score-1.505]

40 3 Model Estimation Section 2 shows that the entity mention model can decompose the entity linking task into the estimation of three distributions P(e), P(s|e) and P(c|e). [sent-85, score-1.648]

41 1 Training Data In this paper, the training data of our model is a set of annotated name mentions M = {m1, m2, mn} . [sent-89, score-0.633]

42 Each annotated name mention is a triple m={s, e, c}, where s is the name, e is the referent entity and c is the context. [sent-90, score-1.493]

43 In Wikipedia, a hyperlink between two … … … … … … …, articles is an annotated name mention (Milne & Witten, 2008): its anchor text is the name and its target article is the referent entity. [sent-95, score-1.45]

44 For example, in following hyperlink (in Wiki syntax), the NBA is the name and the National Basketball Association is the referent entity. [sent-96, score-0.725]

45 “He won his first [[National Basketball Association | NBA]] championship with the Bulls ” Therefore, we can get the training data by collecting all annotated name mentions from the hyperlink data of Wikipedia. [sent-97, score-0.634]

46 In this section, we estimate the distribution P(e) using a model called entity popularity model. [sent-105, score-0.766]

47 To get a more precise estimation, we observed that a more popular entity usually appears more times than a less popular 948 entity in a large text corpus, i. [sent-107, score-1.169]

48 3 Entity Name Model The distribution P(s|e) encodes the name knowledge of entities, i. [sent-119, score-0.635]

49 , for a specific entity e, its more frequently used name should be assigned a higher P(s|e) value than the less frequently used name, and a zero P(s|e) value should be assigned to those never used names. [sent-121, score-1.038]

50 For example, because the name “MJ” doesn’t refer to the Michael Jeffrey Jordan in Wikipedia, the name model will not be able to identify “MJ” as a name of him, even “MJ” is a popular name of Michael Jeffrey Jordan on Web. [sent-126, score-1.983]

51 To better estimate the distribution P(s|e), this paper proposes a much more generic model, called entity name model, which can capture the variations (including full name, aliases, acronyms and misspellings) of an entity's name using a statistical translation model. [sent-127, score-1.625]

52 Given an entity’s name s, our model assumes that it is a translation of this entity’s full name f using the IBM model 1 (Brown, et al. [sent-128, score-1.057]

53 In this way, all name variations of an entity are captured as the possible translations of its full name. [sent-131, score-1.058]

54 To illustrate, Figure 3 shows how the full name “Michael Jeffrey Jordan” can be transalted into its misspelling name “Micheal Jordan”. [sent-132, score-1.013]

55 In this paper, we first collect the (name, entity full name) pairs from all annotated name mentions, then get the lexical translation probability by feeding this data set into an IBM model 1 training system (we use the GIZA++ Toolkit3). [sent-136, score-1.117]

56 html 949 We can see that the entity name model can capture the different name variations, such as the acronym (MichaelM), the misspelling (MichaelMicheal) and the omission (St. [sent-140, score-1.595]

57 , it will assign a high P(c|e) value if the entity e frequently appears in the context c, and will assign a low P(c|e) value if the entity e rarely appears in the context c. [sent-149, score-1.236]

58 To estimate the distribution P(c|e), we propose a method based on language modeling, called entity context model. [sent-153, score-0.668]

59 In our model, the context of each name mention m is the word window surrounding m, and the window size is set to 50 according to the experiments in (Pedersen et al. [sent-154, score-0.754]

60 Specifically, the context knowledge of an entity e is encoded in an unigram language model: Me  {Pe ( t )} where Pe(t) is the probability of the term t appearing in the context of e. [sent-156, score-0.795]

61 Now, given a context c containing n terms t1t2…tn, the entity context model estimates the probability P(c|e) as: P( c c| |e )  P( t1 t 2. [sent-159, score-0.67]

62 Pe( tn ) So the main problem is to estimate Pe(t), the probability of a term t appearing in the context of the entity e. [sent-166, score-0.637]

63 Using the annotated name mention data set M, we can get the maximum likelihood estimation of Pe(t) as follows: Pe_ML(t )CCoouunntet(e t )( t ) t where Counte(t) is the frequency of occurrences of a term t in the contexts of the name mentions whose referent entity is e. [sent-167, score-2.135]

64 5 The NIL Entity Problem By estimating P(e), P(s|e) and P(c|e), our method can effectively link a name mention to its referent entity contained in a knowledge base. [sent-171, score-1.648]

65 , the referent entity may not be contained in the given knowledge base. [sent-174, score-0.891]

66 In this situation, the name mention should be linked to the NIL entity. [sent-175, score-0.709]

67 2010): a classifier is trained to identify whether a name mention should be linked to the NIL entity. [sent-177, score-0.709]

68 Rather than employing an additional step, our entity mention model seamlessly takes into account the NIL entity problem. [sent-178, score-1.359]

69 shtml 950 our solution is that “If a name mention refers to a specific entity, then the probability of this name mention is generated by the specific entity’s model should be significantly higher than the probability it is generated by a general language model”. [sent-182, score-1.444]

70 2 Data Sets To evaluate the entity linking performance, we adopted two data sets: the first is WikiAmbi, which is used to evaluate the performance on Wikipedia articles; the second is TAC_KBP, which is used to evaluate the performance on general newswire documents. [sent-194, score-0.788]

71 In WikiAmbi, there were 207 distinct names and each name contains at least two possible referent entities (on average 6. [sent-198, score-0.866]

72 For each name mention, its referent entity in Wikipedia is manually annotated. [sent-203, score-1.252]

73 Overall, 57% (2229 of 3904) name mentions’s referent entities are missing in Wikipedia, so TAC_KBP is also suitable to evaluate the NIL entity detection performance. [sent-204, score-1.389]

74 The above two data sets can provide a standard testbed for the entity linking task. [sent-205, score-0.788]

75 However, there were still some limitations of these data sets: First, these data sets only annotate the salient name mentions in a document, meanwhile many NLP applications need all name mentions are linked. [sent-206, score-1.182]

76 Second, these data sets only contain well-formed documents, but in many real-world applications the entity linking often be applied to noisy documents such as product reviews and microblog messages. [sent-207, score-0.788]

77 These metrics are:  Micro-Averaged Accuracy (MicroAccuracy): measures entity linking accuracy averaged over all the name mentions;  Macro-Averaged Accuracy (MacroAccuracy): measures entity linking accuracy averaged over all the target entities. [sent-211, score-2.06]

78 , 2008), where a name mention’s referent entity is the entity which has the largest average semantic relatedness with the name mention’s unambiguous context entities – we denoted it as TopicIndex. [sent-215, score-2.501]

79 , the P(m,e)=P(e)P(c|e)); and the full entity mention model (Full Model). [sent-227, score-0.825]

80 2) By incorporating more entity knowledge, our method can significantly improve the entity linking performance: When only using the popularity knowledge, our method can only achieve 49. [sent-231, score-1.519]

81 3) All three types of entity knowledge contribute to the final performance improvement, and the context knowledge contributes the most: By respectively ablating the popularity knowledge, the name knowledge and the context knowledge, the performance of our model correspondingly reduces 7. [sent-238, score-1.622]

82 2 Optimizing Parameters Our model needs to tune one parameter: the Jelinek-Mercer smoothing parameter λ used in the 952 entity context model. [sent-250, score-0.625]

83 The first advantage of our method is the entity mention model can incorporate heterogeneous entity knowledge. [sent-261, score-1.379]

84 The Table 3 and 4 have shown that, by incorporating heterogenous entity knowledge (including the name knowledge, the popularity knowledge and the context knowledge), the entity linking performance can obtain a significant improvement. [sent-262, score-2.276]

85 For instance, we can train a better entity context model P(c|e) using more name mentions. [sent-267, score-1.109]

86 To find whether a better entity knowledge extraction will result in a better performance, Figure 6 plots the micro-accuray along with the size of the training data on name mentions for P(c|e) of each entity e. [sent-268, score-1.825]

87 4 Comparision Performance with State-of-the-Art We also compared our method with the state-ofthe-art entity linking systems in the TAC 2009 KBP track (McNamee and Dang, 2009). [sent-272, score-0.808]

88 To the date, most entity linking systems employed the context similarity based methods. [sent-277, score-0.833]

89 The essential idea was to extract the discriminative features of an entity from its description, then link a name mention to the entity which has the largest context similarity with it. [sent-278, score-1.89]

90 Cucerzan (2007) proposed a Bag of Words based method, which represents each target entity as a vector of terms, then the similarity between a name mention and an entity was computed using the cosine similarity measure. [sent-279, score-1.817]

91 (2009) extended the BoW model by incorporating more entity knowledge such as popularity knowledge, entity category knowledge, etc. [sent-281, score-1.375]

92 Because the context 953 similarity based methods can only represent the entity knowledge as features, the main drawback of it was the difficulty to incorporate heterogenous entity knowledge. [sent-287, score-1.317]

93 Recently there were also some entity linking methods based on inter-dependency. [sent-288, score-0.788]

94 These methods assumed that the entities in the same document are related to each other, thus the referent entity of a name mention is the entity which is most related to its contextual entities. [sent-289, score-2.188]

95 (2008) found the referent entity of a name mention by computing the weighted average of semantic relatedness between the candidate entity and its unambiguous contextual entities. [sent-291, score-2.079]

96 (2009) proposed a method which collectively resolves the entity linking tasks in a document as an optimization problem. [sent-295, score-0.828]

97 The drawback of the inter-dependency based methods is that they are usually specially designed to the leverage of semantic relations, doesn’t take the other types of entity knowledge into consideration. [sent-296, score-0.675]

98 6 Conclusions and Future Work This paper proposes a generative probabilistic model, the entity-mention model, for the entity linking task. [sent-297, score-0.826]

99 The main advantage of our model is it can incorporate multiple types of heterogenous entity knowledge. [sent-298, score-0.64]

100 Furthermore, our model has a statistical foundation, making the entity knowledge extraction approach different from most previous ad hoc approaches. [sent-299, score-0.684]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('entity', 0.554), ('name', 0.484), ('linking', 0.234), ('mention', 0.225), ('jordan', 0.216), ('referent', 0.214), ('nil', 0.164), ('popularity', 0.137), ('entities', 0.137), ('nba', 0.123), ('jeffrey', 0.118), ('wikiambi', 0.111), ('mentions', 0.107), ('knowledge', 0.104), ('basketball', 0.087), ('bulls', 0.086), ('wikipedia', 0.084), ('michael', 0.074), ('milne', 0.07), ('popu', 0.062), ('heterogenous', 0.06), ('mcnamee', 0.056), ('mj', 0.051), ('pg', 0.05), ('tac', 0.05), ('pe', 0.049), ('base', 0.049), ('witten', 0.046), ('context', 0.045), ('player', 0.045), ('micheal', 0.043), ('kbp', 0.04), ('medelyan', 0.04), ('professor', 0.04), ('generative', 0.038), ('dang', 0.037), ('entitymention', 0.037), ('han', 0.035), ('fj', 0.034), ('estimation', 0.033), ('cucerzan', 0.033), ('bow', 0.032), ('names', 0.031), ('distribution', 0.03), ('story', 0.03), ('fader', 0.03), ('relatedness', 0.029), ('tells', 0.029), ('encoded', 0.028), ('link', 0.028), ('hyperlink', 0.027), ('kulkarni', 0.027), ('pasca', 0.027), ('model', 0.026), ('resolving', 0.025), ('commoness', 0.025), ('inkb', 0.025), ('microaccuracy', 0.025), ('mvp', 0.025), ('topicindex', 0.025), ('misspelling', 0.025), ('team', 0.023), ('ambiguity', 0.023), ('bridging', 0.022), ('plots', 0.022), ('distributions', 0.022), ('acronym', 0.022), ('aliases', 0.022), ('csomai', 0.022), ('xianpei', 0.022), ('popular', 0.021), ('zhao', 0.021), ('bunescu', 0.021), ('full', 0.02), ('si', 0.02), ('variation', 0.02), ('method', 0.02), ('document', 0.02), ('contained', 0.019), ('estimate', 0.019), ('candidate', 0.019), ('appears', 0.019), ('correspondingly', 0.019), ('notorious', 0.019), ('appearing', 0.019), ('berkeley', 0.018), ('zheng', 0.018), ('translated', 0.018), ('encyclopedic', 0.018), ('contexts', 0.018), ('bases', 0.017), ('critically', 0.017), ('acronyms', 0.017), ('hyperlinks', 0.017), ('leverage', 0.017), ('encodes', 0.017), ('translation', 0.017), ('kb', 0.016), ('resolve', 0.016), ('annotated', 0.016)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999946 12 acl-2011-A Generative Entity-Mention Model for Linking Entities with Knowledge Base

Author: Xianpei Han ; Le Sun

2 0.34833068 128 acl-2011-Exploring Entity Relations for Named Entity Disambiguation

Author: Danuta Ploch

Abstract: Named entity disambiguation is the task of linking an entity mention in a text to the correct real-world referent predefined in a knowledge base, and is a crucial subtask in many areas like information retrieval or topic detection and tracking. Named entity disambiguation is challenging because entity mentions can be ambiguous and an entity can be referenced by different surface forms. We present an approach that exploits Wikipedia relations between entities co-occurring with the ambiguous form to derive a range of novel features for classifying candidate referents. We find that our features improve disambiguation results significantly over a strong popularity baseline, and are especially suitable for recognizing entities not contained in the knowledge base. Our system achieves state-of-the-art results on the TAC-KBP 2009 dataset.

3 0.31664035 191 acl-2011-Knowledge Base Population: Successful Approaches and Challenges

Author: Heng Ji ; Ralph Grishman

Abstract: In this paper we give an overview of the Knowledge Base Population (KBP) track at the 2010 Text Analysis Conference. The main goal of KBP is to promote research in discovering facts about entities and augmenting a knowledge base (KB) with these facts. This is done through two tasks, Entity Linking linking names in context to entities in the KB and Slot Filling – adding information about an entity to the KB. A large source collection of newswire and web documents is provided from which systems are to discover information. Attributes (“slots”) derived from Wikipedia infoboxes are used to create the reference KB. In this paper we provide an overview of the techniques which can serve as a basis for a good KBP system, lay out the – – remaining challenges by comparison with traditional Information Extraction (IE) and Question Answering (QA) tasks, and provide some suggestions to address these challenges. 1

4 0.24726306 129 acl-2011-Extending the Entity Grid with Entity-Specific Features

Author: Micha Elsner ; Eugene Charniak

Abstract: We extend the popular entity grid representation for local coherence modeling. The grid abstracts away information about the entities it models; we add discourse prominence, named entity type and coreference features to distinguish between important and unimportant entities. We improve the best result for WSJ document discrimination by 6%.

5 0.24334417 196 acl-2011-Large-Scale Cross-Document Coreference Using Distributed Inference and Hierarchical Models

Author: Sameer Singh ; Amarnag Subramanya ; Fernando Pereira ; Andrew McCallum

Abstract: Cross-document coreference, the task of grouping all the mentions of each entity in a document collection, arises in information extraction and automated knowledge base construction. For large collections, it is clearly impractical to consider all possible groupings of mentions into distinct entities. To solve the problem we propose two ideas: (a) a distributed inference technique that uses parallelism to enable large scale processing, and (b) a hierarchical model of coreference that represents uncertainty over multiple granularities of entities to facilitate more effective approximate inference. To evaluate these ideas, we constructed a labeled corpus of 1.5 million disambiguated mentions in Web pages by selecting link anchors referring to Wikipedia entities. We show that the combination of the hierarchical model with distributed inference quickly obtains high accuracy (with error reduction of 38%) on this large dataset, demonstrating the scalability of our approach.

6 0.19045986 126 acl-2011-Exploiting Syntactico-Semantic Structures for Relation Extraction

7 0.16397458 328 acl-2011-Using Cross-Entity Inference to Improve Event Extraction

8 0.16330005 277 acl-2011-Semi-supervised Relation Extraction with Large-scale Word Clustering

9 0.14458652 23 acl-2011-A Pronoun Anaphora Resolution System based on Factorial Hidden Markov Models

10 0.14051737 117 acl-2011-Entity Set Expansion using Topic information

11 0.11300403 314 acl-2011-Typed Graph Models for Learning Latent Attributes from Names

12 0.11055654 213 acl-2011-Local and Global Algorithms for Disambiguation to Wikipedia

13 0.10730682 181 acl-2011-Jigs and Lures: Associating Web Queries with Structured Entities

14 0.10526874 101 acl-2011-Disentangling Chat with Local Coherence Models

15 0.10342228 190 acl-2011-Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations

16 0.094450526 65 acl-2011-Can Document Selection Help Semi-supervised Learning? A Case Study On Event Extraction

17 0.094412565 114 acl-2011-End-to-End Relation Extraction Using Distant Supervision from External Semantic Repositories

18 0.090556614 320 acl-2011-Unsupervised Discovery of Domain-Specific Knowledge from Text

19 0.084915504 261 acl-2011-Recognizing Named Entities in Tweets

20 0.069677547 124 acl-2011-Exploiting Morphology in Turkish Named Entity Recognition System

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.154), (1, 0.088), (2, -0.209), (3, 0.083), (4, 0.146), (5, 0.007), (6, -0.028), (7, -0.151), (8, -0.278), (9, 0.06), (10, 0.074), (11, 0.046), (12, -0.138), (13, -0.179), (14, 0.115), (15, 0.124), (16, 0.145), (17, 0.056), (18, 0.037), (19, 0.011), (20, 0.039), (21, -0.05), (22, 0.06), (23, 0.001), (24, 0.041), (25, -0.011), (26, -0.053), (27, -0.014), (28, -0.051), (29, 0.116), (30, -0.062), (31, 0.034), (32, 0.007), (33, 0.03), (34, 0.186), (35, 0.105), (36, 0.042), (37, -0.077), (38, -0.018), (39, 0.055), (40, -0.067), (41, 0.038), (42, -0.043), (43, 0.074), (44, -0.074), (45, 0.021), (46, -0.064), (47, 0.066), (48, -0.01), (49, 0.054)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.98846638 12 acl-2011-A Generative Entity-Mention Model for Linking Entities with Knowledge Base

Author: Xianpei Han ; Le Sun

2 0.84787869 128 acl-2011-Exploring Entity Relations for Named Entity Disambiguation

Author: Danuta Ploch

3 0.81305236 191 acl-2011-Knowledge Base Population: Successful Approaches and Challenges

Author: Heng Ji ; Ralph Grishman

4 0.71535093 213 acl-2011-Local and Global Algorithms for Disambiguation to Wikipedia

Author: Lev Ratinov ; Dan Roth ; Doug Downey ; Mike Anderson

Abstract: Disambiguating concepts and entities in a context sensitive way is a fundamental problem in natural language processing. The comprehensiveness of Wikipedia has made the online encyclopedia an increasingly popular target for disambiguation. Disambiguation to Wikipedia is similar to a traditional Word Sense Disambiguation task, but distinct in that the Wikipedia link structure provides additional information about which disambiguations are compatible. In this work we analyze approaches that utilize this information to arrive at coherent sets of disambiguations for a given document (which we call “global” approaches), and compare them to more traditional (local) approaches. We show that previous approaches for global disambiguation can be improved, but even then the local disambiguation provides a baseline which is very hard to beat.

5 0.71396929 196 acl-2011-Large-Scale Cross-Document Coreference Using Distributed Inference and Hierarchical Models

Author: Sameer Singh ; Amarnag Subramanya ; Fernando Pereira ; Andrew McCallum

6 0.68742907 129 acl-2011-Extending the Entity Grid with Entity-Specific Features

7 0.51983023 126 acl-2011-Exploiting Syntactico-Semantic Structures for Relation Extraction

8 0.46252581 101 acl-2011-Disentangling Chat with Local Coherence Models

9 0.45436671 337 acl-2011-Wikipedia Revision Toolkit: Efficiently Accessing Wikipedias Edit History

10 0.45259055 23 acl-2011-A Pronoun Anaphora Resolution System based on Factorial Hidden Markov Models

11 0.44958386 320 acl-2011-Unsupervised Discovery of Domain-Specific Knowledge from Text

12 0.44582528 114 acl-2011-End-to-End Relation Extraction Using Distant Supervision from External Semantic Repositories

13 0.43276387 261 acl-2011-Recognizing Named Entities in Tweets

14 0.41466296 285 acl-2011-Simple supervised document geolocation with geodesic grids

15 0.4051488 246 acl-2011-Piggyback: Using Search Engines for Robust Cross-Domain Named Entity Recognition

16 0.37847751 40 acl-2011-An Error Analysis of Relation Extraction in Social Media Documents

17 0.36288783 124 acl-2011-Exploiting Morphology in Turkish Named Entity Recognition System

18 0.35664156 190 acl-2011-Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations

19 0.35631779 314 acl-2011-Typed Graph Models for Learning Latent Attributes from Names

20 0.34522337 195 acl-2011-Language of Vandalism: Improving Wikipedia Vandalism Detection via Stylometric Analysis

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(5, 0.016), (17, 0.028), (26, 0.031), (37, 0.072), (39, 0.054), (41, 0.138), (44, 0.235), (55, 0.034), (59, 0.03), (72, 0.028), (91, 0.036), (93, 0.01), (96, 0.163), (98, 0.019)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.84121883 135 acl-2011-Faster and Smaller N-Gram Language Models

Author: Adam Pauls ; Dan Klein

Abstract: N-gram language models are a major resource bottleneck in machine translation. In this paper, we present several language model implementations that are both highly compact and fast to query. Our fastest implementation is as fast as the widely used SRILM while requiring only 25% of the storage. Our most compact representation can store all 4 billion n-grams and associated counts for the Google n-gram corpus in 23 bits per n-gram, the most compact lossless representation to date, and even more compact than recent lossy compression techniques. We also discuss techniques for improving query speed during decoding, including a simple but novel language model caching technique that improves the query speed of our language models (and SRILM) by up to 300%.

2 0.82942021 95 acl-2011-Detection of Agreement and Disagreement in Broadcast Conversations

Author: Wen Wang ; Sibel Yaman ; Kristin Precoda ; Colleen Richey ; Geoffrey Raymond

Abstract: We present Conditional Random Fields based approaches for detecting agreement/disagreement between speakers in English broadcast conversation shows. We develop annotation approaches for a variety of linguistic phenomena. Various lexical, structural, durational, and prosodic features are explored. We compare the performance when using features extracted from automatically generated annotations against that when using human annotations. We investigate the efficacy of adding prosodic features on top of lexical, structural, and durational features. Since the training data is highly imbalanced, we explore two sampling approaches, random downsampling and ensemble downsampling. Overall, our approach achieves 79.2% (precision), 50.5% (recall), 61.7% (F1) for agreement detection and 69.2% (precision), 46.9% (recall), and 55.9% (F1) for disagreement detection, on the English broadcast conversation data. 1 ?yIntroduction In ?ythis work, we present models for detecting agre?yement/disagreement (denoted (dis)agreement) betwy?een speakers in English broadcast conversation show?ys. The Broadcast Conversation (BC) genre differs from the Broadcast News (BN) genre in that it is?y more interactive and spontaneous, referring to freey? speech in news-style TV and radio programs and consisting of talk shows, interviews, call-in prog?yrams, live reports, and round-tables. Previous y? y?This work was performed while the author was at ICSI. syaman@us . ibm .com, graymond@ s oc .uc sb . edu work on detecting (dis)agreements has been focused on meeting data. (Hillard et al., 2003), (Galley et al., 2004), (Hahn et al., 2006) used spurt-level agreement annotations from the ICSI meeting corpus (Janin et al., 2003). (Hillard et al., 2003) explored unsupervised machine learning approaches and on manual transcripts, they achieved an overall 3-way agreement/disagreement classification ac- curacy as 82% with keyword features. (Galley et al., 2004) explored Bayesian Networks for the detection of (dis)agreements. They used adjacency pair information to determine the structure of their conditional Markov model and outperformed the results of (Hillard et al., 2003) by improving the 3way classification accuracy into 86.9%. (Hahn et al., 2006) explored semi-supervised learning algorithms and reached a competitive performance of 86.7% 3-way classification accuracy on manual transcriptions with only lexical features. (Germesin and Wilson, 2009) investigated supervised machine learning techniques and yields competitive results on the annotated data from the AMI meeting corpus (McCowan et al., 2005). Our work differs from these previous studies in two major categories. One is that a different definition of (dis)agreement was used. In the current work, a (dis)agreement occurs when a responding speaker agrees with, accepts, or disagrees with or rejects, a statement or proposition by a first speaker. Second, we explored (dis)agreement detection in broadcast conversation. Due to the difference in publicity and intimacy/collegiality between speakers in broadcast conversations vs. meet- ings, (dis)agreement may have different character374 Proceedings ofP thoer t4l9atnhd A, Onrnuegaoln M,e Jeuntineg 19 o-f2 t4h,e 2 A0s1s1o.c?i ac t2io0n11 fo Ar Cssoocmiaptuiotanti foonra Clo Lminpguutiast i ocns:aslh Loirntpgaupisetrics , pages 374–378, istics. Different from the unsupervised approaches in (Hillard et al., 2003) and semi-supervised approaches in (Hahn et al., 2006), we conducted supervised training. Also, different from (Hillard et al., 2003) and (Galley et al., 2004), our classification was carried out on the utterance level, instead of on the spurt-level. Galley et al. extended Hillard et al.’s work by adding features from previous spurts and features from the general dialog context to infer the class of the current spurt, on top of features from the current spurt (local features) used by Hillard et al. Galley et al. used adjacency pairs to describe the interaction between speakers and the relations between consecutive spurts. In this preliminary study on broadcast conversation, we directly modeled (dis)agreement detection without using adjacency pairs. Still, within the conditional random fields (CRF) framework, we explored features from preceding and following utterances to consider context in the discourse structure. We explored a wide variety of features, including lexical, structural, du- rational, and prosodic features. To our knowledge, this is the first work to systematically investigate detection of agreement/disagreement for broadcast conversation data. The remainder of the paper is organized as follows. Section 2 presents our data and automatic annotation modules. Section 3 describes various features and the CRF model we explored. Experimental results and discussion appear in Section 4, as well as conclusions and future directions. 2 Data and Automatic Annotation In this work, we selected English broadcast conversation data from the DARPA GALE program collected data (GALE Phase 1 Release 4, LDC2006E91; GALE Phase 4 Release 2, LDC2009E15). Human transcriptions and manual speaker turn labels are used in this study. Also, since the (dis)agreement detection output will be used to analyze social roles and relations of an interacting group, we first manually marked soundbites and then excluded soundbites during annotation and modeling. We recruited annotators to provide manual annotations of speaker roles and (dis)agreement to use for the supervised training of models. We de- fined a set of speaker roles as follows. Host/chair is a person associated with running the discussions 375 or calling the meeting. Reporting participant is a person reporting from the field, from a subcommittee, etc. Commentator participant/Topic participant is a person providing commentary on some subject, or person who is the subject of the conversation and plays a role, e.g., as a newsmaker. Audience participant is an ordinary person who may call in, ask questions at a microphone at e.g. a large presentation, or be interviewed because of their presence at a news event. Other is any speaker who does not fit in one of the above categories, such as a voice talent, an announcer doing show openings or commercial breaks, or a translator. Agreements and disagreements are composed of different combinations of initiating utterances and responses. We reformulated the (dis)agreement detection task as the sequence tagging of 11 (dis)agreement-related labels for identifying whether a given utterance is initiating a (dis)agreement opportunity, is a (dis)agreement response to such an opportunity, or is neither of these, in the show. For example, a Negative tag question followed by a negation response forms an agreement, that is, A: [Negative tag] This is not black and white, is it? B: [Agreeing Response] No, it isn’t. The data sparsity problem is serious. Among all 27,071 utterances, only 2,589 utterances are involved in (dis)agreement as initiating or response utterances, about 10% only among all data, while 24,482 utterances are not involved. These annotators also labeled shows with a variety of linguistic phenomena (denoted language use constituents, LUC), including discourse markers, disfluencies, person addresses and person mentions, prefaces, extreme case formulations, and dialog act tags (DAT). We categorized dialog acts into statement, question, backchannel, and incomplete. We classified disfluencies (DF) into filled pauses (e.g., uh, um), repetitions, corrections, and false starts. Person address (PA) terms are terms that a speaker uses to address another person. Person mentions (PM) are references to non-participants in the conversation. Discourse markers (DM) are words or phrases that are related to the structure of the discourse and express a relation between two utter- ances, for example, I mean, you know. Prefaces (PR) are sentence-initial lexical tokens serving functions close to discourse markers (e.g., Well, I think that...). Extreme case formulations (ECF) are lexical patterns emphasizing extremeness (e.g., This is the best book I have ever read). In the end, we manually annotated 49 English shows. We preprocessed English manual transcripts by removing transcriber annotation markers and noise, removing punctuation and case information, and conducting text normalization. We also built automatic rule-based and statistical annotation tools for these LUCs. 3 Features and Model We explored lexical, structural, durational, and prosodic features for (dis)agreement detection. We included a set of “lexical” features, including ngrams extracted from all of that speaker’s utterances, denoted ngram features. Other lexical features include the presence of negation and acquiescence, yes/no equivalents, positive and negative tag questions, and other features distinguishing different types of initiating utterances and responses. We also included various lexical features extracted from LUC annotations, denoted LUC features. These additional features include features related to the presence of prefaces, the counts of types and tokens of discourse markers, extreme case formulations, disfluencies, person addressing events, and person mentions, and the normalized values of these counts by sentence length. We also include a set of features related to the DAT of the current utterance and preceding and following utterances. We developed a set of “structural” and “durational” features, inspired by conversation analysis, to quantitatively represent the different participation and interaction patterns of speakers in a show. We extracted features related to pausing and overlaps between consecutive turns, the absolute and relative duration of consecutive turns, and so on. We used a set of prosodic features including pause, duration, and the speech rate of a speaker. We also used pitch and energy of the voice. Prosodic features were computed on words and phonetic alignment of manual transcripts. Features are computed for the beginning and ending words of an utterance. For the duration features, we used the average and maximum vowel duration from forced align- ment, both unnormalized and normalized for vowel identity and phone context. For pitch and energy, we 376 calculated the minimum, maximum,E range, mean, standard deviation, skewnesSs and kurEtosis values. A decision tree model was useSd to comEpute posteriors fFrom prosodic features and Swe used cuEmulative binnFing of posteriors as final feSatures , simEilar to (Liu et aFl., 2006). As ilPlu?stErajtSed?F i?n SectionS 2, we refEormulated the F(dis)agrePe?mEEejnSt? Fdet?ection taSsk as a seqEuence tagging FproblemP. EWEejS u?sFe?d the MalSlet packagEe (McCallum, 2F002) toP i?mEEpjSle?mFe?nt the linSear chain CERF model for FsequencPe ?tEEagjSgi?nFg.? A CRFS is an undEirected graphiFcal modPe?lEE EthjSa?t Fde?fines a glSobal log-lEinear distributFion of Pthe?EE sjtaSt?eF (o?r label) Ssequence E conditioned oFn an oPbs?EeErvjaSt?ioFn? sequencSe, in our case including Fthe sequPe?nEcEej So?fF Fse?ntences S and the corresponding sFequencPe ?oEEf jfSea?Ftur?es for this sequence of sentences F. TheP ?mEEodjSe?l Fis? optimized globally over the entire seqPue?nEEcejS. TFh?e CRF model is trained to maximize theP c?oEEnjdSit?iFon?al log-likelihood of a given training set P?EEjS? F?. During testing, the most likely sequence E is found using the Viterbi algorithm. One of the motivations of choosing conditional random fields was to avoid the label-bias problem found in hidden Markov models. Compared to Maximum Entropy modeling, the CRF model is optimized globally over the entire sequence, whereas the ME model makes a decision at each point individually without considering the context event information. 4 Experiments All (dis)agreement detection results are based on nfold cross-validation. In this procedure, we held out one show as the test set, randomly held out another show as the dev set, trained models on the rest of the data, and tested the model on the heldout show. We iterated through all shows and computed the overall accuracy. Table 1 shows the results of (dis)agreement detection using all features except prosodic features. We compared two conditions: (1) features extracted completely from the automatic LUC annotations and automatically detected speaker roles, and (2) features from manual speaker role labels and manual LUC annotations when man- ual annotations are available. Table 1 showed that running a fully automatic system to generate automatic annotations and automatic speaker roles produced comparable performance to the system using features from manual annotations whenever available. Table 1: Precision (%), recall (%), and F1 (%) of (dis)agreement detection using features extracted from manual speaker role labels and manual LUC annotations when available, denoted Manual Annotation, and automatic LUC annotations and automatically detected speaker roles, denoted Automatic Annotation. AMuatnoumaltAicn Aontaoitantio78P91.5Agr4eR3em.26en5tF671.5 AMuatnoumal tAicn Aontaoitanio76P04D.13isag3rR86e.56emn4F96t.176 We then focused on the condition of using features from manual annotations when available and added prosodic features as described in Section 3. The results are shown in Table 2. Adding prosodic features produced a 0.7% absolute gain on F1 on agreement detection, and 1.5% absolute gain on F1 on disagreement detection. Table 2: Precision (%), recall (%), and F1 (%) of (dis)agreement detection using manual annotations without and with prosodic features. w /itohp ro s o d ic 8 P1 .58Agr4 e34Re.m02en5t F76.125 w i/tohp ro s o d ic 7 0 PD.81isag43r0R8e.15eme5n4F19t.172 Note that only about 10% utterances among all data are involved in (dis)agreement. This indicates a highly imbalanced data set as one class is more heavily represented than the other/others. We suspected that this high imbalance has played a major role in the high precision and low recall results we obtained so far. Various approaches have been studied to handle imbalanced data for classifications, 377 trying to balaNnce the class distribution in the training set by eithNer oversaNmpling the minority class or downsamplinNg the maNjority class. In this preliminary study of NNsamplingN Napproaches for handling imbalanced dataN NNfor CRF Ntraining, we investigated two apprNoaches, rNNandom dNownsampling and ensemble dowNnsamplinNgN. RandoNm downsampling randomly dowNnsamples NNthe majorNity class to equate the number Nof minoritNNy and maNjority class samples. Ensemble NdownsampNNling is a N refinement of random downsamNpling whiNNch doesn’Nt discard any majority class samNples. InstNNead, we pNartitioned the majority class samNples into NN subspaNces with each subspace containiNng the samNe numbNer of samples as the minority clasNs. Then wNe train N CRF models, each based on thNe minoritNy class samples and one disjoint partitionN Nfrom the N subspaces. During testing, the posterioNr probability for one utterance is averaged over the N CRF models. The results from these two sampling approaches as well as the baseline are shown in Table 3. Both sampling approaches achieved significant improvement over the baseline, i.e., train- ing on the original data set, and ensemble downsampling produced better performance than downsampling. We noticed that both sampling approaches degraded slightly in precision but improved significantly in recall, resulting in 4.5% absolute gain on F1 for agreement detection and 4.7% absolute gain on F1 for disagreement detection. Table 3: Precision (%), recall (%), and F1 (%) of (dis)agreement detection without sampling, with random downsampling and ensemble downsampling. Manual annotations and prosodic features are used. BERansedlmoinbedwonsampling78P19D.825Aisagr4e8R0.m7e5n6 tF701. 2 EBRa ns ne dlmoinmbel dodwowns asmamp lin gn 67 09. 8324 046. 8915 351. 892 In conclusion, this paper presents our work on detection of agreements and disagreements in English broadcast conversation data. We explored a variety of features, including lexical, structural, durational, and prosodic features. We experimented these features using a linear-chain conditional random fields model and conducted supervised training. We observed significant improvement from adding prosodic features and employing two sampling approaches, random downsampling and ensemble downsampling. Overall, we achieved 79.2% (precision), 50.5% (recall), 61.7% (F1) for agreement detection and 69.2% (precision), 46.9% (recall), and 55.9% (F1) for disagreement detection, on English broadcast conversation data. In future work, we plan to continue adding and refining features, explore dependencies between features and contextual cues with respect to agreements and disagreements, and investigate the efficacy of other machine learning approaches such as Bayesian networks and Support Vector Machines. Acknowledgments The authors thank Gokhan Tur and Dilek HakkaniT u¨r for valuable insights and suggestions. This work has been supported by the Intelligence Advanced Research Projects Activity (IARPA) via Army Research Laboratory (ARL) contract number W91 1NF-09-C-0089. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, ARL, or the U.S. Government. References M. Galley, K. McKeown, J. Hirschberg, and E. Shriberg. 2004. Identifying agreement and disagreement in conversational speech: Use ofbayesian networks to model pragmatic dependencies. In Proceedings of ACL. S. Germesin and T. Wilson. 2009. Agreement detection in multiparty conversation. In Proceedings of International Conference on Multimodal Interfaces. S. Hahn, R. Ladner, and M. Ostendorf. 2006. Agreement/disagreement classification: Exploiting unlabeled data using constraint classifiers. In Proceedings of HLT/NAACL. 378 D. Hillard, M. Ostendorf, and E. Shriberg. 2003. Detection of agreement vs. disagreement in meetings: Training with unlabeled data. In Proceedings of HLT/NAACL. A. Janin, D. Baron, J. Edwards, D. Ellis, D. Gelbart, N. Morgan, B. Peskin, T. Pfau, E. Shriberg, A. Stolcke, and C. Wooters. 2003. The ICSI Meeting Corpus. In Proc. ICASSP, Hong Kong, April. Yang Liu, Elizabeth Shriberg, Andreas Stolcke, Dustin Hillard, Mari Ostendorf, and Mary Harper. 2006. Enriching speech recognition with automatic detection of sentence boundaries and disfluencies. IEEE Transactions on Audio, Speech, and Language Processing, 14(5): 1526–1540, September. Special Issue on Progress in Rich Transcription. Andrew McCallum. 2002. Mallet: A machine learning for language toolkit. http://mallet.cs.umass.edu. I. McCowan, J. Carletta, W. Kraaij, S. Ashby, S. Bourban, M. Flynn, M. Guillemot, T. Hain, J. Kadlec, V. Karaiskos, M. Kronenthal, G. Lathoud, M. Lincoln, A. Lisowska, W. Post, D. Reidsma, and P. Wellner. 2005. The AMI meeting corpus. In Proceedings of Measuring Behavior 2005, the 5th International Conference on Methods and Techniques in Behavioral Research.

same-paper 3 0.81130731 12 acl-2011-A Generative Entity-Mention Model for Linking Entities with Knowledge Base

Author: Xianpei Han ; Le Sun

4 0.80439585 295 acl-2011-Temporal Restricted Boltzmann Machines for Dependency Parsing

Author: Nikhil Garg ; James Henderson

Abstract: We propose a generative model based on Temporal Restricted Boltzmann Machines for transition based dependency parsing. The parse tree is built incrementally using a shiftreduce parse and an RBM is used to model each decision step. The RBM at the current time step induces latent features with the help of temporal connections to the relevant previous steps which provide context information. Our parser achieves labeled and unlabeled attachment scores of 88.72% and 91.65% respectively, which compare well with similar previous models and the state-of-the-art.

5 0.80150902 150 acl-2011-Hierarchical Text Classification with Latent Concepts

Author: Xipeng Qiu ; Xuanjing Huang ; Zhao Liu ; Jinlong Zhou

Abstract: Recently, hierarchical text classification has become an active research topic. The essential idea is that the descendant classes can share the information of the ancestor classes in a predefined taxonomy. In this paper, we claim that each class has several latent concepts and its subclasses share information with these different concepts respectively. Then, we propose a variant Passive-Aggressive (PA) algorithm for hierarchical text classification with latent concepts. Experimental results show that the performance of our algorithm is competitive with the recently proposed hierarchical classification algorithms.

6 0.73739266 265 acl-2011-Reordering Modeling using Weighted Alignment Matrices

7 0.7069757 185 acl-2011-Joint Identification and Segmentation of Domain-Specific Dialogue Acts for Conversational Dialogue Systems

8 0.69151103 94 acl-2011-Deciphering Foreign Language

9 0.69127715 196 acl-2011-Large-Scale Cross-Document Coreference Using Distributed Inference and Hierarchical Models

10 0.68458277 65 acl-2011-Can Document Selection Help Semi-supervised Learning? A Case Study On Event Extraction

11 0.6817559 56 acl-2011-Bayesian Inference for Zodiac and Other Homophonic Ciphers

12 0.68044066 58 acl-2011-Beam-Width Prediction for Efficient Context-Free Parsing

13 0.67749774 139 acl-2011-From Bilingual Dictionaries to Interlingual Document Representations

14 0.67701966 33 acl-2011-An Affect-Enriched Dialogue Act Classification Model for Task-Oriented Dialogue

15 0.67691642 232 acl-2011-Nonparametric Bayesian Machine Transliteration with Synchronous Adaptor Grammars

16 0.67562973 244 acl-2011-Peeling Back the Layers: Detecting Event Role Fillers in Secondary Contexts

17 0.67471647 11 acl-2011-A Fast and Accurate Method for Approximate String Search

18 0.67360795 324 acl-2011-Unsupervised Semantic Role Induction via Split-Merge Clustering

19 0.67329252 15 acl-2011-A Hierarchical Pitman-Yor Process HMM for Unsupervised Part of Speech Induction

20 0.67289513 40 acl-2011-An Error Analysis of Relation Extraction in Social Media Documents