acl acl2012 acl2012-208 knowledge-graph by maker-knowledge-mining

208 acl-2012-Unsupervised Relation Discovery with Sense Disambiguation

Source: pdf

Author: Limin Yao ; Sebastian Riedel ; Andrew McCallum

Abstract: To discover relation types from text, most methods cluster shallow or syntactic patterns of relation mentions, but consider only one possible sense per pattern. In practice this assumption is often violated. In this paper we overcome this issue by inducing clusters of pattern senses from feature representations of patterns. In particular, we employ a topic model to partition entity pairs associated with patterns into sense clusters using local and global features. We merge these sense clusters into semantic relations using hierarchical agglomerative clustering. We compare against several baselines: a generative latent-variable model, a clustering method that does not disambiguate between path senses, and our own approach but with only local features. Experimental results show our proposed approach discovers dramatically more accurate clusters than models without sense disambiguation, and that incorporating global features, such as the document theme, is crucial.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu , , Abstract To discover relation types from text, most methods cluster shallow or syntactic patterns of relation mentions, but consider only one possible sense per pattern. [sent-3, score-0.993]

2 In this paper we overcome this issue by inducing clusters of pattern senses from feature representations of patterns. [sent-5, score-0.557]

3 In particular, we employ a topic model to partition entity pairs associated with patterns into sense clusters using local and global features. [sent-6, score-1.287]

4 We merge these sense clusters into semantic relations using hierarchical agglomerative clustering. [sent-7, score-0.779]

5 We compare against several baselines: a generative latent-variable model, a clustering method that does not disambiguate between path senses, and our own approach but with only local features. [sent-8, score-0.51]

6 Experimental results show our proposed approach discovers dramatically more accurate clusters than models without sense disambiguation, and that incorporating global features, such as the document theme, is crucial. [sent-9, score-0.735]

7 Many relation discovery methods rely exclusively on the notion of either shallow or syntactic patterns that appear between two named entities (Bollegala et al. [sent-24, score-0.435]

8 Generally speaking, relation discovery attempts to cluster such patterns into sets of equivalent or similar meaning. [sent-27, score-0.439]

9 ” It can also indicate that an athlete A beat B in a sports match, as pair “(Dmitry Tursunov, Andy Roddick)” in “Dmitry Tursunov beat the best American player Andy Roddick. [sent-30, score-0.432]

10 It is difficult to discover a high-quality set of fine-grained entity types due to unknown criteria for developing such a set. [sent-48, score-0.438]

11 In particular, the optimal granularity of entity types depends on the particular pattern we consider. [sent-49, score-0.453]

12 In addition, there are senses that just cannot be determined by entity types alone: Take the meaning of “A beat B” where A and B are both persons; this could mean A physically beats B, or it could mean that A defeated B in a competition. [sent-53, score-0.822]

13 Instead of mapping entities to fine-grained types, we directly induce pattern senses by clustering feature representations of pattern contexts, i. [sent-55, score-0.635]

14 This allows us to employ not only local features such as words, but also global features such as the document and sentence themes. [sent-58, score-0.401]

15 To cluster the entity pairs of a single relation pattern into senses, we develop a simple extension to Latent Dirichlet Allocation (Blei et al. [sent-59, score-0.794]

16 Once we have our pattern senses, we merge them into clusters of different patterns with a similar sense. [sent-61, score-0.396]

17 We employ hierarchical agglomerative clustering with a similarity metric that considers features such as the entity arguments, and the document and sentence themes. [sent-62, score-0.818]

18 For automatic evaluation, we use relation instances in Freebase as ground truth, and employ two clustering 713 metrics, pairwise F-score and B3 (as used in coference). [sent-67, score-0.489]

19 Experimental results show that our approach improves over the baselines, and that using global features achieves better performance than using entity type based features. [sent-68, score-0.49]

20 The results also show that our approach discovers relation clusters that human evaluators find coherent. [sent-71, score-0.495]

21 2 Our Approach We induce pattern senses by clustering the entity pairs associated with a pattern, and discover semantic relations by clustering these sense clusters. [sent-72, score-1.439]

22 We represent each pattern as a list of entity pairs and employ a topic model to partition them into different sense clusters using local and global features. [sent-73, score-1.345]

23 We take each sense cluster ofa pattern as an atomic cluster, and use hierarchical agglomerative clustering to organize them into semantic relations. [sent-74, score-0.838]

24 Therefore, a semantic relation comprises a set of sense clusters of patterns. [sent-75, score-0.757]

25 For each pattern, we form a clustering task by collecting all entity pairs the pattern connects. [sent-79, score-0.606]

26 Our goal is to partition these entity pairs into sense clusters. [sent-80, score-0.695]

27 Words: The words between and around the two entity arguments can disambiguate the sense of a path. [sent-84, score-0.695]

28 For example, “A’s parent company B” is different from “A’s largest company B” although they share the same path “A’s company B”. [sent-85, score-0.373]

29 The two words to the left ofthe source argument, and to the right ofthe destination argument also help sense discovery. [sent-87, score-0.429]

30 We call entity name and word features local, and the two theme features global. [sent-100, score-0.633]

31 We employ a topic model to discover senses for each path. [sent-101, score-0.484]

32 Each path pi forms a document, and it contains a list of entity pairs co-occurring with the path in the tuples. [sent-102, score-0.847]

33 Each entity pair is represented by a list of features fk as we described. [sent-103, score-0.445]

34 We denote each entity pair of a path as e(pi) = (f1, . [sent-107, score-0.564]

35 After inference, each entity pair of a path is assigned to one topic. [sent-124, score-0.564]

36 Entity pairs which share the same topic assignments form one sense cluster. [sent-126, score-0.422]

37 2 Hierarchical Agglomerative Clustering After discovering sense clusters of paths, we employ hierarchical agglomerative clustering (HAC) to discover semantic relations from these sense clusters. [sent-128, score-1.377]

38 We represent each sense cluster as one vector by summing up features from each entity pair in the cluster. [sent-132, score-0.805]

39 The weight of a feature indicates how many entity pairs in the cluster have the feature. [sent-133, score-0.477]

40 For example, we use binary features for word “defeat” in sense clusters of pattern “A defeat B”. [sent-136, score-0.729]

41 The two theme features are extracted from generative models, and each is a topic number. [sent-137, score-0.443]

42 Our approach produces sense clusters for each path and semantic relation clusters ofthe whole data. [sent-138, score-1.194]

43 We use their lemmas Table 1: Example sense clusters produced by sense disambiguation. [sent-146, score-0.794]

44 For two theme features, we replace the theme number with the top words. [sent-153, score-0.456]

45 For example, the document theme of the first sense is Topic30, and Topic30 has top words “sports”. [sent-154, score-0.592]

46 For each cluster, we list the top paths in it, and each is followed by “:number”, indicating its sense obtained from sense disambiguation. [sent-156, score-0.748]

47 They are ranked by the number of entity pairs they take. [sent-157, score-0.361]

48 Each entity pair and the dependency path which connects them form a tuple. [sent-161, score-0.564]

49 We filter out paths which occur fewer than 200 times and use some heuristic rules to filter out paths which are unlikely to represent a relation, for example, paths in with both arguments take the syntactic role “dobj” (direct objective) in the dependency path. [sent-162, score-0.624]

50 1 Feature Extraction For the entity name features, we split each entity string of a tuple into tokens. [sent-168, score-0.649]

51 2 Sense clusters and relation clusters For the sense disambiguation model, we set the number of topics (senses) to 50. [sent-189, score-1.017]

52 Note that a path has a multinomial distribution over 50 senses but only a few senses have non-zero probabilities. [sent-191, score-0.688]

53 Randomly sampling some entity pairs from each of them, we find that the two sense clusters are precise. [sent-195, score-0.874]

54 Only 1% of pairs from the sense cluster “entertainment” should be assigned to the “music” sense. [sent-196, score-0.451]

55 For the path “play A in B” we discover two senses which take the most probabilities: “sports” and “art”. [sent-197, score-0.515]

56 However, the “sports” sense may still be split into more fine-grained sense clusters. [sent-199, score-0.562]

57 Both the first and second relation contain path “A play B” but with different senses. [sent-202, score-0.469]

58 ” This is due to that they share many entity pairs of team-team. [sent-205, score-0.361]

59 Each tuple is represented by features of the entity pair, as listed in 2. [sent-213, score-0.391]

60 One sense per path (HAC): This system uses only hierarchical clustering to discover relations, skipping sense disambiguation. [sent-221, score-1.053]

61 In DIRT, each path is represented by its entity arguments. [sent-223, score-0.512]

62 DIRT calculates distributional similarities between different paths to find paths which bear the same semantic re- lation. [sent-224, score-0.41]

63 Local: This system uses our approach (both sense clustering with topic models and hierarchical clustering), but without global features. [sent-226, score-0.64]

64 Local+Type This system adds entity type features to the previous system. [sent-227, score-0.408]

65 This allows us to compare performance of using global features against entity type features. [sent-228, score-0.49]

66 To determine entity types, we link named entities to Wikipedia pages using the Wikifier (Ratinov et al. [sent-229, score-0.419]

67 In hierarchical clustering, for each sense cluster of a path, we pick the most frequent entity type as a feature. [sent-235, score-0.812]

68 Our Approach+Type This system adds Wikipedia entity type features to our approach. [sent-238, score-0.408]

69 1 Automatic Evaluation against Freebase We evaluate relation clusters discovered by all approaches against Freebase. [sent-241, score-0.438]

70 Without using sense disambiguation, the performance of hierarchical clustering decreases significantly, losing 17% in precision in the pairwise measure, and 15% in terms of B3. [sent-259, score-0.533]

71 We also show the results of our approach without global document and sentence theme features (Local). [sent-262, score-0.442]

72 We compare global features (Our approach) against Wikipedia entity type features (Local+Type). [sent-264, score-0.539]

73 We see that using global features achieves better performance than using entity type based features. [sent-265, score-0.49]

74 When we add entity type features to our approach, the performance does not increase. [sent-266, score-0.408]

75 The entity type features do not help much is due to that we cannot determine which particular type to choose for an entity pair. [sent-267, score-0.767]

76 2 Path Intrusion We also evaluate coherence of relation clusters produced by different approaches by creating path intrusion tasks (Chang et al. [sent-276, score-0.771]

77 In each task, some paths from one cluster and an intruding path from another are shown, and the annotator’s job is to identify one single path which is out of place. [sent-278, score-0.776]

78 We show 5 paths and ask the annotator to identify one path which does not belong to the cluster. [sent-288, score-0.391]

79 We concentrate on some intrusion tasks and compare the clusters produced by different systems. [sent-297, score-0.36]

80 The clusters produced by HAC (without sense disambiguation) is coherent if all the paths in one relation take a particular sense. [sent-298, score-0.905]

81 It is easy to identify “A’s program B” as an intruder when the annotators realize that the other four paths state the relation that people work in an educational institution. [sent-300, score-0.392]

82 The generative model approach produces more coherent clusters when the number of relation topics increases. [sent-301, score-0.517]

83 The system which employs local and entity type features (Local+Type) produces clusters with low 718 coherence because the system puts high weight on types. [sent-302, score-0.691]

84 For example, (United States, A talk with B, Syria) and (Canada, A defeat B, United States) are clustered into one relation since they share the argument types “country”-“country”. [sent-303, score-0.374]

85 Some errors are caused by selecting the incorrect sense for an entity pair of a path. [sent-313, score-0.64]

86 For instance, we put (Kenny Smith, who grew up in, Queens) and (Phil Jackson, return to, Los Angeles Lakers) into the “/people/person/place of birth” relation cluster since we do not detect the “sports” sense for the entity pair “(Phil Jackson, Los Angeles Lakers)”. [sent-314, score-0.962]

87 5 Related Work There has been considerable interest in unsupervised relation discovery, including clustering approach, generative models and many other approaches. [sent-315, score-0.453]

88 (2007) addresses the issue of multiple senses per path by automatically learning admissible argument types where two paths are similar. [sent-320, score-0.717]

89 They cluster arguments to fine-grained entity types and rank the associations of a relation with these entity types to discover selectional preferences. [sent-321, score-1.249]

90 , 2010; Seaghdha, 2010) can help path sense disambiguation, however, we show that using global features performs better than entity type features. [sent-323, score-0.976]

91 In our case, each sense of a path can be seen as one view. [sent-327, score-0.486]

92 Hachey (2009) uses topic models to perform dimensionality reduction on features when clustering entity pairs into relations. [sent-331, score-0.631]

93 (2010) employ co-clustering to find clusters of entity pairs and patterns jointly. [sent-333, score-0.733]

94 They employ a self-learner to extract relation instances, but no attempt is made to cluster instances into relations. [sent-337, score-0.409]

95 Moreover, we explore path senses and global features for relation discovery. [sent-340, score-0.756]

96 Our approach employs generative models for path sense disambiguation, which achieves better performance than directly applying generative models to unsupervised relation discovery. [sent-345, score-0.884]

97 6 Conclusion We explore senses of paths to discover semantic re- lations. [sent-346, score-0.534]

98 We employ a topic model to partition entity pairs of a path into different sense clusters and use hierarchical agglomerative clustering to merge senses into semantic relations. [sent-347, score-1.85]

99 Experimental results show our approach discovers precise relation clusters and outperforms a generative model approach and a clustering method which does not address sense disambiguation. [sent-348, score-0.989]

100 We also show that using global features improves the performance of unsupervised relation discovery over using entity type based features. [sent-349, score-0.794]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('entity', 0.307), ('sense', 0.281), ('clusters', 0.232), ('theme', 0.228), ('senses', 0.214), ('relation', 0.206), ('path', 0.205), ('paths', 0.186), ('freebase', 0.157), ('sports', 0.15), ('clustering', 0.134), ('intrusion', 0.128), ('cluster', 0.116), ('dirt', 0.111), ('pattern', 0.111), ('agglomerative', 0.102), ('discover', 0.096), ('pantel', 0.096), ('topic', 0.087), ('employ', 0.087), ('yao', 0.086), ('mcc', 0.085), ('beat', 0.083), ('document', 0.083), ('global', 0.082), ('selectional', 0.081), ('generative', 0.079), ('argument', 0.077), ('pi', 0.076), ('destination', 0.071), ('relations', 0.07), ('banko', 0.068), ('arguments', 0.066), ('disambiguation', 0.066), ('entities', 0.065), ('athlete', 0.064), ('defeated', 0.064), ('hac', 0.064), ('intruding', 0.064), ('pianist', 0.064), ('politician', 0.064), ('rink', 0.064), ('tasini', 0.064), ('discovery', 0.064), ('pairwise', 0.062), ('bollegala', 0.06), ('play', 0.058), ('discovers', 0.057), ('hierarchical', 0.056), ('company', 0.056), ('clinton', 0.056), ('hillary', 0.056), ('rodham', 0.056), ('defeat', 0.056), ('hasegawa', 0.056), ('prefix', 0.055), ('multinomial', 0.055), ('pairs', 0.054), ('partition', 0.053), ('patterns', 0.053), ('type', 0.052), ('pair', 0.052), ('dallas', 0.051), ('local', 0.051), ('lda', 0.051), ('wikipedia', 0.049), ('features', 0.049), ('named', 0.047), ('fn', 0.045), ('team', 0.045), ('draw', 0.043), ('entertainment', 0.043), ('baldi', 0.043), ('giants', 0.043), ('hawks', 0.043), ('mozart', 0.043), ('physically', 0.043), ('sandhaus', 0.043), ('tursunov', 0.043), ('dirichlet', 0.042), ('tuples', 0.041), ('country', 0.041), ('disambiguate', 0.041), ('music', 0.039), ('game', 0.038), ('oren', 0.038), ('semantic', 0.038), ('mean', 0.038), ('lakers', 0.037), ('bizer', 0.037), ('dbpedia', 0.037), ('bollacker', 0.037), ('zj', 0.037), ('isp', 0.037), ('fk', 0.037), ('types', 0.035), ('tuple', 0.035), ('unsupervised', 0.034), ('descriptors', 0.034), ('birth', 0.034)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0 208 acl-2012-Unsupervised Relation Discovery with Sense Disambiguation

Author: Limin Yao ; Sebastian Riedel ; Andrew McCallum

2 0.2901493 159 acl-2012-Pattern Learning for Relation Extraction with a Hierarchical Topic Model

Author: Enrique Alfonseca ; Katja Filippova ; Jean-Yves Delort ; Guillermo Garrido

Abstract: We describe the use of a hierarchical topic model for automatically identifying syntactic and lexical patterns that explicitly state ontological relations. We leverage distant supervision using relations from the knowledge base FreeBase, but do not require any manual heuristic nor manual seed list selections. Results show that the learned patterns can be used to extract new relations with good precision.

3 0.18611462 217 acl-2012-Word Sense Disambiguation Improves Information Retrieval

Author: Zhi Zhong ; Hwee Tou Ng

Abstract: Previous research has conflicting conclusions on whether word sense disambiguation (WSD) systems can improve information retrieval (IR) performance. In this paper, we propose a method to estimate sense distributions for short queries. Together with the senses predicted for words in documents, we propose a novel approach to incorporate word senses into the language modeling approach to IR and also exploit the integration of synonym relations. Our experimental results on standard TREC collections show that using the word senses tagged by a supervised WSD system, we obtain significant improvements over a state-of-the-art IR system.

4 0.17586917 18 acl-2012-A Probabilistic Model for Canonicalizing Named Entity Mentions

Author: Dani Yogatama ; Yanchuan Sim ; Noah A. Smith

Abstract: We present a statistical model for canonicalizing named entity mentions into a table whose rows represent entities and whose columns are attributes (or parts of attributes). The model is novel in that it incorporates entity context, surface features, firstorder dependencies among attribute-parts, and a notion of noise. Transductive learning from a few seeds and a collection of mention tokens combines Bayesian inference and conditional estimation. We evaluate our model and its components on two datasets collected from political blogs and sports news, finding that it outperforms a simple agglomerative clustering approach and previous work.

5 0.16992338 79 acl-2012-Efficient Tree-Based Topic Modeling

Author: Yuening Hu ; Jordan Boyd-Graber

Abstract: Topic modeling with a tree-based prior has been used for a variety of applications because it can encode correlations between words that traditional topic modeling cannot. However, its expressive power comes at the cost of more complicated inference. We extend the SPARSELDA (Yao et al., 2009) inference scheme for latent Dirichlet allocation (LDA) to tree-based topic models. This sampling scheme computes the exact conditional distribution for Gibbs sampling much more quickly than enumerating all possible latent variable assignments. We further improve performance by iteratively refining the sampling distribution only when needed. Experiments show that the proposed techniques dramatically improve the computation time.

6 0.16488792 142 acl-2012-Mining Entity Types from Query Logs via User Intent Modeling

7 0.14353473 191 acl-2012-Temporally Anchored Relation Extraction

8 0.14338467 153 acl-2012-Named Entity Disambiguation in Streaming Data

9 0.14087869 73 acl-2012-Discriminative Learning for Joint Template Filling

10 0.13962589 132 acl-2012-Learning the Latent Semantics of a Concept from its Definition

11 0.13495068 64 acl-2012-Crosslingual Induction of Semantic Roles

12 0.134929 150 acl-2012-Multilingual Named Entity Recognition using Parallel Data and Metadata from Wikipedia

13 0.1268522 48 acl-2012-Classifying French Verbs Using French and English Lexical Resources

14 0.12563653 10 acl-2012-A Discriminative Hierarchical Model for Fast Coreference at Large Scale

15 0.11588311 206 acl-2012-UWN: A Large Multilingual Lexical Knowledge Base

16 0.11300029 169 acl-2012-Reducing Wrong Labels in Distant Supervision for Relation Extraction

17 0.10740124 117 acl-2012-Improving Word Representations via Global Context and Multiple Word Prototypes

18 0.1059195 40 acl-2012-Big Data versus the Crowd: Looking for Relationships in All the Right Places

19 0.10522971 22 acl-2012-A Topic Similarity Model for Hierarchical Phrase-based Translation

20 0.10357422 12 acl-2012-A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relation Extraction

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.273), (1, 0.227), (2, 0.008), (3, 0.187), (4, -0.027), (5, 0.137), (6, -0.18), (7, 0.05), (8, 0.108), (9, -0.03), (10, 0.295), (11, -0.08), (12, 0.089), (13, -0.062), (14, -0.027), (15, 0.019), (16, -0.037), (17, -0.018), (18, -0.036), (19, 0.022), (20, 0.036), (21, -0.035), (22, -0.098), (23, 0.004), (24, 0.03), (25, -0.101), (26, -0.013), (27, -0.005), (28, 0.04), (29, -0.011), (30, 0.004), (31, 0.026), (32, -0.075), (33, -0.008), (34, -0.07), (35, -0.024), (36, 0.054), (37, 0.116), (38, -0.021), (39, 0.08), (40, -0.03), (41, -0.021), (42, 0.034), (43, 0.053), (44, -0.051), (45, 0.062), (46, -0.129), (47, 0.105), (48, 0.021), (49, 0.127)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9743222 208 acl-2012-Unsupervised Relation Discovery with Sense Disambiguation

Author: Limin Yao ; Sebastian Riedel ; Andrew McCallum

2 0.71082515 159 acl-2012-Pattern Learning for Relation Extraction with a Hierarchical Topic Model

Author: Enrique Alfonseca ; Katja Filippova ; Jean-Yves Delort ; Guillermo Garrido

3 0.62342966 73 acl-2012-Discriminative Learning for Joint Template Filling

Author: Einat Minkov ; Luke Zettlemoyer

Abstract: This paper presents a joint model for template filling, where the goal is to automatically specify the fields of target relations such as seminar announcements or corporate acquisition events. The approach models mention detection, unification and field extraction in a flexible, feature-rich model that allows for joint modeling of interdependencies at all levels and across fields. Such an approach can, for example, learn likely event durations and the fact that start times should come before end times. While the joint inference space is large, we demonstrate effective learning with a Perceptron-style approach that uses simple, greedy beam decoding. Empirical results in two benchmark domains demonstrate consistently strong performance on both mention de- tection and template filling tasks.

4 0.59235072 153 acl-2012-Named Entity Disambiguation in Streaming Data

Author: Alexandre Davis ; Adriano Veloso ; Altigran Soares ; Alberto Laender ; Wagner Meira Jr.

Abstract: The named entity disambiguation task is to resolve the many-to-many correspondence between ambiguous names and the unique realworld entity. This task can be modeled as a classification problem, provided that positive and negative examples are available for learning binary classifiers. High-quality senseannotated data, however, are hard to be obtained in streaming environments, since the training corpus would have to be constantly updated in order to accomodate the fresh data coming on the stream. On the other hand, few positive examples plus large amounts of unlabeled data may be easily acquired. Producing binary classifiers directly from this data, however, leads to poor disambiguation performance. Thus, we propose to enhance the quality of the classifiers using finer-grained variations of the well-known ExpectationMaximization (EM) algorithm. We conducted a systematic evaluation using Twitter streaming data and the results show that our classifiers are extremely effective, providing improvements ranging from 1% to 20%, when compared to the current state-of-the-art biased SVMs, being more than 120 times faster.

5 0.57569802 18 acl-2012-A Probabilistic Model for Canonicalizing Named Entity Mentions

Author: Dani Yogatama ; Yanchuan Sim ; Noah A. Smith

6 0.5512265 217 acl-2012-Word Sense Disambiguation Improves Information Retrieval

7 0.54449105 142 acl-2012-Mining Entity Types from Query Logs via User Intent Modeling

8 0.50078934 132 acl-2012-Learning the Latent Semantics of a Concept from its Definition

9 0.49561188 79 acl-2012-Efficient Tree-Based Topic Modeling

10 0.48099822 14 acl-2012-A Joint Model for Discovery of Aspects in Utterances

11 0.4737395 186 acl-2012-Structuring E-Commerce Inventory

12 0.45948067 216 acl-2012-Word Epoch Disambiguation: Finding How Words Change Over Time

13 0.45928743 12 acl-2012-A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relation Extraction

14 0.44597495 124 acl-2012-Joint Inference of Named Entity Recognition and Normalization for Tweets

15 0.4407056 169 acl-2012-Reducing Wrong Labels in Distant Supervision for Relation Extraction

16 0.43745643 152 acl-2012-Multilingual WSD with Just a Few Lines of Code: the BabelNet API

17 0.42330959 10 acl-2012-A Discriminative Hierarchical Model for Fast Coreference at Large Scale

18 0.42207941 53 acl-2012-Combining Textual Entailment and Argumentation Theory for Supporting Online Debates Interactions

19 0.41510862 40 acl-2012-Big Data versus the Crowd: Looking for Relationships in All the Right Places

20 0.41471466 117 acl-2012-Improving Word Representations via Global Context and Multiple Word Prototypes

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(25, 0.017), (26, 0.028), (28, 0.029), (30, 0.025), (37, 0.023), (39, 0.052), (64, 0.011), (74, 0.019), (82, 0.012), (84, 0.022), (85, 0.02), (90, 0.09), (92, 0.498), (94, 0.018), (99, 0.064)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.95315456 78 acl-2012-Efficient Search for Transformation-based Inference

Author: Asher Stern ; Roni Stern ; Ido Dagan ; Ariel Felner

Abstract: This paper addresses the search problem in textual inference, where systems need to infer one piece of text from another. A prominent approach to this task is attempts to transform one text into the other through a sequence of inference-preserving transformations, a.k.a. a proof, while estimating the proof’s validity. This raises a search challenge of finding the best possible proof. We explore this challenge through a comprehensive investigation of prominent search algorithms and propose two novel algorithmic components specifically designed for textual inference: a gradient-style evaluation function, and a locallookahead node expansion method. Evaluations, using the open-source system, BIUTEE, show the contribution of these ideas to search efficiency and proof quality.

2 0.94029635 86 acl-2012-Exploiting Latent Information to Predict Diffusions of Novel Topics on Social Networks

Author: Tsung-Ting Kuo ; San-Chuan Hung ; Wei-Shih Lin ; Nanyun Peng ; Shou-De Lin ; Wei-Fen Lin

Abstract: This paper brings a marriage of two seemly unrelated topics, natural language processing (NLP) and social network analysis (SNA). We propose a new task in SNA which is to predict the diffusion of a new topic, and design a learning-based framework to solve this problem. We exploit the latent semantic information among users, topics, and social connections as features for prediction. Our framework is evaluated on real data collected from public domain. The experiments show 16% AUC improvement over baseline methods. The source code and dataset are available at http://www.csie.ntu.edu.tw/~d97944007/dif fusion/ 1 Background The diffusion of information on social networks has been studied for decades. Generally, the proposed strategies can be categorized into two categories, model-driven and data-driven. The model-driven strategies, such as independent cascade model (Kempe et al., 2003), rely on certain manually crafted, usually intuitive, models to fit the diffusion data without using diffusion history. The data-driven strategies usually utilize learning-based approaches to predict the future propagation given historical records of prediction (Fei et al., 2011; Galuba et al., 2010; Petrovic et al., 2011). Data-driven strategies usually perform better than model-driven approaches because the past diffusion behavior is used during learning (Galuba et al., 2010). Recently, researchers started to exploit content information in data-driven diffusion models (Fei et al., 2011; Petrovic et al., 2011; Zhu et al., 2011). 344 However, most of the data-driven approaches assume that in order to train a model and predict the future diffusion of a topic, it is required to obtain historical records about how this topic has propagated in a social network (Petrovic et al., 2011; Zhu et al., 2011). We argue that such assumption does not always hold in the real-world scenario, and being able to forecast the propagation of novel or unseen topics is more valuable in practice. For example, a company would like to know which users are more likely to be the source of ‘viva voce’ of a newly released product for advertising purpose. A political party might want to estimate the potential degree of responses of a half-baked policy before deciding to bring it up to public. To achieve such goal, it is required to predict the future propagation behavior of a topic even before any actual diffusion happens on this topic (i.e., no historical propagation data of this topic are available). Lin et al. also propose an idea aiming at predicting the inference of implicit diffusions for novel topics (Lin et al., 2011). The main difference between their work and ours is that they focus on implicit diffusions, whose data are usually not available. Consequently, they need to rely on a model-driven approach instead of a datadriven approach. On the other hand, our work focuses on the prediction of explicit diffusion behaviors. Despite the fact that no diffusion data of novel topics is available, we can still design a data- driven approach taking advantage of some explicit diffusion data of known topics. Our experiments show that being able to utilize such information is critical for diffusion prediction. 2 The Novel-Topic Diffusion Model We start by assuming an existing social network G = (V, E), where V is the set of nodes (or user) v, and E is the set of link e. The set of topics is Proce dJienjgus, R ofep thueb 5lic0t hof A Knonruea ,l M 8-e1e4ti Jnugly o f2 t0h1e2 A.s ?c so2c0ia1t2io Ans fso rc Ciatoiomnp fuotart Cio nmaplu Ltiantgiounisatlic Lsi,n pgaugiestsi3c 4s4–348, denoted as T. Among them, some are considered as novel topics (denoted as N), while the rest (R) are used as the training records. We are also given a set of diffusion records D = {d | d = (src, dest, t) }, where src is the source node (or diffusion source), dest is the destination node, and t is the topic of the diffusion that belongs to R but not N. We assume that diffusions cannot occur between nodes without direct social connection; any diffusion pair implies the existence of a link e = (src, dest) ∈ E. Finally, we assume there are sets of keywords or tags that relevant to each topic (including existing and novel topics). Note that the set of keywords for novel topics should be seen in that of existing topics. From these sets of keywords, we construct a topicword matrix TW = (P(wordj | topici))i,j of which the elements stand for the conditional probabilities that a word appears in the text of a certain topic. Similarly, we also construct a user-word matrix UW= (P(wordj | useri))i,j from these sets of keywords. Given the above information, the goal is to predict whether a given link is active (i.e., belongs to a diffusion link) for topics in N. 2.1 The Framework The main challenge of this problem lays in that the past diffusion behaviors of new topics are missing. To address this challenge, we propose a supervised diffusion discovery framework that exploits the latent semantic information among users, topics, and their explicit / implicit interactions. Intuitively, four kinds of information are useful for prediction: • Topic information: Intuitively, knowing the signatures of a topic (e.g., is it about politics?) is critical to the success of the prediction. • User information: The information of a user such as the personality (e.g., whether this user is aggressive or passive) is generally useful. • User-topic interaction: Understanding the users' preference on certain topics can improve the quality of prediction. • Global information: We include some global features (e.g., topology info) of social network. Below we will describe how these four kinds of information can be modeled in our framework. 2.2 Topic Information We extract hidden topic category information to model topic signature. In particular, we exploit the 345 Latent Dirichlet Allocation (LDA) method (Blei et al., 2003), which is a widely used topic modeling technique, to decompose the topic-word matrix TW into hidden topic categories: TW = TH * HW , where TH is a topic-hidden matrix, HW is hiddenword matrix, and h is the manually-chosen parameter to determine the size of hidden topic categories. TH indicates the distribution of each topic to hidden topic categories, and HW indicates the distribution of each lexical term to hidden topic categories. Note that TW and TH include both existing and novel topics. We utilize THt,*, the row vector of the topic-hidden matrix TH for a topic t, as a feature set. In brief, we apply LDA to extract the topic-hidden vector THt,* to model topic signature (TG) for both existing and novel topics. Topic information can be further exploited. To predict whether a novel topic will be propagated through a link, we can first enumerate the existing topics that have been propagated through this link. For each such topic, we can calculate its similarity with the new topic based on the hidden vectors generated above (e.g., using cosine similarity between feature vectors). Then, we sum up the similarity values as a new feature: topic similarity (TS). For example, a link has previously propagated two topics for a total of three times {ACL, KDD, ACL}, and we would like to know whether a new topic, EMNLP, will propagate through this link. We can use the topic-hidden vector to generate the similarity values between EMNLP and the other topics (e.g., {0.6, 0.4, 0.6}), and then sum them up (1.6) as the value of TS. 2.3 User Information Similar to topic information, we extract latent personal information to model user signature (the users are anonymized already). We apply LDA on the user-word matrix UW: UW = UM * MW , where UM is the user-hidden matrix, MW is the hidden-word matrix, and m is the manually-chosen size of hidden user categories. UM indicates the distribution of each user to the hidden user categories (e.g., age). We then use UMu,*, the row vector of UM for the user u, as a feature set. In brief, we apply LDA to extract the user-hidden vector UMu,* for both source and destination nodes of a link to model user signature (UG). 2.4 User-Topic Interaction Modeling user-topic interaction turns out to be non-trivial. It is not useful to exploit latent semantic analysis directly on the user-topic matrix UR = UQ * QR , where UR represents how many times each user is diffused for existing topic R (R ∈ T), because UR does not contain information of novel topics, and neither do UQ and QR. Given no propagation record about novel topics, we propose a method that allows us to still extract implicit user-topic information. First, we extract from the matrix TH (described in Section 2.2) a subset RH that contains only information about existing topics. Next we apply left division to derive another userhidden matrix UH: UH = (RH \ URT)T = ((RHT RH )-1 RHT URT)T Using left division, we generate the UH matrix using existing topic information. Finally, we exploit UHu,*, the row vector of the user-hidden matrix UH for the user u, as a feature set. Note that novel topics were included in the process of learning the hidden topic categories on RH; therefore the features learned here do implicitly utilize some latent information of novel topics, which is not the case for UM. Experiments confirm the superiority of our approach. Furthermore, our approach ensures that the hidden categories in topic-hidden and user-hidden matrices are identical. Intuitively, our method directly models the user’s preference to topics’ signature (e.g., how capable is this user to propagate topics in politics category?). In contrast, the UM mentioned in Section 2.3 represents the users’ signature (e.g., aggressiveness) and has nothing to do with their opinions on a topic. In short, we obtain the user-hidden probability vector UHu,* as a feature set, which models user preferences to latent categories (UPLC). 2.5 Global Features Given a candidate link, we can extract global social features such as in-degree (ID) and outdegree (OD). We tried other features such as PageRank values but found them not useful. Moreover, we extract the number of distinct topics (NDT) for a link as a feature. The intuition behind this is that the more distinct topics a user has diffused to another, the more likely the diffusion will happen for novel topics. 346 2.6 Complexity Analysis The complexity to produce each feature is as below: (1) Topic information: O(I * |T| * h * Bt) for LDA using Gibbs sampling, where Iis # of the iterations in sampling, |T| is # of topics, and Bt is the average # of tokens in a topic. (2) User information: O(I * |V| * m * Bu) , where |V| is # of users, and Bu is the average # of tokens for a user. (3) User-topic interaction: the time complexity is O(h3 + h2 * |T| + h * |T| * |V|). (4) Global features: O(|D|), where |D| is # of diffusions. 3 Experiments For evaluation, we try to use the diffusion records of old topics to predict whether a diffusion link exists between two nodes given a new topic. 3.1 Dataset and Evaluation Metric We first identify 100 most popular topic (e.g., earthquake) from the Plurk micro-blog site between 01/201 1 and 05/201 1. Plurk is a popular micro-blog service in Asia with more than 5 million users (Kuo et al., 2011). We manually separate the 100 topics into 7 groups. We use topic-wise 4-fold cross validation to evaluate our method, because there are only 100 available topics. For each group, we select 3/4 of the topics as training and 1/4 as validation. The positive diffusion records are generated based on the post-response behavior. That is, if a person x posts a message containing one of the selected topic t, and later there is a person y responding to this message, we consider a diffusion of t has occurred from x to y (i.e., (x, y, t) is a positive instance). Our dataset contains a total of 1,642,894 positive instances out of 100 distinct topics; the largest and smallest topic contains 303,424 and 2,166 diffusions, respectively. Also, the same amount of negative instances for each topic (totally 1,642,894) is sampled for binary classification (similar to the setup in KDD Cup 2011 Track 2). The negative links of a topic t are sampled randomly based on the absence of responses for that given topic. The underlying social network is created using the post-response behavior as well. We assume there is an acquaintance link between x and y if and only if x has responded to y (or vice versa) on at least one topic. Eventually we generated a social network of 163,034 nodes and 382,878 links. Furthermore, the sets of keywords for each topic are required to create the TW and UW matrices for latent topic analysis; we simply extract the content of posts and responses for each topic to create both matrices. We set the hidden category number h = m = 7, which is equal to the number of topic groups. We use area under ROC curve (AUC) to evaluate our proposed framework (Davis and Goadrich, 2006); we rank the testing instances based on their likelihood of being positive, and compare it with the ground truth to compute AUC. 3.2 Implementation and Baseline After trying many classifiers and obtaining similar results for all of them, we report only results from LIBLINEAR with c=0.0001 (Fan et al., 2008) due to space limitation. We remove stop-words, use SCWS (Hightman, 2012) for tokenization, and MALLET (McCallum, 2002) and GibbsLDA++ (Phan and Nguyen, 2007) for LDA. There are three baseline models we compare the result with. First, we simply use the total number of existing diffusions among all topics between two nodes as the single feature for prediction. Second, we exploit the independent cascading model (Kempe et al., 2003), and utilize the normalized total number of diffusions as the propagation probability of each link. Third, we try the heat diffusion model (Ma et al., 2008), set initial heat proportional to out-degree, and tune the diffusion time parameter until the best results are obtained. Note that we did not compare with any data-driven approaches, as we have not identified one that can predict diffusion of novel topics. 3.3 Results The result of each model is shown in Table 1. All except two features outperform the baseline. The best single feature is TS. Note that UPLC performs better than UG, which verifies our hypothesis that maintaining the same hidden features across different LDA models is better. We further conduct experiments to evaluate different combinations of features (Table 2), and found that the best one (TS + ID + NDT) results in about 16% improvement over the baseline, and outperforms the combination of all features. As stated in (Witten et al., 2011), 347 adding useless features may cause the performance of classifiers to deteriorate. Intuitively, TS captures both latent topic and historical diffusion information, while ID and NDT provide complementary social characteristics of users. 4 Conclusions The main contributions of this paper are as below: 1. We propose a novel task of predicting the diffusion of unseen topics, which has wide applications in real-world. 2. Compared to the traditional model-driven or content-independent data-driven works on diffusion analysis, our solution demonstrates how one can bring together ideas from two different but promising areas, NLP and SNA, to solve a challenging problem. 3. Promising experiment result (74% in AUC) not only demonstrates the usefulness of the proposed models, but also indicates that predicting diffusion of unseen topics without historical diffusion data is feasible. Acknowledgments This work was also supported by National Science Council, National Taiwan University and Intel Corporation under Grants NSC 100-291 1-I-002-001, and 101R7501. References David M. Blei, Andrew Y. Ng & Michael I. Jordan. 2003. Latent dirichlet allocation. J. Mach. Learn. Res., 3.993-1022. Jesse Davis & Mark Goadrich. 2006. The relationship between Precision-Recall and ROC curves. Proceedings of the 23rd international conference on Machine learning, Pittsburgh, Pennsylvania. Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, XiangRui Wang & Chih-Jen Lin. 2008. LIBLINEAR: A Library for Large Linear Classification. J. Mach. Learn. Res., 9.1871-74. Hongliang Fei, Ruoyi Jiang, Yuhao Yang, Bo Luo & Jun Huan. 2011. Content based social behavior prediction: a multi-task learning approach. Proceedings of the 20th ACM international conference on Information and knowledge management, Glasgow, Scotland, UK. Wojciech Galuba, Karl Aberer, Dipanjan Chakraborty, Zoran Despotovic & Wolfgang Kellerer. 2010. Outtweeting the twitterers - predicting information cascades in microblogs. Proceedings of the 3rd conference on Online social networks, Boston, MA. Hightman. 2012. Simple Chinese Words Segmentation (SCWS). David Kempe, Jon Kleinberg & Eva Tardos. 2003. Maximizing the spread of influence through a social network. Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, Washington, D.C. Tsung-Ting Kuo, San-Chuan Hung, Wei-Shih Lin, Shou-De Lin, Ting-Chun Peng & Chia-Chun Shih. 2011. Assessing the Quality of Diffusion Models Using Real-World Social Network Data. Conference on Technologies and Applications of Artificial Intelligence, 2011. C.X. Lin, Q.Z. Mei, Y.L. Jiang, J.W. Han & S.X. Qi. 2011. Inferring the Diffusion and Evolution of Topics in Social Communities. Proceedings of the IEEE International Conference on Data Mining, 2011. Hao Ma, Haixuan Yang, Michael R. Lyu & Irwin King. 2008. Mining social networks using heat diffusion processes for marketing candidates selection. Proceeding of the 17th ACM conference on Information and knowledge management, Napa Valley, California, USA. Andrew Kachites McCallum. 2002. MALLET: A Machine Learning for Language Toolkit. Sasa Petrovic, Miles Osborne & Victor Lavrenko. 2011. RT to Win! Predicting Message Propagation in Twitter. International AAAI Conference on Weblogs and Social Media, 2011. 348 Xuan-Hieu Phan & Cam-Tu Nguyen. 2007. GibbsLDA++: A C/C++ implementation of latent Dirichlet allocation (LDA). Ian H. Witten, Eibe Frank & Mark A. Hall. 2011. Data Mining: Practical machine learning tools and techniques. San Francisco: Morgan Kaufmann Publishers Inc. Jiang Zhu, Fei Xiong, Dongzhen Piao, Yun Liu & Ying Zhang. 2011. Statistically Modeling the Effectiveness of Disaster Information in Social Media. Proceedings of the 2011 IEEE Global Humanitarian Technology Conference.

3 0.92677754 154 acl-2012-Native Language Detection with Tree Substitution Grammars

Author: Benjamin Swanson ; Eugene Charniak

Abstract: We investigate the potential of Tree Substitution Grammars as a source of features for native language detection, the task of inferring an author’s native language from text in a different language. We compare two state of the art methods for Tree Substitution Grammar induction and show that features from both methods outperform previous state of the art results at native language detection. Furthermore, we contrast these two induction algorithms and show that the Bayesian approach produces superior classification results with a smaller feature set.

4 0.91350168 205 acl-2012-Tweet Recommendation with Graph Co-Ranking

Author: Rui Yan ; Mirella Lapata ; Xiaoming Li

Abstract: Mirella Lapata‡ Xiaoming Li†, \ ‡Institute for Language, \State Key Laboratory of Software Cognition and Computation, Development Environment, University of Edinburgh, Beihang University, Edinburgh EH8 9AB, UK Beijing 100083, China mlap@ inf .ed .ac .uk lxm@pku .edu .cn 2012.1 Twitter enables users to send and read textbased posts ofup to 140 characters, known as tweets. As one of the most popular micro-blogging services, Twitter attracts millions of users, producing millions of tweets daily. Shared information through this service spreads faster than would have been possible with traditional sources, however the proliferation of user-generation content poses challenges to browsing and finding valuable information. In this paper we propose a graph-theoretic model for tweet recommendation that presents users with items they may have an interest in. Our model ranks tweets and their authors simultaneously using several networks: the social network connecting the users, the network connecting the tweets, and a third network that ties the two together. Tweet and author entities are ranked following a co-ranking algorithm based on the intuition that that there is a mutually reinforcing relationship between tweets and their authors that could be reflected in the rankings. We show that this framework can be parametrized to take into account user preferences, the popularity of tweets and their authors, and diversity. Experimental evaluation on a large dataset shows that our model out- performs competitive approaches by a large margin.

same-paper 5 0.90262163 208 acl-2012-Unsupervised Relation Discovery with Sense Disambiguation

Author: Limin Yao ; Sebastian Riedel ; Andrew McCallum

6 0.69514263 36 acl-2012-BIUTEE: A Modular Open-Source System for Recognizing Textual Entailment

7 0.69464284 84 acl-2012-Estimating Compact Yet Rich Tree Insertion Grammars

8 0.68437642 31 acl-2012-Authorship Attribution with Author-aware Topic Models

9 0.67284948 38 acl-2012-Bayesian Symbol-Refined Tree Substitution Grammars for Syntactic Parsing

10 0.661062 174 acl-2012-Semantic Parsing with Bayesian Tree Transducers

11 0.62173784 98 acl-2012-Finding Bursty Topics from Microblogs

12 0.61171383 132 acl-2012-Learning the Latent Semantics of a Concept from its Definition

13 0.58753031 167 acl-2012-QuickView: NLP-based Tweet Search

14 0.58405638 79 acl-2012-Efficient Tree-Based Topic Modeling

15 0.5823369 22 acl-2012-A Topic Similarity Model for Hierarchical Phrase-based Translation

16 0.57939816 80 acl-2012-Efficient Tree-based Approximation for Entailment Graph Learning

17 0.57740825 171 acl-2012-SITS: A Hierarchical Nonparametric Model using Speaker Identity for Topic Segmentation in Multiparty Conversations

18 0.57562917 198 acl-2012-Topic Models, Latent Space Models, Sparse Coding, and All That: A Systematic Understanding of Probabilistic Semantic Extraction in Large Corpus

19 0.55709726 10 acl-2012-A Discriminative Hierarchical Model for Fast Coreference at Large Scale

20 0.55546075 185 acl-2012-Strong Lexicalization of Tree Adjoining Grammars