acl acl2013 acl2013-219 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Zhengyan He ; Shujie Liu ; Mu Li ; Ming Zhou ; Longkai Zhang ; Houfeng Wang
Abstract: We propose a novel entity disambiguation model, based on Deep Neural Network (DNN). Instead of utilizing simple similarity measures and their disjoint combinations, our method directly optimizes document and entity representations for a given similarity measure. Stacked Denoising Auto-encoders are first employed to learn an initial document representation in an unsupervised pre-training stage. A supervised fine-tuning stage follows to optimize the representation towards the similarity measure. Experiment results show that our method achieves state-of-the-art performance on two public datasets without any manually designed features, even beating complex collective approaches.
Reference: text
sentIndex sentText sentNum sentScore
1 Learning Entity Representation for Entity Disambiguation Zhengyan He† Shujie Liu‡ Mu Li‡ Ming Zhou‡ Longkai Zhang† Houfeng Wang†∗ † Key Laboratory of Computational Linguistics (Peking University) Ministry of Education,China ‡ Microsoft Research Asia he zhengyan . [sent-1, score-0.069]
2 kc@o qq com Abstract We propose a novel entity disambiguation model, based on Deep Neural Network (DNN). [sent-6, score-0.282]
3 Instead of utilizing simple similarity measures and their disjoint combinations, our method directly optimizes document and entity representations for a given similarity measure. [sent-7, score-0.417]
4 Stacked Denoising Auto-encoders are first employed to learn an initial document representation in an unsupervised pre-training stage. [sent-8, score-0.052]
5 A supervised fine-tuning stage follows to optimize the representation towards the similarity measure. [sent-9, score-0.268]
6 Experiment results show that our method achieves state-of-the-art performance on two public datasets without any manually designed features, even beating complex collective approaches. [sent-10, score-0.247]
7 1 Introduction Entity linking or disambiguation has recently received much attention in natural language processing community (Bunescu and Pasca, 2006; Han et al. [sent-11, score-0.161]
8 Given a sentence with four mentions, “The [[Python]] of [[Delphi]] was a creature with the body of a snake. [sent-15, score-0.078]
9 This creature dwelled on [[Mount Parnassus]], in central [[Greece]]. [sent-16, score-0.078]
10 A most straightforward method is to compare the context of the mention and the definition of candidate entities. [sent-18, score-0.086]
11 ocnu d and entity e, such as dot product, cosine similarity, Kullback-Leibler divergence, Jaccard distance, or more complicated ones (Zheng et al. [sent-21, score-0.282]
12 Another line of work focuses on collective disambiguation (Kulkarni et al. [sent-27, score-0.353]
13 Ambiguous mentions within the same context are resolved simultaneously based on the coherence among decisions. [sent-32, score-0.042]
14 , 2011) show that even though global approaches can be improved, local methods based on only similarity sim(d, e) of context d and entity e are hard to beat. [sent-35, score-0.328]
15 This somehow reveals the importance of a good modeling of sim(d, e). [sent-36, score-0.099]
16 Rather than learning context entity associa- tion at word level, topic model based approaches (Kataria et al. [sent-37, score-0.218]
17 However, the one-topic-perentity assumption makes it impossible to scale to large knowledge base, as every entity has a separate word distribution P(w|e) ; besides, the training objective rdibouesti nnot P directly correspond rwaiitnhdisambiguation performances. [sent-39, score-0.213]
18 To overcome disadvantages of previous approaches, we propose a novel method to learn context entity association enriched with deep architecture. [sent-40, score-0.328]
19 , 2007) are built in a hierarchical manner, and allow us to compare context and entity at some higher level abstraction; while at lower levels, general concepts are shared across entities, resulting in compact models. [sent-43, score-0.301]
20 Moreover, to make our model highly correlated with disambiguation performance, our method directly optimizes doc30 Proce dinSgosfi oa,f tB huel 5g1arsita, An Anu gauls Mt 4e-e9ti n2g01 o3f. [sent-44, score-0.15]
21 c e2 A0s1s3oc Aiastsio cnia fotiron C fo mrp Cuotmatpiounta tlio Lninaglu Li sntgicusi,s ptaicgses 30–34, ument and entity representations for a fixed simi- larity measure. [sent-46, score-0.227]
22 In fact, the underlying representations for computing similarity measure add internal structure to the given similarity measure. [sent-47, score-0.197]
23 Features are learned leveraging large scale annotation of Wikipedia, without any manual design efforts. [sent-48, score-0.116]
24 Furthermore, the learned model is compact compared with topic model based approaches, and can be trained discriminatively without relying on expensive sampling strategy. [sent-49, score-0.169]
25 Despite its simplicity, it beats all complex collective approaches in our experiments. [sent-50, score-0.295]
26 The learned similarity measure can be readily incorporated into any existing collective approaches, which further boosts performance. [sent-51, score-0.399]
27 2 Learning Representation for Contextual Document Given a mention string m with its context document d, a list of candidate entities C(m) are generated for m, for each candidate entity ei ∈ C(m), we compute a ranking score sim(dm, ei) iCn d(icmat)-, ing how likely m refers to ei. [sent-52, score-0.53]
28 In the pre- training stage, Stacked Denoising Auto-encoders are built in an unsupervised layer-wise fashion to discover general concepts encoding d and e. [sent-55, score-0.096]
29 In the supervised fine-tuning stage, the entire network weights are fine-tuned to optimize the similarity score sim(d, e). [sent-56, score-0.192]
30 , 2007) is one of the building blocks of deep learning. [sent-59, score-0.11]
31 Assume the input is a vector x, an auto-encoder consists of an encoding process h(x) and a decoding process g(h(x)). [sent-60, score-0.058]
32 The goal is to minimize the reconstruction error L(x, g(h(x))), thus retaining rmeacxonimsturumc iinonfo errmroatrio Ln(x. [sent-61, score-0.233]
33 By repeatedly stacking new auto-encoder on top of previously learned h(x), stacked auto-encoders are obtained. [sent-62, score-0.237]
34 This way we learn multiple levels of representation of input x. [sent-63, score-0.052]
35 , 2008) seeks to reconstruct x given a random corruption ˜x of x. [sent-66, score-0.06]
36 DA can capture global structure while ignoring noise as the author shows in image processing. [sent-67, score-0.061]
37 DA will capture general concepts and ignore noise like function words. [sent-70, score-0.099]
38 By applying masking noise (randomly mask 1 with 0), the model also exhibits a fill-in-the-blank property (Vincent et al. [sent-71, score-0.156]
39 Take “greece” for example, the model must learn to predict it with “python” “mount”, through some hidden unit. [sent-73, score-0.05]
40 The hidden unit may somehow express the concept of Greece mythology. [sent-74, score-0.102]
41 This adds considerable computational overhead because the reconstruction process involves expensive dense matrix multiplication. [sent-77, score-0.284]
42 Reconstruction sampling keeps the sparse property of matrix multiplication by reconstructing a small subset of original input, with no loss of quality of the learned representation (Dauphin et al. [sent-78, score-0.243]
43 2 Supervised Fine-tuning × This stage we optimize the learned representation (“hidden layer n” in Fig. [sent-81, score-0.435]
44 2) towards the ranking score sim(d, e), with large scale Wikipedia annotation as supervision. [sent-82, score-0.089]
45 We collect hyperlinks in Wikipedia as our training set {(di, ei, mi)}, where mi kiisp tehdei am aesn otiuorn t string fo sert c {a(nddidate generation. [sent-83, score-0.135]
46 The network weights below “hidden layer n” are initialized with the pre-training stage. [sent-84, score-0.242]
47 Next, we stack another layer on top of the learned representation. [sent-85, score-0.283]
48 The whole network is tuned by the final supervised objective. [sent-86, score-0.081]
49 The reason to stack another layer on top of the learned representation, is to capture problem specific structures. [sent-87, score-0.283]
50 Denote the encoding of d and e as and respectively, after stacking the problem-specific layer, the representation for d is given as f(d) = sigmoid(W + b), where W and b are weight asnigdm boiaids tWerm × respectively. [sent-88, score-0.157]
51 f(e) follows the same e dˆ dˆ 31 encoding process. [sent-89, score-0.058]
52 The similarity score of (d, e) pair is defined as the dot product of f(d) and f(e) (Fig. [sent-90, score-0.179]
53 Our goal is to rank the correct entity higher than the rest candidates relative to the context of the mention. [sent-92, score-0.258]
54 For each training instance (d, e), we contrast it with one of its negative candidate pair (d, e0). [sent-93, score-0.044]
55 This gives the pairwise ranking criterion: L(d, e) = max{0, 1 sim(d, e) + sim(d, e0) } (2) Alternatively, we can contrast with all its candidate pairs (d, ei). [sent-94, score-0.133]
56 That is, we raise the similarity score of true pair sim(d, e) and penalize all the rest sim(d, ei). [sent-95, score-0.073]
57 In our experiments, the softmax loss function consistently outperforms × pairwise ranking loss function, which is taken as our default setting. [sent-98, score-0.305]
58 However, the softmax training criterion adds additional computational overhead when performing mini-batch Stochastic Gradient Descent (SGD). [sent-99, score-0.177]
59 mini-batch size is 1), mini-batch SGD is faster to converge and more stable. [sent-102, score-0.042]
60 Assume the mini-batch size is m and the number of candidates is n, a total of m n forward-backward passes over the netowfo rmk are performed atcok compute a similarity matrix (Fig. [sent-103, score-0.158]
61 3), while pairwise ranking criterion only needs 2 m. [sent-104, score-0.133]
62 Only m n forward-backward passes are needed for each mini-batch now. [sent-108, score-0.045]
63 BTS is a variant of the general backpropagation algorithm for structured neural network. [sent-117, score-0.189]
64 In BTS, parent node is computed with its child nodes at the forward pass stage; child node receives gradient as the sum of derivatives from all its parents. [sent-118, score-0.256]
65 2), parent node is the score node sim(d, e) and child nodes are f(d) and f(e). [sent-120, score-0.117]
66 In Figure 3, each row shares forward path of f(d) while each column shares forward path of f(e). [sent-121, score-0.17]
67 At backpropagation stage, gradient is summed over each row of score nodes for f(d) and over each column for f(e). [sent-122, score-0.169]
68 We can incorporate any handcrafted feature f(d, e) as: sim(d, e) = Dot(f(d), f(e)) + ~λf~(d, e) (5) In fact, we find that with only Dot(f(d) , f(e)) as ranking score, the performance is sufficiently good. [sent-124, score-0.052]
69 32 3 Experiments and Analysis Training settings: In pre-training stage, input layer has 100,000 units, all hidden layers have 1,000 units with rectifier function max(0, x). [sent-126, score-0.268]
70 , 2011), for the first reconstruction layer, we use sigmoid activation function and cross-entropy error function. [sent-128, score-0.369]
71 For higher reconstruction layers, we use softplus (log(1 + exp(x))) as activation function and squared loss as error function. [sent-129, score-0.368]
72 For corruption process, we use a masking noise probability in {0. [sent-130, score-0.216]
73 For reconstruction sampling, we set the reconstruction rate to 0. [sent-138, score-0.466]
74 In fine-tuning stage, the final layer has 200 units with sigmoid activation function. [sent-140, score-0.297]
75 Thanks to reconstruction sampling and refined mini-batch arrangement, it takes about 1 day to converge for pre-training and 3 days for finetuning, which is fast given our training set size. [sent-145, score-0.32]
76 Datasets: We use half of Wikipedia 1 plain text (˜1. [sent-146, score-0.04]
77 We collect a total of 40M hyperlinks grouped by name string m for fine-tuning stage. [sent-148, score-0.095]
78 We holdout a subset of hyperlinks for model selection, and we find that 3 layers network with a higher masking noise rate (0. [sent-149, score-0.389]
79 We select TAC-KBP 2010 (Ji and Grishman, 2011) dataset for non-collective approaches, and AIDA 2 dataset for collective approaches. [sent-151, score-0.247]
80 For candidate generation, mention-to-entity dictionary is built by mining Wikipedia structures, following (Cucerzan, 2007). [sent-154, score-0.044]
81 We keep top 30 candidates by prominence P(e|m) for speed considdeiradtaiotens. [sent-155, score-0.084]
82 , 2011) is a collective approach, using Personalized PageRank to propagate evidence between 1available at http://dumps. [sent-161, score-0.247]
83 To our surprise, our method with only local evidence even beats several complex collective methods with simple word similar- ity. [sent-168, score-0.332]
84 This reveals the importance of context modeling in semantic space. [sent-169, score-0.089]
85 Collective approaches can improve performance only when local evidence is not confident enough. [sent-170, score-0.037]
86 When embedding our similarity measure sim(d, e) into (Han et al. [sent-171, score-0.073]
87 A close error analysis shows some typical errors due to the lack of prominence feature and name matching feature. [sent-173, score-0.044]
88 Some queries accidentally link to rare candidates and some link to entities with completely different names. [sent-174, score-0.1]
89 4 Conclusion We propose a deep learning approach that automatically learns context-entity similarity measure for entity disambiguation. [sent-200, score-0.359]
90 The intermediate representations are learned leveraging large scale annotations of Wikipedia, without any manual effort of designing features. [sent-201, score-0.167]
91 The learned representation of entity is compact and can scale to very large knowledge base. [sent-202, score-0.389]
92 Furthermore, experiment reveals the importance of context modeling in this field. [sent-203, score-0.089]
93 By incorporating our learned measure into collective approach, performance is further improved. [sent-204, score-0.326]
94 Domain adaptation for large-scale sentiment classification: A deep learning approach. [sent-241, score-0.11]
95 Collective entity linking in web text: a graph-based method. [sent-254, score-0.231]
96 Thater, and Robust disambiguation of In Proceedings of the Con- ference on Empirical Methods in Natural Language Processing, pages 782–792. [sent-279, score-0.106]
97 Lcc approaches to knowledge base population at tac 2010. [sent-314, score-0.139]
98 Hara, and disambiguation based on a Technical report, Technical 125, Microsoft Research. [sent-335, score-0.106]
99 Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. [sent-362, score-0.745]
100 Entity linking with effective acronym expansion, instance selection and topic modeling. [sent-372, score-0.055]
wordName wordTfidf (topN-words)
[('sim', 0.271), ('collective', 0.247), ('denoising', 0.233), ('reconstruction', 0.233), ('aida', 0.179), ('entity', 0.176), ('layer', 0.161), ('kulkarni', 0.127), ('backpropagation', 0.119), ('glorot', 0.117), ('kataria', 0.117), ('bengio', 0.114), ('ei', 0.112), ('stacked', 0.111), ('han', 0.11), ('deep', 0.11), ('dot', 0.106), ('disambiguation', 0.106), ('stage', 0.105), ('mount', 0.103), ('bts', 0.103), ('tac', 0.102), ('greece', 0.102), ('bunescu', 0.098), ('masking', 0.095), ('hyperlinks', 0.095), ('sgd', 0.095), ('wikipedia', 0.09), ('python', 0.089), ('sen', 0.085), ('softmax', 0.082), ('pasca', 0.082), ('network', 0.081), ('learned', 0.079), ('hoffart', 0.079), ('creature', 0.078), ('delphi', 0.078), ('larochelle', 0.078), ('noweb', 0.078), ('parnassus', 0.078), ('shirakawa', 0.078), ('wordsim', 0.078), ('similarity', 0.073), ('neural', 0.07), ('zhengyan', 0.069), ('dauphin', 0.069), ('activation', 0.068), ('sigmoid', 0.068), ('loss', 0.067), ('ratinov', 0.067), ('da', 0.065), ('goller', 0.063), ('lcc', 0.063), ('noise', 0.061), ('entities', 0.06), ('cucerzan', 0.06), ('corruption', 0.06), ('vincent', 0.059), ('encoding', 0.058), ('layers', 0.057), ('linking', 0.055), ('somehow', 0.052), ('representation', 0.052), ('ranking', 0.052), ('representations', 0.051), ('overhead', 0.051), ('hidden', 0.05), ('gradient', 0.05), ('ji', 0.049), ('beats', 0.048), ('forward', 0.048), ('reveals', 0.047), ('hinton', 0.047), ('stacking', 0.047), ('passes', 0.045), ('sampling', 0.045), ('compact', 0.045), ('optimizes', 0.044), ('prominence', 0.044), ('dm', 0.044), ('candidate', 0.044), ('criterion', 0.044), ('stack', 0.043), ('converge', 0.042), ('context', 0.042), ('child', 0.041), ('grishman', 0.041), ('plain', 0.04), ('mi', 0.04), ('candidates', 0.04), ('contrastive', 0.039), ('concepts', 0.038), ('node', 0.038), ('optimize', 0.038), ('shares', 0.037), ('zheng', 0.037), ('scale', 0.037), ('pairwise', 0.037), ('population', 0.037), ('local', 0.037)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999976 219 acl-2013-Learning Entity Representation for Entity Disambiguation
Author: Zhengyan He ; Shujie Liu ; Mu Li ; Ming Zhou ; Longkai Zhang ; Houfeng Wang
Abstract: We propose a novel entity disambiguation model, based on Deep Neural Network (DNN). Instead of utilizing simple similarity measures and their disjoint combinations, our method directly optimizes document and entity representations for a given similarity measure. Stacked Denoising Auto-encoders are first employed to learn an initial document representation in an unsupervised pre-training stage. A supervised fine-tuning stage follows to optimize the representation towards the similarity measure. Experiment results show that our method achieves state-of-the-art performance on two public datasets without any manually designed features, even beating complex collective approaches.
2 0.20065783 139 acl-2013-Entity Linking for Tweets
Author: Xiaohua Liu ; Yitong Li ; Haocheng Wu ; Ming Zhou ; Furu Wei ; Yi Lu
Abstract: We study the task of entity linking for tweets, which tries to associate each mention in a tweet with a knowledge base entry. Two main challenges of this task are the dearth of information in a single tweet and the rich entity mention variations. To address these challenges, we propose a collective inference method that simultaneously resolves a set of mentions. Particularly, our model integrates three kinds of similarities, i.e., mention-entry similarity, entry-entry similarity, and mention-mention similarity, to enrich the context for entity linking, and to address irregular mentions that are not covered by the entity-variation dictionary. We evaluate our method on a publicly available data set and demonstrate the effectiveness of our method.
3 0.19534577 388 acl-2013-Word Alignment Modeling with Context Dependent Deep Neural Network
Author: Nan Yang ; Shujie Liu ; Mu Li ; Ming Zhou ; Nenghai Yu
Abstract: In this paper, we explore a novel bilingual word alignment approach based on DNN (Deep Neural Network), which has been proven to be very effective in various machine learning tasks (Collobert et al., 2011). We describe in detail how we adapt and extend the CD-DNNHMM (Dahl et al., 2012) method introduced in speech recognition to the HMMbased word alignment model, in which bilingual word embedding is discriminatively learnt to capture lexical translation information, and surrounding words are leveraged to model context information in bilingual sentences. While being capable to model the rich bilingual correspondence, our method generates a very compact model with much fewer parameters. Experiments on a large scale EnglishChinese word alignment task show that the proposed method outperforms the HMM and IBM model 4 baselines by 2 points in F-score.
4 0.13758503 179 acl-2013-HYENA-live: Fine-Grained Online Entity Type Classification from Natural-language Text
Author: Mohamed Amir Yosef ; Sandro Bauer ; Johannes Hoffart ; Marc Spaniol ; Gerhard Weikum
Abstract: Recent research has shown progress in achieving high-quality, very fine-grained type classification in hierarchical taxonomies. Within such a multi-level type hierarchy with several hundreds of types at different levels, many entities naturally belong to multiple types. In order to achieve high-precision in type classification, current approaches are either limited to certain domains or require time consuming multistage computations. As a consequence, existing systems are incapable of performing ad-hoc type classification on arbitrary input texts. In this demo, we present a novel Webbased tool that is able to perform domain independent entity type classification under real time conditions. Thanks to its efficient implementation and compacted feature representation, the system is able to process text inputs on-the-fly while still achieving equally high precision as leading state-ofthe-art implementations. Our system offers an online interface where natural-language text can be inserted, which returns semantic type labels for entity mentions. Further more, the user interface allows users to explore the assigned types by visualizing and navigating along the type-hierarchy.
5 0.12370483 160 acl-2013-Fine-grained Semantic Typing of Emerging Entities
Author: Ndapandula Nakashole ; Tomasz Tylenda ; Gerhard Weikum
Abstract: Methods for information extraction (IE) and knowledge base (KB) construction have been intensively studied. However, a largely under-explored case is tapping into highly dynamic sources like news streams and social media, where new entities are continuously emerging. In this paper, we present a method for discovering and semantically typing newly emerging out-ofKB entities, thus improving the freshness and recall of ontology-based IE and improving the precision and semantic rigor of open IE. Our method is based on a probabilistic model that feeds weights into integer linear programs that leverage type signatures of relational phrases and type correlation or disjointness constraints. Our experimental evaluation, based on crowdsourced user studies, show our method performing significantly better than prior work.
6 0.11882652 83 acl-2013-Collective Annotation of Linguistic Resources: Basic Principles and a Formal Model
7 0.11867366 254 acl-2013-Multimodal DBN for Predicting High-Quality Answers in cQA portals
8 0.11406294 352 acl-2013-Towards Accurate Distant Supervision for Relational Facts Extraction
9 0.11084805 216 acl-2013-Large tagset labeling using Feed Forward Neural Networks. Case study on Romanian Language
10 0.099175707 347 acl-2013-The Role of Syntax in Vector Space Models of Compositional Semantics
11 0.098525092 38 acl-2013-Additive Neural Networks for Statistical Machine Translation
12 0.089779273 43 acl-2013-Align, Disambiguate and Walk: A Unified Approach for Measuring Semantic Similarity
13 0.084405959 275 acl-2013-Parsing with Compositional Vector Grammars
14 0.07901299 294 acl-2013-Re-embedding words
15 0.07647343 291 acl-2013-Question Answering Using Enhanced Lexical Semantic Models
16 0.076228693 169 acl-2013-Generating Synthetic Comparable Questions for News Articles
17 0.074144669 35 acl-2013-Adaptation Data Selection using Neural Language Models: Experiments in Machine Translation
18 0.073489964 301 acl-2013-Resolving Entity Morphs in Censored Data
19 0.072644241 27 acl-2013-A Two Level Model for Context Sensitive Inference Rules
20 0.071720943 221 acl-2013-Learning Non-linear Features for Machine Translation Using Gradient Boosting Machines
topicId topicWeight
[(0, 0.217), (1, 0.048), (2, 0.017), (3, -0.081), (4, 0.055), (5, 0.036), (6, -0.001), (7, -0.001), (8, -0.02), (9, 0.025), (10, -0.021), (11, -0.074), (12, 0.029), (13, -0.136), (14, 0.024), (15, 0.102), (16, -0.079), (17, 0.087), (18, -0.088), (19, -0.111), (20, 0.003), (21, 0.008), (22, -0.115), (23, 0.116), (24, 0.013), (25, -0.023), (26, 0.14), (27, -0.084), (28, 0.009), (29, 0.002), (30, -0.054), (31, -0.145), (32, 0.013), (33, -0.032), (34, -0.055), (35, -0.058), (36, -0.023), (37, -0.078), (38, -0.033), (39, -0.051), (40, 0.109), (41, 0.023), (42, 0.051), (43, 0.02), (44, 0.008), (45, -0.073), (46, -0.038), (47, 0.019), (48, -0.055), (49, -0.016)]
simIndex simValue paperId paperTitle
same-paper 1 0.94190317 219 acl-2013-Learning Entity Representation for Entity Disambiguation
Author: Zhengyan He ; Shujie Liu ; Mu Li ; Ming Zhou ; Longkai Zhang ; Houfeng Wang
Abstract: We propose a novel entity disambiguation model, based on Deep Neural Network (DNN). Instead of utilizing simple similarity measures and their disjoint combinations, our method directly optimizes document and entity representations for a given similarity measure. Stacked Denoising Auto-encoders are first employed to learn an initial document representation in an unsupervised pre-training stage. A supervised fine-tuning stage follows to optimize the representation towards the similarity measure. Experiment results show that our method achieves state-of-the-art performance on two public datasets without any manually designed features, even beating complex collective approaches.
2 0.69102216 139 acl-2013-Entity Linking for Tweets
Author: Xiaohua Liu ; Yitong Li ; Haocheng Wu ; Ming Zhou ; Furu Wei ; Yi Lu
Abstract: We study the task of entity linking for tweets, which tries to associate each mention in a tweet with a knowledge base entry. Two main challenges of this task are the dearth of information in a single tweet and the rich entity mention variations. To address these challenges, we propose a collective inference method that simultaneously resolves a set of mentions. Particularly, our model integrates three kinds of similarities, i.e., mention-entry similarity, entry-entry similarity, and mention-mention similarity, to enrich the context for entity linking, and to address irregular mentions that are not covered by the entity-variation dictionary. We evaluate our method on a publicly available data set and demonstrate the effectiveness of our method.
3 0.63456047 179 acl-2013-HYENA-live: Fine-Grained Online Entity Type Classification from Natural-language Text
Author: Mohamed Amir Yosef ; Sandro Bauer ; Johannes Hoffart ; Marc Spaniol ; Gerhard Weikum
Abstract: Recent research has shown progress in achieving high-quality, very fine-grained type classification in hierarchical taxonomies. Within such a multi-level type hierarchy with several hundreds of types at different levels, many entities naturally belong to multiple types. In order to achieve high-precision in type classification, current approaches are either limited to certain domains or require time consuming multistage computations. As a consequence, existing systems are incapable of performing ad-hoc type classification on arbitrary input texts. In this demo, we present a novel Webbased tool that is able to perform domain independent entity type classification under real time conditions. Thanks to its efficient implementation and compacted feature representation, the system is able to process text inputs on-the-fly while still achieving equally high precision as leading state-ofthe-art implementations. Our system offers an online interface where natural-language text can be inserted, which returns semantic type labels for entity mentions. Further more, the user interface allows users to explore the assigned types by visualizing and navigating along the type-hierarchy.
4 0.62933207 160 acl-2013-Fine-grained Semantic Typing of Emerging Entities
Author: Ndapandula Nakashole ; Tomasz Tylenda ; Gerhard Weikum
Abstract: Methods for information extraction (IE) and knowledge base (KB) construction have been intensively studied. However, a largely under-explored case is tapping into highly dynamic sources like news streams and social media, where new entities are continuously emerging. In this paper, we present a method for discovering and semantically typing newly emerging out-ofKB entities, thus improving the freshness and recall of ontology-based IE and improving the precision and semantic rigor of open IE. Our method is based on a probabilistic model that feeds weights into integer linear programs that leverage type signatures of relational phrases and type correlation or disjointness constraints. Our experimental evaluation, based on crowdsourced user studies, show our method performing significantly better than prior work.
5 0.61415452 216 acl-2013-Large tagset labeling using Feed Forward Neural Networks. Case study on Romanian Language
Author: Tiberiu Boros ; Radu Ion ; Dan Tufis
Abstract: Radu Ion Research Institute for ?????????? ???????????? ?????? Dr?????????? Romanian Academy radu@ racai . ro Dan Tufi? Research Institute for ?????????? ???????????? ?????? Dr?????????? Romanian Academy tufi s @ racai . ro Networks (Marques and Lopes, 1996) and Conditional Random Fields (CRF) (Lafferty et Standard methods for part-of-speech tagging suffer from data sparseness when used on highly inflectional languages (which require large lexical tagset inventories). For this reason, a number of alternative methods have been proposed over the years. One of the most successful methods used for this task, ?????? ?????? ??????? ??????, 1999), exploits a reduced set of tags derived by removing several recoverable features from the lexicon morpho-syntactic descriptions. A second phase is aimed at recovering the full set of morpho-syntactic features. In this paper we present an alternative method to Tiered Tagging, based on local optimizations with Neural Networks and we show how, by properly encoding the input sequence in a general Neural Network architecture, we achieve results similar to the Tiered Tagging methodology, significantly faster and without requiring extensive linguistic knowledge as implied by the previously mentioned method. 1
6 0.61002612 352 acl-2013-Towards Accurate Distant Supervision for Relational Facts Extraction
7 0.59528744 388 acl-2013-Word Alignment Modeling with Context Dependent Deep Neural Network
8 0.58559221 294 acl-2013-Re-embedding words
9 0.56923407 172 acl-2013-Graph-based Local Coherence Modeling
10 0.55417681 35 acl-2013-Adaptation Data Selection using Neural Language Models: Experiments in Machine Translation
11 0.52843249 254 acl-2013-Multimodal DBN for Predicting High-Quality Answers in cQA portals
12 0.52640653 38 acl-2013-Additive Neural Networks for Statistical Machine Translation
13 0.52459085 84 acl-2013-Combination of Recurrent Neural Networks and Factored Language Models for Code-Switching Language Modeling
14 0.52188861 138 acl-2013-Enriching Entity Translation Discovery using Selective Temporality
15 0.52030665 340 acl-2013-Text-Driven Toponym Resolution using Indirect Supervision
16 0.5162828 275 acl-2013-Parsing with Compositional Vector Grammars
17 0.49438122 178 acl-2013-HEADY: News headline abstraction through event pattern clustering
18 0.48970813 301 acl-2013-Resolving Entity Morphs in Censored Data
19 0.48019281 159 acl-2013-Filling Knowledge Base Gaps for Distant Supervision of Relation Extraction
20 0.47192785 169 acl-2013-Generating Synthetic Comparable Questions for News Articles
topicId topicWeight
[(0, 0.064), (6, 0.04), (11, 0.045), (24, 0.045), (26, 0.053), (35, 0.117), (42, 0.049), (48, 0.028), (63, 0.238), (67, 0.033), (70, 0.042), (71, 0.019), (88, 0.036), (90, 0.052), (95, 0.057)]
simIndex simValue paperId paperTitle
1 0.87212592 151 acl-2013-Extra-Linguistic Constraints on Stance Recognition in Ideological Debates
Author: Kazi Saidul Hasan ; Vincent Ng
Abstract: Determining the stance expressed by an author from a post written for a twosided debate in an online debate forum is a relatively new problem. We seek to improve Anand et al.’s (201 1) approach to debate stance classification by modeling two types of soft extra-linguistic constraints on the stance labels of debate posts, user-interaction constraints and ideology constraints. Experimental results on four datasets demonstrate the effectiveness of these inter-post constraints in improving debate stance classification.
same-paper 2 0.81395793 219 acl-2013-Learning Entity Representation for Entity Disambiguation
Author: Zhengyan He ; Shujie Liu ; Mu Li ; Ming Zhou ; Longkai Zhang ; Houfeng Wang
Abstract: We propose a novel entity disambiguation model, based on Deep Neural Network (DNN). Instead of utilizing simple similarity measures and their disjoint combinations, our method directly optimizes document and entity representations for a given similarity measure. Stacked Denoising Auto-encoders are first employed to learn an initial document representation in an unsupervised pre-training stage. A supervised fine-tuning stage follows to optimize the representation towards the similarity measure. Experiment results show that our method achieves state-of-the-art performance on two public datasets without any manually designed features, even beating complex collective approaches.
3 0.74969918 140 acl-2013-Evaluating Text Segmentation using Boundary Edit Distance
Author: Chris Fournier
Abstract: This work proposes a new segmentation evaluation metric, named boundary similarity (B), an inter-coder agreement coefficient adaptation, and a confusion-matrix for segmentation that are all based upon an adaptation of the boundary edit distance in Fournier and Inkpen (2012). Existing segmentation metrics such as Pk, WindowDiff, and Segmentation Similarity (S) are all able to award partial credit for near misses between boundaries, but are biased towards segmentations containing few or tightly clustered boundaries. Despite S’s improvements, its normalization also produces cosmetically high values that overestimate agreement & performance, leading this work to propose a solution.
4 0.71361887 188 acl-2013-Identifying Sentiment Words Using an Optimization-based Model without Seed Words
Author: Hongliang Yu ; Zhi-Hong Deng ; Shiyingxue Li
Abstract: Sentiment Word Identification (SWI) is a basic technique in many sentiment analysis applications. Most existing researches exploit seed words, and lead to low robustness. In this paper, we propose a novel optimization-based model for SWI. Unlike previous approaches, our model exploits the sentiment labels of documents instead of seed words. Several experiments on real datasets show that WEED is effective and outperforms the state-of-the-art methods with seed words.
5 0.61522931 49 acl-2013-An annotated corpus of quoted opinions in news articles
Author: Tim O'Keefe ; James R. Curran ; Peter Ashwell ; Irena Koprinska
Abstract: Quotes are used in news articles as evidence of a person’s opinion, and thus are a useful target for opinion mining. However, labelling each quote with a polarity score directed at a textually-anchored target can ignore the broader issue that the speaker is commenting on. We address this by instead labelling quotes as supporting or opposing a clear expression of a point of view on a topic, called a position statement. Using this we construct a corpus covering 7 topics with 2,228 quotes.
6 0.60530591 389 acl-2013-Word Association Profiles and their Use for Automated Scoring of Essays
7 0.60515279 187 acl-2013-Identifying Opinion Subgroups in Arabic Online Discussions
8 0.60403407 252 acl-2013-Multigraph Clustering for Unsupervised Coreference Resolution
9 0.60360414 46 acl-2013-An Infinite Hierarchical Bayesian Model of Phrasal Translation
10 0.60165995 159 acl-2013-Filling Knowledge Base Gaps for Distant Supervision of Relation Extraction
11 0.59865314 172 acl-2013-Graph-based Local Coherence Modeling
12 0.59750676 158 acl-2013-Feature-Based Selection of Dependency Paths in Ad Hoc Information Retrieval
13 0.59745109 341 acl-2013-Text Classification based on the Latent Topics of Important Sentences extracted by the PageRank Algorithm
14 0.59145516 4 acl-2013-A Context Free TAG Variant
15 0.59105819 60 acl-2013-Automatic Coupling of Answer Extraction and Information Retrieval
16 0.59070575 312 acl-2013-Semantic Parsing as Machine Translation
17 0.58983332 250 acl-2013-Models of Translation Competitions
18 0.58981049 2 acl-2013-A Bayesian Model for Joint Unsupervised Induction of Sentiment, Aspect and Discourse Representations
19 0.58945191 283 acl-2013-Probabilistic Domain Modelling With Contextualized Distributional Semantic Vectors
20 0.58899331 194 acl-2013-Improving Text Simplification Language Modeling Using Unsimplified Text Data