acl acl2013 acl2013-242 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Ziqi Zhang ; Anna Lisa Gentile ; Isabelle Augenstein ; Eva Blomqvist ; Fabio Ciravegna
Abstract: Linking heterogeneous resources is a major research challenge in the Semantic Web. This paper studies the task of mining equivalent relations from Linked Data, which was insufficiently addressed before. We introduce an unsupervised method to measure equivalency of relation pairs and cluster equivalent relations. Early experiments have shown encouraging results with an average of 0.75~0.87 precision in predicting relation pair equivalency and 0.78~0.98 precision in relation clustering. 1
Reference: text
sentIndex sentText sentNum sentScore
1 f ciravegna } @ dc s she f ac Abstract Linking heterogeneous resources is a major research challenge in the Semantic Web. [sent-8, score-0.066]
2 This paper studies the task of mining equivalent relations from Linked Data, which was insufficiently addressed before. [sent-9, score-0.563]
3 We introduce an unsupervised method to measure equivalency of relation pairs and cluster equivalent relations. [sent-10, score-0.812]
4 Early experiments have shown encouraging results with an average of 0. [sent-11, score-0.053]
5 It constitutes the conjunction between the Web and the Semantic Web, balancing the richness of semantics offered by Semantic Web with the easiness of data publishing. [sent-17, score-0.049]
6 For the last few years Linked Open Data has grown to a gigantic knowledge base, which, as of 2013, comprised 3 1 billion triples in 295 datasets1. [sent-18, score-0.071]
7 A major research question concerning Linked Data is linking heterogeneous resources, the fact that publishers may describe analogous information using different vocabulary, or may assign different identifiers to the same referents. [sent-19, score-0.14]
8 Among such work, many study mappings between ontology concepts and data instances (e. [sent-20, score-0.221]
9 An insufficiently addressed problem is linking heterogeneous relations, which is also widely found in data and can cause problems in information retrieval (Fu et al. [sent-26, score-0.211]
10 Existing work in linking relations typically employ string similarity metrics or semantic similarity mea- 1 http://lod-cloud. [sent-28, score-0.545]
11 This paper introduces a novel method to discover equivalent groups of relations for Linked Data concepts. [sent-36, score-0.534]
12 It consists of two components: 1) a measure of equivalency between pairs of relations of a concept and 2) a clustering process to group equivalent relations. [sent-37, score-1.101]
13 Two types of experiments have been carried out using two major Linked Data sets: 1) evaluating the precision of predicting equivalency of relation pairs and 2) evaluating the precision of clustering equivalent relations. [sent-39, score-0.94]
14 Preliminary results have shown encouraging results as the method achieves between 0. [sent-40, score-0.053]
15 85 precision in the first set of experiments while 0. [sent-42, score-0.062]
16 2 Related Work Research on linking heterogeneous ontological resources mostly addresses mapping classes (or concepts) and instances (Isaac et al, 2007; Mi et al. [sent-45, score-0.216]
17 string edit distance), semantic similarity (Budanitsky and Hirst, 2006), and distributional similarity based on the overlap in data usage (Duan et al. [sent-52, score-0.335]
18 There have been insufficient studies on mapping relations (or properties) across ontologies. [sent-55, score-0.265]
19 Typical methods make use of a combination of string similarity and semantic similarity metrics (Zhong et al. [sent-56, score-0.206]
20 While string similarity fails to identify equivalent relations if their lexicalizations are distinct, semantic similarity often depends on taxonomic structures 289 Proce dingSsof oifa, th Beu 5l1gsarti Aan,An u aglu Mste 4e-ti9n2g 0 o1f3 t. [sent-60, score-0.698]
21 c A2s0s1o3ci Aatsiosonc fioartio Cno fmorpu Ctoamtiopnuatalt Lioin gauli Lsitnicgsu,i psatgices 289–293, in existing ontologies (Budanitsky and Hirst, 2006). [sent-62, score-0.084]
22 Unfortunately many Linked Data instances use relations that are invented arbitrarily or originate in rudimentary ontologies (Parundekar et al. [sent-63, score-0.382]
23 Distributional similarity has also been used to discover equivalent or similar rela- tions. [sent-65, score-0.325]
24 (2012) extract product properties from an e-commerce website and align equivalent properties using a supervised maximum entropy classification method. [sent-67, score-0.343]
25 We study linking relations on Linked Data and propose an unsupervised method. [sent-68, score-0.339]
26 (2012) identify similar relations using the overlap of the subjects of two relations and the overlap of their objects. [sent-70, score-0.918]
27 On the contrary, we aim at identifying strictly equivalent relations rather than similarity in general. [sent-71, score-0.548]
28 3 Method Let t denote a 3-tuple (triple) consisting of a subject (ts), predicate (tp) and object (to). [sent-75, score-0.072]
29 We write type (ts) = c meaning that ts is of class c. [sent-77, score-0.132]
30 p denotes a relation and rp is a set of triples whose tp=p, i. [sent-78, score-0.33]
31 Given a specific class c, and its pairs of relations (p, p ’) such that rp= {t|tp=p, type(ts)=c} and {t|tp=p ’, type (ts)=c}, we measure the equivalency of p and p ’ and then cluster equivalent relations. [sent-81, score-1.026]
32 The equivalency is calculated locally (within same class c) rather than globally (across all classes) because two relations can have identical meaning in specific class context but not necessarily so in general. [sent-82, score-0.728]
33 For example, for the class Book, the relations dbpp:title and foaf:name are used with the same meaning, however for Actor, dbpp:title is used interchangeably with awards dbpp:awards (e. [sent-83, score-0.353]
34 In practice, given a class c, our method starts with retrieving all t from a Linked Data set where type(ts)=c, using the universal query language SPARQL with any SPARQL data endpoint. [sent-86, score-0.039]
35 This data is then used to measure equivalency for each pair of relations (Section 3. [sent-87, score-0.711]
36 The equivalence scores are then used to group relations in equivalent clusters (Section 3. [sent-89, score-0.684]
37 1 Measure of equivalence The equivalence for each distinct pair of relations depends on three components. [sent-92, score-0.572]
38 Triple overlap evaluates the degree of overlap2 in terms of the usage of relations in triples. [sent-93, score-0.427]
39 The MAX function allows addressing infrequently used, but still equivalent relations (i. [sent-95, score-0.55]
40 , where the overlap covers most triples of an infrequently used relation but only a very small proportion of a much more frequently used). [sent-97, score-0.348]
41 Subject agreement While triple overlap looks at the data in general, subject agreement looks at the overlap of subjects of two relations, and the degree to which these subjects have overlapping objects. [sent-98, score-0.797]
42 Let S(p) return the set of subjects of relation p, and O(p|s) returns the set of objects of relation p whose subjects are s, i. [sent-99, score-0.474]
43 The higher the value of α, the more the two relations “agree” in terms of their shared subjects. [sent-102, score-0.265]
44 For each shared subject of p and p ’ we count 1 if they have at least 1 overlapping object and 0 otherwise. [sent-103, score-0.117]
45 This is because both p and p ’ can be 1:many relations and a low overlap value could mean that one is densely populated while the other is not, which does not necessarily mean they do not “agree”. [sent-104, score-0.394]
46 Equation [6] evaluates the degree to which two relations share the same set of subjects. [sent-105, score-0.298]
47 The agreement AG(p, p ’) balances the two factors by taking the product. [sent-106, score-0.035]
48 As a result, 2 In this paper overlap is based on “exact” match. [sent-107, score-0.129]
49 290 relations that have high level of agreement will have more subjects in common, and higher proportion of shared subjects with shared objects. [sent-108, score-0.56]
50 Cardinality ratio is a ratio between cardinality of the two relations. [sent-109, score-0.087]
51 2 Clustering We apply the measure to every pair of relations of a concept, and keep those with a non-zero equivalence score. [sent-112, score-0.49]
52 The goal of clustering is to create groups of equivalent relations based on the pair-wise equivalence scores. [sent-113, score-0.717]
53 We use a simple rule-based agglomerative clustering algorithm for this purpose. [sent-114, score-0.138]
54 First, we rank all relation pairs by their equivalence score, then we keep a pair if (i) its score and (ii) the number of triples covered by each relation are above a certain threshold, TminEqvl and TminTP respectively. [sent-115, score-0.479]
55 To merge clusters, given an existing cluster c and a new pair (p, p ’) where either p c or p ’ c, the pair is added to c if E(p, p ’) is close (as a fractional number above the threshold TminEqvlRel) to the average scores of all connected pairs in c. [sent-117, score-0.178]
56 Adjusting these thresholds allows balancing between precision and recall. [sent-120, score-0.143]
57 4 Experiment Design To our knowledge, there is no publically availa- ble gold standard for relation equivalency using Linked Data. [sent-121, score-0.443]
58 We randomly selected 21 concepts (Figure 1) from the DBpedia ontology (v3. [sent-122, score-0.188]
59 it lne t, We apply our method to each concept to discover clusters of equivalent relations, using as SPARQL endpoint both DBpedia3 and Sindice4 and report results separately. [sent-125, score-0.389]
60 For this preliminary evaluation, we have limited the amount of anno- tations to a maximum of 100 top scoring pairs of relations per concept, resulting in 16~100 pairs per concept (avg. [sent-134, score-0.465]
61 40) for DBpedia experiment and 29~100 pairs for Sindice (avg. [sent-135, score-0.053]
62 The annotators were asked to rate each edge in each cluster with -1 (wrong), 1 (correct) or 0 (cannot decide). [sent-137, score-0.039]
63 Also using this data, we derived a gold standard for clustering based on edge connectivity and we evaluate (i) the precision of top n% (p@n%) ranked equivalent relation pairs and (ii) the precision of clustering for each concept. [sent-141, score-0.723]
64 IAA on annotating pair equivalency So far the output of 13 concepts has been annotated. [sent-146, score-0.454]
65 This dataset 5 contains ≈1800 relation pairs and is larger than the one by Fu et al. [sent-147, score-0.143]
66 Annotation process shows that over 75% of relation pairs in the Sindice experiment contain non-English relations and mostly are crosslingual. [sent-149, score-0.408]
67 We used this data to report performance, although the method has been applied to all the 21 concepts, and the complete results can be visualized at our demo website link. [sent-150, score-0.08]
68 Examples of visualized clusters 5 Result and Discussion equivalency6 Figure 3 shows p@n% for pair Figure 4 shows clustering precision. [sent-161, score-0.242]
69 The box plots show the ranges of precision at each n%; the lines show the average. [sent-164, score-0.062]
70 Clustering precision As it is shown in Figure 2, Linked Data relations are often heterogeneous. [sent-166, score-0.327]
71 Therefore, finding equivalent relations to improve coverage is important. [sent-167, score-0.492]
72 Results in Figure 3 show that in most cases the method identifies equivalent relations with high precision. [sent-168, score-0.492]
73 It is effective for both single- and cross-language relation pairs. [sent-169, score-0.09]
74 The worst performing case for DBpedia is Aircraft (for all n%), mostly due to duplicating numeric valued objects of different relations (e. [sent-170, score-0.358]
75 The decreasing precision with respect to n% suggests the measure effectively ranks correct pairs to the top. [sent-173, score-0.165]
76 Figure 4 shows that the method effectively clusters equivalent relations with very high precision: 0. [sent-175, score-0.552]
77 (2012), for BasketballPlayer, our method creates separate clusters for relations meaning “draft team” and “former team” because although they are “similar” they are not “equivalent”. [sent-181, score-0.325]
78 We noticed that annotating equivalent relations is a non-trivial task. [sent-182, score-0.492]
79 Sometimes relations and their corresponding schemata (if any) are poorly documented and it is impossible to understand the meaning of relations (e. [sent-183, score-0.53]
80 Analyses of the evaluation output show that errors are typically found between highly similar relations, or whose object values are numeric types. [sent-186, score-0.099]
81 In both cases, there is a very high probability of having a high overlap of subject-object pairs between relations. [sent-187, score-0.182]
82 For example, for Aircraft, the relations dbpp:heightIn and dbpp: weight are predicted to be equivalent because many instances have the same numeric value for the properties. [sent-188, score-0.584]
83 Another example are the Airport properties dbpp:runwaySurface, dbpp:r1Surface, dbpp:r2Surface etc. [sent-189, score-0.041]
84 The relations are semantically highly similar and the object values have a high overlap. [sent-193, score-0.305]
85 A potential solution to such issues is incorporating ontological knowledge if available. [sent-194, score-0.043]
86 For example, if an ontology defines the two distinct properties of Airport without explicitly defining an “equivalence” relation between them, they are unlikely to be equivalent even if the data suggests the opposite. [sent-195, score-0.488]
87 6 Conclusion This paper introduced a data-driven, unsupervised and domain and language independent method to learn equivalent relations for Linked Data concepts. [sent-196, score-0.492]
88 Preliminary experiments show encouraging results as it effectively discovers equivalent relations in both single- and multilingual settings. [sent-197, score-0.545]
89 In future, we will revise the equivalence measure and also experiment with clustering algorithms such as (Beeferman et al. [sent-198, score-0.275]
90 We will also study the contribution of individual components of the measure in such task. [sent-200, score-0.05]
91 recall) are planned and this work will be extended to address other tasks such as ontology mapping and ontology pattern mining (Nuzzolese et al. [sent-202, score-0.26]
92 Towards better understanding and uti- lizing relations in DBpedia. [sent-232, score-0.265]
93 Deriving similarity graphs from open linked data on semantic web. [sent-254, score-0.328]
94 Journal on Data Semantics, 1(4), pp 219-236 Julius Volz, Christian Bizer, Martin Gaedke, Georgi Kobilarov. [sent-266, score-0.038]
wordName wordTfidf (topN-words)
[('equivalency', 0.353), ('dbpp', 0.282), ('relations', 0.265), ('equivalent', 0.227), ('linked', 0.225), ('rp', 0.169), ('ichise', 0.141), ('sindice', 0.141), ('equivalence', 0.132), ('subjects', 0.13), ('ontology', 0.13), ('overlap', 0.129), ('isaac', 0.115), ('dbpedia', 0.111), ('ryutaro', 0.106), ('schopman', 0.106), ('soint', 0.106), ('airport', 0.093), ('volz', 0.093), ('ts', 0.093), ('clustering', 0.093), ('relation', 0.09), ('tp', 0.089), ('budanitsky', 0.089), ('cardinality', 0.087), ('sparql', 0.086), ('ontologies', 0.084), ('fu', 0.084), ('duan', 0.077), ('linking', 0.074), ('zhong', 0.074), ('triples', 0.071), ('insufficiently', 0.071), ('lihua', 0.071), ('mauge', 0.071), ('nuzzolese', 0.071), ('parundekar', 0.071), ('shenghui', 0.071), ('soin', 0.071), ('tmineqvl', 0.071), ('tmineqvlrel', 0.071), ('tmintp', 0.071), ('triple', 0.068), ('heterogeneous', 0.066), ('zhao', 0.064), ('sint', 0.062), ('precision', 0.062), ('concept', 0.06), ('clusters', 0.06), ('ag', 0.06), ('numeric', 0.059), ('concepts', 0.058), ('antoine', 0.058), ('aircraft', 0.058), ('infrequently', 0.058), ('adar', 0.058), ('hirst', 0.057), ('similarity', 0.056), ('web', 0.054), ('beeferman', 0.054), ('bouma', 0.054), ('pairs', 0.053), ('encouraging', 0.053), ('measure', 0.05), ('iaa', 0.049), ('balancing', 0.049), ('awards', 0.049), ('semantic', 0.047), ('string', 0.047), ('visualized', 0.046), ('overlapping', 0.045), ('agglomerative', 0.045), ('ontological', 0.043), ('connectivity', 0.043), ('pair', 0.043), ('han', 0.043), ('discover', 0.042), ('yong', 0.041), ('properties', 0.041), ('actor', 0.04), ('le', 0.04), ('object', 0.04), ('cluster', 0.039), ('class', 0.039), ('pp', 0.038), ('team', 0.037), ('mi', 0.036), ('cd', 0.036), ('agreement', 0.035), ('objects', 0.034), ('website', 0.034), ('preliminary', 0.034), ('instances', 0.033), ('evaluates', 0.033), ('title', 0.032), ('asian', 0.032), ('thresholds', 0.032), ('subject', 0.032), ('calculated', 0.032), ('looks', 0.032)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999988 242 acl-2013-Mining Equivalent Relations from Linked Data
Author: Ziqi Zhang ; Anna Lisa Gentile ; Isabelle Augenstein ; Eva Blomqvist ; Fabio Ciravegna
Abstract: Linking heterogeneous resources is a major research challenge in the Semantic Web. This paper studies the task of mining equivalent relations from Linked Data, which was insufficiently addressed before. We introduce an unsupervised method to measure equivalency of relation pairs and cluster equivalent relations. Early experiments have shown encouraging results with an average of 0.75~0.87 precision in predicting relation pair equivalency and 0.78~0.98 precision in relation clustering. 1
2 0.11404039 198 acl-2013-IndoNet: A Multilingual Lexical Knowledge Network for Indian Languages
Author: Brijesh Bhatt ; Lahari Poddar ; Pushpak Bhattacharyya
Abstract: We present IndoNet, a multilingual lexical knowledge base for Indian languages. It is a linked structure of wordnets of 18 different Indian languages, Universal Word dictionary and the Suggested Upper Merged Ontology (SUMO). We discuss various benefits of the network and challenges involved in the development. The system is encoded in Lexical Markup Framework (LMF) and we propose modifications in LMF to accommodate Universal Word Dictionary and SUMO. This standardized version of lexical knowledge base of Indian Languages can now easily , be linked to similar global resources.
3 0.098868676 306 acl-2013-SPred: Large-scale Harvesting of Semantic Predicates
Author: Tiziano Flati ; Roberto Navigli
Abstract: We present SPred, a novel method for the creation of large repositories of semantic predicates. We start from existing collocations to form lexical predicates (e.g., break ∗) and learn the semantic classes that best f∗it) tahned ∗ argument. Taon idco this, we extract failtl thhee ∗ occurrences ion Wikipedia ewxthraiccht match the predicate and abstract its arguments to general semantic classes (e.g., break BODY PART, break AGREEMENT, etc.). Our experiments show that we are able to create a large collection of semantic predicates from the Oxford Advanced Learner’s Dictionary with high precision and recall, and perform well against the most similar approach.
4 0.087549455 71 acl-2013-Bootstrapping Entity Translation on Weakly Comparable Corpora
Author: Taesung Lee ; Seung-won Hwang
Abstract: This paper studies the problem of mining named entity translations from comparable corpora with some “asymmetry”. Unlike the previous approaches relying on the “symmetry” found in parallel corpora, the proposed method is tolerant to asymmetry often found in comparable corpora, by distinguishing different semantics of relations of entity pairs to selectively propagate seed entity translations on weakly comparable corpora. Our experimental results on English-Chinese corpora show that our selective propagation approach outperforms the previous approaches in named entity translation in terms of the mean reciprocal rank by up to 0.16 for organization names, and 0.14 in a low com- parability case.
5 0.082741246 12 acl-2013-A New Set of Norms for Semantic Relatedness Measures
Author: Sean Szumlanski ; Fernando Gomez ; Valerie K. Sims
Abstract: We have elicited human quantitative judgments of semantic relatedness for 122 pairs of nouns and compiled them into a new set of relatedness norms that we call Rel-122. Judgments from individual subjects in our study exhibit high average correlation to the resulting relatedness means (r = 0.77, σ = 0.09, N = 73), although not as high as Resnik’s (1995) upper bound for expected average human correlation to similarity means (r = 0.90). This suggests that human perceptions of relatedness are less strictly constrained than perceptions of similarity and establishes a clearer expectation for what constitutes human-like performance by a computational measure of semantic relatedness. We compare the results of several WordNet-based similarity and relatedness measures to our Rel-122 norms and demonstrate the limitations of WordNet for discovering general indications of semantic relatedness. We also offer a critique of the field’s reliance upon similarity norms to evaluate relatedness measures.
6 0.079795562 291 acl-2013-Question Answering Using Enhanced Lexical Semantic Models
7 0.079649851 169 acl-2013-Generating Synthetic Comparable Questions for News Articles
8 0.072935618 139 acl-2013-Entity Linking for Tweets
9 0.068373419 152 acl-2013-Extracting Definitions and Hypernym Relations relying on Syntactic Dependencies and Support Vector Machines
10 0.067695364 41 acl-2013-Aggregated Word Pair Features for Implicit Discourse Relation Disambiguation
11 0.066521548 252 acl-2013-Multigraph Clustering for Unsupervised Coreference Resolution
12 0.065608248 11 acl-2013-A Multi-Domain Translation Model Framework for Statistical Machine Translation
13 0.065597847 61 acl-2013-Automatic Interpretation of the English Possessive
14 0.065267757 345 acl-2013-The Haves and the Have-Nots: Leveraging Unlabelled Corpora for Sentiment Analysis
15 0.064783715 62 acl-2013-Automatic Term Ambiguity Detection
16 0.064721592 43 acl-2013-Align, Disambiguate and Walk: A Unified Approach for Measuring Semantic Similarity
17 0.064347021 47 acl-2013-An Information Theoretic Approach to Bilingual Word Clustering
18 0.06341435 304 acl-2013-SEMILAR: The Semantic Similarity Toolkit
19 0.060815644 31 acl-2013-A corpus-based evaluation method for Distributional Semantic Models
20 0.059092183 231 acl-2013-Linggle: a Web-scale Linguistic Search Engine for Words in Context
topicId topicWeight
[(0, 0.169), (1, 0.068), (2, 0.005), (3, -0.101), (4, -0.008), (5, 0.023), (6, -0.085), (7, -0.002), (8, 0.014), (9, -0.028), (10, -0.006), (11, -0.014), (12, -0.028), (13, 0.036), (14, 0.01), (15, -0.009), (16, 0.04), (17, 0.002), (18, -0.049), (19, -0.023), (20, -0.034), (21, 0.039), (22, 0.014), (23, 0.03), (24, 0.013), (25, 0.036), (26, 0.007), (27, 0.022), (28, -0.06), (29, 0.049), (30, 0.038), (31, 0.057), (32, 0.02), (33, -0.084), (34, 0.06), (35, 0.012), (36, -0.046), (37, -0.011), (38, -0.042), (39, 0.039), (40, 0.015), (41, 0.04), (42, -0.048), (43, 0.045), (44, -0.042), (45, -0.061), (46, -0.008), (47, -0.057), (48, 0.008), (49, -0.055)]
simIndex simValue paperId paperTitle
same-paper 1 0.94983202 242 acl-2013-Mining Equivalent Relations from Linked Data
Author: Ziqi Zhang ; Anna Lisa Gentile ; Isabelle Augenstein ; Eva Blomqvist ; Fabio Ciravegna
Abstract: Linking heterogeneous resources is a major research challenge in the Semantic Web. This paper studies the task of mining equivalent relations from Linked Data, which was insufficiently addressed before. We introduce an unsupervised method to measure equivalency of relation pairs and cluster equivalent relations. Early experiments have shown encouraging results with an average of 0.75~0.87 precision in predicting relation pair equivalency and 0.78~0.98 precision in relation clustering. 1
2 0.70529026 198 acl-2013-IndoNet: A Multilingual Lexical Knowledge Network for Indian Languages
Author: Brijesh Bhatt ; Lahari Poddar ; Pushpak Bhattacharyya
Abstract: We present IndoNet, a multilingual lexical knowledge base for Indian languages. It is a linked structure of wordnets of 18 different Indian languages, Universal Word dictionary and the Suggested Upper Merged Ontology (SUMO). We discuss various benefits of the network and challenges involved in the development. The system is encoded in Lexical Markup Framework (LMF) and we propose modifications in LMF to accommodate Universal Word Dictionary and SUMO. This standardized version of lexical knowledge base of Indian Languages can now easily , be linked to similar global resources.
3 0.68679231 61 acl-2013-Automatic Interpretation of the English Possessive
Author: Stephen Tratz ; Eduard Hovy
Abstract: The English ’s possessive construction occurs frequently in text and can encode several different semantic relations; however, it has received limited attention from the computational linguistics community. This paper describes the creation of a semantic relation inventory covering the use of ’s, an inter-annotator agreement study to calculate how well humans can agree on the relations, a large collection of possessives annotated according to the relations, and an accurate automatic annotation system for labeling new examples. Our 21,938 example dataset is by far the largest annotated possessives dataset we are aware of, and both our automatic classification system, which achieves 87.4% accuracy in our classification experiment, and our annotation data are publicly available.
4 0.65998012 268 acl-2013-PATHS: A System for Accessing Cultural Heritage Collections
Author: Eneko Agirre ; Nikolaos Aletras ; Paul Clough ; Samuel Fernando ; Paula Goodale ; Mark Hall ; Aitor Soroa ; Mark Stevenson
Abstract: This paper describes a system for navigating large collections of information about cultural heritage which is applied to Europeana, the European Library. Europeana contains over 20 million artefacts with meta-data in a wide range of European languages. The system currently provides access to Europeana content with meta-data in English and Spanish. The paper describes how Natural Language Processing is used to enrich and organise this meta-data to assist navigation through Europeana and shows how this information is used within the system.
5 0.63784492 215 acl-2013-Large-scale Semantic Parsing via Schema Matching and Lexicon Extension
Author: Qingqing Cai ; Alexander Yates
Abstract: Supervised training procedures for semantic parsers produce high-quality semantic parsers, but they have difficulty scaling to large databases because of the sheer number of logical constants for which they must see labeled training data. We present a technique for developing semantic parsers for large databases based on a reduction to standard supervised training algorithms, schema matching, and pattern learning. Leveraging techniques from each of these areas, we develop a semantic parser for Freebase that is capable of parsing questions with an F1 that improves by 0.42 over a purely-supervised learning algorithm.
6 0.637456 281 acl-2013-Post-Retrieval Clustering Using Third-Order Similarity Measures
7 0.60753435 96 acl-2013-Creating Similarity: Lateral Thinking for Vertical Similarity Judgments
8 0.59980386 262 acl-2013-Offspring from Reproduction Problems: What Replication Failure Teaches Us
9 0.58996809 231 acl-2013-Linggle: a Web-scale Linguistic Search Engine for Words in Context
10 0.57903707 185 acl-2013-Identifying Bad Semantic Neighbors for Improving Distributional Thesauri
11 0.57045394 324 acl-2013-Smatch: an Evaluation Metric for Semantic Feature Structures
12 0.56669623 306 acl-2013-SPred: Large-scale Harvesting of Semantic Predicates
13 0.56390446 12 acl-2013-A New Set of Norms for Semantic Relatedness Measures
14 0.5620811 161 acl-2013-Fluid Construction Grammar for Historical and Evolutionary Linguistics
16 0.55053729 41 acl-2013-Aggregated Word Pair Features for Implicit Discourse Relation Disambiguation
17 0.55050772 138 acl-2013-Enriching Entity Translation Discovery using Selective Temporality
18 0.54986423 48 acl-2013-An Open Source Toolkit for Quantitative Historical Linguistics
19 0.54847473 170 acl-2013-GlossBoot: Bootstrapping Multilingual Domain Glossaries from the Web
20 0.5480752 76 acl-2013-Building and Evaluating a Distributional Memory for Croatian
topicId topicWeight
[(0, 0.069), (6, 0.041), (11, 0.104), (24, 0.037), (26, 0.046), (35, 0.06), (42, 0.046), (48, 0.057), (64, 0.02), (70, 0.074), (88, 0.031), (90, 0.026), (95, 0.066), (97, 0.262)]
simIndex simValue paperId paperTitle
same-paper 1 0.77125418 242 acl-2013-Mining Equivalent Relations from Linked Data
Author: Ziqi Zhang ; Anna Lisa Gentile ; Isabelle Augenstein ; Eva Blomqvist ; Fabio Ciravegna
Abstract: Linking heterogeneous resources is a major research challenge in the Semantic Web. This paper studies the task of mining equivalent relations from Linked Data, which was insufficiently addressed before. We introduce an unsupervised method to measure equivalency of relation pairs and cluster equivalent relations. Early experiments have shown encouraging results with an average of 0.75~0.87 precision in predicting relation pair equivalency and 0.78~0.98 precision in relation clustering. 1
2 0.71451962 254 acl-2013-Multimodal DBN for Predicting High-Quality Answers in cQA portals
Author: Haifeng Hu ; Bingquan Liu ; Baoxun Wang ; Ming Liu ; Xiaolong Wang
Abstract: In this paper, we address the problem for predicting cQA answer quality as a classification task. We propose a multimodal deep belief nets based approach that operates in two stages: First, the joint representation is learned by taking both textual and non-textual features into a deep learning network. Then, the joint representation learned by the network is used as input features for a linear classifier. Extensive experimental results conducted on two cQA datasets demonstrate the effectiveness of our proposed approach.
3 0.67867362 338 acl-2013-Task Alternation in Parallel Sentence Retrieval for Twitter Translation
Author: Felix Hieber ; Laura Jehl ; Stefan Riezler
Abstract: We present an approach to mine comparable data for parallel sentences using translation-based cross-lingual information retrieval (CLIR). By iteratively alternating between the tasks of retrieval and translation, an initial general-domain model is allowed to adapt to in-domain data. Adaptation is done by training the translation system on a few thousand sentences retrieved in the step before. Our setup is time- and memory-efficient and of similar quality as CLIR-based adaptation on millions of parallel sentences.
4 0.63434416 226 acl-2013-Learning to Prune: Context-Sensitive Pruning for Syntactic MT
Author: Wenduan Xu ; Yue Zhang ; Philip Williams ; Philipp Koehn
Abstract: We present a context-sensitive chart pruning method for CKY-style MT decoding. Source phrases that are unlikely to have aligned target constituents are identified using sequence labellers learned from the parallel corpus, and speed-up is obtained by pruning corresponding chart cells. The proposed method is easy to implement, orthogonal to cube pruning and additive to its pruning power. On a full-scale Englishto-German experiment with a string-totree model, we obtain a speed-up of more than 60% over a strong baseline, with no loss in BLEU.
5 0.57822883 155 acl-2013-Fast and Accurate Shift-Reduce Constituent Parsing
Author: Muhua Zhu ; Yue Zhang ; Wenliang Chen ; Min Zhang ; Jingbo Zhu
Abstract: Shift-reduce dependency parsers give comparable accuracies to their chartbased counterparts, yet the best shiftreduce constituent parsers still lag behind the state-of-the-art. One important reason is the existence of unary nodes in phrase structure trees, which leads to different numbers of shift-reduce actions between different outputs for the same input. This turns out to have a large empirical impact on the framework of global training and beam search. We propose a simple yet effective extension to the shift-reduce process, which eliminates size differences between action sequences in beam-search. Our parser gives comparable accuracies to the state-of-the-art chart parsers. With linear run-time complexity, our parser is over an order of magnitude faster than the fastest chart parser.
6 0.57460672 7 acl-2013-A Lattice-based Framework for Joint Chinese Word Segmentation, POS Tagging and Parsing
7 0.57383007 358 acl-2013-Transition-based Dependency Parsing with Selectional Branching
8 0.57292324 132 acl-2013-Easy-First POS Tagging and Dependency Parsing with Beam Search
9 0.57226622 333 acl-2013-Summarization Through Submodularity and Dispersion
10 0.57180542 274 acl-2013-Parsing Graphs with Hyperedge Replacement Grammars
11 0.57117844 17 acl-2013-A Random Walk Approach to Selectional Preferences Based on Preference Ranking and Propagation
12 0.57113737 83 acl-2013-Collective Annotation of Linguistic Resources: Basic Principles and a Formal Model
13 0.56966734 156 acl-2013-Fast and Adaptive Online Training of Feature-Rich Translation Models
14 0.56955761 272 acl-2013-Paraphrase-Driven Learning for Open Question Answering
15 0.56911707 245 acl-2013-Modeling Human Inference Process for Textual Entailment Recognition
16 0.56888819 123 acl-2013-Discriminative Learning with Natural Annotations: Word Segmentation as a Case Study
17 0.56881917 169 acl-2013-Generating Synthetic Comparable Questions for News Articles
18 0.5687198 164 acl-2013-FudanNLP: A Toolkit for Chinese Natural Language Processing
19 0.56612796 275 acl-2013-Parsing with Compositional Vector Grammars
20 0.5660795 173 acl-2013-Graph-based Semi-Supervised Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging