acl acl2013 acl2013-306 knowledge-graph by maker-knowledge-mining

306 acl-2013-SPred: Large-scale Harvesting of Semantic Predicates

Source: pdf

Author: Tiziano Flati ; Roberto Navigli

Abstract: We present SPred, a novel method for the creation of large repositories of semantic predicates. We start from existing collocations to form lexical predicates (e.g., break ∗) and learn the semantic classes that best f∗it) tahned ∗ argument. Taon idco this, we extract failtl thhee ∗ occurrences ion Wikipedia ewxthraiccht match the predicate and abstract its arguments to general semantic classes (e.g., break BODY PART, break AGREEMENT, etc.). Our experiments show that we are able to create a large collection of semantic predicates from the Oxford Advanced Learner’s Dictionary with high precision and recall, and perform well against the most similar approach.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 We start from existing collocations to form lexical predicates (e. [sent-4, score-0.32]

2 , break ∗) and learn the semantic classes that best f∗it) tahned ∗ argument. [sent-6, score-0.489]

3 Taon idco this, we extract failtl thhee ∗ occurrences ion Wikipedia ewxthraiccht match the predicate and abstract its arguments to general semantic classes (e. [sent-7, score-0.986]

4 Our experiments show that we are able to create a large collection of semantic predicates from the Oxford Advanced Learner’s Dictionary with high precision and recall, and perform well against the most similar approach. [sent-11, score-0.385]

5 For instance, which semantic classes are expected as a direct object of the verb break? [sent-34, score-0.366]

6 These approaches leverage lexico-syntactic patterns and input seeds to recursively learn the semantic classes of relation arguments. [sent-38, score-0.399]

7 However, they require the manual selection of one or more seeds for each pattern of interest, and this selection influences the amount and kind of semantic classes to be learned. [sent-39, score-0.366]

8 The goal of our research is to create a largescale repository of semantic predicates whose lexical arguments are replaced by their semantic classes. [sent-41, score-0.882]

9 Ac s2s0o1ci3a Atiosnso fcoirat Cio nm foprut Caotimonpaulta Lti nognuails Lti cnsg,u piasgteics 12 2–1232, ing semantic predicate break a BODY PART, where BODY PART is a class comprising several lexical realizations, such as leg, arm, foot, etc. [sent-44, score-0.841]

10 , n), c ∈ C is a semantic class selected from a fixed scet ∈ ∈C C Cof i scla ass seems, aanntdic ci c∈l {s0s, . [sent-87, score-0.308]

11 ntic predicate cup of BEVERAGE,1 where BEVERAGE is a semantic class representing beverages. [sent-95, score-0.881]

12 This predicate matches phrases like cup of coffee, cup of tea, etc. [sent-96, score-0.822]

13 Semantic predicates mix the lexical information of a given lexical predicate with the explicit semantic modeling of its argument. [sent-99, score-0.881]

14 Importantly, the same lexical predicate can have different classes as its argument, like cup of FOOD vs. [sent-100, score-0.874]

15 Note, however, that different classes might convey different semantics for the same lexical 1In what follows we denote the SEMANTIC CLASS in small capitals and the lexical predicate in italics. [sent-102, score-0.711]

16 predicate, such as cup of COUNTRY, referring to cup as a prize instead of cup as a container. [sent-103, score-0.747]

17 3 Large-Scale Harvesting of Semantic Predicates The goal of this paper is to provide a fully automatic approach for the creation of a large repository of semantic predicates in three phases. [sent-104, score-0.385]

18 We extract all its possible filling arguments from Wikipedia, e. [sent-108, score-0.406]

19 We disambiguate as many filling arguments as possible using Wikipedia, obtaining a set of corresponding Wikipedia pages, e. [sent-114, score-0.443]

20 We create the semantic predicates by generalizing the Wikipedia pages to their most suitable semantic classes, e. [sent-120, score-0.628]

21 We can then exploit the learned semantic predi- cates to assign the most suitable semantic class to new filling arguments for the given lexical predicate (Section 3. [sent-125, score-1.326]

22 1 Extraction of Filling Arguments Let π be an input lexical predicate (e. [sent-128, score-0.41]

23 s Weaerc shhionwg Wikipedia for the arguments of the lexical predicate a * of milk in Table 1. [sent-135, score-0.771]

24 The output of this first step is a set Lπ of triples (a, s, l) of filling arguments a matching the lexical predicate π in a sentence s of the Wikipedia corpus, with a potentially linked to a page l (e. [sent-137, score-1.061]

25 2 Disambiguation of Filling Arguments The objective of the second step is to disambiguate as many arguments in Lπ as possible for the lex2We will also refer to l as the sense of a in sentence s. [sent-143, score-0.367]

26 1223 Table1:Aa nevfcixuorcelydnbh[rosp[atibg nto[uldbaetcoslnutf]pletho f tom mki lek nsequ cs which match the lexical predicate a * of milk in Wikipedia (filling argument shown in the second column; following the Wikipedia convention we provide links in double square brackets). [sent-144, score-0.617]

27 }lin ⊆ked L to the corresponding Wikipedia page = (like the top three linked arguments in Table 1). [sent-148, score-0.457]

28 )eur ∈ist Uics: • One sense per page: if another occurrence oOfn ae sine tshee psaerme p a Wgeik:i ipfed aniao page c(cinudrerpenencedent of the lexical predicate) is linked to a page l, then remove (a, s, ? [sent-160, score-0.471]

29 For instance, cup of coffee appears in the Wikipedia page Energy drink in the sentence “[. [sent-163, score-0.495]

30 ] energy drinks contain more caffeine than a strong cup of coffee”, but this occurrence of coffee is not linked. [sent-166, score-0.414]

31 ] combined with a cup of coffee and a half-boiled egg” is not linked, but we have collected many other occurrences, all linked to the Coffee page, so this link gets propagated to our ambiguous item as well. [sent-175, score-0.531]

32 Consider the instance “At that point, Smith threw down a cup of Gatorade” in page Jimmy Clausen; there is only one sense for Gatorade in Wikipedia, so we link the unannotated occurrence to it. [sent-180, score-0.478]

33 As a result, the initial set of disambiguated arguments in Dπ is augmented with all those triples for which any of the above three heuristics apply. [sent-181, score-0.346]

34 3 Generalization to Semantic Classes Our final objective is to generalize the annotated arguments to semantic classes picked out from a fixed set C of classes. [sent-194, score-0.626]

35 We perform this in two substeps: we first link all our disambiguated arguments to WordNet (Section 3. [sent-196, score-0.339]

36 1) and then leverage the WordNet taxonomy to populate the semantic classes in C (Section 3. [sent-198, score-0.452]

37 While it is true that attached to each Wikipedia page there are one or more categories, these categories just provide shallow information about the class the page 1224 belongs to. [sent-205, score-0.431]

38 Indeed, categories are not ideal for representing the semantic classes of a Wikipedia page for at least three reasons: i) many categories do not express taxonomic information (e. [sent-206, score-0.56]

39 , 2009; Erk and McCarthy, 2009; Huang and Riloff, 2010), we pick out our semantic classes C from WordNet and leverage its manually-curated taxonomy to associate our arguments with the most suitable class. [sent-211, score-0.763]

40 This way we avoid building a new taxonomy and shift the problem to that of projecting the Wikipedia pages associated with annotated filling arguments to – – µ synsets in WordNet. [sent-212, score-0.573]

41 For instance, the mapping provided by BabelNet does not provide any link for the page Peter Spence; thanks to WCL, though, we are able to set the page Journalist as its hypernym, and link it to the WordNet synset journalistn1. [sent-229, score-0.37]

42 2 Populating the Semantic Classes We now proceed to populating the semantic classes in C with the annotated arguments obtained for the lexical predicate π. [sent-235, score-1.036]

43 The semantic class for a WordNet synset S is the class c among those in C which is the most specific hypernym of S according to the WordNet taxonomy. [sent-237, score-0.597]

44 For instance, given the synset tap watern1, its semantic class is watern1 (while the other more general subsumers in C are not considered, e. [sent-238, score-0.43]

45 For each argument a ∈ A for which a Wikipedia-to-WordNet maa ∈ppi Ang µ(sense(a)) could be established as a result of the linking procedure described above, we associate a with the semantic class of µ(sense(a)). [sent-241, score-0.414]

46 For example, consider the case in which a is equal to tap water and sense(a) is equal to the Wikipedia page Tap water, in turn mapped to tap watern1 via µ; we thus associate tap water with its semantic class watern1. [sent-242, score-0.706]

47 7 Ultimately, for each class c ∈ C, we obtain a sUetlt smuaptpeolyr,t(c fo) m eaadche up osf c a l∈l t Che, arguments a ∈ A associated with c. [sent-244, score-0.417]

48 1th),a ntkhes support of a class can also contain arguments not covered in WordNet (e. [sent-259, score-0.417]

49 Since not all classes are equally relevant to the lexical predicate π, we estimate the conditional probability of each class c ∈ C given π on the pbarosibsa obifl tithye nofum eabcehr o clfa assesnt cen ∈ces C Cwh giivchen nco πnt oanin t hane argument in that class. [sent-267, score-0.922]

50 -p Arosba abnil eitxy- classesP for theP Plexical predicate cup of ∗. [sent-269, score-0.573]

51 ciation of each semantic class c with a target lexical predicate w1 w2 . [sent-271, score-0.718]

52 4 Classification of new arguments Once the semantic predicates for the input lexical predicate π have been learned, we can classify a new filling argument a of π. [sent-285, score-1.307]

53 Next, we create a distributional vector for each class c ∈ C as follows: c = PS∈desc(c) S~, where desc(c) iPs the set of all synsets which are descendants of the semantic class c in WordNet. [sent-290, score-0.575]

54 As a result we obtain a predicate-independent distributional description for each semantic class in C. [sent-291, score-0.345]

55 Now, given an argument a of a lexical predicate π, we create a distributional vector by summing the noun occurrences of all the sentences s such that (a, s, l) ∈ Lπ (cf. [sent-293, score-0.589]

56 Let Ca be the set of candidate semantic classes for argument a, i. [sent-297, score-0.472]

57 , Ca contains the semantic classes for the WordNet synsets of a as well as the semantic classes associated with µ(p) for all Wikipedia pages p whose lemma is a. [sent-299, score-0.846]

58 Then, we determine the most suitable semantic class c ∈ Ca of argument a as the class twicith c ltahses h cig ∈heCs t distributional probability, estimated as: Pdistr(c|π,a) =Pc0∈sCimas(i c~m, a~() c~0, a~). [sent-302, score-0.659]

59 Given a textual expression such as virus replicate, we: (i) extract all the filling arguments of the lexical predicate * replicate; (ii) link and disambiguate the extracted filling arguments; (iii) query our system for the available virus semantic classes (i. [sent-305, score-1.602]

60 e,c {tovrirsu sfor 1226 the candidate semantic classes and the given input argument; (v) calculate the probability mixture. [sent-308, score-0.366]

61 For both evaluations, we use a lexical predicate dataset built from the Oxford Advanced Learner’s Dictionary (Crowther, 1998). [sent-314, score-0.41]

62 1 Set of Semantic Classes The selection of which semantic classes to include in the set C is of great importance. [sent-316, score-0.366]

63 In fact, having too many classes will end up in an overly finegrained inventory of meanings, whereas an excessively small number of classes will provide little discriminatory power. [sent-317, score-0.475]

64 As our set C of semantic classes we selected the standard set of 3,299 core nominal synsets available in WordNet. [sent-318, score-0.439]

65 2 Datasets The Oxford Advanced Learner’s Dictionary provides usage notes that contain typical predicates in various semantic domains in English, e. [sent-321, score-0.385]

66 The splitting procedure generated 6,220 instantiated lexical predicate items overall. [sent-332, score-0.476]

67 For instance, the three items bacteria/microbes/viruses spread were generalized into the lexical predicate * spread. [sent-344, score-0.527]

68 The total number of different lexical predicates obtained was 1,446, totaling 1,429 distinct verbs (note that the dataset might contain the lexical predicate * spread as well as spread *). [sent-345, score-0.866]

69 For each lexical predicate we calculated the conditional probability of each semantic class using Formula 1, resulting in a ranking of semantic classes. [sent-347, score-0.903]

70 , locationn1 is a valid semantic class for travel to * while emotionn1 is not. [sent-350, score-0.308]

71 We note that high performance, attaining above 80%, can be achieved 10The low number of items per predicate is due to the original Oxford resource. [sent-358, score-0.39]

72 any semantic class 1227 by focusing up to the first 7 classes output by our system, with a 94% precision@ 1. [sent-360, score-0.523]

73 Starting from the lexical predicate items obtained as described in Section 4. [sent-363, score-0.476]

74 , virus in viruses spread) with the most suitable semantic class (e. [sent-367, score-0.435]

75 In this second evaluation we measure the accuracy of our method at assigning the most suitable semantic class to the argument of a lexical predicate item in our gold standard. [sent-373, score-0.909]

76 Precision is the number of items which are assigned the correct class (as evaluated by a human) over the number of items which are assigned a class by the system. [sent-377, score-0.446]

77 For tuning α we used a held-out set of 8 verbs, randomly sampled from the lexical predicates not used in the dataset. [sent-384, score-0.32]

78 We created a tuning set using the annotated arguments in Wikipedia for these verbs: we trained the model on 80% of the annotated lexical predicate arguments (i. [sent-385, score-0.93]

79 We also compared against a random baseline that randomly selects one out of all the candidate semantic classes for each item, achieving only moderate results. [sent-404, score-0.366]

80 Starting from the entire set of 1,446 lexical predicates from the Oxford dictionary (see Section 4. [sent-417, score-0.32]

81 We note that, while the amount of originally linked arguments is very low (about 2. [sent-425, score-0.339]

82 , 68 out of almost 74 million) remain unlinked, the ratio of distinct arguments which we linked to WordNet is considerably higher, calculated as 3,723,979 linked arguments over 12,43 1,564 distinct arguments, i. [sent-432, score-0.712]

83 The most similar approach is that of Kozareva and Hovy (2010, K&H;) who assign supertypes to the arguments of arbitrary relations, a task which resembles our semantic predicate ranking. [sent-436, score-0.819]

84 We calculated precision@k of the semantic classes obtained for each relation in the dataset of K&H. [sent-456, score-0.4]

85 Although we cannot report recall, we list the number of Wikipedia arguments and associated classes in Table 7, which provides an estimate of the extraction capability of SPred. [sent-465, score-0.513]

86 The large number of classes found for the arguments demonstrates the ability of our method to generalize to a variety of semantic classes. [sent-466, score-0.626]

87 PredicateNumber of argsNumber of classes classes for the 12 most frequent lexical predicates of Kozareva and Hovy (2010) extracted by SPred from Wikipedia. [sent-467, score-0.75]

88 However, these resources often operate purely at the lexical level, providing no information on the semantics of their arguments or relations. [sent-473, score-0.346]

89 Several studies have examined adding semantics through grouping relations into sets (Yates and Etzioni, 2009), ontologizing the arguments (Chklovski and Pantel, 2004), or ontologizing the relations themselves (Moro and Navigli, 2013). [sent-474, score-0.5]

90 Our method for identifying the different semantic classes ofpredicate arguments is closely related to the task of identifying selectional preferences. [sent-483, score-0.709]

91 The most similar approaches to it are taxonomybased ones, which leverage the semantic types of the relations arguments (Resnik, 1996; Li and Abe, 1998; Clark and Weir, 2002; Pennacchiotti and Pantel, 2006). [sent-484, score-0.488]

92 As a result, alternative approaches have been proposed that eschew taxonomies in favor of rating the quality of potential relation arguments (Erk, 2007; Chambers and Jurafsky, 2010) or generating probability distributions over the arguments (Rooth et al. [sent-486, score-0.52]

93 Another closely related work is that of Hanks (2013) concerning the Theory of Norms and Exploitations, where norms (exploitations) represent expected (unexpected) classes for a given lexical predicate. [sent-497, score-0.34]

94 The closest technical approach to ours is that of Kozareva and Hovy (2010), who use recursive patterns to induce semantic classes for the arguments of relational patterns. [sent-499, score-0.626]

95 Whereas their approach requires both a relation pattern and one or more seeds, which bias the types of semantic classes that are learned, our proposed method requires only the pattern itself, and as a result is capable of learning an unbounded number of different semantic classes. [sent-500, score-0.517]

96 In order to semantify lexical predicates we exploit the wide coverage of Wikipedia to extract and disambiguate lexical predicate occurrences, and leverage WordNet to populate the semantic classes with suitable predicate arguments. [sent-502, score-1.579]

97 As a result, we are able to ontologize lexical predicate instances like those available in existing dictionaries (e. [sent-503, score-0.41]

98 , break a toe) into semantic predicates (such as break a BODY PART). [sent-505, score-0.631]

99 , Wikipedias in other languages), provided lexical predicates can be extracted with associated semantic classes. [sent-514, score-0.471]

100 In order to support future efforts we are releasing our semantic predicates as a freely available – resource. [sent-515, score-0.385]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('predicate', 0.324), ('arguments', 0.26), ('cup', 0.249), ('predicates', 0.234), ('classes', 0.215), ('wikipedia', 0.196), ('class', 0.157), ('semantic', 0.151), ('filling', 0.146), ('navigli', 0.143), ('coffee', 0.128), ('break', 0.123), ('wordnet', 0.122), ('page', 0.118), ('spred', 0.118), ('kozareva', 0.116), ('babelnet', 0.111), ('oxford', 0.108), ('argument', 0.106), ('milk', 0.101), ('moro', 0.095), ('lexical', 0.086), ('supertypes', 0.084), ('selectional', 0.083), ('hypernym', 0.08), ('roberto', 0.08), ('linked', 0.079), ('ontologizing', 0.076), ('virus', 0.076), ('synsets', 0.073), ('faralli', 0.073), ('sense', 0.07), ('formula', 0.07), ('tap', 0.07), ('spend', 0.067), ('items', 0.066), ('yates', 0.06), ('hovy', 0.059), ('ontologized', 0.057), ('stefano', 0.056), ('velardi', 0.056), ('ponzetto', 0.054), ('etzioni', 0.053), ('taxonomy', 0.053), ('synset', 0.052), ('spread', 0.051), ('suitable', 0.051), ('exploitations', 0.051), ('wi', 0.049), ('triples', 0.048), ('stroudsburg', 0.047), ('uppsala', 0.047), ('disambiguation', 0.047), ('inventory', 0.045), ('textual', 0.044), ('relations', 0.044), ('chklovski', 0.044), ('pa', 0.042), ('link', 0.041), ('pages', 0.041), ('roma', 0.04), ('oren', 0.039), ('norms', 0.039), ('riloff', 0.039), ('stern', 0.039), ('categories', 0.038), ('crowther', 0.038), ('gatorade', 0.038), ('igo', 0.038), ('kcorrecttotal', 0.038), ('kprec', 0.038), ('menta', 0.038), ('pclass', 0.038), ('pdistr', 0.038), ('semantify', 0.038), ('thelen', 0.038), ('wisenet', 0.038), ('disambiguated', 0.038), ('extraction', 0.038), ('disambiguate', 0.037), ('energy', 0.037), ('distributional', 0.037), ('fader', 0.036), ('occurrences', 0.036), ('water', 0.035), ('wn', 0.035), ('calculated', 0.034), ('carlson', 0.034), ('weld', 0.034), ('item', 0.034), ('tithye', 0.034), ('supertype', 0.034), ('totaling', 0.034), ('izquierdo', 0.034), ('bottle', 0.034), ('cocoa', 0.034), ('desc', 0.034), ('yakushiji', 0.034), ('leverage', 0.033), ('harvesting', 0.033)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000006 306 acl-2013-SPred: Large-scale Harvesting of Semantic Predicates

Author: Tiziano Flati ; Roberto Navigli

2 0.24602647 17 acl-2013-A Random Walk Approach to Selectional Preferences Based on Preference Ranking and Propagation

Author: Zhenhua Tian ; Hengheng Xiang ; Ziqi Liu ; Qinghua Zheng

Abstract: This paper presents an unsupervised random walk approach to alleviate data sparsity for selectional preferences. Based on the measure of preferences between predicates and arguments, the model aggregates all the transitions from a given predicate to its nearby predicates, and propagates their argument preferences as the given predicate’s smoothed preferences. Experimental results show that this approach outperforms several state-of-the-art methods on the pseudo-disambiguation task, and it better correlates with human plausibility judgements.

3 0.22859861 314 acl-2013-Semantic Roles for String to Tree Machine Translation

Author: Marzieh Bazrafshan ; Daniel Gildea

Abstract: We experiment with adding semantic role information to a string-to-tree machine translation system based on the rule extraction procedure of Galley et al. (2004). We compare methods based on augmenting the set of nonterminals by adding semantic role labels, and altering the rule extraction process to produce a separate set of rules for each predicate that encompass its entire predicate-argument structure. Our results demonstrate that the second approach is effective in increasing the quality of translations.

4 0.2127644 189 acl-2013-ImpAr: A Deterministic Algorithm for Implicit Semantic Role Labelling

Author: Egoitz Laparra ; German Rigau

Abstract: This paper presents a novel deterministic algorithm for implicit Semantic Role Labeling. The system exploits a very simple but relevant discursive property, the argument coherence over different instances of a predicate. The algorithm solves the implicit arguments sequentially, exploiting not only explicit but also the implicit arguments previously solved. In addition, we empirically demonstrate that the algorithm obtains very competitive and robust performances with respect to supervised approaches that require large amounts of costly training data.

5 0.2084866 6 acl-2013-A Java Framework for Multilingual Definition and Hypernym Extraction

Author: Stefano Faralli ; Roberto Navigli

Abstract: In this paper we present a demonstration of a multilingual generalization of Word-Class Lattices (WCLs), a supervised lattice-based model used to identify textual definitions and extract hypernyms from them. Lattices are learned from a dataset of automatically-annotated definitions from Wikipedia. We release a Java API for the programmatic use of multilingual WCLs in three languages (English, French and Italian), as well as a Web application for definition and hypernym extraction from user-provided sentences.

6 0.19875579 27 acl-2013-A Two Level Model for Context Sensitive Inference Rules

7 0.18196177 267 acl-2013-PARMA: A Predicate Argument Aligner

8 0.17618927 43 acl-2013-Align, Disambiguate and Walk: A Unified Approach for Measuring Semantic Similarity

9 0.17049129 152 acl-2013-Extracting Definitions and Hypernym Relations relying on Syntactic Dependencies and Support Vector Machines

10 0.14709897 376 acl-2013-Using Lexical Expansion to Learn Inference Rules from Sparse Data

11 0.13829979 98 acl-2013-Cross-lingual Transfer of Semantic Role Labeling Models

12 0.13806038 179 acl-2013-HYENA-live: Fine-Grained Online Entity Type Classification from Natural-language Text

13 0.11989705 56 acl-2013-Argument Inference from Relevant Event Mentions in Chinese Argument Extraction

14 0.11568705 231 acl-2013-Linggle: a Web-scale Linguistic Search Engine for Words in Context

15 0.10845632 234 acl-2013-Linking and Extending an Open Multilingual Wordnet

16 0.10836863 170 acl-2013-GlossBoot: Bootstrapping Multilingual Domain Glossaries from the Web

17 0.10494874 291 acl-2013-Question Answering Using Enhanced Lexical Semantic Models

18 0.098868676 242 acl-2013-Mining Equivalent Relations from Linked Data

19 0.097004168 245 acl-2013-Modeling Human Inference Process for Textual Entailment Recognition

20 0.092445157 160 acl-2013-Fine-grained Semantic Typing of Emerging Entities

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.246), (1, 0.102), (2, -0.003), (3, -0.242), (4, -0.099), (5, 0.047), (6, -0.159), (7, 0.114), (8, 0.048), (9, 0.038), (10, 0.03), (11, 0.075), (12, 0.012), (13, 0.029), (14, 0.106), (15, -0.049), (16, 0.111), (17, 0.021), (18, 0.166), (19, 0.051), (20, 0.003), (21, 0.088), (22, -0.128), (23, 0.114), (24, 0.241), (25, 0.047), (26, 0.103), (27, -0.082), (28, 0.093), (29, 0.097), (30, -0.077), (31, 0.043), (32, 0.051), (33, 0.082), (34, 0.13), (35, -0.029), (36, -0.035), (37, 0.022), (38, -0.031), (39, 0.033), (40, -0.007), (41, -0.035), (42, -0.004), (43, 0.014), (44, 0.022), (45, 0.008), (46, -0.004), (47, -0.065), (48, 0.056), (49, -0.061)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96453506 306 acl-2013-SPred: Large-scale Harvesting of Semantic Predicates

Author: Tiziano Flati ; Roberto Navigli

2 0.75752831 189 acl-2013-ImpAr: A Deterministic Algorithm for Implicit Semantic Role Labelling

Author: Egoitz Laparra ; German Rigau

3 0.71517199 17 acl-2013-A Random Walk Approach to Selectional Preferences Based on Preference Ranking and Propagation

Author: Zhenhua Tian ; Hengheng Xiang ; Ziqi Liu ; Qinghua Zheng

4 0.64035118 6 acl-2013-A Java Framework for Multilingual Definition and Hypernym Extraction

Author: Stefano Faralli ; Roberto Navigli

5 0.63321465 267 acl-2013-PARMA: A Predicate Argument Aligner

Author: Travis Wolfe ; Benjamin Van Durme ; Mark Dredze ; Nicholas Andrews ; Charley Beller ; Chris Callison-Burch ; Jay DeYoung ; Justin Snyder ; Jonathan Weese ; Tan Xu ; Xuchen Yao

Abstract: We introduce PARMA, a system for crossdocument, semantic predicate and argument alignment. Our system combines a number of linguistic resources familiar to researchers in areas such as recognizing textual entailment and question answering, integrating them into a simple discriminative model. PARMA achieves state of the art results on an existing and a new dataset. We suggest that previous efforts have focussed on data that is biased and too easy, and we provide a more difficult dataset based on translation data with a low baseline which we beat by 17% F1.

6 0.61503989 152 acl-2013-Extracting Definitions and Hypernym Relations relying on Syntactic Dependencies and Support Vector Machines

7 0.60523176 344 acl-2013-The Effects of Lexical Resource Quality on Preference Violation Detection

8 0.57699078 376 acl-2013-Using Lexical Expansion to Learn Inference Rules from Sparse Data

9 0.57042003 314 acl-2013-Semantic Roles for String to Tree Machine Translation

10 0.5622741 170 acl-2013-GlossBoot: Bootstrapping Multilingual Domain Glossaries from the Web

11 0.55272257 198 acl-2013-IndoNet: A Multilingual Lexical Knowledge Network for Indian Languages

12 0.5311408 27 acl-2013-A Two Level Model for Context Sensitive Inference Rules

13 0.50851333 242 acl-2013-Mining Equivalent Relations from Linked Data

14 0.49853221 61 acl-2013-Automatic Interpretation of the English Possessive

15 0.49834242 234 acl-2013-Linking and Extending an Open Multilingual Wordnet

16 0.48977405 215 acl-2013-Large-scale Semantic Parsing via Schema Matching and Lexicon Extension

17 0.48964795 324 acl-2013-Smatch: an Evaluation Metric for Semantic Feature Structures

18 0.48919907 367 acl-2013-Universal Conceptual Cognitive Annotation (UCCA)

19 0.48117077 228 acl-2013-Leveraging Domain-Independent Information in Semantic Parsing

20 0.48071584 231 acl-2013-Linggle: a Web-scale Linguistic Search Engine for Words in Context

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.054), (6, 0.026), (11, 0.086), (24, 0.028), (26, 0.027), (28, 0.013), (35, 0.081), (42, 0.033), (48, 0.388), (64, 0.027), (70, 0.032), (71, 0.012), (88, 0.03), (90, 0.016), (95, 0.06)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.97954482 334 acl-2013-Supervised Model Learning with Feature Grouping based on a Discrete Constraint

Author: Jun Suzuki ; Masaaki Nagata

Abstract: This paper proposes a framework of supervised model learning that realizes feature grouping to obtain lower complexity models. The main idea of our method is to integrate a discrete constraint into model learning with the help of the dual decomposition technique. Experiments on two well-studied NLP tasks, dependency parsing and NER, demonstrate that our method can provide state-of-the-art performance even if the degrees of freedom in trained models are surprisingly small, i.e., 8 or even 2. This significant benefit enables us to provide compact model representation, which is especially useful in actual use.

2 0.96783209 39 acl-2013-Addressing Ambiguity in Unsupervised Part-of-Speech Induction with Substitute Vectors

Author: Volkan Cirik

Abstract: We study substitute vectors to solve the part-of-speech ambiguity problem in an unsupervised setting. Part-of-speech tagging is a crucial preliminary process in many natural language processing applications. Because many words in natural languages have more than one part-of-speech tag, resolving part-of-speech ambiguity is an important task. We claim that partof-speech ambiguity can be solved using substitute vectors. A substitute vector is constructed with possible substitutes of a target word. This study is built on previous work which has proven that word substitutes are very fruitful for part-ofspeech induction. Experiments show that our methodology works for words with high ambiguity.

3 0.96741384 54 acl-2013-Are School-of-thought Words Characterizable?

Author: Xiaorui Jiang ; Xiaoping Sun ; Hai Zhuge

Abstract: School of thought analysis is an important yet not-well-elaborated scientific knowledge discovery task. This paper makes the first attempt at this problem. We focus on one aspect of the problem: do characteristic school-of-thought words exist and whether they are characterizable? To answer these questions, we propose a probabilistic generative School-Of-Thought (SOT) model to simulate the scientific authoring process based on several assumptions. SOT defines a school of thought as a distribution of topics and assumes that authors determine the school of thought for each sentence before choosing words to deliver scientific ideas. SOT distinguishes between two types of school-ofthought words for either the general background of a school of thought or the original ideas each paper contributes to its school of thought. Narrative and quantitative experiments show positive and promising results to the questions raised above. 1

same-paper 4 0.94009519 306 acl-2013-SPred: Large-scale Harvesting of Semantic Predicates

Author: Tiziano Flati ; Roberto Navigli

5 0.91957933 87 acl-2013-Compositional-ly Derived Representations of Morphologically Complex Words in Distributional Semantics

Author: Angeliki Lazaridou ; Marco Marelli ; Roberto Zamparelli ; Marco Baroni

Abstract: Speakers of a language can construct an unlimited number of new words through morphological derivation. This is a major cause of data sparseness for corpus-based approaches to lexical semantics, such as distributional semantic models of word meaning. We adapt compositional methods originally developed for phrases to the task of deriving the distributional meaning of morphologically complex words from their parts. Semantic representations constructed in this way beat a strong baseline and can be of higher quality than representations directly constructed from corpus data. Our results constitute a novel evaluation of the proposed composition methods, in which the full additive model achieves the best performance, and demonstrate the usefulness of a compositional morphology component in distributional semantics.

6 0.90977061 354 acl-2013-Training Nondeficient Variants of IBM-3 and IBM-4 for Word Alignment

7 0.71959406 237 acl-2013-Margin-based Decomposed Amortized Inference

8 0.71822184 188 acl-2013-Identifying Sentiment Words Using an Optimization-based Model without Seed Words

9 0.71161276 103 acl-2013-DISSECT - DIStributional SEmantics Composition Toolkit

10 0.6910938 7 acl-2013-A Lattice-based Framework for Joint Chinese Word Segmentation, POS Tagging and Parsing

11 0.68608493 62 acl-2013-Automatic Term Ambiguity Detection

12 0.68024093 294 acl-2013-Re-embedding words

13 0.67331845 275 acl-2013-Parsing with Compositional Vector Grammars

14 0.66774762 260 acl-2013-Nonconvex Global Optimization for Latent-Variable Models

15 0.65939683 91 acl-2013-Connotation Lexicon: A Dash of Sentiment Beneath the Surface Meaning

16 0.65613914 78 acl-2013-Categorization of Turkish News Documents with Morphological Analysis

17 0.65375376 109 acl-2013-Decipherment Complexity in 1:1 Substitution Ciphers

18 0.6509167 347 acl-2013-The Role of Syntax in Vector Space Models of Compositional Semantics

19 0.64862406 191 acl-2013-Improved Bayesian Logistic Supervised Topic Models with Data Augmentation

20 0.64730269 185 acl-2013-Identifying Bad Semantic Neighbors for Improving Distributional Thesauri