acl acl2013 acl2013-113 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Sebastian Pado ; Jan Snajder ; Britta Zeller
Abstract: Syntax-based vector spaces are used widely in lexical semantics and are more versatile than word-based spaces (Baroni and Lenci, 2010). However, they are also sparse, with resulting reliability and coverage problems. We address this problem by derivational smoothing, which uses knowledge about derivationally related words (oldish → old) to improve semantic similarity est→imates. We develop a set of derivational smoothing methods and evaluate them on two lexical semantics tasks in German. Even for models built from very large corpora, simple derivational smoothing can improve coverage considerably.
Reference: text
sentIndex sentText sentNum sentScore
1 hr Abstract Syntax-based vector spaces are used widely in lexical semantics and are more versatile than word-based spaces (Baroni and Lenci, 2010). [sent-5, score-0.267]
2 However, they are also sparse, with resulting reliability and coverage problems. [sent-6, score-0.122]
3 We address this problem by derivational smoothing, which uses knowledge about derivationally related words (oldish → old) to improve semantic similarity est→imates. [sent-7, score-0.811]
4 We develop a set of derivational smoothing methods and evaluate them on two lexical semantics tasks in German. [sent-8, score-0.988]
5 Even for models built from very large corpora, simple derivational smoothing can improve coverage considerably. [sent-9, score-1.074]
6 1 Introduction Distributional semantics (Turney and Pantel, 2010) builds on the assumption that the semantic similarity of words is strongly correlated to the overlap between their linguistic contexts. [sent-10, score-0.161]
7 This hypothesis can be used to construct context vectors for words directly from large text corpora in an unsupervised manner. [sent-11, score-0.046]
8 Such vector spaces have been applied successfully to many problems in NLP (see Turney and Pantel (2010) or Erk (2012) for current overviews). [sent-12, score-0.116]
9 Most distributional models in computational lexical semantics are either (a) bag-of-words models, where the context features are words within a surface window around the target word, or (b) syntactic models, where context features are typically pairs of dependency relations and context words. [sent-13, score-0.216]
10 The advantage of syntactic models is that they incorporate a richer, structured notion of context. [sent-14, score-0.033]
11 It is also able at least in principle to capture more fine-grained – – types of semantic similarity such as predicateargument plausibility (Erk et al. [sent-16, score-0.125]
12 At the same time, syntactic spaces are much more prone to sparsity problems, as their contexts are sparser. [sent-18, score-0.139]
13 In this paper, we propose a novel strategy for combating sparsity in syntactic vector spaces, derivational smoothing. [sent-20, score-0.674]
14 It follows the intuition that derivationally related words (feed – feeder, blocked blockage) are, as a rule, semantically highly similar. [sent-21, score-0.201]
15 Consequently, knowledge about derivationally related words can be used as a “back off” for sparse vectors in syntactic spaces. [sent-22, score-0.291]
16 For example, the pair oldish ancient should receive a high semantic similarity, but in practice, the vector for oldish will be very sparse, which makes this result uncertain. [sent-23, score-0.319]
17 Knowing that oldish is derivationally related to old allows us to use the much less sparse vector for old as a proxy for oldish. [sent-24, score-0.439]
18 We present a set of general methods for smoothing vector similarity computations given a resource that groups words into derivational families (equivalence classes) and evaluate these methods on Ger- – – man for two distributional tasks (similarity prediction and synonym choice). [sent-25, score-1.477]
19 We find that even for syntactic models built from very large corpora, a simple derivational resource that groups words on morphological grounds can improve the results. [sent-26, score-0.639]
20 Query expansion methods in Information Retrieval are also prominent cases of smoothing that addresses the lexical mismatch between query and document (Voorhees, 1994; Gonzalo et al. [sent-30, score-0.5]
21 In lexical semantics, smoothing is often achieved by backing – – 731 Proce dingSsof oifa, th Beu 5l1gsarti Aan,An u aglu Mste 4e-ti9n2g 0 o1f3 t. [sent-32, score-0.478]
22 c A2s0s1o3ci Aatsiosonc fioartio Cno fmorpu Ctoamtiopnuatalt Lioin gauli Lsitnicgsu,i psatgices 731–735, off from words to semantic classes, either adopted from a resource such as WordNet (Resnik, 1996) or induced from data (Pantel and Lin, 2002; Wang et al. [sent-34, score-0.096]
23 Similarly, distributional features support generalization in Named Entity Recognition (Finkel et al. [sent-37, score-0.147]
24 Although distributional information is often used for smoothing, to our knowledge there is little work on smoothing distributional models themselves. [sent-39, score-0.729]
25 (2008) build models of selectional preferences that include morphological features such as capitalization and the presence of digits. [sent-42, score-0.086]
26 Allan and Kumaran (2003) make use of morphology by building language models for stemming-based equivalence classes. [sent-44, score-0.048]
27 Our approach also uses morphological processing, albeit more precise than stemming. [sent-45, score-0.069]
28 3 A Resource for German Derivation Derivational morphology describes the process of building new (derived) words from other (basis) words. [sent-46, score-0.048]
29 Derived words can, but do not have to, share the part-of-speech (POS) with their basis (oldA → oldishA vs. [sent-47, score-0.033]
30 Wor→ds can be grouped into derivational families by forming the transitive closure over individual derivation relations. [sent-49, score-0.701]
31 The words in these families are typically semantically similar, although the exact degree depends on the type of relation and idiosyncratic factors (bookN → bookishA, Lieber (2009)). [sent-50, score-0.122]
32 For German, there are several resources with derivational information. [sent-51, score-0.517]
33 , 2013),1 a freely available resource that groups over 280,000 verbs, nouns, and adjectives into more than 17,000 nonsingleton derivational families. [sent-54, score-0.571]
34 Its higher coverage compared to CELEX (Baayen et al. [sent-56, score-0.122]
35 , 1996) and IMSLEX (Fitschen, 2004) makes it particularly suitable for the use in smoothing, where the resource should include low-frequency lemmas. [sent-57, score-0.054]
36 The following example illustrates a family that covers three POSes as well as a word with a predominant metaphorical reading (to kneel → to beg): knieenV (to kneelV), beknieenV (to begV), KniendeN (kneeling personN), kniendA (kneelingA), KnieNn (kneeN) 1Downloadable from: http : / / goo . [sent-58, score-0.134]
37 gl / 7KG2U Using derivational knowledge for smoothing raises the question of how semantically similar the lemmas within a family really are. [sent-59, score-1.165]
38 It was constructed with hand-written derivation rules, employing string transformation functions that map basis lemmas onto derived lemmas. [sent-61, score-0.166]
39 For example, a suffixation rule using the affix “heit” generates the derivation dunkel Dunkelheit (darkA darknessN). [sent-62, score-0.064]
40 Since derivational families are defined as transitive closures, each pair of words in a family is connected by a derivation path. [sent-63, score-0.776]
41 Because the rules do not have a perfect precision, our confidence in pairs of words decreases the longer the derivation path between them. [sent-64, score-0.064]
42 For example, bekleiden (enrobeV) is connected to Verkleidung (disguiseN) through three steps via the lemmas kleiden (dressV) and verklei– – den (disguiseV) and is assigned the confidence 1/3. [sent-66, score-0.069]
43 4 Models for Derivational Smoothing Derivational smoothing exploits the fact that derivationally related words are also semantically related, by combining and/or comparing distributional representations of derivationally related words. [sent-67, score-0.952]
44 The definition of a derivational smoothing algorithm consists of two parts: a trigger and a scheme. [sent-68, score-0.995]
45 Given a word w, we use w to denote its distributional vector and D(w) to denote the set of vectors for the derivatioDna(lw family of w. [sent-70, score-0.309]
46 As discussed above, there is no guarantee for high semantic similarity within a derivational family. [sent-75, score-0.642]
47 For this reason, smoothing may also drown out information. [sent-76, score-0.435]
48 In this paper, we report on two triggers: smooth always always performs smoothing; smooth if sim=0 smooths only when the unsmoothed similarity sim(w1 , w2) is zero or unknown (due to w1 or w2 not being in the model). [sent-77, score-0.379]
49 We present three smoothing schemes, all of which apply to the level of complete families. [sent-79, score-0.435]
50 The first two schemes are exemplar-based schemes, which define the smoothed similarity for a word pair as a function of the pairwise similarities between all words of the two derivational families. [sent-80, score-0.726]
51 It computes a centroid vector for each derivational family, which can be thought of as a representation × for the concept(s) that it expresses: centSim(w1 , w2) = sim ? [sent-82, score-0.674]
52 IPt is more efficient to calculate and effectively introduces a kind or regularization, where outliers in either family have less impact on the overall result. [sent-88, score-0.075]
53 These models only represents a sample of possible derivational smoothing methods. [sent-89, score-0.952]
54 We performed a number of additional experiments (POS-restricted smoothing, word-based, and pair-based smoothing triggers), but they did not yield any improvements over the simpler models we present here. [sent-90, score-0.435]
55 The syntactic distributional model that we use represents target words by pairs of dependency relations and context words. [sent-92, score-0.18]
56 DE was created on the basis of the 884M-token SDEWAC web corpus (Faaß et al. [sent-96, score-0.033]
57 We evaluate the impact of smoothing on two standard tasks from lexical semantics. [sent-99, score-0.435]
58 We lemmatized and POS-tagged the German GUR350 dataset (Zesch et al. [sent-101, score-0.053]
59 , 2007), a set of 350 word pairs with human similarity judgments, created analogously to the well-known Rubenstein and Goodenough (1965) dataset for English. [sent-102, score-0.083]
60 2 We predict 2Downloadable from: http : / / goo . [sent-103, score-0.059]
61 We make a prediction for a word pair if both words are represented in the semantic space and their vectors have a non-zero similarity. [sent-105, score-0.088]
62 The second task is synonym choice on the German version of the Reader’s Digest WordPower dataset (Wallace and Wallace, 2005). [sent-106, score-0.146]
63 2 This dataset, which we also lemmatized and POS-tagged, consists of 984 target words with four synonym can- didates each (including phrases), one of which is correct. [sent-107, score-0.163]
64 Again, we compute semantic similarity as the cosine between target and a candidate vector and pick the highest-similarity candidate as synonym. [sent-108, score-0.166]
65 For phrases, we compute the maximum pairwise word similarity. [sent-109, score-0.037]
66 We make a prediction for an item if the target as well as at least one candidate are represented in the semantic space and their vectors have a non-zero similarity. [sent-110, score-0.088]
67 We expect differences between the two tasks with regard to derivational smoothing, since the words within derivational families are generally related but often not synonymous (cf. [sent-111, score-1.124]
68 Thus, semantic similarity judgments should profit more easily from derivational smoothing than synonym choice. [sent-113, score-1.266]
69 Our baseline is a standard bag-ofwords vector space (BOW), which represents target words by the words occurring in their context. [sent-115, score-0.041]
70 We also applied derivational smoothing to this model, but did not obtain improvements. [sent-120, score-0.952]
71 To analyze the impact of smoothing, we evaluate the coverage of models and the quality of their predictions separately. [sent-122, score-0.122]
72 In both tasks, coverage is the percentage of items for which we make a prediction. [sent-123, score-0.122]
73 We measure quality of the semantic similarity task as the Pearson correlation between the model predictions and the human judgments for covered items (Zesch et al. [sent-124, score-0.256]
74 For synonym choice, we follow the method established by Mohammad et al. [sent-126, score-0.11]
75 Additionally, conservative, prototype-based smoothing (if sim = 0) 733 Smoothing trigger Smoothing scheme r Cov % DM. [sent-135, score-0.637]
76 9 Table 1: Results on the semantic similarity task (r: Pearson correlation, Cov: Coverage) increases correlation somewhat to r = 0. [sent-146, score-0.187]
77 The difference to the unsmoothed model is not significant at p = 0. [sent-148, score-0.209]
78 05 according to Fisher’s (1925) method of comparing correlation coefficients. [sent-149, score-0.062]
79 The bag-of-words baseline (BOW) has a greater coverage than DM. [sent-150, score-0.122]
80 DE models, but at the cost of lower correlation across the board. [sent-151, score-0.062]
81 We attribute this weak performance to the presence of many pairwise zero similarities in the data, which makes the avgSim predictions unreliable. [sent-154, score-0.066]
82 To our knowledge, there are no previous published papers on distributional approaches to modeling this dataset. [sent-155, score-0.147]
83 Smoothing increases the coverage by almost 6% to 86. [sent-166, score-0.122]
84 6% (for example, a question item for inferior becomes covered after backing off from the target to Inferiorita¨t (inferiority)). [sent-167, score-0.073]
85 All smoothed models show a loss in accuracy, albeit small. [sent-168, score-0.074]
86 The best model is again a conservative smoothing model (sim = 0) with a loss of 1. [sent-169, score-0.479]
87 Using bootstrap resampling (Efron and Tibshirani, 1993), we established that the difference to the unsmoothed DM. [sent-171, score-0.209]
88 This time, the avgSim (average similarity) smoothing scheme performs best, with the prototype-based scheme in second place. [sent-174, score-0.521]
89 Thus, the results for synonym choice are less clear-cut: derivational smoothing can trade accuracy against Smoothing trigger Acc % Cov % DM. [sent-175, score-1.141]
90 6 BOW “baseline” Smoothing scheme cmaevnagxSt Simim cmaevnagxSt Simim 56. [sent-184, score-0.043]
91 5 Table 2: Results on the synonym choice task (Acc: Accuracy, Cov: Coverage) coverage but does not lead to a clear improvement. [sent-186, score-0.268]
92 What is more, the BOW “baseline” significantly outperforms all syntactic models, smoothed and unsmoothed, with an almost perfect coverage combined with a higher accuracy. [sent-187, score-0.195]
93 6 Conclusions and Outlook In this paper, we have introduced derivational smoothing, a novel strategy to combating sparsity in syntactic vector spaces by comparing and combining the vectors of morphologically related lemmas. [sent-188, score-0.795]
94 The only information strictly necessary for the methods we propose is a grouping of lemmas into derivationally related classes. [sent-189, score-0.238]
95 We have demonstrated that derivational smoothing improves two tasks, increasing coverage substantially and also leading to a numerically higher correlation in the semantic similarity task, even for vectors created from a very large corpus. [sent-190, score-1.307]
96 We obtained the best results for a conservative approach, smoothing only zero similarities. [sent-191, score-0.508]
97 This also explains our failure to improve less sparse word-based models, where very few pairs are assigned a similarity of zero. [sent-192, score-0.126]
98 A comparison of prototype- and exemplar-based schemes did not yield a clear winner. [sent-193, score-0.049]
99 The estimation of generic semantic similarity can profit more from derivational smoothing than the induction of specific lexical relations. [sent-194, score-1.117]
100 In future work, we plan to work on other evaluation tasks, application to other languages, and more sophisticated smoothing schemes. [sent-195, score-0.435]
wordName wordTfidf (topN-words)
[('derivational', 0.517), ('smoothing', 0.435), ('unsmoothed', 0.209), ('derivationally', 0.169), ('avgsim', 0.148), ('distributional', 0.147), ('coverage', 0.122), ('cmaevnagxst', 0.118), ('oldish', 0.118), ('simim', 0.118), ('sim', 0.116), ('bow', 0.112), ('synonym', 0.11), ('utt', 0.104), ('derivbase', 0.104), ('cov', 0.096), ('families', 0.09), ('pad', 0.087), ('similarity', 0.083), ('zeller', 0.078), ('zesch', 0.077), ('family', 0.075), ('spaces', 0.075), ('wallace', 0.072), ('lemmas', 0.069), ('derivation', 0.064), ('correlation', 0.062), ('baroni', 0.06), ('german', 0.06), ('beste', 0.059), ('centsim', 0.059), ('goo', 0.059), ('maxsim', 0.059), ('pado', 0.058), ('erk', 0.057), ('lenci', 0.056), ('resource', 0.054), ('lemmatized', 0.053), ('combating', 0.052), ('britta', 0.052), ('faa', 0.052), ('selectional', 0.051), ('pantel', 0.049), ('schemes', 0.049), ('celex', 0.048), ('morphology', 0.048), ('vectors', 0.046), ('gonzalo', 0.045), ('rubenstein', 0.045), ('digest', 0.045), ('conservative', 0.044), ('trigger', 0.043), ('efron', 0.043), ('backing', 0.043), ('zagreb', 0.043), ('baayen', 0.043), ('scheme', 0.043), ('sparse', 0.043), ('semantic', 0.042), ('vector', 0.041), ('najder', 0.041), ('smoothed', 0.04), ('sebastian', 0.04), ('xd', 0.04), ('profit', 0.04), ('lw', 0.04), ('versatile', 0.04), ('memory', 0.04), ('judgments', 0.039), ('pairwise', 0.037), ('gl', 0.037), ('choice', 0.036), ('semantics', 0.036), ('turney', 0.035), ('dekang', 0.035), ('torsten', 0.035), ('morphological', 0.035), ('heidelberg', 0.034), ('acc', 0.034), ('triggers', 0.034), ('old', 0.034), ('query', 0.034), ('albeit', 0.034), ('bergsma', 0.034), ('basis', 0.033), ('syntactic', 0.033), ('allan', 0.033), ('finkel', 0.032), ('semantically', 0.032), ('navigli', 0.032), ('pearson', 0.032), ('gurevych', 0.032), ('sparsity', 0.031), ('expansion', 0.031), ('mohammad', 0.03), ('transitive', 0.03), ('covered', 0.03), ('iryna', 0.029), ('smooth', 0.029), ('zero', 0.029)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 113 acl-2013-Derivational Smoothing for Syntactic Distributional Semantics
Author: Sebastian Pado ; Jan Snajder ; Britta Zeller
Abstract: Syntax-based vector spaces are used widely in lexical semantics and are more versatile than word-based spaces (Baroni and Lenci, 2010). However, they are also sparse, with resulting reliability and coverage problems. We address this problem by derivational smoothing, which uses knowledge about derivationally related words (oldish → old) to improve semantic similarity est→imates. We develop a set of derivational smoothing methods and evaluate them on two lexical semantics tasks in German. Even for models built from very large corpora, simple derivational smoothing can improve coverage considerably.
2 0.50584602 102 acl-2013-DErivBase: Inducing and Evaluating a Derivational Morphology Resource for German
Author: Britta Zeller ; Jan Snajder ; Sebastian Pado
Abstract: Derivational models are still an underresearched area in computational morphology. Even for German, a rather resourcerich language, there is a lack of largecoverage derivational knowledge. This paper describes a rule-based framework for inducing derivational families (i.e., clusters of lemmas in derivational relationships) and its application to create a highcoverage German resource, DERIVBASE, mapping over 280k lemmas into more than 17k non-singleton clusters. We focus on the rule component and a qualitative and quantitative evaluation. Our approach achieves up to 93% precision and 71% recall. We attribute the high precision to the fact that our rules are based on information from grammar books.
3 0.1562366 76 acl-2013-Building and Evaluating a Distributional Memory for Croatian
Author: Jan Snajder ; Sebastian Pado ; Zeljko Agic
Abstract: We report on the first structured distributional semantic model for Croatian, DM.HR. It is constructed after the model of the English Distributional Memory (Baroni and Lenci, 2010), from a dependencyparsed Croatian web corpus, and covers about 2M lemmas. We give details on the linguistic processing and the design principles. An evaluation shows state-of-theart performance on a semantic similarity task with particularly good performance on nouns. The resource is freely available.
4 0.13704562 87 acl-2013-Compositional-ly Derived Representations of Morphologically Complex Words in Distributional Semantics
Author: Angeliki Lazaridou ; Marco Marelli ; Roberto Zamparelli ; Marco Baroni
Abstract: Speakers of a language can construct an unlimited number of new words through morphological derivation. This is a major cause of data sparseness for corpus-based approaches to lexical semantics, such as distributional semantic models of word meaning. We adapt compositional methods originally developed for phrases to the task of deriving the distributional meaning of morphologically complex words from their parts. Semantic representations constructed in this way beat a strong baseline and can be of higher quality than representations directly constructed from corpus data. Our results constitute a novel evaluation of the proposed composition methods, in which the full additive model achieves the best performance, and demonstrate the usefulness of a compositional morphology component in distributional semantics.
5 0.11508482 111 acl-2013-Density Maximization in Context-Sense Metric Space for All-words WSD
Author: Koichi Tanigaki ; Mitsuteru Shiba ; Tatsuji Munaka ; Yoshinori Sagisaka
Abstract: This paper proposes a novel smoothing model with a combinatorial optimization scheme for all-words word sense disambiguation from untagged corpora. By generalizing discrete senses to a continuum, we introduce a smoothing in context-sense space to cope with data-sparsity resulting from a large variety of linguistic context and sense, as well as to exploit senseinterdependency among the words in the same text string. Through the smoothing, all the optimal senses are obtained at one time under maximum marginal likelihood criterion, by competitive probabilistic kernels made to reinforce one another among nearby words, and to suppress conflicting sense hypotheses within the same word. Experimental results confirmed the superiority of the proposed method over conventional ones by showing the better performances beyond most-frequent-sense baseline performance where none of SemEval2 unsupervised systems reached.
6 0.11213962 43 acl-2013-Align, Disambiguate and Walk: A Unified Approach for Measuring Semantic Similarity
7 0.10620268 376 acl-2013-Using Lexical Expansion to Learn Inference Rules from Sparse Data
8 0.10601839 325 acl-2013-Smoothed marginal distribution constraints for language modeling
9 0.095681906 238 acl-2013-Measuring semantic content in distributional vectors
10 0.090116821 27 acl-2013-A Two Level Model for Context Sensitive Inference Rules
11 0.088471219 17 acl-2013-A Random Walk Approach to Selectional Preferences Based on Preference Ranking and Propagation
12 0.08294297 185 acl-2013-Identifying Bad Semantic Neighbors for Improving Distributional Thesauri
13 0.080675855 31 acl-2013-A corpus-based evaluation method for Distributional Semantic Models
14 0.078080378 22 acl-2013-A Structured Distributional Semantic Model for Event Co-reference
15 0.069629207 12 acl-2013-A New Set of Norms for Semantic Relatedness Measures
16 0.068516344 32 acl-2013-A relatedness benchmark to test the role of determiners in compositional distributional semantics
17 0.068465821 231 acl-2013-Linggle: a Web-scale Linguistic Search Engine for Words in Context
18 0.067733034 103 acl-2013-DISSECT - DIStributional SEmantics Composition Toolkit
19 0.066838086 306 acl-2013-SPred: Large-scale Harvesting of Semantic Predicates
20 0.064040996 134 acl-2013-Embedding Semantic Similarity in Tree Kernels for Domain Adaptation of Relation Extraction
topicId topicWeight
[(0, 0.168), (1, 0.036), (2, 0.024), (3, -0.151), (4, -0.1), (5, -0.086), (6, -0.059), (7, 0.063), (8, 0.006), (9, 0.015), (10, -0.039), (11, 0.061), (12, 0.154), (13, -0.095), (14, -0.039), (15, 0.04), (16, -0.052), (17, -0.141), (18, 0.075), (19, 0.02), (20, -0.11), (21, 0.141), (22, 0.083), (23, 0.03), (24, -0.093), (25, -0.013), (26, 0.024), (27, -0.005), (28, -0.066), (29, -0.02), (30, 0.094), (31, 0.075), (32, -0.069), (33, 0.102), (34, -0.101), (35, -0.162), (36, 0.086), (37, -0.044), (38, -0.033), (39, -0.169), (40, -0.066), (41, 0.107), (42, -0.199), (43, -0.078), (44, 0.019), (45, -0.157), (46, -0.15), (47, -0.02), (48, 0.118), (49, -0.268)]
simIndex simValue paperId paperTitle
same-paper 1 0.93075854 113 acl-2013-Derivational Smoothing for Syntactic Distributional Semantics
Author: Sebastian Pado ; Jan Snajder ; Britta Zeller
Abstract: Syntax-based vector spaces are used widely in lexical semantics and are more versatile than word-based spaces (Baroni and Lenci, 2010). However, they are also sparse, with resulting reliability and coverage problems. We address this problem by derivational smoothing, which uses knowledge about derivationally related words (oldish → old) to improve semantic similarity est→imates. We develop a set of derivational smoothing methods and evaluate them on two lexical semantics tasks in German. Even for models built from very large corpora, simple derivational smoothing can improve coverage considerably.
2 0.8725968 102 acl-2013-DErivBase: Inducing and Evaluating a Derivational Morphology Resource for German
Author: Britta Zeller ; Jan Snajder ; Sebastian Pado
Abstract: Derivational models are still an underresearched area in computational morphology. Even for German, a rather resourcerich language, there is a lack of largecoverage derivational knowledge. This paper describes a rule-based framework for inducing derivational families (i.e., clusters of lemmas in derivational relationships) and its application to create a highcoverage German resource, DERIVBASE, mapping over 280k lemmas into more than 17k non-singleton clusters. We focus on the rule component and a qualitative and quantitative evaluation. Our approach achieves up to 93% precision and 71% recall. We attribute the high precision to the fact that our rules are based on information from grammar books.
3 0.59529024 76 acl-2013-Building and Evaluating a Distributional Memory for Croatian
Author: Jan Snajder ; Sebastian Pado ; Zeljko Agic
Abstract: We report on the first structured distributional semantic model for Croatian, DM.HR. It is constructed after the model of the English Distributional Memory (Baroni and Lenci, 2010), from a dependencyparsed Croatian web corpus, and covers about 2M lemmas. We give details on the linguistic processing and the design principles. An evaluation shows state-of-theart performance on a semantic similarity task with particularly good performance on nouns. The resource is freely available.
4 0.47521523 325 acl-2013-Smoothed marginal distribution constraints for language modeling
Author: Brian Roark ; Cyril Allauzen ; Michael Riley
Abstract: We present an algorithm for re-estimating parameters of backoff n-gram language models so as to preserve given marginal distributions, along the lines of wellknown Kneser-Ney (1995) smoothing. Unlike Kneser-Ney, our approach is designed to be applied to any given smoothed backoff model, including models that have already been heavily pruned. As a result, the algorithm avoids issues observed when pruning Kneser-Ney models (Siivola et al., 2007; Chelba et al., 2010), while retaining the benefits of such marginal distribution constraints. We present experimental results for heavily pruned backoff ngram models, and demonstrate perplexity and word error rate reductions when used with various baseline smoothing methods. An open-source version of the algorithm has been released as part of the OpenGrm ngram library.1
5 0.46945384 87 acl-2013-Compositional-ly Derived Representations of Morphologically Complex Words in Distributional Semantics
Author: Angeliki Lazaridou ; Marco Marelli ; Roberto Zamparelli ; Marco Baroni
Abstract: Speakers of a language can construct an unlimited number of new words through morphological derivation. This is a major cause of data sparseness for corpus-based approaches to lexical semantics, such as distributional semantic models of word meaning. We adapt compositional methods originally developed for phrases to the task of deriving the distributional meaning of morphologically complex words from their parts. Semantic representations constructed in this way beat a strong baseline and can be of higher quality than representations directly constructed from corpus data. Our results constitute a novel evaluation of the proposed composition methods, in which the full additive model achieves the best performance, and demonstrate the usefulness of a compositional morphology component in distributional semantics.
7 0.42373854 185 acl-2013-Identifying Bad Semantic Neighbors for Improving Distributional Thesauri
8 0.41396534 238 acl-2013-Measuring semantic content in distributional vectors
9 0.40990767 31 acl-2013-A corpus-based evaluation method for Distributional Semantic Models
10 0.36474243 12 acl-2013-A New Set of Norms for Semantic Relatedness Measures
11 0.33767191 308 acl-2013-Scalable Modified Kneser-Ney Language Model Estimation
12 0.33112058 111 acl-2013-Density Maximization in Context-Sense Metric Space for All-words WSD
13 0.32608876 247 acl-2013-Modeling of term-distance and term-occurrence information for improving n-gram language model performance
14 0.32599628 17 acl-2013-A Random Walk Approach to Selectional Preferences Based on Preference Ranking and Propagation
15 0.31628311 262 acl-2013-Offspring from Reproduction Problems: What Replication Failure Teaches Us
16 0.31006214 299 acl-2013-Reconstructing an Indo-European Family Tree from Non-native English Texts
17 0.3094725 32 acl-2013-A relatedness benchmark to test the role of determiners in compositional distributional semantics
18 0.30859324 376 acl-2013-Using Lexical Expansion to Learn Inference Rules from Sparse Data
19 0.30710605 242 acl-2013-Mining Equivalent Relations from Linked Data
20 0.30535468 103 acl-2013-DISSECT - DIStributional SEmantics Composition Toolkit
topicId topicWeight
[(0, 0.078), (6, 0.019), (11, 0.068), (15, 0.012), (24, 0.023), (26, 0.034), (28, 0.016), (35, 0.183), (42, 0.032), (48, 0.072), (52, 0.222), (64, 0.011), (70, 0.047), (88, 0.021), (90, 0.015), (95, 0.072)]
simIndex simValue paperId paperTitle
1 0.92131335 161 acl-2013-Fluid Construction Grammar for Historical and Evolutionary Linguistics
Author: Pieter Wellens ; Remi van Trijp ; Katrien Beuls ; Luc Steels
Abstract: Fluid Construction Grammar (FCG) is an open-source computational grammar formalism that is becoming increasingly popular for studying the history and evolution of language. This demonstration shows how FCG can be used to operationalise the cultural processes and cognitive mechanisms that underly language evolution and change.
same-paper 2 0.86585486 113 acl-2013-Derivational Smoothing for Syntactic Distributional Semantics
Author: Sebastian Pado ; Jan Snajder ; Britta Zeller
Abstract: Syntax-based vector spaces are used widely in lexical semantics and are more versatile than word-based spaces (Baroni and Lenci, 2010). However, they are also sparse, with resulting reliability and coverage problems. We address this problem by derivational smoothing, which uses knowledge about derivationally related words (oldish → old) to improve semantic similarity est→imates. We develop a set of derivational smoothing methods and evaluate them on two lexical semantics tasks in German. Even for models built from very large corpora, simple derivational smoothing can improve coverage considerably.
3 0.82879817 339 acl-2013-Temporal Signals Help Label Temporal Relations
Author: Leon Derczynski ; Robert Gaizauskas
Abstract: Automatically determining the temporal order of events and times in a text is difficult, though humans can readily perform this task. Sometimes events and times are related through use of an explicit co-ordination which gives information about the temporal relation: expressions like “before ” and “as soon as”. We investigate the r oˆle that these co-ordinating temporal signals have in determining the type of temporal relations in discourse. Using machine learning, we improve upon prior approaches to the problem, achieving over 80% accuracy at labelling the types of temporal relation between events and times that are related by temporal signals.
4 0.75584745 78 acl-2013-Categorization of Turkish News Documents with Morphological Analysis
Author: Burak Kerim Akku� ; Ruket Cakici
Abstract: Morphologically rich languages such as Turkish may benefit from morphological analysis in natural language tasks. In this study, we examine the effects of morphological analysis on text categorization task in Turkish. We use stems and word categories that are extracted with morphological analysis as main features and compare them with fixed length stemmers in a bag of words approach with several learning algorithms. We aim to show the effects of using varying degrees of morphological information.
5 0.70814121 291 acl-2013-Question Answering Using Enhanced Lexical Semantic Models
Author: Wen-tau Yih ; Ming-Wei Chang ; Christopher Meek ; Andrzej Pastusiak
Abstract: In this paper, we study the answer sentence selection problem for question answering. Unlike previous work, which primarily leverages syntactic analysis through dependency tree matching, we focus on improving the performance using models of lexical semantic resources. Experiments show that our systems can be consistently and significantly improved with rich lexical semantic information, regardless of the choice of learning algorithms. When evaluated on a benchmark dataset, the MAP and MRR scores are increased by 8 to 10 points, compared to one of our baseline systems using only surface-form matching. Moreover, our best system also outperforms pervious work that makes use of the dependency tree structure by a wide margin.
6 0.70755702 58 acl-2013-Automated Collocation Suggestion for Japanese Second Language Learners
7 0.70550919 283 acl-2013-Probabilistic Domain Modelling With Contextualized Distributional Semantic Vectors
8 0.70437163 32 acl-2013-A relatedness benchmark to test the role of determiners in compositional distributional semantics
9 0.70027816 122 acl-2013-Discriminative Approach to Fill-in-the-Blank Quiz Generation for Language Learners
10 0.69993758 103 acl-2013-DISSECT - DIStributional SEmantics Composition Toolkit
11 0.69939506 102 acl-2013-DErivBase: Inducing and Evaluating a Derivational Morphology Resource for German
12 0.6980179 76 acl-2013-Building and Evaluating a Distributional Memory for Croatian
13 0.6971783 238 acl-2013-Measuring semantic content in distributional vectors
14 0.69695771 60 acl-2013-Automatic Coupling of Answer Extraction and Information Retrieval
15 0.69494033 160 acl-2013-Fine-grained Semantic Typing of Emerging Entities
16 0.69408274 311 acl-2013-Semantic Neighborhoods as Hypergraphs
17 0.69217789 215 acl-2013-Large-scale Semantic Parsing via Schema Matching and Lexicon Extension
18 0.69143879 347 acl-2013-The Role of Syntax in Vector Space Models of Compositional Semantics
19 0.6891858 158 acl-2013-Feature-Based Selection of Dependency Paths in Ad Hoc Information Retrieval
20 0.6890592 121 acl-2013-Discovering User Interactions in Ideological Discussions