acl acl2013 acl2013-376 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Oren Melamud ; Ido Dagan ; Jacob Goldberger ; Idan Szpektor
Abstract: Automatic acquisition of inference rules for predicates is widely addressed by computing distributional similarity scores between vectors of argument words. In this scheme, prior work typically refrained from learning rules for low frequency predicates associated with very sparse argument vectors due to expected low reliability. To improve the learning of such rules in an unsupervised way, we propose to lexically expand sparse argument word vectors with semantically similar words. Our evaluation shows that lexical expansion significantly improves performance in comparison to state-of-the-art baselines.
Reference: text
sentIndex sentText sentNum sentScore
1 com Abstract Automatic acquisition of inference rules for predicates is widely addressed by computing distributional similarity scores between vectors of argument words. [sent-7, score-0.972]
2 In this scheme, prior work typically refrained from learning rules for low frequency predicates associated with very sparse argument vectors due to expected low reliability. [sent-8, score-0.715]
3 To improve the learning of such rules in an unsupervised way, we propose to lexically expand sparse argument word vectors with semantically similar words. [sent-9, score-0.776]
4 Our evaluation shows that lexical expansion significantly improves performance in comparison to state-of-the-art baselines. [sent-10, score-0.419]
5 1 Introduction The benefit of utilizing template-based inference rules between predicates was demonstrated in NLP tasks such as Question Answering (QA) (Ravichandran and Hovy, 2002) and Information Extraction (IE) (Shinyama and Sekine, 2006). [sent-11, score-0.23]
6 For example, the inference rule ‘X treat Y → X relieve eYx’ , m beptlwe,e tehne ti nhefe templates ‘‘XX ttrreeaatt YY’ → →an Xd ‘eXli relieve Y’ may be useful to identify the answer to “Which drugs relieve stomach ache? [sent-12, score-0.671]
7 The predominant unsupervised approach for learning inference rules between templates is via distributional similarity (Lin and Pantel, 2001 ; Ravichandran and Hovy, 2002; Szpektor and Dagan, 2008). [sent-14, score-0.642]
8 Specifically, each argument slot in a template is represented by an argument vector, containing the words (or terms) that instantiate this slot in all of the occurrences of the template in a learning corpus. [sent-15, score-1.45]
9 Two templates are then deemed semantically similar if the argument vectors of their corresponding slots are similar. [sent-16, score-0.987]
10 Ideally, inference rules should be learned for all templates that occur in the learning corpus. [sent-17, score-0.408]
11 However, many templates are rare and occur only few times in the corpus. [sent-18, score-0.342]
12 Due to few occurrences, the slots of rare templates are represented with very sparse argument vectors, which in turn lead to low reliability in distributional similarity scores. [sent-20, score-1.154]
13 A common practice in prior work for learning predicate inference rules is to simply disregard templates below a minimal frequency threshold (Lin and Pantel, 2001 ; Kotlerman et al. [sent-21, score-0.503]
14 Yet, acquiring rules for rare templates may be beneficial both in terms of coverage, but also in terms of more accurate rule application, since rare templates are less ambiguous than frequent ones. [sent-24, score-0.918]
15 We propose to improve the learning of rules between infrequent templates by expanding their ar- gument vectors. [sent-25, score-0.406]
16 This is done via a “dual” distributional similarity approach, in which we consider two words to be similar if they instantiate similar sets of templates. [sent-26, score-0.287]
17 We then use these similarities to expand the argument vector of each slot with words that were identified as similar to the original arguments in the vector. [sent-27, score-0.7]
18 Finally, similarities between templates are computed using the expanded vectors, resulting in a ‘smoothed’ version of the original similarity measure. [sent-28, score-0.542]
19 Evaluations on a rule application task show that our lexical expansion approach significantly improves the performance of the state-of-the-art DIRT algorithm (Lin and Pantel, 2001). [sent-29, score-0.604]
20 In addition, our approach outperforms a similarity measure based on vectors of latent topics instead of word vectors, a common way to avoid sparseness issues by means of dimensionality reduction. [sent-30, score-0.497]
21 c A2s0s1o3ci Aatsiosonc fioartio Cno fmorpu Ctoamtiopnuatalt Lioin gauli Lsitnicgsu,i psatgices 283–28 , 2 Technical Background The distributional similarity score for an inference rule between two predicate templates, e. [sent-33, score-0.554]
22 ‘X resign Y → X quit Y’, is typically computed by measuring →the X similarity b tyetpwiceaelnly t choem argument vectors of the corresponding X slots and Y slots of the two templates. [sent-35, score-1.017]
23 To this end, first the argument vectors should be constructed and then a similarity measure between two vectors should be provided. [sent-36, score-0.864]
24 We note that we focus here on binary templates with two slots each, but this approach can be applied to any template. [sent-37, score-0.438]
25 M’s rows correspond to the template slots and the columns correspond to the various terms that instantiate the slots. [sent-39, score-0.425]
26 Mx quit,John, contains a count of the number of times the term j instantiated the template slot iin the corpus. [sent-42, score-0.443]
27 Thus, each row Mi,∗ corresponds to an argument vector for slot i. [sent-43, score-0.53]
28 Finally, rules are assessed using some similarity measure between corresponding argument vectors. [sent-46, score-0.611]
29 While the original DIRT algorithm utilizes the Lin measure, one can replace it with any other vector similarity measure. [sent-48, score-0.209]
30 A separate line of research for word similarity introduced directional similarity measures that have a bias for identifying generaliza- tion/specification relations, i. [sent-49, score-0.367]
31 relations between predicates with narrow (or specific) semantic meanings to predicates with broader meanings inferred by them (unlike the symmetric Lin). [sent-51, score-0.146]
32 Therefore, when the argument vectors are sparse, containing very few non-zero features, these scores become unreliable and volatile, changing greatly with every inclusion or exclusion of a single shared argument. [sent-54, score-0.508]
33 3 Lexical Expansion Scheme We wish to overcome the sparseness issues in rare feature vectors, especially in cases where argument vectors of semantically similar predicates comprise similar but not exactly identical arguments. [sent-55, score-0.805]
34 First, we learn lexical expansion sets for argument words, such as the set {euros, money} faorrg utmhee wnot rwdo odrodlsl,ar sus. [sent-57, score-0.759]
35 c Thh aesn we use {theuesreo ss,e tms oton expand the argument word vectors of predicate templates. [sent-58, score-0.659]
36 For example, given the template ‘X can be exchanged for Y’, with the following argument words instantiating slot X {dollars, gold}, and wtheo expansion saetitn above, we {wdooullldar expand }th,e a argument word vector to include all the following words {dollars, euros, money, gold}. [sent-59, score-1.609]
37 Finally, we use dthse { expanded argument ewy,o grdo vde}c. [sent-60, score-0.432]
38 to Frisn taoll compute the scores for predicate inference rules with a given similarity measure. [sent-61, score-0.387]
39 When a template is instantiated with an observed word, we expect it to also be instantiated with semantically similar words such as the ones in the expansion set of the observed word. [sent-62, score-0.794]
40 We “blame” the lack of such template occurrences only on the size of the corpus and the sparseness phenomenon in natural languages. [sent-63, score-0.322]
41 Thus, we utilize our lexical expansion scheme to synthetically add these expected but missing occurrences, effectively smoothing or generalizing over the explicitly observed argument occurrences. [sent-64, score-0.914]
42 Our approach is inspired by query expansion (Voorhees, 1994) in Information Retrieval (IR), as well as by the recent lexical expansion framework proposed in (Biemann and Riedl, 2013), and the work by 284 Miller et al. [sent-65, score-0.789]
43 Yet, to the best of our knowledge, this is the first work that applies lexical expansion to distributional similarity feature vectors. [sent-67, score-0.653]
44 1 Learning Lexical Expansions We start by constructing the co-occurrence matrix M (Section 2), where each entry Mt:s,w indicates the number of times that word w instantiates slot s of template t in the learning corpus, denoted by ’t:s ’, where s can be either X or Y. [sent-70, score-0.412]
45 In traditional distributional similarity, the rows Mt:s,∗ serve as argument vectors of template slots. [sent-71, score-0.792]
46 However, to learn expansion sets we take a “dual” view and consider each matrix column M∗:∗,w (denoted vw) as a feature vector for the argument word w. [sent-72, score-0.749]
47 Under this view, templates (or more specifically, template slots) are the features. [sent-73, score-0.436]
48 For instance, for the word dollars the respective feature vector may include entries such as ‘X can be exchanged for’ , ‘can be exchanged for Y’ , ‘purchase Y’ and ‘sell Y’. [sent-74, score-0.228]
49 We next learn an expansion set per each word w by computing the distributional similarity between the vectors of w and any other argument word w0, sim(vw, vw0). [sent-75, score-1.112]
50 Then we take the N most similar words as expansion set with degree N, denoted by LwN = {w01 , . [sent-76, score-0.485]
51 Any similarity measure could= = be { used, but as our experiments show, different measures generate sets with different properties, and some may be fitter for argument vector expansion than others. [sent-80, score-0.974]
52 2 Expanding Argument Vectors Given a row count vector Mt:s,∗ for slot s of template t, we enrich it with expansion sets as follows. [sent-82, score-0.782]
53 For each w in Mt:s,∗, the original count in vt:s (w) is redistributed equally between itself and all words in w’s expansion set, i. [sent-83, score-0.495]
54 Specifically, the new count that is assigned to each word w is its remaining original count after it has been redistributed (or zero if no original count), plus all the counts that were distributed to it from other words. [sent-86, score-0.225]
55 Next, PMI weights are recomputed according to the new counts, and the resulting expanded vector is denoted by Similarity between template slots is now computed over the expanded vectors instead of the original ones, e. [sent-87, score-0.874]
56 , 2011), comprising tuple extractions of pred- icate templates with their argument instantiations. [sent-93, score-0.671]
57 We applied some clean-up preprocessing to these extractions, discarding stop words, rare words and non-alphabetical words that instantiated either the X or the Y argument slots. [sent-94, score-0.501]
58 In addition, we discarded templates that co-occur with less than 5 unique argument words in either of their slots, assuming that such few arguments cannot convey reliable semantic information, even with expansion. [sent-95, score-0.67]
59 In this corpus around one third of the extractions refer to templates that co-occur with at most 35 unique arguments in both their slots. [sent-97, score-0.41]
60 We evaluated the quality of inference rules using the dataset constructed by Zeichner et al. [sent-98, score-0.157]
61 (2012)2, which contains about 6,500 manually annotated template rule applications, each labeled as correct or not. [sent-99, score-0.336]
62 ’s dataset, denoted DS-5-35 and DS-5-50, which consist of all rule applications whose templates are present in our learning corpus and co-occurred with at least 5 and at most 35 and 50 unique argument words in both their slots, respectively. [sent-102, score-0.847]
63 DS-5-35 includes 311 rule applications (104 correct and 207 incorrect) and DS-5-50 includes 502 rule applications (190 correct and 3 12 incorrect). [sent-103, score-0.302]
64 Our evaluation task is to rank all rule applications in each test set based on the similarity scores of the applied rules. [sent-104, score-0.286]
65 Optimal performance would rank all correct rule applications above the incorrect ones. [sent-105, score-0.186]
66 As a baseline for rule scoring we 1http : / / reve rb . [sent-106, score-0.151]
67 We then compared between the performance of this baseline and its expanded versions, testing two similarity measures for generating the expansion sets of arguments: Lin and Cover. [sent-115, score-0.634]
68 We denote these expanded methods DIRT-LE-SIM-N, where SIM is the similarity measure used to generate the expansion sets and N is the lexical expansion degree, e. [sent-116, score-1.069]
69 We remind the reader that our scheme utilizes two similarity measures. [sent-119, score-0.221]
70 The first measure assesses the similarity between the argument vectors of the two templates in the rule. [sent-120, score-1.0]
71 This measure is kept constant in our experiments and is identical to DIRT’s similarity measure (Lin). [sent-121, score-0.241]
72 3 The second measure assesses the similarity between words and is used for the lexical expansion of argument vectors. [sent-122, score-1.0]
73 Since this is the research goal of this paper, we experimented with two different measures for lexical expansion: a symmetric measure (Lin) and an asymmetric measure (Cover). [sent-123, score-0.192]
74 To this end we evaluated their effect on DIRT’s rule ranking performance and compared them to a vanilla version of DIRT without lexical expansion. [sent-124, score-0.2]
75 As another baseline, we follow Dinu and Lapata (2010) inducing LDA topic vectors for template slots and computing predicate template inference rule scores based on similarity between these vectors. [sent-125, score-1.18]
76 This method is denoted LDA-K, where K is the number of topics in the model. [sent-127, score-0.125]
77 , 2008) of the rule application ranking computed by this method. [sent-129, score-0.185]
78 We varied the degree of the lexical expansion in our model and the number of topics in the topic model baseline to analyze their effect on the performance of these methods on our datasets. [sent-132, score-0.549]
79 We note that in our model a greater degree of lexical expansion cor3Experiments with Cosine as the template similarity measure instead of Lin for both DIRT and its expanded versions yielded similar results. [sent-133, score-0.923]
80 This is indeed the dataset where we expected expansion to affect most due the extreme sparseness of argument vectors. [sent-140, score-0.802]
81 The above shows that expansion is effective for improving rule learning between infrequent templates. [sent-147, score-0.564]
82 Furthermore, the fact that DIRT-LE-CoverN outperforms DIRT-LE-Lin-N suggests that using directional expansions, which are biased to generalizations of the observed argument words, e. [sent-148, score-0.429]
83 vehicle as an expansion for car, is more effective than using symmetrically related words, such as bicycle or automobile. [sent-150, score-0.519]
84 These results indicate that LDA is less effective than our expansion approach. [sent-160, score-0.37]
85 286 Figure 1: MAP scores on DS-5-35 and DS-5-50 for the original DIRT scheme, denoted DIRT-LE-None, and for the compared smoothing methods as follows. [sent-161, score-0.151]
86 DIRT with varied degrees of lexical expansion is denoted as DIRT-LE-Lin-N and DIRT-LE-Cover-N. [sent-162, score-0.537]
87 The topic model with varied number of topics is denoted as LDA-K. [sent-163, score-0.167]
88 Data labels indicate the expansion degree (N) or the number of LDA topics (K), depending on the tested method. [sent-164, score-0.486]
89 One reason may be that in our model, every expansion set may be viewed as a cluster around a specific word, an outstanding difference in comparison to topics, which provide a global partition over all words. [sent-165, score-0.37]
90 In order to further illustrate our lexical expan- sion scheme we focus on the rule application ‘Captain Cook sail to Australia → Captain Cook depart for Australia’, Awuhsictrha li sa la →be Cleadp as correct in our test set and corresponds to the rule ‘X sail to Y → X depart for Y’ . [sent-167, score-0.971]
91 nOgn { Ctheo outmhberu hand, tpheerroer are m1e8s w, Joorhdns instantiating nth teh eX o stlhoetr o hfa tnhde, predicate ‘depart for’ including {Amanda, Jerry, Michael, mother, queen}. [sent-170, score-0.16]
92 The following are descriptions of some of the argument word expansions performed by DIRTLE-Cover-2 (using the notation LwN defined in Section 3. [sent-173, score-0.432]
93 For instance in this case L2mother = {father, sarah}, which does not identify people as a shhearr,ed sa argument for the rule. [sent-180, score-0.34]
94 6 Conclusions We propose to improve the learning of inference rules between infrequent predicate templates with sparse argument vectors by utilizing a novel scheme that lexically expands argument vectors with semantically similar words. [sent-181, score-1.777]
95 Similarities between argument words are discovered using a dual distributional representation, in which templates are the features. [sent-182, score-0.732]
96 We tested the performance of our expansion approach on rule application datasets that were biased towards rare templates. [sent-183, score-0.674]
97 Our evaluation showed that rule learning with expanded vectors outperformed the baseline learning with original vectors. [sent-184, score-0.479]
98 It also outperformed an LDA-based similarity model that overcomes sparseness via dimensionality reduction. [sent-185, score-0.26]
99 In future work we plan to investigate how our scheme performs when integrated with manually constructed resources for lexical expansion, such as WordNet (Fellbaum, 1998). [sent-186, score-0.135]
100 Using distributional similarity for lexical expansion in knowledge-based word sense disambiguation. [sent-236, score-0.653]
wordName wordTfidf (topN-words)
[('expansion', 0.37), ('argument', 0.34), ('dirt', 0.261), ('templates', 0.251), ('slots', 0.187), ('template', 0.185), ('vectors', 0.168), ('slot', 0.151), ('rule', 0.151), ('similarity', 0.135), ('depart', 0.131), ('sail', 0.119), ('lin', 0.112), ('szpektor', 0.104), ('distributional', 0.099), ('captain', 0.097), ('predicate', 0.095), ('sparseness', 0.092), ('expanded', 0.092), ('expansions', 0.092), ('rare', 0.091), ('bicycle', 0.089), ('lwn', 0.089), ('scheme', 0.086), ('rules', 0.083), ('idan', 0.08), ('extractions', 0.08), ('zeichner', 0.079), ('denoted', 0.076), ('car', 0.075), ('inference', 0.074), ('predicates', 0.073), ('instantiated', 0.07), ('instantiating', 0.065), ('relieve', 0.065), ('drive', 0.064), ('exchanged', 0.063), ('dollars', 0.063), ('pantel', 0.062), ('directional', 0.06), ('ravichandran', 0.06), ('vehicle', 0.06), ('vl', 0.06), ('kurland', 0.06), ('launch', 0.06), ('ido', 0.059), ('biemann', 0.056), ('expand', 0.056), ('dagan', 0.056), ('dinu', 0.055), ('lda', 0.054), ('instantiate', 0.053), ('measure', 0.053), ('vw', 0.053), ('ppw', 0.053), ('shinyama', 0.053), ('redistributed', 0.053), ('euros', 0.053), ('assesses', 0.053), ('sparse', 0.051), ('ww', 0.051), ('arguments', 0.05), ('oren', 0.049), ('topics', 0.049), ('lexical', 0.049), ('kotlerman', 0.046), ('occurrences', 0.045), ('infrequent', 0.043), ('dual', 0.042), ('varied', 0.042), ('semantically', 0.041), ('smoothing', 0.04), ('degree', 0.039), ('vt', 0.039), ('vector', 0.039), ('map', 0.038), ('fader', 0.038), ('count', 0.037), ('measures', 0.037), ('lexically', 0.037), ('cover', 0.035), ('incorrect', 0.035), ('ritter', 0.035), ('vv', 0.035), ('original', 0.035), ('application', 0.034), ('outperformed', 0.033), ('coordination', 0.032), ('evident', 0.032), ('money', 0.032), ('cook', 0.031), ('game', 0.031), ('outperforming', 0.03), ('observed', 0.029), ('unique', 0.029), ('expanding', 0.029), ('similarities', 0.029), ('counts', 0.028), ('pmi', 0.028), ('tested', 0.028)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000002 376 acl-2013-Using Lexical Expansion to Learn Inference Rules from Sparse Data
Author: Oren Melamud ; Ido Dagan ; Jacob Goldberger ; Idan Szpektor
Abstract: Automatic acquisition of inference rules for predicates is widely addressed by computing distributional similarity scores between vectors of argument words. In this scheme, prior work typically refrained from learning rules for low frequency predicates associated with very sparse argument vectors due to expected low reliability. To improve the learning of such rules in an unsupervised way, we propose to lexically expand sparse argument word vectors with semantically similar words. Our evaluation shows that lexical expansion significantly improves performance in comparison to state-of-the-art baselines.
2 0.54285616 27 acl-2013-A Two Level Model for Context Sensitive Inference Rules
Author: Oren Melamud ; Jonathan Berant ; Ido Dagan ; Jacob Goldberger ; Idan Szpektor
Abstract: Automatic acquisition of inference rules for predicates has been commonly addressed by computing distributional similarity between vectors of argument words, operating at the word space level. A recent line of work, which addresses context sensitivity of rules, represented contexts in a latent topic space and computed similarity over topic vectors. We propose a novel two-level model, which computes similarities between word-level vectors that are biased by topic-level context representations. Evaluations on a naturallydistributed dataset show that our model significantly outperforms prior word-level and topic-level models. We also release a first context-sensitive inference rule set.
3 0.17757891 283 acl-2013-Probabilistic Domain Modelling With Contextualized Distributional Semantic Vectors
Author: Jackie Chi Kit Cheung ; Gerald Penn
Abstract: Generative probabilistic models have been used for content modelling and template induction, and are typically trained on small corpora in the target domain. In contrast, vector space models of distributional semantics are trained on large corpora, but are typically applied to domaingeneral lexical disambiguation tasks. We introduce Distributional Semantic Hidden Markov Models, a novel variant of a hidden Markov model that integrates these two approaches by incorporating contextualized distributional semantic vectors into a generative model as observed emissions. Experiments in slot induction show that our approach yields improvements in learning coherent entity clusters in a domain. In a subsequent extrinsic evaluation, we show that these improvements are also reflected in multi-document summarization.
4 0.14709897 306 acl-2013-SPred: Large-scale Harvesting of Semantic Predicates
Author: Tiziano Flati ; Roberto Navigli
Abstract: We present SPred, a novel method for the creation of large repositories of semantic predicates. We start from existing collocations to form lexical predicates (e.g., break ∗) and learn the semantic classes that best f∗it) tahned ∗ argument. Taon idco this, we extract failtl thhee ∗ occurrences ion Wikipedia ewxthraiccht match the predicate and abstract its arguments to general semantic classes (e.g., break BODY PART, break AGREEMENT, etc.). Our experiments show that we are able to create a large collection of semantic predicates from the Oxford Advanced Learner’s Dictionary with high precision and recall, and perform well against the most similar approach.
5 0.13971888 169 acl-2013-Generating Synthetic Comparable Questions for News Articles
Author: Oleg Rokhlenko ; Idan Szpektor
Abstract: We introduce the novel task of automatically generating questions that are relevant to a text but do not appear in it. One motivating example of its application is for increasing user engagement around news articles by suggesting relevant comparable questions, such as “is Beyonce a better singer than Madonna?”, for the user to answer. We present the first algorithm for the task, which consists of: (a) offline construction of a comparable question template database; (b) ranking of relevant templates to a given article; and (c) instantiation of templates only with entities in the article whose comparison under the template’s relation makes sense. We tested the suggestions generated by our algorithm via a Mechanical Turk experiment, which showed a significant improvement over the strongest baseline of more than 45% in all metrics.
6 0.13376676 314 acl-2013-Semantic Roles for String to Tree Machine Translation
7 0.12879317 206 acl-2013-Joint Event Extraction via Structured Prediction with Global Features
8 0.12827027 56 acl-2013-Argument Inference from Relevant Event Mentions in Chinese Argument Extraction
9 0.12758645 189 acl-2013-ImpAr: A Deterministic Algorithm for Implicit Semantic Role Labelling
10 0.1256935 17 acl-2013-A Random Walk Approach to Selectional Preferences Based on Preference Ranking and Propagation
11 0.10830528 57 acl-2013-Arguments and Modifiers from the Learner's Perspective
12 0.10620578 267 acl-2013-PARMA: A Predicate Argument Aligner
13 0.10620268 113 acl-2013-Derivational Smoothing for Syntactic Distributional Semantics
14 0.10207688 98 acl-2013-Cross-lingual Transfer of Semantic Role Labeling Models
15 0.094264813 129 acl-2013-Domain-Independent Abstract Generation for Focused Meeting Summarization
16 0.092010647 43 acl-2013-Align, Disambiguate and Walk: A Unified Approach for Measuring Semantic Similarity
17 0.087190703 104 acl-2013-DKPro Similarity: An Open Source Framework for Text Similarity
18 0.08394482 304 acl-2013-SEMILAR: The Semantic Similarity Toolkit
19 0.078904688 185 acl-2013-Identifying Bad Semantic Neighbors for Improving Distributional Thesauri
20 0.078791156 285 acl-2013-Propminer: A Workflow for Interactive Information Extraction and Exploration using Dependency Trees
topicId topicWeight
[(0, 0.215), (1, 0.065), (2, 0.018), (3, -0.202), (4, -0.086), (5, 0.073), (6, -0.023), (7, 0.047), (8, -0.13), (9, 0.025), (10, 0.088), (11, 0.139), (12, 0.224), (13, 0.017), (14, 0.121), (15, -0.035), (16, 0.123), (17, -0.09), (18, 0.235), (19, 0.083), (20, -0.015), (21, 0.092), (22, -0.084), (23, 0.14), (24, -0.016), (25, 0.075), (26, 0.153), (27, 0.051), (28, 0.096), (29, -0.049), (30, -0.039), (31, -0.144), (32, 0.034), (33, 0.006), (34, -0.035), (35, 0.092), (36, 0.056), (37, -0.024), (38, -0.038), (39, -0.032), (40, -0.096), (41, -0.014), (42, 0.032), (43, -0.081), (44, 0.07), (45, -0.014), (46, 0.048), (47, 0.03), (48, -0.058), (49, 0.017)]
simIndex simValue paperId paperTitle
same-paper 1 0.97131222 376 acl-2013-Using Lexical Expansion to Learn Inference Rules from Sparse Data
Author: Oren Melamud ; Ido Dagan ; Jacob Goldberger ; Idan Szpektor
Abstract: Automatic acquisition of inference rules for predicates is widely addressed by computing distributional similarity scores between vectors of argument words. In this scheme, prior work typically refrained from learning rules for low frequency predicates associated with very sparse argument vectors due to expected low reliability. To improve the learning of such rules in an unsupervised way, we propose to lexically expand sparse argument word vectors with semantically similar words. Our evaluation shows that lexical expansion significantly improves performance in comparison to state-of-the-art baselines.
2 0.91451931 27 acl-2013-A Two Level Model for Context Sensitive Inference Rules
Author: Oren Melamud ; Jonathan Berant ; Ido Dagan ; Jacob Goldberger ; Idan Szpektor
Abstract: Automatic acquisition of inference rules for predicates has been commonly addressed by computing distributional similarity between vectors of argument words, operating at the word space level. A recent line of work, which addresses context sensitivity of rules, represented contexts in a latent topic space and computed similarity over topic vectors. We propose a novel two-level model, which computes similarities between word-level vectors that are biased by topic-level context representations. Evaluations on a naturallydistributed dataset show that our model significantly outperforms prior word-level and topic-level models. We also release a first context-sensitive inference rule set.
3 0.64544451 17 acl-2013-A Random Walk Approach to Selectional Preferences Based on Preference Ranking and Propagation
Author: Zhenhua Tian ; Hengheng Xiang ; Ziqi Liu ; Qinghua Zheng
Abstract: This paper presents an unsupervised random walk approach to alleviate data sparsity for selectional preferences. Based on the measure of preferences between predicates and arguments, the model aggregates all the transitions from a given predicate to its nearby predicates, and propagates their argument preferences as the given predicate’s smoothed preferences. Experimental results show that this approach outperforms several state-of-the-art methods on the pseudo-disambiguation task, and it better correlates with human plausibility judgements.
4 0.59726799 189 acl-2013-ImpAr: A Deterministic Algorithm for Implicit Semantic Role Labelling
Author: Egoitz Laparra ; German Rigau
Abstract: This paper presents a novel deterministic algorithm for implicit Semantic Role Labeling. The system exploits a very simple but relevant discursive property, the argument coherence over different instances of a predicate. The algorithm solves the implicit arguments sequentially, exploiting not only explicit but also the implicit arguments previously solved. In addition, we empirically demonstrate that the algorithm obtains very competitive and robust performances with respect to supervised approaches that require large amounts of costly training data.
5 0.57157141 306 acl-2013-SPred: Large-scale Harvesting of Semantic Predicates
Author: Tiziano Flati ; Roberto Navigli
Abstract: We present SPred, a novel method for the creation of large repositories of semantic predicates. We start from existing collocations to form lexical predicates (e.g., break ∗) and learn the semantic classes that best f∗it) tahned ∗ argument. Taon idco this, we extract failtl thhee ∗ occurrences ion Wikipedia ewxthraiccht match the predicate and abstract its arguments to general semantic classes (e.g., break BODY PART, break AGREEMENT, etc.). Our experiments show that we are able to create a large collection of semantic predicates from the Oxford Advanced Learner’s Dictionary with high precision and recall, and perform well against the most similar approach.
6 0.56610894 314 acl-2013-Semantic Roles for String to Tree Machine Translation
7 0.53208172 283 acl-2013-Probabilistic Domain Modelling With Contextualized Distributional Semantic Vectors
8 0.49095106 285 acl-2013-Propminer: A Workflow for Interactive Information Extraction and Exploration using Dependency Trees
9 0.47295859 269 acl-2013-PLIS: a Probabilistic Lexical Inference System
10 0.47136891 304 acl-2013-SEMILAR: The Semantic Similarity Toolkit
11 0.46565503 267 acl-2013-PARMA: A Predicate Argument Aligner
12 0.44235364 231 acl-2013-Linggle: a Web-scale Linguistic Search Engine for Words in Context
13 0.42524225 57 acl-2013-Arguments and Modifiers from the Learner's Perspective
14 0.41420445 237 acl-2013-Margin-based Decomposed Amortized Inference
15 0.39346075 31 acl-2013-A corpus-based evaluation method for Distributional Semantic Models
16 0.3893708 104 acl-2013-DKPro Similarity: An Open Source Framework for Text Similarity
17 0.38111931 113 acl-2013-Derivational Smoothing for Syntactic Distributional Semantics
18 0.37302244 185 acl-2013-Identifying Bad Semantic Neighbors for Improving Distributional Thesauri
19 0.37135178 12 acl-2013-A New Set of Norms for Semantic Relatedness Measures
20 0.36966664 56 acl-2013-Argument Inference from Relevant Event Mentions in Chinese Argument Extraction
topicId topicWeight
[(0, 0.055), (6, 0.014), (11, 0.522), (15, 0.011), (24, 0.033), (26, 0.021), (28, 0.022), (35, 0.059), (42, 0.021), (48, 0.073), (70, 0.021), (88, 0.023), (90, 0.019), (95, 0.044)]
simIndex simValue paperId paperTitle
Author: Kimi Kaneko ; Yusuke Miyao ; Daisuke Bekki
Abstract: This paper proposes a methodology for generating specialized Japanese data sets for textual entailment, which consists of pairs decomposed into basic sentence relations. We experimented with our methodology over a number of pairs taken from the RITE-2 data set. We compared our methodology with existing studies in terms of agreement, frequencies and times, and we evaluated its validity by investigating recognition accuracy.
2 0.97314286 170 acl-2013-GlossBoot: Bootstrapping Multilingual Domain Glossaries from the Web
Author: Flavio De Benedictis ; Stefano Faralli ; Roberto Navigli
Abstract: We present GlossBoot, an effective minimally-supervised approach to acquiring wide-coverage domain glossaries for many languages. For each language of interest, given a small number of hypernymy relation seeds concerning a target domain, we bootstrap a glossary from the Web for that domain by means of iteratively acquired term/gloss extraction patterns. Our experiments show high performance in the acquisition of domain terminologies and glossaries for three different languages.
Author: Tirthankar Dasgupta
Abstract: In this work we present psycholinguistically motivated computational models for the organization and processing of Bangla morphologically complex words in the mental lexicon. Our goal is to identify whether morphologically complex words are stored as a whole or are they organized along the morphological line. For this, we have conducted a series of psycholinguistic experiments to build up hypothesis on the possible organizational structure of the mental lexicon. Next, we develop computational models based on the collected dataset. We observed that derivationally suffixed Bangla words are in general decomposed during processing and compositionality between the stem . and the suffix plays an important role in the decomposition process. We observed the same phenomena for Bangla verb sequences where experiments showed noncompositional verb sequences are in general stored as a whole in the ML and low traces of compositional verbs are found in the mental lexicon. 1 IInnttrroodduuccttiioonn Mental lexicon is the representation of the words in the human mind and their associations that help fast retrieval and comprehension (Aitchison, 1987). Words are known to be associated with each other in terms of, orthography, phonology, morphology and semantics. However, the precise nature of these relations is unknown. An important issue that has been a subject of study for a long time is to identify the fundamental units in terms of which the mental lexicon is i itkgp .ernet . in organized. That is, whether lexical representations in the mental lexicon are word based or are they organized along morphological lines. For example, whether a word such as “unimaginable” is stored in the mental lexicon as a whole word or do we break it up “un-” , “imagine” and “able”, understand the meaning of each of these constituent and then recombine the units to comprehend the whole word. Such questions are typically answered by designing appropriate priming experiments (Marslen-Wilson et al., 1994) or other lexical decision tasks. The reaction time of the subjects for recognizing various lexical items under appropriate conditions reveals important facts about their organization in the brain. (See Sec. 2 for models of morphological organization and access and related experiments). A clear understanding of the structure and the processing mechanism of the mental lexicon will further our knowledge of how the human brain processes language. Further, these linguistically important and interesting questions are also highly significant for computational linguistics (CL) and natural language processing (NLP) applications. Their computational significance arises from the issue of their storage in lexical resources like WordNet (Fellbaum, 1998) and raises the questions like, how to store morphologically complex words, in a lexical resource like WordNet keeping in mind the storage and access efficiency. There is a rich literature on organization and lexical access of morphologically complex words where experiments have been conducted mainly for derivational suffixed words of English, Hebrew, Italian, French, Dutch, and few other languages (Marslen-Wilson et al., 2008; Frost et al., 1997; Grainger, et al., 1991 ; Drews and Zwitserlood, 1995). However, we do not know of any such investigations for Indian languages, which 123 Sofia, BuPrlgoacreiead, iAngusgu osft 4h-e9 A 2C01L3 S.tu ?c d2en0t1 3Re Ases aorc hiat Wio nrk fsohro Cp,om papguesta 1ti2o3n–a1l2 L9in,guistics are morphologically richer than many of their Indo-European cousins. Moreover, Indian languages show some distinct phenomena like, compound and composite verbs for which no such investigations have been conducted yet. On the other hand, experiments indicate that mental representation and processing of morphologically complex words are not quite language independent (Taft, 2004). Therefore, the findings from experiments in one language cannot be generalized to all languages making it important to conduct similar experimentations in other languages. This work aims to design cognitively motivated computational models that can explain the organization and processing of Bangla morphologically complex words in the mental lexicon. Presently we will concentrate on the following two aspects: OOrrggaanniizzaattiioonn aanndd pprroocceessssiinngg ooff BBaannggllaa PPo o l yy-mmoorrpphheemmiicc wwoorrddss:: our objective here is to determine whether the mental lexicon decomposes morphologically complex words into its constituent morphemes or does it represent the unanalyzed surface form of a word. OOrrggaanniizzaattiioonn aanndd pprroocceessssiinngg ooff BBaannggllaa ccoomm-ppoouunndd vveerrbbss ((CCVV)) :: compound verbs are the subject of much debate in linguistic theory. No consensus has been reached yet with respect to the issue that whether to consider them as unitary lexical units or are they syntactically assembled combinations of two independent lexical units. As linguistic arguments have so far not led to a consensus, we here use cognitive experiments to probe the brain signatures of verb-verb combinations and propose cognitive as well as computational models regarding the possible organization and processing of Bangla CVs in the mental lexicon (ML). With respect to this, we apply the different priming and other lexical decision experiments, described in literature (Marslen-Wilson et al., 1994; Bentin, S. and Feldman, 1990) specifically for derivationally suffixed polymorphemic words and compound verbs of Bangla. Our cross-modal and masked priming experiment on Bangla derivationally suffixed words shows that morphological relatedness between lexical items triggers a significant priming effect, even when the forms are phonologically/orthographically unrelated. These observations are similar to those reported for English and indicate that derivationally suffixed words in Bangla are in general accessed through decomposition of the word into its constituent morphemes. Further, based on the experimental data we have developed a series of computational models that can be used to predict the decomposition of Bangla polymorphemic words. Our evaluation result shows that decom- position of a polymorphemic word depends on several factors like, frequency, productivity of the suffix and the compositionality between the stem and the suffix. The organization of the paper is as follows: Sec. 2 presents related works; Sec. 3 describes experiment design and procedure; Sec. 4 presents the processing of CVs; and finally, Sec. 5 concludes the paper by presenting the future direction of the work. 2 RReellaatteedd WWoorrkkss 2. . 11 RReepprreesseennttaattiioonn ooff ppoollyymmoorrpphheemmiicc wwoorrddss Over the last few decades many studies have attempted to understand the representation and processing of morphologically complex words in the brain for various languages. Most of the studies are designed to support one of the two mutually exclusive paradigms: the full-listing and the morphemic model. The full-listing model claims that polymorphic words are represented as a whole in the human mental lexicon (Bradley, 1980; Butterworth, 1983). On the other hand, morphemic model argues that morphologically complex words are decomposed and represented in terms of the smaller morphemic units. The affixes are stripped away from the root form, which in turn are used to access the mental lexicon (Taft and Forster, 1975; Taft, 1981 ; MacKay, 1978). Intermediate to these two paradigms is the partial decomposition model that argues that different types of morphological forms are processed separately. For instance, the derived morphological forms are believed to be represented as a whole, whereas the representation of the inflected forms follows the morphemic model (Caramazza et al., 1988). Traditionally, priming experiments have been used to study the effects of morphology in language processing. Priming is a process that results in increase in speed or accuracy of response to a stimulus, called the target, based on the occurrence of a prior exposure of another stimulus, called the prime (Tulving et al., 1982). Here, subjects are exposed to a prime word for a short duration, and are subsequently shown a target word. The prime and target words may be morphologically, phonologically or semantically re124 lated. An analysis of the effect of the reaction time of subjects reveals the actual organization and representation of the lexicon at the relevant level. See Pulvermüller (2002) for a detailed account of such phenomena. It has been argued that frequency of a word influences the speed of lexical processing and thus, can serve as a diagnostic tool to observe the nature and organization of lexical representations. (Taft, 1975) with his experiment on English inflected words, argued that lexical decision responses of polymorphemic words depends upon the base word frequency. Similar observation for surface word frequency was also observed by (Bertram et al., 2000;Bradley, 1980;Burani et al., 1987;Burani et al., 1984;Schreuder et al., 1997; Taft 1975;Taft, 2004) where it has been claimed that words having low surface frequency tends to decompose. Later, Baayen(2000) proposed the dual processing race model that proposes that a specific morphologically complex form is accessed via its parts if the frequency of that word is above a certain threshold of frequency, then the direct route will win, and the word will be accessed as a whole. If it is below that same threshold of frequency, the parsing route will win, and the word will be accessed via its parts. 2. . 22 RReepprreesseennttaattiioonn ooff CCoommppoouunndd A compound verb (CV) consists of two verbs (V1 and V2) acting as and expresses a single expression For example, in the sentence VVeerrbbss a sequence of a single verb of meaning. রুটিগুল ো খেল খেল ো (/ruTigulo kheYe phela/) ―bread-plural-the eat and drop-pres. Imp‖ ―Eat the breads‖ the verb sequence “খেল খেল ো (eat drop)” is an example of CV. Compound verbs are a special phenomena that are abundantly found in IndoEuropean languages like Indian languages. A plethora of works has been done to provide linguistic explanations on the formation of such word, yet none so far has led to any consensus. Hook (1981) considers the second verb V2 as an aspectual complex comparable to the auxiliaries. Butt (1993) argues CV formations in Hindi and Urdu are either morphological or syntactical and their formation take place at the argument struc- ture. Bashir (1993) tried to construct a semantic analysis based on “prepared” and “unprepared mind”. Similar findings have been proposed by Pandharipande (1993) that points out V1 and V2 are paired on the basis of their semantic compatibility, which is subject to syntactic constraints. Paul (2004) tried to represent Bangla CVs in terms of HPSG formalism. She proposes that the selection of a V2 by a V1 is determined at the semantic level because the two verbs will unify if and only if they are semantically compatible. Since none of the linguistic formalism could satisfactorily explain the unique phenomena of CV formation, we here for the first time drew our attention towards psycholinguistic and neurolinguistic studies to model the processing of verb-verb combinations in the ML and compare these responses with that of the existing models. 3 TThhee PPrrooppoosseedd AApppprrooaacchheess 3. . 11 TThhee ppssyycchhoolliinngguuiissttiicc eexxppeerriimmeennttss We apply two different priming experiments namely, the cross modal priming and masked priming experiment discussed in (Forster and Davis, 1984; Rastle et al., 2000;Marslen-Wilson et al., 1994; Marslen-Wilson et al., 2008) for Bangla morphologically complex words. Here, the prime is morphologically derived form of the target presented auditorily (for cross modal priming) or visually (for masked priming). The subjects were asked to make a lexical decision whether the given target is a valid word in that language. The same target word is again probed but with a different audio or visual probe called the control word. The control shows no relationship with the target. For example, baYaska (aged) and baYasa (age) is a prime-target pair, for which the corresponding control-target pair could be naYana (eye) and baYasa (age). Similar to (Marslen-Wilson et al., 2008) the masked priming has been conducted for three different SOA (Stimulus Onset Asynchrony), 48ms, 72ms and 120ms. The SOA is measured as the amount of time between the start the first stimulus till the start of the next stimulus. TCM abl-’+ Sse-+ O1 +:-DatjdgnmAshielbatArDu)f(osiAMrawnteihmsgcdaoe)lEx-npgmAchebamr)iD-gnatmprhdiYlbeaA(n ftrTsli,ae(+gnrmdisc)phroielctn)osrelated, and - implies unrelated. There were 500 prime-target and controltarget pairs classified into five classes. Depending on the class, the prime is related to the target 125 either in terms of morphology, semantics, orthography and/or Phonology (See Table 1). The experiments were conducted on 24 highly educated native Bangla speakers. Nineteen of them have a graduate degree and five hold a post graduate degree. The age of the subjects varies between 22 to 35 years. RReessuullttss:: The RTs with extreme values and incorrect decisions were excluded from the data. The data has been analyzed using two ways ANOVA with three factors: priming (prime and control), conditions (five classes) and prime durations (three different SOA). We observe strong priming effects (p<0.05) when the target word is morphologically derived and has a recognizable suffix, semantically and orthographically related with respect to the prime; no priming effects are observed when the prime and target words are orthographically related but share no morphological or semantic relationship; although not statistically significant (p>0.07), but weak priming is observed for prime target pairs that are only semantically related. We see no significant difference between the prime and control RTs for other classes. We also looked at the RTs for each of the 500 target words. We observe that maximum priming occurs for words in [M+S+O+](69%), some priming is evident in [M+S+O-](51%) and [M'+S-O+](48%), but for most of the words in [M-S+O-](86%) and [M-S-O+](92%) no priming effect was observed. 3. . 22 FFrreeqquueennccyy DDiissttrriibbuuttiioonn MMooddeellss ooff MMoo rrpphhoo-llooggiiccaall PPrroocceessssiinngg From the above results we saw that not all polymorphemic words tend to decompose during processing, thus we need to further investigate the processing phenomena of Bangla derived words. One notable means is to identify whether the stem or suffix frequency is involved in the processing stage of that word. For this, we apply different frequency based models to the Bangla polymorphemic words and try to evaluate their performance by comparing their predicted results with the result obtained through the priming experiment. MMooddeell --11:: BBaassee aanndd SSuurrffaaccee wwoorrdd ffrreeqquueennccyy ee ff-ffeecctt -- It states that the probability of decomposition of a Bangla polymorphemic word depends upon the frequency of its base word. Thus, if the stem frequency of a polymorphemic word crosses a given threshold value, then the word will decomposed into its constituent morpheme. Similar claim has been made for surface word frequency model where decomposition depends upon the frequency of the surface word itself. We have evaluated both the models with the 500 words used in the priming experiments discussed above. We have achieved an accuracy of 62% and 49% respectively for base and surface word frequency models. MMooddeell --22:: CCoommbbiinniinngg tthhee bbaassee aanndd ssuurrffaaccee wwoorrdd ffrreeq quueennccyy -- In a pursuit towards an extended model, we combine model 1 and 2 together. We took the log frequencies of both the base and the derived words and plotted the best-fit regression curve over the given dataset. The evaluation of this model over the same set of 500 target words returns an accuracy of 68% which is better than the base and surface word frequency models. However, the proposed model still fails to predict processing of around 32% of words. This led us to further enhance the model. For this, we analyze the role of suffixes in morphological processing. MMooddeell -- 33:: DDeeggrreeee ooff AAffffiixxaattiioonn aanndd SSuuffffiixx PPrroodd-uuccttiivviittyy:: we examine whether the regression analysis between base and derived frequency of Bangla words varies between suffixes and how these variations affect morphological decomposition. With respect to this, we try to compute the degree of affixation between the suffix and the base word. For this, we perform regression analysis on sixteen different Bangla suffixes with varying degree of type and token frequencies. For each suffix, we choose 100 different derived words. We observe that those suffixes having high value of intercept are forming derived words whose base frequencies are substantially high as compared to their derived forms. Moreover we also observe that high intercept value for a given suffix indicates higher inclination towards decomposition. Next, we try to analyze the role of suffix type/token ratio and compare them with the base/derived frequency ratio model. This has been done by regression analysis between the suffix type-token ratios with the base-surface frequency ratio. We further tried to observe the role of suffix productivity in morphological processing. For this, we computed the three components of productivity P, P* and V as discussed in (Hay and Plag, 2004). P is the “conditioned degree of productivity” and is the probability that we are encountering a word with an affix and it is representing a new type. P* is the “hapaxedconditioned degree of productivity”. It expresses the probability that when an entirely new word is 126 encountered it will contain the suffix. V is the “type frequency”. Finally, we computed the productivity of a suffix through its P, P* and V values. We found that decomposition of Bangla polymorphemic word is directly proportional to the productivity of the suffix. Therefore, words that are composed of productive suffixes (P value ranges between 0.6 and 0.9) like “-oYAlA”, “-giri”, “-tba” and “-panA” are highly decomposable than low productive suffixes like “-Ani”, “-lA”, “-k”, and “-tama”. The evaluation of the proposed model returns an accuracy of 76% which comes to be 8% better than the preceding models. CCoommbbiinniinngg MMooddeell --22 aanndd MMooddeell -- 33:: One important observation that can be made from the above results is that, model-3 performs best in determining the true negative values. It also possesses a high recall value of (85%) but having a low precision of (50%). In other words, the model can predict those words for which decomposition will not take place. On the other hand, results of Model-2 posses a high precision of 70%. Thus, we argue that combining the above two models can better predict the decomposition of Bangla polymorphemic words. Hence, we combine the two models together and finally achieved an overall accuracy of 80% with a precision of 87% and a recall of 78%. This surpasses the performance of the other models discussed earlier. However, around 22% of the test words were wrongly classified which the model fails to justify. Thus, a more rigorous set of experiments and data analysis are required to predict access mechanisms of such Bangla polymorphemic words. 3. . 33 SStteemm- -SSuuffffiixx CCoommppoossiittiioonnaalliittyy Compositionality refers to the fact that meaning of a complex expression is inferred from the meaning of its constituents. Therefore, the cost of retrieving a word from the secondary memory is directly proportional to the cost of retrieving the individual parts (i.e the stem and the suffix). Thus, following the work of (Milin et al., 2009) we define the compositionality of a morphologically complex word (We) as: C(We)=α 1H(We)+α α2H(e)+α α3H(W|e)+ α4H(e|W) Where, H(x) is entropy of an expression x, H(W|e) is the conditional entropy between the stem W and suffix e and is the proportionality factor whose value is computed through regression analysis. Next, we tried to compute the compositionality of the stem and suffixes in terms of relative entropy D(W||e) and Point wise mutual information (PMI). The relative entropy is the measure of the distance between the probability distribution of the stem W and the suffix e. The PMI measures the amount of information that one random variable (the stem) contains about the other (the suffix). We have compared the above three techniques with the actual reaction time data collected through the priming and lexical decision experiment. We observed that all the three information theoretic models perform much better than the frequency based models discussed in the earlier section, for predicting the decomposability of Bangla polymorphemic words. However, we think it is still premature to claim anything concrete at this stage of our work. We believe much more rigorous experiments are needed to be per- formed in order to validate our proposed models. Further, the present paper does not consider factors related to age of acquisition, and word familiarity effects that plays important role in the processing of morphologically complex words. Moreover, it is also very interesting to see how stacking of multiple suffixes in a word are processed by the human brain. 44 OOrrggaanniizzaattiioonn aanndd PPrroocceessssiinngg ooff CCoomm-ppoouunndd VVeerrbbss iinn tthhee MMeennttaall LLeexxiiccoonn Compound verbs, as discussed above, are special type of verb sequences consisting of two or more verbs acting as a single verb and express a single expression of meaning. The verb V1 is known as pole and V2 is called as vector. For example, “ওঠে পড়া ” (getting up) is a compound verb where individual words do not entirely reflects the meaning of the whole expression. However, not all V1+V2 combinations are CVs. For example, expressions like, “নিঠে য়াও ”(take and then go) and “ নিঠে আঠ ়া” (return back) are the examples of verb sequences where meaning of the whole expression can be derived from the mean- ing of the individual component and thus, these verb sequences are not considered as CV. The key question linguists are trying to identify for a long time and debating a lot is whether to consider CVs as a single lexical units or consider them as two separate units. Since linguistic rules fails to explain the process, we for the first time tried to perform cognitive experiments to understand the organization and processing of such verb sequences in the human mind. A clear understanding about these phenomena may help us to classify or extract actual CVs from other verb 127 sequences. In order to do so, presently we have applied three different techniques to collect user data. In the first technique, we annotated 4500 V1+V2 sequences, along with their example sentences, using a group of three linguists (the expert subjects). We asked the experts to classify the verb sequences into three classes namely, CV, not a CV and not sure. Each linguist has received 2000 verb pairs along with their respective example sentences. Out of this, 1500 verb sequences are unique to each of them and rest 500 are overlapping. We measure the inter annotator agreement using the Fleiss Kappa (Fleiss et al., 1981) measure (κ) where the agreement lies around 0.79. Next, out of the 500 common verb sequences that were annotated by all the three linguists, we randomly choose 300 V1+V2 pairs and presented them to 36 native Bangla speakers. We ask each subjects to give a compositionality score of each verb sequences under 1-10 point scale, 10 being highly compositional and 1 for noncompositional. We found an agreement of κ=0.69 among the subjects. We also observe a continuum of compositionality score among the verb sequences. This reflects that it is difficult to classify Bangla verb sequences discretely into the classes of CV and not a CV. We then, compare the compositionality score with that of the expert user’s annotation. We found a significant correlation between the expert annotation and the compositionality score. We observe verb sequences that are annotated as CVs (like, খেঠে খিল )কঠে খি ,ওঠে পড ,have got low compositionality score (average score ranges between 1-4) on the other hand high compositional values are in general tagged as not a cv (নিঠে য়া (come and get), নিঠে আে (return back), তুঠল খেঠেনি (kept), গনিঠে পিল (roll on floor)). This reflects that verb sequences which are not CV shows high degree of compositionality. In other words non CV verbs can directly interpret from their constituent verbs. This leads us to the possibility that compositional verb sequences requires individual verbs to be recognized separately and thus the time to recognize such expressions must be greater than the non-compositional verbs which maps to a single expression of meaning. In order to validate such claim we perform a lexical decision experiment using 32 native Bangla speakers with 92 different verb sequences. We followed the same experimental procedure as discussed in (Taft, 2004) for English polymorphemic words. However, rather than derived words, the subjects were shown a verb sequence and asked whether they recognize them as a valid combination. The reaction time (RT) of each subject is recorded. Our preliminarily observation from the RT analysis shows that as per our claim, RT of verb sequences having high compositionality value is significantly higher than the RTs for low or noncompositional verbs. This proves our hypothesis that Bangla compound verbs that show less compositionality are stored as a hole in the mental lexicon and thus follows the full-listing model whereas compositional verb phrases are individually parsed. However, we do believe that our experiment is composed of a very small set of data and it is premature to conclude anything concrete based only on the current experimental results. 5 FFuuttuurree DDiirreeccttiioonnss In the next phase of our work we will focus on the following aspects of Bangla morphologically complex words: TThhee WWoorrdd FFaammiilliiaarriittyy EEffffeecctt:: Here, our aim is to study the role of familiarity of a word during its processing. We define the familiarity of a word in terms of corpus frequency, Age of acquisition, the level of language exposure of a person, and RT of the word etc. RRoollee ooff ssuuffffiixx ttyyppeess iinn mmoorrpphhoollooggiiccaall ddeeccoo mm ppoo-ssiittiioonn:: For native Bangla speakers which morphological suffixes are internalized and which are just learnt in school, but never internalized. We can compare the representation of Native, Sanskrit derived and foreign suffixes in Bangla words. CCoommppuuttaattiioonnaall mmooddeellss ooff oorrggaanniizzaattiioonn aanndd pprroocceessssiinngg ooff BBaannggllaa ccoommppoouunndd vveerrbbss :: presently we have performed some small set of experiments to study processing of compound verbs in the mental lexicon. In the next phase of our work we will extend the existing experiments and also apply some more techniques like, crowd sourcing and language games to collect more relevant RT and compositionality data. Finally, based on the collected data we will develop computational models that can explain the possible organizational structure and processing mechanism of morphologically complex Bangla words in the mental lexicon. Reference Aitchison, J. (1987). ―Words in the mind: An introduction to the mental lexicon‖. Wiley-Blackwell, 128 Baayen R. H. (2000). ―On frequency, transparency and productivity‖. G. Booij and J. van Marle (eds), Yearbook of Morphology, pages 181-208, Baayen R.H. (2003). ―Probabilistic approaches to morphology‖. Probabilistic linguistics, pages 229287. Baayen R.H., T. Dijkstra, and R. Schreuder. (1997). ―Singulars and plurals in dutch: Evidence for a parallel dual-route model‖. Journal of Memory and Language, 37(1):94-1 17. Bashir, E. (1993), ―Causal Chains and Compound Verbs.‖ In M. K. Verma ed. (1993). Bentin, S. & Feldman, L.B. (1990). The contribution of morphological and semantic relatedness to repetition priming at short and long lags: Evidence from Hebrew. Quarterly Journal of Experimental Psychology, 42, pp. 693–71 1. Bradley, D. (1980). Lexical representation of derivational relation, Juncture, Saratoga, CA: Anma Libri, pp. 37-55. Butt, M. (1993), ―Conscious choice and some light verbs in Urdu.‖ In M. K. Verma ed. (1993). Butterworth, B. (1983). Lexical Representation, Language Production, Vol. 2, pp. 257-294, San Diego, CA: Academic Press. Caramazza, A., Laudanna, A. and Romani, C. (1988). Lexical access and inflectional morphology. Cognition, 28, pp. 297-332. Drews, E., and Zwitserlood, P. (1995).Morphological and orthographic similarity in visual word recognition. Journal of Experimental Psychology:HumanPerception andPerformance, 21, 1098– 1116. Fellbaum, C. (ed.). (1998). WordNet: An Electronic Lexical Database, MIT Press. Forster, K.I., and Davis, C. (1984). Repetition priming and frequency attenuation in lexical access. Journal of Experimental Psychology: Learning, Memory, and Cognition, 10, 680–698. Frost, R., Forster, K.I., & Deutsch, A. (1997). What can we learn from the morphology of Hebrew? A masked-priming investigation of morphological representation. Journal of Experimental Psychology: Learning, Memory, and Cognition, 23, 829–856. Grainger, J., Cole, P., & Segui, J. (1991). Masked morphological priming in visual word recognition. Journal of Memory and Language, 30, 370–384. Hook, P. E. (1981). ―Hindi Structures: Intermediate Level.‖ Michigan Papers on South and Southeast Asia, The University of Michigan Center for South and Southeast Studies, Ann Arbor, Michigan. Joseph L Fleiss, Bruce Levin, and Myunghee Cho Paik. 1981. The measurement of interrater agreement. Statistical methods for rates and proportions,2:212–236. MacKay,D.G.(1978), Derivational rules and the internal lexicon. Journal of Verbal Learning and Verbal Behavior, 17, pp.61-71. Marslen-Wilson, W.D., & Tyler, L.K. (1997). Dissociating types of mental computation. Nature, 387, pp. 592–594. Marslen-Wilson, W.D., & Tyler, L.K. (1998). Rules, representations, and the English past tense. Trends in Cognitive Sciences, 2, pp. 428–435. Marslen-Wilson, W.D., Tyler, L.K., Waksler, R., & Older, L. (1994). Morphology and meaning in the English mental lexicon. Psychological Review, 101, pp. 3–33. Marslen-Wilson,W.D. and Zhou,X.( 1999). Abstractness, allomorphy, and lexical architecture. Language and Cognitive Processes, 14, 321–352. Milin, P., Kuperman, V., Kosti´, A. and Harald R., H. (2009). Paradigms bit by bit: an information- theoretic approach to the processing of paradigmatic structure in inflection and derivation, Analogy in grammar: Form and acquisition, pp: 214— 252. Pandharipande, R. (1993). ―Serial verb construction in Marathi.‖ In M. K. Verma ed. (1993). Paul, S. (2004). An HPSG Account of Bangla Compound Verbs with LKB Implementation, Ph.D. Dissertation. CALT, University of Hyderabad. Pulvermüller, F. (2002). The Neuroscience guage. Cambridge University Press. of Lan- Stolz, J.A., and Feldman, L.B. (1995). The role of orthographic and semantic transparency of the base morpheme in morphological processing. In L.B. Feldman (Ed.) Morphological aspects of language processing. Hillsdale, NJ: Lawrence Erlbaum Associates Inc. Taft, M., and Forster, K.I.(1975). Lexical storage and retrieval of prefix words. Journal of Verbal Learning and Verbal Behavior, Vol.14, pp. 638-647. Taft, M.(1988). A morphological decomposition model of lexical access. Linguistics, 26, pp. 657667. Taft, M. (2004). Morphological decomposition and the reverse base frequency effect. Quarterly Journal of Experimental Psychology, 57A, pp. 745-765 Tulving, E., Schacter D. L., and Heather A.(1982). Priming Effects in Word Fragment Completion are independent of Recognition Memory. Journal of Experimental Psychology: Learning, Memory and Cognition, vol.8 (4). 129
same-paper 4 0.95679563 376 acl-2013-Using Lexical Expansion to Learn Inference Rules from Sparse Data
Author: Oren Melamud ; Ido Dagan ; Jacob Goldberger ; Idan Szpektor
Abstract: Automatic acquisition of inference rules for predicates is widely addressed by computing distributional similarity scores between vectors of argument words. In this scheme, prior work typically refrained from learning rules for low frequency predicates associated with very sparse argument vectors due to expected low reliability. To improve the learning of such rules in an unsupervised way, we propose to lexically expand sparse argument word vectors with semantically similar words. Our evaluation shows that lexical expansion significantly improves performance in comparison to state-of-the-art baselines.
5 0.95076722 50 acl-2013-An improved MDL-based compression algorithm for unsupervised word segmentation
Author: Ruey-Cheng Chen
Abstract: We study the mathematical properties of a recently proposed MDL-based unsupervised word segmentation algorithm, called regularized compression. Our analysis shows that its objective function can be efficiently approximated using the negative empirical pointwise mutual information. The proposed extension improves the baseline performance in both efficiency and accuracy on a standard benchmark.
6 0.95064837 26 acl-2013-A Transition-Based Dependency Parser Using a Dynamic Parsing Strategy
7 0.91315764 71 acl-2013-Bootstrapping Entity Translation on Weakly Comparable Corpora
8 0.87902939 61 acl-2013-Automatic Interpretation of the English Possessive
9 0.82097012 245 acl-2013-Modeling Human Inference Process for Textual Entailment Recognition
10 0.72512972 156 acl-2013-Fast and Adaptive Online Training of Feature-Rich Translation Models
11 0.71554214 242 acl-2013-Mining Equivalent Relations from Linked Data
12 0.71422482 27 acl-2013-A Two Level Model for Context Sensitive Inference Rules
13 0.70728493 358 acl-2013-Transition-based Dependency Parsing with Selectional Branching
14 0.64762843 154 acl-2013-Extracting bilingual terminologies from comparable corpora
15 0.64265001 208 acl-2013-Joint Inference for Heterogeneous Dependency Parsing
16 0.62651145 202 acl-2013-Is a 204 cm Man Tall or Small ? Acquisition of Numerical Common Sense from the Web
17 0.62253457 387 acl-2013-Why-Question Answering using Intra- and Inter-Sentential Causal Relations
18 0.62208152 169 acl-2013-Generating Synthetic Comparable Questions for News Articles
19 0.60838151 332 acl-2013-Subtree Extractive Summarization via Submodular Maximization
20 0.60765898 274 acl-2013-Parsing Graphs with Hyperedge Replacement Grammars