acl acl2013 acl2013-213 knowledge-graph by maker-knowledge-mining

213 acl-2013-Language Acquisition and Probabilistic Models: keeping it simple

Source: pdf

Author: Aline Villavicencio ; Marco Idiart ; Robert Berwick ; Igor Malioutov

Abstract: Hierarchical Bayesian Models (HBMs) have been used with some success to capture empirically observed patterns of under- and overgeneralization in child language acquisition. However, as is well known, HBMs are “ideal”learningsystems, assumingaccess to unlimited computational resources that may not be available to child language learners. Consequently, it remains crucial to carefully assess the use of HBMs along with alternative, possibly simpler, candidate models. This paper presents such an evaluation for a language acquisi- tion domain where explicit HBMs have been proposed: the acquisition of English dative constructions. In particular, we present a detailed, empiricallygrounded model-selection comparison of HBMs vs. a simpler alternative based on clustering along with maximum likelihood estimation that we call linear competition learning (LCL). Our results demonstrate that LCL can match HBM model performance without incurring on the high computational costs associated with HBMs.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Abstract Hierarchical Bayesian Models (HBMs) have been used with some success to capture empirically observed patterns of under- and overgeneralization in child language acquisition. [sent-10, score-0.412]

2 However, as is well known, HBMs are “ideal”learningsystems, assumingaccess to unlimited computational resources that may not be available to child language learners. [sent-11, score-0.085]

3 This paper presents such an evaluation for a language acquisi- tion domain where explicit HBMs have been proposed: the acquisition of English dative constructions. [sent-13, score-0.239]

4 a simpler alternative based on clustering along with maximum likelihood estimation that we call linear competition learning (LCL). [sent-15, score-0.304]

5 1 Introduction In recent years, with advances in probability and estimation theory, there has been much interest in Bayesian models (BMs) (Chater, Tenenbaum, and Yuille, 2006; Jones and Love, 2011) and their application to child language acquisition with its challenging com. [sent-17, score-0.313]

6 They can readily handle the evident noise and ambiguity of acquisition input, while at the same time providing efficiency via priors that mirror known pre-existing language biases. [sent-20, score-0.113]

7 dative alternations in English (Hsu and Chater, 2010; Perfors, Tenenbaum, and Wonnacott, 2010), and verb frames in a controlled artificial language (Wonnacott, Newport, and Tanenhaus, 2008). [sent-23, score-0.399]

8 HBMs can thus be viewed as providing a “rational” upper bound on language learn- ability, yielding optimal models that account for observed data while minimizing any required prior information. [sent-24, score-0.07]

9 In addition, the clustering implicit in HBM modeling introduces additional parameters that canbe tuned to specific data patterns. [sent-25, score-0.07]

10 For instance, in terms of verb learning, this could 1321 Proce dingsS o f ita h,e B 5u1lgsta Arinan,u Aaulg Musete 4ti-n9g 2 o0f1 t3h. [sent-28, score-0.148]

11 The focus should be on finding the simplest statistical models consistent with a given behavior, particularly one that aligns with known cognitive limitations. [sent-36, score-0.127]

12 In the case of many language acquisition tasks this behavior often takes the form of overgeneralization, but with eventual convergence to some target language given exposure to more data. [sent-37, score-0.199]

13 In terms of Marr’s hierarchy (Marr, 1982) learning verb alternations is an abstract computational problem (Marr’s type I), solvable by many type II methods combining representations (models, viz. [sent-40, score-0.197]

14 Any algorithm that approximates an HBM can be viewed as implementing a somewhat different underlying model; if it replicates HBM prediction performance but is simpler and less computationally complex then we assume it is preferable. [sent-45, score-0.088]

15 This paper is organized as follows: we start with a discussion of formalizations of language acquisition tasks, §2. [sent-46, score-0.147]

16 c 2 Evidence in Language Acquisition Afamiliarproblemforlanguageacquisitionis how children learn which verbs participate in so-called dative alternations, exemplified by the child-produced sentences 1to 3, from the Brown (1973) corpus in CHILDES (MacWhinney, 1995). [sent-51, score-0.302]

17 you took me three scrambled eggs (a direct object dative (DOD) from Adam at age 3;6) 2. [sent-53, score-0.126]

18 (a prepositional dative (PD) from Adam at age 4;7) 3. [sent-55, score-0.126]

19 For example, in sentence (1), the child Adam uses take as a DOD before any recorded occurrence of a similar use of take in adult speech to Adam. [sent-57, score-0.131]

20 Such verbs alternate because they can also occur with a prepositional form, as in sentence (2). [sent-58, score-0.14]

21 However, sometimes a child’s use of verbs like 1322 these amounts to an overgeneralization that is, their productive use of a verb in a pattern that does not occur in the adult grammar, as in sentence (3), above. [sent-59, score-0.692]

22 Faced with these two verb frames the task for the learner is to decide for a particular verb if it is a non-alternating DOD only verb, a PD only verb, or an alternating verb that allows both forms. [sent-60, score-0.567]

23 On the assumption that children only receive positive examples of verb forms, then it is not clear how they might recover from the overgeneralization exhibited in sentence (3) above, because they will never receive positive sentences from adults like (3), using fix in a DOD form. [sent-62, score-0.511]

24 As has long been noted, if negative examples were systematically available to learners, then this problem would be solved, since the child would be given evidence that the DOD form is not possible in the adult grammar. [sent-63, score-0.297]

25 However, although parental correction could be considered to be a source of negative evidence, it is neither systematic nor generally available to all children (Marcus, 1993). [sent-64, score-0.072]

26 One alternative solution to Baker’s paradox that has been widely discussed at least since Chomsky (1981) is the use of indirect negative evidence. [sent-67, score-0.162]

27 On the indirect negative evidence model, if a verb is not found where it would be expected to occur, the learner may con- clude it is not part of the adult grammar. [sent-68, score-0.468]

28 Different formalizations of indirect negative evidence have been incorporated in several computational learning models for learning e. [sent-70, score-0.3]

29 , 2010); dative verbs (Perfors, Tenenbaum, and Wonnacott, 2010; Hsu and Chater, 2010); and multiword verbs (Nematzadeh, Fazly, and Stevenson, 2013). [sent-73, score-0.406]

30 Since a number of closely related models can all implement the indirect negative evidence approach, the decision of which one to choose for a given task may not be entirely clear. [sent-74, score-0.266]

31 1 Dative Corpora To emulate a child language acquisition environment we use naturalistic longitudinal child-directed data, from the Brown corpus in CHILDES, for one child (Adam) for a subset of 19 verbs in the DOD and PD verb frames, figure 1. [sent-77, score-0.596]

32 Model comparison requires a gold standard database for acquisition, reporting which frames have been learned for which verbs at each stage, and how likely a child is of making creative uses of a particular verb in a new frame. [sent-79, score-0.449]

33 Absent this, a first step is demonstrating that simpler alternative models can replicate HBM performance on their own terms. [sent-84, score-0.137]

34 Further, since it has been argued that very low frequency verbs may not yet be firmly placed in a child’s lexicon (Yang, 2010; Gropen et al. [sent-87, score-0.218]

35 , 1989), at each epoch we also impose a low-frequency threshold of 5 occurrences, considering only verbs that the learner has seen at least 5 times. [sent-88, score-0.293]

36 2 The learners We selected a set of representative statistical models that are capable in principle of solving this classification task, ranging from what is perhaps the simplest possible, a simple binomial, all the way to multi-level hierarchical Bayesian approaches. [sent-94, score-0.072]

37 A Binomial distribution serves as the simplest model for capturing the behavior of a verb occurring in either DOD or PD frame. [sent-95, score-0.239]

38 Representing the probability of DOD as θ, after n occurrences of the verb the probability that y of them are DOD is: p( y| θ,n) = yn! [sent-96, score-0.237]

39 Ruling out events on the grounds that they did not occur in a finite data set early in learning may be too strong though it should be noted that this is simply one (overly strong) version of the indirect negative evidence position. [sent-99, score-0.227]

40 Again as is familiar, to overcome zero count problem, models adopt one or another method of smoothing to assign a small probability mass to unseen events. [sent-100, score-0.155]

41 In a Bayesian formulation, this amounts to assigning non– zero probability mass to some set of priors; smoothing also captures the notion of generalization, making predictions about data that has never been seen by the learner. [sent-101, score-0.185]

42 As we discuss below, class-based estimates arise from one or another clustering method, and can produce more accurate estimates for less frequent verbs based on patterns already learned for more frequent verbs in the same class; see (Perfors, Tenenbaum, and Wonnacott, 2010). [sent-104, score-0.35]

43 In this case, smoothing is a sideeffect of the behavior of a class as a whole. [sent-105, score-0.151]

44 µµ Whenlearningbegins, thepriorprobability is the only source of information for a learner and, as such, dominates the value ofthe posterior probability. [sent-106, score-0.111]

45 However, in the large sample limit, it is the likelihood that dominates the posterior distribution regardless of the prior. [sent-107, score-0.102]

46 Then there are classes of different αk, βk, and the probabilities for the DOD frame for the different verbs (θik) are drawn according to the classes k assigned to them. [sent-111, score-0.238]

47 This complexity also calls into question the cognitive fidelity of such approaches. [sent-119, score-0.083]

48 3 is particularly interesting because by fixing α and β (instead of averaging over them) it is possible to deduce simpler (and classical) models: MLE corresponds to α = 0; the so called “add-one” smoothing (referred in this paper as L1) corresponds to α = 2 and β = 1/2. [sent-121, score-0.121]

49 3 it is also clear that if α and β (or their distributions) are unchanged, as the evidence of a verb grows (n → ∞), the HBM estidmeantece approaches MowLsE (’ns, →(θH ∞BM), →he HθMBLME). [sent-123, score-0.278]

50 e sOtinthe other hand, when α >> n, θ→HB θM ∼ β, so that β can be interpreted as a prior valu∼e fβo,r s θo in the low frequency limit. [sent-124, score-0.109]

51 Following this reasoning, we propose an alternative approach, a linear competition learner (LCL), that explicitly models the behavior of a given verb as the linear competition between the evidence for the verb, and the average behavior of verbs of the same class. [sent-125, score-0.755]

52 As clustering is defined independently from parameter estimation, the advantages of the proposed approach are twofold. [sent-126, score-0.07]

53 Second, differently from HBMs where the same attributes are used for clustering and parameter estimation (in this case the DOD and PD counts for each verb), in LCL cluster1325 ing may be done using more general contexts that employ a variety of linguistic and environmental attributes. [sent-128, score-0.118]

54 For LCL the prior and class-based information are incorporated as: θLCL=ynii++ αC αβCC (4) where αC and βC are defined via justifiable heuristic expressions dependent solely on the statistics of the class attributed to each verb i. [sent-129, score-0.214]

55 The strength of the prior (αC) is a monotonic function of the number of elements (mC) in the class C, excluding the target verb vi. [sent-130, score-0.214]

56 1 (5) with the strength of the prior for the LCL model depending on the number of verbs in the class, not on their frequency. [sent-132, score-0.171]

57 For comparative purposes, in this paper we examine alternative models for (a) probability estimation and (b) clustering. [sent-139, score-0.15]

58 The models are the following: • • • two models without clusters: MLE and tLw1;o two models where clusters are performed independently: LrCecLl uasntder MLEαβ; afnordm the full HBM described before. [sent-140, score-0.161]

59 With clustering, same-cluster verbs share some parameters, influencing one another. [sent-145, score-0.14]

60 In HBMs, clustering and probability estimation are calculated jointly. [sent-147, score-0.146]

61 In the other models these two estimates are calculated separately, permitting ’plug-and-play’ use of external clustering methods, like X-means (Pelleg and Moore, 2000)1. [sent-148, score-0.109]

62 However, to further assess the impact of cluster assignment on alternative model performance, we also used the clusters that maximize the evidence of the HBM for the DOD and PD counts of the target verbs, and we refer to these as Maximum Evidence (ME) clusters. [sent-149, score-0.209]

63 In MWE clusters, verbs are separated into 3 classes: one if they have counts for both frames; another for only the DOD frame; and a final for only the PD frame. [sent-150, score-0.14]

64 In this context, overgeneralization can be viewed as the model’s predictions that a given verb seen only in one frame (say, a PD) can also occur in the other (say, a DOD) as well, and it decreases as the learner receives more data. [sent-152, score-0.606]

65 The other 3 models fall somewhere in between, overgeneralizing beyond the observed data, using the prior and class-based smoothing to assign some (low) probability mass to an unseen verb-frame pair. [sent-154, score-0.186]

66 nz/ml/weka/ 1326 predictions for each of the target verbs in the DOD frame, given the full corpus, are in figure 3. [sent-160, score-0.178]

67 In either end of the figure are the verbs that were attested in only one of the frames (PD only at the left-hand end, and DOD only at the right-hand end). [sent-161, score-0.216]

68 Figure 4: Probability of verbs in DOD frame, Low Frequency Threshold. [sent-164, score-0.14]

69 To examine how overgeneralization progresses during the course of learning as the models were exposed to increasing amounts of data, we used the corpus divided by cumulative epochs, as described in §3. [sent-165, score-0.423]

70 he F ofrra emacehs were divided in 5 frequency bins, and the models were assessed as to how much overgeneralization they displayed for each of these verbs. [sent-169, score-0.418]

71 Following Perfors, Tenenbaum, and Wonnacott (2010) overgeneralization is calculated as the absolute difference between the models predicted θ and θMLE, for each of the epochs, figure 5, and for comparative purposes their alternating/non-alternating classification is also adopted. [sent-170, score-0.366]

72 For non-alternating verbs, overgeneralization reflects the degree of smoothing of each model. [sent-171, score-0.385]

73 As expected, the more frequent a verb is, the more confident the model is in the indirect negative evidence it has for that verb, and the less it overgeneralizes, shown in the lighter bars in all epochs. [sent-172, score-0.375]

74 In addition, the overall effect of larger amounts of data are indicated by a reduction in overgeneralization epoch by epoch. [sent-173, score-0.464]

75 The effects of class-based smoothing can be assessed comparing L1, a model without clustering which displays a constant degree of overgeneralization regardless of the epoch, while HBM uses a distribution over clusters and the other models X-means. [sent-174, score-0.538]

76 If a low-frequency threshold is applied, the differences between the models decrease significantly and so does the degree of overgeneralization in the models’ predictions, as shown in the 3 lighter bars in the figure. [sent-175, score-0.391]

77 To compare the models and provide an overall difference measure, we use the predictions of the more complex model, HBM, as a baseline and then calculate the difference between its predictions and those of the other models. [sent-179, score-0.115]

78 We used three different measures for comparing models, one for their standard difference; one that prioritizes agreement for high frequency verbs; and one that focuses more on low frequency verbs. [sent-180, score-0.16]

79 To focus on high frequency verbs, we also define the Weighted Difference between two models as: Here we expect Dn < D since models tend to 1327 agree as the amount of evidence for each verb increases. [sent-184, score-0.408]

80 Conversely, our third measure, denoted Inverted, prioritizes the agreement between two models on low frequency verbs, defined as follows: D1/n captures the degree of similarity in overgeneralization between two models. [sent-185, score-0.474]

81 Comparing the prediction agreement, the strong influence of clustering is clear: the models that have compatible clusters have similar performances. [sent-190, score-0.153]

82 For instance, all the models that adopt the ME clusters for the data perform closest to HBMs. [sent-191, score-0.083]

83 The results for these measures become even closer in most cases when the low frequency threshold is adopted, figure 7, as the Figure 6: Model Comparisons. [sent-195, score-0.078]

84 Figure 8: DOD probability evolution for models with increase in evidence evidence reduces the influence of the prior. [sent-197, score-0.327]

85 To examine the decay of overgeneralization with the increase in evidence for these models, two simulated scenarios are defined for a single generic verb: one where the evidence for DOD amounts to 75% of the data (dashed lines) and in the other to 100% (solid lines), figures 9 and 8. [sent-198, score-0.618]

86 Unsurprisingly, the performance of the models is dependent on the amount of evidence available. [sent-199, score-0.169]

87 Ultimately it is the ev1328 number of examples Figure 9: Overgeneralization reduction with increase in evidence idence that dominates the posterior probability. [sent-201, score-0.194]

88 These results suggest that while these models all differ slightly in the degree of overgeneralization for low frequency data and noise, these differences are small, and as evidence reaches approximately 10 examples per verb, the overall performance for all models approaches that of MLE. [sent-203, score-0.613]

89 5 Conclusions and Future Work HBMs have been successfully used for a number of language acquisition tasks capturing both patterns of under- and overgeneralization found in child language acquisition. [sent-204, score-0.525]

90 Their (hyper)parameters provide robustness for dealing with low frequency events, noise, and uncertainty and a good fit to the data, but this fidelity comes at the cost of complex computation. [sent-205, score-0.106]

91 Here we have examined HBMs against computationally simpler approaches to dative alternation acquisition, which implement the indirect negative approach. [sent-206, score-0.311]

92 The results show that the proposed LCL model, in particular, that combines class-based smoothing with maximum likelihood estimation, obtains results comparable to those of HBMs, in a much simpler framework. [sent-208, score-0.159]

93 Moreover, when a cognitively-viable frequency threshold is adopted, differences in the performance of all models decrease, and quite rapidly approach the performance of MLE. [sent-209, score-0.091]

94 In this paper we used standard clustering techniques grounded solely on verb counts to enable comparisonwith previous work. [sent-210, score-0.218]

95 However, a variety of additional linguistic and distributional features could be used for clustering verbs into more semantically motivated classes, using a larger number of frames and verbs. [sent-211, score-0.286]

96 Derivational complexity and the order of acquisition of child’s speech. [sent-231, score-0.113]

97 The learnability and acquisition of the dative alternation in English. [sent-260, score-0.264]

98 Variability, negative evidence, and the acquisition of verb argument constructions. [sent-329, score-0.297]

99 Dynamics of bayesian updating with dependent data and misspecified models. [sent-337, score-0.092]

100 Acquiring and processing verb argument structure: Distributional learning in a miniature language. [sent-353, score-0.148]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('hbms', 0.396), ('hbm', 0.344), ('overgeneralization', 0.327), ('dod', 0.32), ('perfors', 0.207), ('wonnacott', 0.207), ('lcl', 0.189), ('tenenbaum', 0.157), ('verb', 0.148), ('mle', 0.146), ('verbs', 0.14), ('evidence', 0.13), ('dative', 0.126), ('acquisition', 0.113), ('epoch', 0.106), ('bayesian', 0.092), ('pd', 0.088), ('chater', 0.086), ('child', 0.085), ('frames', 0.076), ('clustering', 0.07), ('marr', 0.069), ('simpler', 0.063), ('parisien', 0.061), ('fazly', 0.061), ('indirect', 0.061), ('behavior', 0.058), ('smoothing', 0.058), ('villavicencio', 0.056), ('cognitive', 0.055), ('frequency', 0.052), ('gropen', 0.052), ('kwisthout', 0.052), ('wareham', 0.052), ('competition', 0.05), ('alternations', 0.049), ('estimation', 0.048), ('learner', 0.047), ('adult', 0.046), ('childes', 0.046), ('frame', 0.046), ('clusters', 0.044), ('hsu', 0.042), ('models', 0.039), ('ik', 0.039), ('baker', 0.038), ('predictions', 0.038), ('likelihood', 0.038), ('epochs', 0.038), ('children', 0.036), ('binomial', 0.036), ('dominates', 0.036), ('negative', 0.036), ('class', 0.035), ('alternative', 0.035), ('berwick', 0.034), ('formalizations', 0.034), ('gallistel', 0.034), ('gelman', 0.034), ('grande', 0.034), ('ingram', 0.034), ('lcls', 0.034), ('mommy', 0.034), ('nematzadeh', 0.034), ('pelleg', 0.034), ('shalizi', 0.034), ('ufrgs', 0.034), ('simplest', 0.033), ('suzanne', 0.033), ('occurrences', 0.033), ('inverted', 0.032), ('prior', 0.031), ('amounts', 0.031), ('newport', 0.03), ('afsaneh', 0.03), ('prioritizes', 0.03), ('brazil', 0.03), ('csail', 0.03), ('incurring', 0.03), ('paradox', 0.03), ('rooij', 0.03), ('mass', 0.03), ('stevenson', 0.03), ('brown', 0.029), ('posterior', 0.028), ('eventual', 0.028), ('macwhinney', 0.028), ('fidelity', 0.028), ('sul', 0.028), ('adam', 0.028), ('probability', 0.028), ('monte', 0.027), ('exposed', 0.026), ('classes', 0.026), ('low', 0.026), ('computationally', 0.025), ('decrease', 0.025), ('federal', 0.025), ('learnability', 0.025), ('longitudinal', 0.025)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999994 213 acl-2013-Language Acquisition and Probabilistic Models: keeping it simple

Author: Aline Villavicencio ; Marco Idiart ; Robert Berwick ; Igor Malioutov

2 0.11661205 119 acl-2013-Diathesis alternation approximation for verb clustering

Author: Lin Sun ; Diana McCarthy ; Anna Korhonen

Abstract: Although diathesis alternations have been used as features for manual verb classification, and there is recent work on incorporating such features in computational models of human language acquisition, work on large scale verb classification has yet to examine the potential for using diathesis alternations as input features to the clustering process. This paper proposes a method for approximating diathesis alternation behaviour in corpus data and shows, using a state-of-the-art verb clustering system, that features based on alternation approximation outperform those based on independent subcategorization frames. Our alternation-based approach is particularly adept at leveraging information from less frequent data.

3 0.10665427 192 acl-2013-Improved Lexical Acquisition through DPP-based Verb Clustering

Author: Roi Reichart ; Anna Korhonen

Abstract: Subcategorization frames (SCFs), selectional preferences (SPs) and verb classes capture related aspects of the predicateargument structure. We present the first unified framework for unsupervised learning of these three types of information. We show how to utilize Determinantal Point Processes (DPPs), elegant probabilistic models that are defined over the possible subsets of a given dataset and give higher probability mass to high quality and diverse subsets, for clustering. Our novel clustering algorithm constructs a joint SCF-DPP DPP kernel matrix and utilizes the efficient sampling algorithms of DPPs to cluster together verbs with similar SCFs and SPs. We evaluate the induced clusters in the context of the three tasks and show results that are superior to strong baselines for each 1.

4 0.091212161 366 acl-2013-Understanding Verbs based on Overlapping Verbs Senses

Author: Kavitha Rajan

Abstract: Natural language can be easily understood by everyone irrespective of their differences in age or region or qualification. The existence of a conceptual base that underlies all natural languages is an accepted claim as pointed out by Schank in his Conceptual Dependency (CD) theory. Inspired by the CD theory and theories in Indian grammatical tradition, we propose a new set of meaning primitives in this paper. We claim that this new set of primitives captures the meaning inherent in verbs and help in forming an inter-lingual and computable ontological classification of verbs. We have identified seven primitive overlapping verb senses which substantiate our claim. The percentage of coverage of these primitives is 100% for all verbs in Sanskrit and Hindi and 3750 verbs in English. 1

5 0.077352047 8 acl-2013-A Learner Corpus-based Approach to Verb Suggestion for ESL

Author: Yu Sawai ; Mamoru Komachi ; Yuji Matsumoto

Abstract: We propose a verb suggestion method which uses candidate sets and domain adaptation to incorporate error patterns produced by ESL learners. The candidate sets are constructed from a large scale learner corpus to cover various error patterns made by learners. Furthermore, the model is trained using both a native corpus and the learner corpus via a domain adaptation technique. Experiments on two learner corpora show that the candidate sets increase the coverage of error patterns and domain adaptation improves the performance for verb suggestion.

6 0.071834177 58 acl-2013-Automated Collocation Suggestion for Japanese Second Language Learners

7 0.067100927 186 acl-2013-Identifying English and Hungarian Light Verb Constructions: A Contrastive Approach

8 0.064797238 224 acl-2013-Learning to Extract International Relations from Political Context

9 0.055053454 47 acl-2013-An Information Theoretic Approach to Bilingual Word Clustering

10 0.054533593 265 acl-2013-Outsourcing FrameNet to the Crowd

11 0.054358758 286 acl-2013-Psycholinguistically Motivated Computational Models on the Organization and Processing of Morphologically Complex Words

12 0.051309526 310 acl-2013-Semantic Frames to Predict Stock Price Movement

13 0.051263977 378 acl-2013-Using subcategorization knowledge to improve case prediction for translation to German

14 0.050406873 63 acl-2013-Automatic detection of deception in child-produced speech using syntactic complexity features

15 0.049847778 11 acl-2013-A Multi-Domain Translation Model Framework for Statistical Machine Translation

16 0.04977135 113 acl-2013-Derivational Smoothing for Syntactic Distributional Semantics

17 0.049037576 348 acl-2013-The effect of non-tightness on Bayesian estimation of PCFGs

18 0.048014324 306 acl-2013-SPred: Large-scale Harvesting of Semantic Predicates

19 0.046750516 46 acl-2013-An Infinite Hierarchical Bayesian Model of Phrasal Translation

20 0.046470899 116 acl-2013-Detecting Metaphor by Contextual Analogy

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.129), (1, 0.021), (2, 0.003), (3, -0.06), (4, -0.054), (5, -0.052), (6, -0.012), (7, 0.04), (8, -0.021), (9, -0.002), (10, -0.016), (11, -0.008), (12, -0.02), (13, -0.024), (14, -0.097), (15, -0.055), (16, -0.015), (17, 0.043), (18, 0.065), (19, -0.007), (20, 0.056), (21, 0.018), (22, 0.087), (23, -0.033), (24, 0.128), (25, -0.009), (26, -0.04), (27, 0.001), (28, 0.004), (29, 0.077), (30, 0.072), (31, -0.02), (32, -0.086), (33, -0.044), (34, -0.103), (35, 0.027), (36, -0.087), (37, -0.101), (38, -0.06), (39, 0.054), (40, 0.003), (41, -0.067), (42, -0.041), (43, -0.001), (44, -0.059), (45, 0.045), (46, -0.057), (47, -0.016), (48, -0.039), (49, 0.007)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.93728733 213 acl-2013-Language Acquisition and Probabilistic Models: keeping it simple

Author: Aline Villavicencio ; Marco Idiart ; Robert Berwick ; Igor Malioutov

2 0.84699076 119 acl-2013-Diathesis alternation approximation for verb clustering

Author: Lin Sun ; Diana McCarthy ; Anna Korhonen

3 0.80470902 192 acl-2013-Improved Lexical Acquisition through DPP-based Verb Clustering

Author: Roi Reichart ; Anna Korhonen

4 0.65191144 366 acl-2013-Understanding Verbs based on Overlapping Verbs Senses

Author: Kavitha Rajan

5 0.6002152 286 acl-2013-Psycholinguistically Motivated Computational Models on the Organization and Processing of Morphologically Complex Words

Author: Tirthankar Dasgupta

Abstract: In this work we present psycholinguistically motivated computational models for the organization and processing of Bangla morphologically complex words in the mental lexicon. Our goal is to identify whether morphologically complex words are stored as a whole or are they organized along the morphological line. For this, we have conducted a series of psycholinguistic experiments to build up hypothesis on the possible organizational structure of the mental lexicon. Next, we develop computational models based on the collected dataset. We observed that derivationally suffixed Bangla words are in general decomposed during processing and compositionality between the stem . and the suffix plays an important role in the decomposition process. We observed the same phenomena for Bangla verb sequences where experiments showed noncompositional verb sequences are in general stored as a whole in the ML and low traces of compositional verbs are found in the mental lexicon. 1 IInnttrroodduuccttiioonn Mental lexicon is the representation of the words in the human mind and their associations that help fast retrieval and comprehension (Aitchison, 1987). Words are known to be associated with each other in terms of, orthography, phonology, morphology and semantics. However, the precise nature of these relations is unknown. An important issue that has been a subject of study for a long time is to identify the fundamental units in terms of which the mental lexicon is i itkgp .ernet . in organized. That is, whether lexical representations in the mental lexicon are word based or are they organized along morphological lines. For example, whether a word such as “unimaginable” is stored in the mental lexicon as a whole word or do we break it up “un-” , “imagine” and “able”, understand the meaning of each of these constituent and then recombine the units to comprehend the whole word. Such questions are typically answered by designing appropriate priming experiments (Marslen-Wilson et al., 1994) or other lexical decision tasks. The reaction time of the subjects for recognizing various lexical items under appropriate conditions reveals important facts about their organization in the brain. (See Sec. 2 for models of morphological organization and access and related experiments). A clear understanding of the structure and the processing mechanism of the mental lexicon will further our knowledge of how the human brain processes language. Further, these linguistically important and interesting questions are also highly significant for computational linguistics (CL) and natural language processing (NLP) applications. Their computational significance arises from the issue of their storage in lexical resources like WordNet (Fellbaum, 1998) and raises the questions like, how to store morphologically complex words, in a lexical resource like WordNet keeping in mind the storage and access efficiency. There is a rich literature on organization and lexical access of morphologically complex words where experiments have been conducted mainly for derivational suffixed words of English, Hebrew, Italian, French, Dutch, and few other languages (Marslen-Wilson et al., 2008; Frost et al., 1997; Grainger, et al., 1991 ; Drews and Zwitserlood, 1995). However, we do not know of any such investigations for Indian languages, which 123 Sofia, BuPrlgoacreiead, iAngusgu osft 4h-e9 A 2C01L3 S.tu ?c d2en0t1 3Re Ases aorc hiat Wio nrk fsohro Cp,om papguesta 1ti2o3n–a1l2 L9in,guistics are morphologically richer than many of their Indo-European cousins. Moreover, Indian languages show some distinct phenomena like, compound and composite verbs for which no such investigations have been conducted yet. On the other hand, experiments indicate that mental representation and processing of morphologically complex words are not quite language independent (Taft, 2004). Therefore, the findings from experiments in one language cannot be generalized to all languages making it important to conduct similar experimentations in other languages. This work aims to design cognitively motivated computational models that can explain the organization and processing of Bangla morphologically complex words in the mental lexicon. Presently we will concentrate on the following two aspects:   OOrrggaanniizzaattiioonn aanndd pprroocceessssiinngg ooff BBaannggllaa PPo o l yy-mmoorrpphheemmiicc wwoorrddss:: our objective here is to determine whether the mental lexicon decomposes morphologically complex words into its constituent morphemes or does it represent the unanalyzed surface form of a word. OOrrggaanniizzaattiioonn aanndd pprroocceessssiinngg ooff BBaannggllaa ccoomm-ppoouunndd vveerrbbss ((CCVV)) :: compound verbs are the subject of much debate in linguistic theory. No consensus has been reached yet with respect to the issue that whether to consider them as unitary lexical units or are they syntactically assembled combinations of two independent lexical units. As linguistic arguments have so far not led to a consensus, we here use cognitive experiments to probe the brain signatures of verb-verb combinations and propose cognitive as well as computational models regarding the possible organization and processing of Bangla CVs in the mental lexicon (ML). With respect to this, we apply the different priming and other lexical decision experiments, described in literature (Marslen-Wilson et al., 1994; Bentin, S. and Feldman, 1990) specifically for derivationally suffixed polymorphemic words and compound verbs of Bangla. Our cross-modal and masked priming experiment on Bangla derivationally suffixed words shows that morphological relatedness between lexical items triggers a significant priming effect, even when the forms are phonologically/orthographically unrelated. These observations are similar to those reported for English and indicate that derivationally suffixed words in Bangla are in general accessed through decomposition of the word into its constituent morphemes. Further, based on the experimental data we have developed a series of computational models that can be used to predict the decomposition of Bangla polymorphemic words. Our evaluation result shows that decom- position of a polymorphemic word depends on several factors like, frequency, productivity of the suffix and the compositionality between the stem and the suffix. The organization of the paper is as follows: Sec. 2 presents related works; Sec. 3 describes experiment design and procedure; Sec. 4 presents the processing of CVs; and finally, Sec. 5 concludes the paper by presenting the future direction of the work. 2 RReellaatteedd WWoorrkkss 2. . 11 RReepprreesseennttaattiioonn ooff ppoollyymmoorrpphheemmiicc wwoorrddss Over the last few decades many studies have attempted to understand the representation and processing of morphologically complex words in the brain for various languages. Most of the studies are designed to support one of the two mutually exclusive paradigms: the full-listing and the morphemic model. The full-listing model claims that polymorphic words are represented as a whole in the human mental lexicon (Bradley, 1980; Butterworth, 1983). On the other hand, morphemic model argues that morphologically complex words are decomposed and represented in terms of the smaller morphemic units. The affixes are stripped away from the root form, which in turn are used to access the mental lexicon (Taft and Forster, 1975; Taft, 1981 ; MacKay, 1978). Intermediate to these two paradigms is the partial decomposition model that argues that different types of morphological forms are processed separately. For instance, the derived morphological forms are believed to be represented as a whole, whereas the representation of the inflected forms follows the morphemic model (Caramazza et al., 1988). Traditionally, priming experiments have been used to study the effects of morphology in language processing. Priming is a process that results in increase in speed or accuracy of response to a stimulus, called the target, based on the occurrence of a prior exposure of another stimulus, called the prime (Tulving et al., 1982). Here, subjects are exposed to a prime word for a short duration, and are subsequently shown a target word. The prime and target words may be morphologically, phonologically or semantically re124 lated. An analysis of the effect of the reaction time of subjects reveals the actual organization and representation of the lexicon at the relevant level. See Pulvermüller (2002) for a detailed account of such phenomena. It has been argued that frequency of a word influences the speed of lexical processing and thus, can serve as a diagnostic tool to observe the nature and organization of lexical representations. (Taft, 1975) with his experiment on English inflected words, argued that lexical decision responses of polymorphemic words depends upon the base word frequency. Similar observation for surface word frequency was also observed by (Bertram et al., 2000;Bradley, 1980;Burani et al., 1987;Burani et al., 1984;Schreuder et al., 1997; Taft 1975;Taft, 2004) where it has been claimed that words having low surface frequency tends to decompose. Later, Baayen(2000) proposed the dual processing race model that proposes that a specific morphologically complex form is accessed via its parts if the frequency of that word is above a certain threshold of frequency, then the direct route will win, and the word will be accessed as a whole. If it is below that same threshold of frequency, the parsing route will win, and the word will be accessed via its parts. 2. . 22 RReepprreesseennttaattiioonn ooff CCoommppoouunndd A compound verb (CV) consists of two verbs (V1 and V2) acting as and expresses a single expression For example, in the sentence VVeerrbbss a sequence of a single verb of meaning. রুটিগুল ো খেল খেল ো (/ruTigulo kheYe phela/) ―bread-plural-the eat and drop-pres. Imp‖ ―Eat the breads‖ the verb sequence “খেল খেল ো (eat drop)” is an example of CV. Compound verbs are a special phenomena that are abundantly found in IndoEuropean languages like Indian languages. A plethora of works has been done to provide linguistic explanations on the formation of such word, yet none so far has led to any consensus. Hook (1981) considers the second verb V2 as an aspectual complex comparable to the auxiliaries. Butt (1993) argues CV formations in Hindi and Urdu are either morphological or syntactical and their formation take place at the argument struc- ture. Bashir (1993) tried to construct a semantic analysis based on “prepared” and “unprepared mind”. Similar findings have been proposed by Pandharipande (1993) that points out V1 and V2 are paired on the basis of their semantic compatibility, which is subject to syntactic constraints. Paul (2004) tried to represent Bangla CVs in terms of HPSG formalism. She proposes that the selection of a V2 by a V1 is determined at the semantic level because the two verbs will unify if and only if they are semantically compatible. Since none of the linguistic formalism could satisfactorily explain the unique phenomena of CV formation, we here for the first time drew our attention towards psycholinguistic and neurolinguistic studies to model the processing of verb-verb combinations in the ML and compare these responses with that of the existing models. 3 TThhee PPrrooppoosseedd AApppprrooaacchheess 3. . 11 TThhee ppssyycchhoolliinngguuiissttiicc eexxppeerriimmeennttss We apply two different priming experiments namely, the cross modal priming and masked priming experiment discussed in (Forster and Davis, 1984; Rastle et al., 2000;Marslen-Wilson et al., 1994; Marslen-Wilson et al., 2008) for Bangla morphologically complex words. Here, the prime is morphologically derived form of the target presented auditorily (for cross modal priming) or visually (for masked priming). The subjects were asked to make a lexical decision whether the given target is a valid word in that language. The same target word is again probed but with a different audio or visual probe called the control word. The control shows no relationship with the target. For example, baYaska (aged) and baYasa (age) is a prime-target pair, for which the corresponding control-target pair could be naYana (eye) and baYasa (age). Similar to (Marslen-Wilson et al., 2008) the masked priming has been conducted for three different SOA (Stimulus Onset Asynchrony), 48ms, 72ms and 120ms. The SOA is measured as the amount of time between the start the first stimulus till the start of the next stimulus. TCM abl-’+ Sse-+ O1 +:-DatjdgnmAshielbatArDu)f(osiAMrawnteihmsgcdaoe)lEx-npgmAchebamr)iD-gnatmprhdiYlbeaA(n ftrTsli,ae(+gnrmdisc)phroielctn)osrelated, and - implies unrelated. There were 500 prime-target and controltarget pairs classified into five classes. Depending on the class, the prime is related to the target 125 either in terms of morphology, semantics, orthography and/or Phonology (See Table 1). The experiments were conducted on 24 highly educated native Bangla speakers. Nineteen of them have a graduate degree and five hold a post graduate degree. The age of the subjects varies between 22 to 35 years. RReessuullttss:: The RTs with extreme values and incorrect decisions were excluded from the data. The data has been analyzed using two ways ANOVA with three factors: priming (prime and control), conditions (five classes) and prime durations (three different SOA). We observe strong priming effects (p<0.05) when the target word is morphologically derived and has a recognizable suffix, semantically and orthographically related with respect to the prime; no priming effects are observed when the prime and target words are orthographically related but share no morphological or semantic relationship; although not statistically significant (p>0.07), but weak priming is observed for prime target pairs that are only semantically related. We see no significant difference between the prime and control RTs for other classes. We also looked at the RTs for each of the 500 target words. We observe that maximum priming occurs for words in [M+S+O+](69%), some priming is evident in [M+S+O-](51%) and [M'+S-O+](48%), but for most of the words in [M-S+O-](86%) and [M-S-O+](92%) no priming effect was observed. 3. . 22 FFrreeqquueennccyy DDiissttrriibbuuttiioonn MMooddeellss ooff MMoo rrpphhoo-llooggiiccaall PPrroocceessssiinngg From the above results we saw that not all polymorphemic words tend to decompose during processing, thus we need to further investigate the processing phenomena of Bangla derived words. One notable means is to identify whether the stem or suffix frequency is involved in the processing stage of that word. For this, we apply different frequency based models to the Bangla polymorphemic words and try to evaluate their performance by comparing their predicted results with the result obtained through the priming experiment. MMooddeell --11:: BBaassee aanndd SSuurrffaaccee wwoorrdd ffrreeqquueennccyy ee ff-ffeecctt -- It states that the probability of decomposition of a Bangla polymorphemic word depends upon the frequency of its base word. Thus, if the stem frequency of a polymorphemic word crosses a given threshold value, then the word will decomposed into its constituent morpheme. Similar claim has been made for surface word frequency model where decomposition depends upon the frequency of the surface word itself. We have evaluated both the models with the 500 words used in the priming experiments discussed above. We have achieved an accuracy of 62% and 49% respectively for base and surface word frequency models. MMooddeell --22:: CCoommbbiinniinngg tthhee bbaassee aanndd ssuurrffaaccee wwoorrdd ffrreeq quueennccyy -- In a pursuit towards an extended model, we combine model 1 and 2 together. We took the log frequencies of both the base and the derived words and plotted the best-fit regression curve over the given dataset. The evaluation of this model over the same set of 500 target words returns an accuracy of 68% which is better than the base and surface word frequency models. However, the proposed model still fails to predict processing of around 32% of words. This led us to further enhance the model. For this, we analyze the role of suffixes in morphological processing. MMooddeell -- 33:: DDeeggrreeee ooff AAffffiixxaattiioonn aanndd SSuuffffiixx PPrroodd-uuccttiivviittyy:: we examine whether the regression analysis between base and derived frequency of Bangla words varies between suffixes and how these variations affect morphological decomposition. With respect to this, we try to compute the degree of affixation between the suffix and the base word. For this, we perform regression analysis on sixteen different Bangla suffixes with varying degree of type and token frequencies. For each suffix, we choose 100 different derived words. We observe that those suffixes having high value of intercept are forming derived words whose base frequencies are substantially high as compared to their derived forms. Moreover we also observe that high intercept value for a given suffix indicates higher inclination towards decomposition. Next, we try to analyze the role of suffix type/token ratio and compare them with the base/derived frequency ratio model. This has been done by regression analysis between the suffix type-token ratios with the base-surface frequency ratio. We further tried to observe the role of suffix productivity in morphological processing. For this, we computed the three components of productivity P, P* and V as discussed in (Hay and Plag, 2004). P is the “conditioned degree of productivity” and is the probability that we are encountering a word with an affix and it is representing a new type. P* is the “hapaxedconditioned degree of productivity”. It expresses the probability that when an entirely new word is 126 encountered it will contain the suffix. V is the “type frequency”. Finally, we computed the productivity of a suffix through its P, P* and V values. We found that decomposition of Bangla polymorphemic word is directly proportional to the productivity of the suffix. Therefore, words that are composed of productive suffixes (P value ranges between 0.6 and 0.9) like “-oYAlA”, “-giri”, “-tba” and “-panA” are highly decomposable than low productive suffixes like “-Ani”, “-lA”, “-k”, and “-tama”. The evaluation of the proposed model returns an accuracy of 76% which comes to be 8% better than the preceding models. CCoommbbiinniinngg MMooddeell --22 aanndd MMooddeell -- 33:: One important observation that can be made from the above results is that, model-3 performs best in determining the true negative values. It also possesses a high recall value of (85%) but having a low precision of (50%). In other words, the model can predict those words for which decomposition will not take place. On the other hand, results of Model-2 posses a high precision of 70%. Thus, we argue that combining the above two models can better predict the decomposition of Bangla polymorphemic words. Hence, we combine the two models together and finally achieved an overall accuracy of 80% with a precision of 87% and a recall of 78%. This surpasses the performance of the other models discussed earlier. However, around 22% of the test words were wrongly classified which the model fails to justify. Thus, a more rigorous set of experiments and data analysis are required to predict access mechanisms of such Bangla polymorphemic words. 3. . 33 SStteemm- -SSuuffffiixx CCoommppoossiittiioonnaalliittyy Compositionality refers to the fact that meaning of a complex expression is inferred from the meaning of its constituents. Therefore, the cost of retrieving a word from the secondary memory is directly proportional to the cost of retrieving the individual parts (i.e the stem and the suffix). Thus, following the work of (Milin et al., 2009) we define the compositionality of a morphologically complex word (We) as: C(We)=α 1H(We)+α α2H(e)+α α3H(W|e)+ α4H(e|W) Where, H(x) is entropy of an expression x, H(W|e) is the conditional entropy between the stem W and suffix e and is the proportionality factor whose value is computed through regression analysis. Next, we tried to compute the compositionality of the stem and suffixes in terms of relative entropy D(W||e) and Point wise mutual information (PMI). The relative entropy is the measure of the distance between the probability distribution of the stem W and the suffix e. The PMI measures the amount of information that one random variable (the stem) contains about the other (the suffix). We have compared the above three techniques with the actual reaction time data collected through the priming and lexical decision experiment. We observed that all the three information theoretic models perform much better than the frequency based models discussed in the earlier section, for predicting the decomposability of Bangla polymorphemic words. However, we think it is still premature to claim anything concrete at this stage of our work. We believe much more rigorous experiments are needed to be per- formed in order to validate our proposed models. Further, the present paper does not consider factors related to age of acquisition, and word familiarity effects that plays important role in the processing of morphologically complex words. Moreover, it is also very interesting to see how stacking of multiple suffixes in a word are processed by the human brain. 44 OOrrggaanniizzaattiioonn aanndd PPrroocceessssiinngg ooff CCoomm-ppoouunndd VVeerrbbss iinn tthhee MMeennttaall LLeexxiiccoonn Compound verbs, as discussed above, are special type of verb sequences consisting of two or more verbs acting as a single verb and express a single expression of meaning. The verb V1 is known as pole and V2 is called as vector. For example, “ওঠে পড়া ” (getting up) is a compound verb where individual words do not entirely reflects the meaning of the whole expression. However, not all V1+V2 combinations are CVs. For example, expressions like, “নিঠে য়াও ”(take and then go) and “ নিঠে আঠ ়া” (return back) are the examples of verb sequences where meaning of the whole expression can be derived from the mean- ing of the individual component and thus, these verb sequences are not considered as CV. The key question linguists are trying to identify for a long time and debating a lot is whether to consider CVs as a single lexical units or consider them as two separate units. Since linguistic rules fails to explain the process, we for the first time tried to perform cognitive experiments to understand the organization and processing of such verb sequences in the human mind. A clear understanding about these phenomena may help us to classify or extract actual CVs from other verb 127 sequences. In order to do so, presently we have applied three different techniques to collect user data. In the first technique, we annotated 4500 V1+V2 sequences, along with their example sentences, using a group of three linguists (the expert subjects). We asked the experts to classify the verb sequences into three classes namely, CV, not a CV and not sure. Each linguist has received 2000 verb pairs along with their respective example sentences. Out of this, 1500 verb sequences are unique to each of them and rest 500 are overlapping. We measure the inter annotator agreement using the Fleiss Kappa (Fleiss et al., 1981) measure (κ) where the agreement lies around 0.79. Next, out of the 500 common verb sequences that were annotated by all the three linguists, we randomly choose 300 V1+V2 pairs and presented them to 36 native Bangla speakers. We ask each subjects to give a compositionality score of each verb sequences under 1-10 point scale, 10 being highly compositional and 1 for noncompositional. We found an agreement of κ=0.69 among the subjects. We also observe a continuum of compositionality score among the verb sequences. This reflects that it is difficult to classify Bangla verb sequences discretely into the classes of CV and not a CV. We then, compare the compositionality score with that of the expert user’s annotation. We found a significant correlation between the expert annotation and the compositionality score. We observe verb sequences that are annotated as CVs (like, খেঠে খিল )কঠে খি ,ওঠে পড ,have got low compositionality score (average score ranges between 1-4) on the other hand high compositional values are in general tagged as not a cv (নিঠে য়া (come and get), নিঠে আে (return back), তুঠল খেঠেনি (kept), গনিঠে পিল (roll on floor)). This reflects that verb sequences which are not CV shows high degree of compositionality. In other words non CV verbs can directly interpret from their constituent verbs. This leads us to the possibility that compositional verb sequences requires individual verbs to be recognized separately and thus the time to recognize such expressions must be greater than the non-compositional verbs which maps to a single expression of meaning. In order to validate such claim we perform a lexical decision experiment using 32 native Bangla speakers with 92 different verb sequences. We followed the same experimental procedure as discussed in (Taft, 2004) for English polymorphemic words. However, rather than derived words, the subjects were shown a verb sequence and asked whether they recognize them as a valid combination. The reaction time (RT) of each subject is recorded. Our preliminarily observation from the RT analysis shows that as per our claim, RT of verb sequences having high compositionality value is significantly higher than the RTs for low or noncompositional verbs. This proves our hypothesis that Bangla compound verbs that show less compositionality are stored as a hole in the mental lexicon and thus follows the full-listing model whereas compositional verb phrases are individually parsed. However, we do believe that our experiment is composed of a very small set of data and it is premature to conclude anything concrete based only on the current experimental results. 5 FFuuttuurree DDiirreeccttiioonnss In the next phase of our work we will focus on the following aspects of Bangla morphologically complex words: TThhee WWoorrdd FFaammiilliiaarriittyy EEffffeecctt:: Here, our aim is to study the role of familiarity of a word during its processing. We define the familiarity of a word in terms of corpus frequency, Age of acquisition, the level of language exposure of a person, and RT of the word etc. RRoollee ooff ssuuffffiixx ttyyppeess iinn mmoorrpphhoollooggiiccaall ddeeccoo mm ppoo-ssiittiioonn:: For native Bangla speakers which morphological suffixes are internalized and which are just learnt in school, but never internalized. We can compare the representation of Native, Sanskrit derived and foreign suffixes in Bangla words. CCoommppuuttaattiioonnaall mmooddeellss ooff oorrggaanniizzaattiioonn aanndd pprroocceessssiinngg ooff BBaannggllaa ccoommppoouunndd vveerrbbss :: presently we have performed some small set of experiments to study processing of compound verbs in the mental lexicon. In the next phase of our work we will extend the existing experiments and also apply some more techniques like, crowd sourcing and language games to collect more relevant RT and compositionality data. Finally, based on the collected data we will develop computational models that can explain the possible organizational structure and processing mechanism of morphologically complex Bangla words in the mental lexicon. Reference Aitchison, J. (1987). ―Words in the mind: An introduction to the mental lexicon‖. Wiley-Blackwell, 128 Baayen R. H. (2000). ―On frequency, transparency and productivity‖. G. Booij and J. van Marle (eds), Yearbook of Morphology, pages 181-208, Baayen R.H. (2003). ―Probabilistic approaches to morphology‖. Probabilistic linguistics, pages 229287. Baayen R.H., T. Dijkstra, and R. Schreuder. (1997). ―Singulars and plurals in dutch: Evidence for a parallel dual-route model‖. Journal of Memory and Language, 37(1):94-1 17. Bashir, E. (1993), ―Causal Chains and Compound Verbs.‖ In M. K. Verma ed. (1993). Bentin, S. & Feldman, L.B. (1990). The contribution of morphological and semantic relatedness to repetition priming at short and long lags: Evidence from Hebrew. Quarterly Journal of Experimental Psychology, 42, pp. 693–71 1. Bradley, D. (1980). Lexical representation of derivational relation, Juncture, Saratoga, CA: Anma Libri, pp. 37-55. Butt, M. (1993), ―Conscious choice and some light verbs in Urdu.‖ In M. K. Verma ed. (1993). Butterworth, B. (1983). Lexical Representation, Language Production, Vol. 2, pp. 257-294, San Diego, CA: Academic Press. Caramazza, A., Laudanna, A. and Romani, C. (1988). Lexical access and inflectional morphology. Cognition, 28, pp. 297-332. Drews, E., and Zwitserlood, P. (1995).Morphological and orthographic similarity in visual word recognition. Journal of Experimental Psychology:HumanPerception andPerformance, 21, 1098– 1116. Fellbaum, C. (ed.). (1998). WordNet: An Electronic Lexical Database, MIT Press. Forster, K.I., and Davis, C. (1984). Repetition priming and frequency attenuation in lexical access. Journal of Experimental Psychology: Learning, Memory, and Cognition, 10, 680–698. Frost, R., Forster, K.I., & Deutsch, A. (1997). What can we learn from the morphology of Hebrew? A masked-priming investigation of morphological representation. Journal of Experimental Psychology: Learning, Memory, and Cognition, 23, 829–856. Grainger, J., Cole, P., & Segui, J. (1991). Masked morphological priming in visual word recognition. Journal of Memory and Language, 30, 370–384. Hook, P. E. (1981). ―Hindi Structures: Intermediate Level.‖ Michigan Papers on South and Southeast Asia, The University of Michigan Center for South and Southeast Studies, Ann Arbor, Michigan. Joseph L Fleiss, Bruce Levin, and Myunghee Cho Paik. 1981. The measurement of interrater agreement. Statistical methods for rates and proportions,2:212–236. MacKay,D.G.(1978), Derivational rules and the internal lexicon. Journal of Verbal Learning and Verbal Behavior, 17, pp.61-71. Marslen-Wilson, W.D., & Tyler, L.K. (1997). Dissociating types of mental computation. Nature, 387, pp. 592–594. Marslen-Wilson, W.D., & Tyler, L.K. (1998). Rules, representations, and the English past tense. Trends in Cognitive Sciences, 2, pp. 428–435. Marslen-Wilson, W.D., Tyler, L.K., Waksler, R., & Older, L. (1994). Morphology and meaning in the English mental lexicon. Psychological Review, 101, pp. 3–33. Marslen-Wilson,W.D. and Zhou,X.( 1999). Abstractness, allomorphy, and lexical architecture. Language and Cognitive Processes, 14, 321–352. Milin, P., Kuperman, V., Kosti´, A. and Harald R., H. (2009). Paradigms bit by bit: an information- theoretic approach to the processing of paradigmatic structure in inflection and derivation, Analogy in grammar: Form and acquisition, pp: 214— 252. Pandharipande, R. (1993). ―Serial verb construction in Marathi.‖ In M. K. Verma ed. (1993). Paul, S. (2004). An HPSG Account of Bangla Compound Verbs with LKB Implementation, Ph.D. Dissertation. CALT, University of Hyderabad. Pulvermüller, F. (2002). The Neuroscience guage. Cambridge University Press. of Lan- Stolz, J.A., and Feldman, L.B. (1995). The role of orthographic and semantic transparency of the base morpheme in morphological processing. In L.B. Feldman (Ed.) Morphological aspects of language processing. Hillsdale, NJ: Lawrence Erlbaum Associates Inc. Taft, M., and Forster, K.I.(1975). Lexical storage and retrieval of prefix words. Journal of Verbal Learning and Verbal Behavior, Vol.14, pp. 638-647. Taft, M.(1988). A morphological decomposition model of lexical access. Linguistics, 26, pp. 657667. Taft, M. (2004). Morphological decomposition and the reverse base frequency effect. Quarterly Journal of Experimental Psychology, 57A, pp. 745-765 Tulving, E., Schacter D. L., and Heather A.(1982). Priming Effects in Word Fragment Completion are independent of Recognition Memory. Journal of Experimental Psychology: Learning, Memory and Cognition, vol.8 (4). 129

6 0.58824068 8 acl-2013-A Learner Corpus-based Approach to Verb Suggestion for ESL

7 0.58256739 149 acl-2013-Exploring Word Order Universals: a Probabilistic Graphical Model Approach

8 0.58009076 344 acl-2013-The Effects of Lexical Resource Quality on Preference Violation Detection

9 0.5773536 186 acl-2013-Identifying English and Hungarian Light Verb Constructions: A Contrastive Approach

10 0.51151752 224 acl-2013-Learning to Extract International Relations from Political Context

11 0.50244182 58 acl-2013-Automated Collocation Suggestion for Japanese Second Language Learners

12 0.49185181 349 acl-2013-The mathematics of language learning

13 0.48549992 310 acl-2013-Semantic Frames to Predict Stock Price Movement

14 0.47344118 29 acl-2013-A Visual Analytics System for Cluster Exploration

15 0.45598793 175 acl-2013-Grounded Language Learning from Video Described with Sentences

16 0.45563546 231 acl-2013-Linggle: a Web-scale Linguistic Search Engine for Words in Context

17 0.44921193 122 acl-2013-Discriminative Approach to Fill-in-the-Blank Quiz Generation for Language Learners

18 0.43397644 378 acl-2013-Using subcategorization knowledge to improve case prediction for translation to German

19 0.4230983 371 acl-2013-Unsupervised joke generation from big data

20 0.41794404 220 acl-2013-Learning Latent Personas of Film Characters

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.047), (6, 0.043), (11, 0.083), (14, 0.011), (15, 0.014), (24, 0.051), (26, 0.044), (35, 0.078), (42, 0.067), (48, 0.043), (70, 0.05), (88, 0.02), (90, 0.031), (92, 0.274), (95, 0.047)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.84891003 42 acl-2013-Aid is Out There: Looking for Help from Tweets during a Large Scale Disaster

Author: Istvan Varga ; Motoki Sano ; Kentaro Torisawa ; Chikara Hashimoto ; Kiyonori Ohtake ; Takao Kawai ; Jong-Hoon Oh ; Stijn De Saeger

Abstract: The 2011 Great East Japan Earthquake caused a wide range of problems, and as countermeasures, many aid activities were carried out. Many of these problems and aid activities were reported via Twitter. However, most problem reports and corresponding aid messages were not successfully exchanged between victims and local governments or humanitarian organizations, overwhelmed by the vast amount of information. As a result, victims could not receive necessary aid and humanitarian organizations wasted resources on redundant efforts. In this paper, we propose a method for discovering matches between problem reports and aid messages. Our system contributes to problem-solving in a large scale disaster situation by facilitating communication between victims and humanitarian organizations.

same-paper 2 0.76335394 213 acl-2013-Language Acquisition and Probabilistic Models: keeping it simple

Author: Aline Villavicencio ; Marco Idiart ; Robert Berwick ; Igor Malioutov

3 0.63906097 25 acl-2013-A Tightly-coupled Unsupervised Clustering and Bilingual Alignment Model for Transliteration

Author: Tingting Li ; Tiejun Zhao ; Andrew Finch ; Chunyue Zhang

Abstract: Machine Transliteration is an essential task for many NLP applications. However, names and loan words typically originate from various languages, obey different transliteration rules, and therefore may benefit from being modeled independently. Recently, transliteration models based on Bayesian learning have overcome issues with over-fitting allowing for many-to-many alignment in the training of transliteration models. We propose a novel coupled Dirichlet process mixture model (cDPMM) that simultaneously clusters and bilingually aligns transliteration data within a single unified model. The unified model decomposes into two classes of non-parametric Bayesian component models: a Dirichlet process mixture model for clustering, and a set of multinomial Dirichlet process models that perform bilingual alignment independently for each cluster. The experimental results show that our method considerably outperforms conventional alignment models.

4 0.58954883 387 acl-2013-Why-Question Answering using Intra- and Inter-Sentential Causal Relations

Author: Jong-Hoon Oh ; Kentaro Torisawa ; Chikara Hashimoto ; Motoki Sano ; Stijn De Saeger ; Kiyonori Ohtake

Abstract: In this paper, we explore the utility of intra- and inter-sentential causal relations between terms or clauses as evidence for answering why-questions. To the best of our knowledge, this is the first work that uses both intra- and inter-sentential causal relations for why-QA. We also propose a method for assessing the appropriateness of causal relations as answers to a given question using the semantic orientation of excitation proposed by Hashimoto et al. (2012). By applying these ideas to Japanese why-QA, we improved precision by 4.4% against all the questions in our test set over the current state-of-theart system for Japanese why-QA. In addi- tion, unlike the state-of-the-art system, our system could achieve very high precision (83.2%) for 25% of all the questions in the test set by restricting its output to the confident answers only.

5 0.53434932 83 acl-2013-Collective Annotation of Linguistic Resources: Basic Principles and a Formal Model

Author: Ulle Endriss ; Raquel Fernandez

Abstract: Crowdsourcing, which offers new ways of cheaply and quickly gathering large amounts of information contributed by volunteers online, has revolutionised the collection of labelled data. Yet, to create annotated linguistic resources from this data, we face the challenge of having to combine the judgements of a potentially large group of annotators. In this paper we investigate how to aggregate individual annotations into a single collective annotation, taking inspiration from the field of social choice theory. We formulate a general formal model for collective annotation and propose several aggregation methods that go beyond the commonly used majority rule. We test some of our methods on data from a crowdsourcing experiment on textual entailment annotation.

6 0.53400129 132 acl-2013-Easy-First POS Tagging and Dependency Parsing with Beam Search

7 0.53300601 155 acl-2013-Fast and Accurate Shift-Reduce Constituent Parsing

8 0.5314855 225 acl-2013-Learning to Order Natural Language Texts

9 0.53073478 275 acl-2013-Parsing with Compositional Vector Grammars

10 0.52988648 358 acl-2013-Transition-based Dependency Parsing with Selectional Branching

11 0.5297839 169 acl-2013-Generating Synthetic Comparable Questions for News Articles

12 0.5281778 56 acl-2013-Argument Inference from Relevant Event Mentions in Chinese Argument Extraction

13 0.52799177 318 acl-2013-Sentiment Relevance

14 0.52767974 85 acl-2013-Combining Intra- and Multi-sentential Rhetorical Parsing for Document-level Discourse Analysis

15 0.52722293 82 acl-2013-Co-regularizing character-based and word-based models for semi-supervised Chinese word segmentation

16 0.52627426 185 acl-2013-Identifying Bad Semantic Neighbors for Improving Distributional Thesauri

17 0.52593231 123 acl-2013-Discriminative Learning with Natural Annotations: Word Segmentation as a Case Study

18 0.52518046 57 acl-2013-Arguments and Modifiers from the Learner's Perspective

19 0.52506745 343 acl-2013-The Effect of Higher-Order Dependency Features in Discriminative Phrase-Structure Parsing

20 0.52475029 212 acl-2013-Language-Independent Discriminative Parsing of Temporal Expressions