acl acl2011 acl2011-341 acl2011-341-reference knowledge-graph by maker-knowledge-mining

341 acl-2011-Word Maturity: Computational Modeling of Word Knowledge


Source: pdf

Author: Kirill Kireyev ; Thomas K Landauer

Abstract: While computational estimation of difficulty of words in the lexicon is useful in many educational and assessment applications, the concept of scalar word difficulty and current corpus-based methods for its estimation are inadequate. We propose a new paradigm called word meaning maturity which tracks the degree of knowledge of each word at different stages of language learning. We present a computational algorithm for estimating word maturity, based on modeling language acquisition with Latent Semantic Analysis. We demonstrate that the resulting metric not only correlates well with external indicators, but captures deeper semantic effects in language. 1 Motivation It is no surprise that through stages of language learning, different words are learned at different times and are known to different extents. For example, a common word like “dog” is familiar to even a first-grader, whereas a more advanced word like “focal” does not usually enter learners’ vocabulary until much later. Although individual rates of learning words may vary between highand low-performing students, it has been observed that “children [… ] acquire word meanings in roughly the same sequence” (Biemiller, 2008). The aim of this work is to model the degree of knowledge of words at different learning stages. Such a metric would have extremely useful applications in personalized educational technologies, for the purposes of accurate assessment and personalized vocabulary instruction. … 299 .l andaue r } @pear s on .com 2 Rethinking Word Difficulty Previously, related work in education and psychometrics has been concerned with measuring word difficulty or classifying words into different difficulty categories. Examples of such approaches include creation of word lists for targeted vocabulary instruction at various grade levels that were compiled by educational experts, such as Nation (1993) or Biemiller (2008). Such word difficulty assignments are also implicitly present in some readability formulas that estimate difficulty of texts, such as Lexiles (Stenner, 1996), which include a lexical difficulty component based on the frequency of occurrence of words in a representative corpus, on the assumption that word difficulty is inversely correlated to corpus frequency. Additionally, research in psycholinguistics has attempted to outline and measure psycholinguistic dimensions of words such as age-of-acquisition and familiarity, which aim to track when certain words become known and how familiar they appear to an average person. Importantly, all such word difficulty measures can be thought of as functions that assign a single scalar value to each word w: !


reference text

Andrew Biemiller (2008). Words Worth Teaching. Columbus, OH: SRA/McGraw-Hill. John B. Carrol and M. N. White (1973). Age of acquisition norms for 220 picturable nouns. Journal of Verbal Learning & Verbal Behavior, 12, 563-576. Meri Coleman and T.L. Liau (1975). A computer readability formula designed for machine scoring, Journal of Applied Psychology, Vol. 60, pp. 283-284. Ken J. Gilhooly and R. H. Logie (1980). Age of acquisition, imagery, concreteness, familiarity and ambiguity measures for 1944 words. Behaviour Research Methods & Instrumentation, 12, 395-427. Wojtek J. Krzanowski (2000) Principles of Multivariate Analysis: A User’ ’s Perspective (Oxford Statistical Science Series). Oxford University Press, USA. Thomas K Landauer and Susan Dumais (1997). A solution to Plato's problem: The Latent Semantic Analysis Theory of the Acquisition, Induction, and Representation of Knowledge. Psychological Review, 104, pp 211-240. Thomas K Landauer (2002). On the Computation Basis of Learning and Cognition: Arguments from LSA. In N. Ross (Ed.), The Psychology of Learning and Motivation, 41, 43-84. Thomas K Landauer, Danielle S. McNamara, Simon Dennis, and Walter Kintsch (2007). Handbook of Latent Semantic Analysis. Lawrence Erlbaum. Paul Nation (1993). Measuring readiness for simplified material: a test of the first 1,000 words of English. In Simplification: Theory and Application M. L. Tickoo (ed.), RELC Anthology Series 3 1: 193-203. Hans Stadthagen-Gonzalez and C. J. Davis (2006). The Bristol Norms for Age of Acquisition, Imageability and Familiarity. Behavior Research Methods, 38, 598-605. A. Jackson Stenner (1996). Measuring Reading Comprehension with the Lexile Framework. Forth North American Conference on Adolescent/Adult Literacy. Brent Wolter (2001). Comparing the L1 and L2 Mental Lexicon. Studies in Second Language Acquisition. Cambridge University Press. 308