nips nips2009 nips2009-194 knowledge-graph by maker-knowledge-mining

194 nips-2009-Predicting the Optimal Spacing of Study: A Multiscale Context Model of Memory


Source: pdf

Author: Harold Pashler, Nicholas Cepeda, Robert Lindsey, Ed Vul, Michael C. Mozer

Abstract: When individuals learn facts (e.g., foreign language vocabulary) over multiple study sessions, the temporal spacing of study has a significant impact on memory retention. Behavioral experiments have shown a nonmonotonic relationship between spacing and retention: short or long intervals between study sessions yield lower cued-recall accuracy than intermediate intervals. Appropriate spacing of study can double retention on educationally relevant time scales. We introduce a Multiscale Context Model (MCM) that is able to predict the influence of a particular study schedule on retention for specific material. MCM’s prediction is based on empirical data characterizing forgetting of the material following a single study session. MCM is a synthesis of two existing memory models (Staddon, Chelaru, & Higa, 2002; Raaijmakers, 2003). On the surface, these models are unrelated and incompatible, but we show they share a core feature that allows them to be integrated. MCM can determine study schedules that maximize the durability of learning, and has implications for education and training. MCM can be cast either as a neural network with inputs that fluctuate over time, or as a cascade of leaky integrators. MCM is intriguingly similar to a Bayesian multiscale model of memory (Kording, Tenenbaum, & Shadmehr, 2007), yet MCM is better able to account for human declarative memory. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 , foreign language vocabulary) over multiple study sessions, the temporal spacing of study has a significant impact on memory retention. [sent-8, score-0.627]

2 Behavioral experiments have shown a nonmonotonic relationship between spacing and retention: short or long intervals between study sessions yield lower cued-recall accuracy than intermediate intervals. [sent-9, score-0.449]

3 Appropriate spacing of study can double retention on educationally relevant time scales. [sent-10, score-0.517]

4 MCM’s prediction is based on empirical data characterizing forgetting of the material following a single study session. [sent-12, score-0.272]

5 MCM is intriguingly similar to a Bayesian multiscale model of memory (Kording, Tenenbaum, & Shadmehr, 2007), yet MCM is better able to account for human declarative memory. [sent-17, score-0.263]

6 This advice is based on a memory phenomenon known as the distributed practice or spacing effect (Cepeda, Pashler, Vul, Wixted, & Rohrer, 2006). [sent-20, score-0.497]

7 The spacing effect is typically studied via a controlled experimental paradigm in which participants are asked to study unfamiliar paired associates (e. [sent-21, score-0.49]

8 ” The lag between second session and the test is known as the retention interval or RI. [sent-28, score-0.18]

9 The solid line of Figure 1a sketches this curve, which we will refer to as the spacing function. [sent-30, score-0.348]

10 The left edge of the graph corresponds to massed practice, when session two immediately follows session one. [sent-31, score-0.18]

11 (2006) sug1 forgetting function spacing function (b) m2 % recall (a) m3 m4 . [sent-34, score-0.542]

12 pool 1 pool 2 pool 3 pool 4 pool N ISI Figure 1: (a) The spacing function (solid line) depicts recall at test following two study sessions separated by a given ISI; the forgetting function (dashed line) depicts recall as a function of the lag between study and test. [sent-40, score-1.225]

13 For educationally relevant RIs on the order of weeks and months, the effect of spacing can be tremendous: optimal spacing can double retention over massed practice (Cepeda et al. [sent-44, score-0.808]

14 The spacing function is related to another observable measure of retention, the forgetting function, which characterizes recall accuracy following a single study session as a function of the lag between study and test. [sent-46, score-0.82]

15 For example, suppose participants in the experiment described above learned material in study session 1, and were then tested on the material immediately prior to study session 2. [sent-47, score-0.41]

16 Typical forgetting functions follow a generalized power-law decay, of the form P (recall) = A(1 + Bt)−C , where A, B, and C are constants, and t is the study-test lag (Wixted & Carpenter, 2007). [sent-50, score-0.208]

17 Our goal is to develop a model of long-term memory that characterizes the memory-trace strength of items learned over two or more sessions. [sent-51, score-0.197]

18 The model predicts recall accuracy as a function of the RI, taking into account the study schedule—the ISI or set of ISIs determining the spacing of study sessions. [sent-52, score-0.561]

19 The spacing effect is among the best known phenomena in cognitive psychology, and many theoretical explanations have been suggested. [sent-54, score-0.351]

20 Two well developed computational models of human memory have been elaborated to explain the spacing effect (Pavlik & Anderson, 2005; Raaijmakers, 2003). [sent-55, score-0.549]

21 These models are necessarily complex: the brain contains multiple, interacting memory systems whose decay and interference characteristics depend on the specific content being stored and its relationship to other content. [sent-56, score-0.23]

22 Consequently, these computational theories are fairly flexible and can provide reasonable post-hoc fits to spacing effect data, but we question their predictive value. [sent-57, score-0.412]

23 Rather than developing a general theory of memory, we introduce a model that specifically predicts the shape of the spacing function. [sent-58, score-0.358]

24 Because the spacing function depends not only on the RI, but also on the nature of the material being learned, and the manner and amount of study, the model requires empirical constraints. [sent-59, score-0.363]

25 We propose a novel approach to obtaining a predictive model: we collect behavioral data to determine the forgetting function for the specific material being learned. [sent-60, score-0.216]

26 We then use the forgetting function, which is based on a single study session, to predict the spacing function, which is based on two or more study sessions. [sent-61, score-0.643]

27 2 Accounts of the spacing effect We review two existing theories proposed to explain the spacing effect, and then propose a synthesis of these theories. [sent-65, score-0.793]

28 However, after introducing our model and showing its predictive power, we discuss an intriguingly similar Bayesian theory of memory adaptation (Kording et al. [sent-68, score-0.186]

29 1 Encoding-variability theories One class of theories proposed to explain the spacing effect focuses on the notion of encoding variability. [sent-72, score-0.503]

30 According to these theories, when an item is studied, a memory trace is formed that incorporates the current psychological context. [sent-73, score-0.378]

31 Retrieval of a stored item depends at least in part on the similarity of the contexts at the study and test. [sent-75, score-0.276]

32 If psychological context is assumed to fluctuate randomly over time, two study sessions close together in time will have similar contexts. [sent-76, score-0.23]

33 Consequently, at the time of a recall test, either both study contexts will match the test context or neither will. [sent-77, score-0.208]

34 Increasing the ISI can thus prove advantageous because the test context will have higher likelihood of matching one study context or the other. [sent-78, score-0.19]

35 Greater contextual variation enhances memory on this account by making for less redundancy in the underlying memory traces. [sent-79, score-0.349]

36 Raaijmakers (2003) developed an encoding variability theory by incorporating time-varying contextual drift into the well-known Search of Associative Memory (SAM) model (Raaijmakers & Shiffrin, 1981), and explained a range of data from the spacing literature. [sent-82, score-0.408]

37 The input layer to this neural net is a pool of binary valued neurons that represent the contextual state at the current time; the output layer consists of a set of memory elements, one per item to be stored. [sent-88, score-0.484]

38 To simplify notation throughout this paper, we’ll describe this model and all others in terms of a single-item memory, allowing us to avoid an explicit index term for the item being stored or retrieved. [sent-89, score-0.2]

39 The memory element for the item under consideration has an activation level, m, which is a linear function of the context unit activities: m = j wj cj , where cj is the binary activation level of context unit j and wj is the strength of connection from context j. [sent-90, score-0.725]

40 The probability of retrieval of the item is assumed to be monotonically related to m. [sent-91, score-0.219]

41 When an item is studied, its connection strengths are adjusted according to a Hebbian learning rule with an upper limit on the connection strength: ∆wj = min(1 − wj , cj m), ˆ (2) where m = 1 if the item was just presented for study, or 0 otherwise. [sent-92, score-0.392]

42 When an item is studied, the ˆ weights for all contextual features present at the time of study will be strengthened. [sent-93, score-0.331]

43 Later retrieval is more likely if the context at test matches the context at study: the memory element receives a contribution only when an input is active and its connection strength is nonzero. [sent-94, score-0.381]

44 When an item has been studied twice, retrieval will be more robust if the two study opportunities strengthen different weights, which occurs when the ISI is large and the contextual states do not overlap significantly. [sent-96, score-0.385]

45 After an item has been studied at least once, SAM assumes that the memory trace resulting from further study is influenced by whether the item is accessible to retrieval at the time of study. [sent-98, score-0.701]

46 Other memory models similarly claim that memory traces are weaker if an item is inaccessible to retrieval at the time of study (e. [sent-100, score-0.611]

47 We have described the key components of SAM that explain the spacing effect, but the model has additional complexity, including a short-term memory store, inter-item interference, and additional 3 context based on associativity and explicit cues. [sent-103, score-0.562]

48 , hours), but the same model cannot explain spacing effects on a different time scale (e. [sent-109, score-0.406]

49 2 Predictive-utility theories We now turn to another class of theories that has been proposed to explain the spacing effect. [sent-114, score-0.481]

50 When an item is studied multiple times with a given ISI, the rational analysis suggests that the need probability drops off rapidly following the last study once an interval of time greater than the ISI has passed. [sent-118, score-0.307]

51 In MTS, each item to be stored is represented by a dedicated cascade of N leaky integrators. [sent-123, score-0.284]

52 The activation of integrator i, xi , decays over time according to: xi (t + ∆t) = xi (t) exp(−∆t/τi ), (3) where τi is the decay time constant. [sent-124, score-0.223]

53 The probability of retrieving the item is related to the total k trace strength, sN , where sk = j=1 xj . [sent-125, score-0.203]

54 When an item is repeatedly presented for study with short ISIs, the trace can successfully be represented by the integrators with short time constants, and consequently, the trace will decay rapidly. [sent-131, score-0.537]

55 Increasing the spacing shifts the representation to integrators with slower decay rates. [sent-132, score-0.534]

56 Essentially, we take from SAM the notion of contextual drift and retrieval-dependent update, and from MTS the multiscale representation and the cascaded error-correction memory update, and we obtain a new model which we call the Multiscale Context Model or MCM. [sent-138, score-0.31]

57 MCM can also be described in terms of N leaky integrators, where integrator i has time constant τi and activity scaled by γi . [sent-143, score-0.197]

58 As the reader might infer from our description of SAM and MTS, these parameters characterize memory decay, extending Equation 3 such that the total trace strength at time t is defined as: N sN (t) = γi exp(− i=1 4 t )xi (0). [sent-146, score-0.25]

59 τi If xi (0) = 1 for all i—which is the integrator activity following the first study in MTS—the trace strength as a function of time is a mixture of exponentials. [sent-147, score-0.297]

60 To match the form of human forgetting (Figure 1), this mixture must approximate a power function. [sent-148, score-0.184]

61 Given N and human forgetting data collected in an experiment, we can search for the parameters {µ, ν, ω, ξ} that obtain a least squares fit to the data. [sent-157, score-0.184]

62 Given the human forgetting function function, then, we can completely determine the {τi } and {γi }. [sent-158, score-0.184]

63 1 Casting MCM as a cascade of leaky integrators Assume that—as in MTS—a dedicated set of N leaky integrators hold the memory of each item to be learned. [sent-161, score-0.754]

64 Let xi denote the activity of integrator i associated with the item, and let si be the average strength of the first i integrators, weighted by the {γj } terms: si = 1 Γi i i γj xj , where Γi = j=1 γj . [sent-162, score-0.228]

65 When an item is studied, its integrators receive a boost in activity. [sent-164, score-0.34]

66 Integrator i receives a boost that depends on how close the average strength of the first i integrators is to full strength, i. [sent-165, score-0.217]

67 We adopt the retrieval-dependent update assumption of SAM, and fix = 1 for an item that is unsuccessfully recalled at the time of study, and = r > 1 for an item that is successfully recalled. [sent-168, score-0.401]

68 (1) MTS weighs all integrators equally when combining the individual integrator activities. [sent-170, score-0.229]

69 (2) MTS provides no guidance in setting the τ and γ constants; MCM constrains these parameters based on the human forgetting function. [sent-172, score-0.184]

70 The context units are binary valued and units in pool i flip with time constant τi . [sent-180, score-0.231]

71 ) 5 As depicted in Figure 1b, the model also includes a set of N memory elements for each item to be learned. [sent-183, score-0.32]

72 Activation of memory element i, denoted mi , indicates strength of retrieval for the item based on context pools 1. [sent-185, score-0.562]

73 The activation function is cascaded such that memory element i receives input from context units in pool i as well as memory element i − 1: mi = mi−1 + wij cij + b, j where wij is the connection weight from context unit j to memory element i, m0 ≡ 0, and b = −β/(1 − β) is a bias weight. [sent-189, score-0.901]

74 The bias simply serves to offset spurious activity reaching the memory elements, activity that is unrelated to the fact that the item was previously studied and stored. [sent-190, score-0.455]

75 The probability of recalling the item is related to the activity of memory element N : P (recall) = min(1, mN ). [sent-192, score-0.38]

76 When the item is studied, the weights from context units in pool i are adjusted according to an update rule that performs gradient descent in an error measure Ei = ei 2 , where ei = 1 − mi /Γi . [sent-193, score-0.472]

77 This error is minimized when the memory element i reaches activation level Γi (defined earlier as the proportion of units in the entire context pool that contributes to activity at stage i). [sent-194, score-0.412]

78 (2) SAM’s memory update rule can be interpreted as Hebbian learning; MCM’s update can be interpreted as error-correction learning. [sent-203, score-0.204]

79 3 Relating leaky integrator and neural net characterizations of MCM To make contact with MTS, we have described MCM as a cascade of leaky integrators, and to make contact with SAM, we have described MCM as a neural net. [sent-205, score-0.251]

80 , in press) have recently conducted well-controlled experimental manipulations of spacing involving RIs on educationally relevant time scales of days to months. [sent-209, score-0.46]

81 Most research in the spacing literature involves brief RIs, on the scale of minutes to an hour, and methodological concerns have been raised with the few well-known studies involving longer RIs (Cepeda et al. [sent-210, score-0.349]

82 Recall accuracy at the start of the second session provides the basic forgetting function, and recall accuracy at test provides the spacing function. [sent-216, score-0.622]

83 In panel (e), the peaks of the model’s spacing functions are indicated by the triangle pointers. [sent-222, score-0.349]

84 These four model parameters determine the time constants and weighting coefficients of the mixture-of-exponentials approximation to the forgetting function (Equation 5). [sent-224, score-0.186]

85 The model has only one other free parameter, r , the magnitude of update on a trial when an item is successfully recalled (see Equation 6). [sent-225, score-0.203]

86 With r , MCM is fully constrained and can make strong predictions regarding the spacing function. [sent-227, score-0.351]

87 For each experiment, MCM’s prediction of the peak of the spacing function is entirely consistent with the data, and for the most part, MCM’s quantiative predictions are excellent. [sent-231, score-0.394]

88 (It would be extremely surprising to psychologists if the peak were in general independent of the material, as content effects pervade the memory literature. [sent-236, score-0.212]

89 Each red circle represents a single spacing experiment in which the ISI was varied for a given RI. [sent-241, score-0.329]

90 MCM predicts the spacing functions with absolutely spectacular precision, considering the predictions are fully constrained and parameter free. [sent-249, score-0.38]

91 Moreover, MCM anticipates the peaks of the spacing functions, with the curvature of the peak decreasing with the RI, and the optimal ISI increasing with the RI. [sent-250, score-0.392]

92 In addition to these results, MCM also predicts the probability of recall at test conditional on successful or unsuccessful recall during the test at the start of the second study session. [sent-251, score-0.207]

93 Finally, MCM is able to post-hoc fit classic studies from the spacing literature (for which forgetting functions are not available). [sent-254, score-0.491]

94 5 Discussion MCM’s blind prediction of 7 different spacing functions is remarkable considering that the domain’s complexity (the content, manner and amount of study) is reduced to four parameters, which are fully determined by the forgetting function. [sent-255, score-0.491]

95 The state predicts the appearance of an item in the temporal stream of experience. [sent-264, score-0.203]

96 (Two parameters determine the range of time scales; two specify internal and observation noise levels; and two perform an affine transform from internal memory strength to recall probability. [sent-270, score-0.332]

97 ) In terms of sum-squared error, the model shows a reasonable fit, but the model clearly misses the peaks of the spacing functions, and in fact predicts a peak that is independent of RI. [sent-271, score-0.421]

98 Notably, the KF model is a post-hoc fit to the spacing functions, whereas MCM produces a true prediction of the spacing functions, i. [sent-272, score-0.658]

99 , parameters of MCM are determined without peeking at the spacing function. [sent-274, score-0.329]

100 Practice and forgetting effects on vocabulary memory: An activation-based model of the spacing effect. [sent-331, score-0.533]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('mcm', 0.656), ('spacing', 0.329), ('isi', 0.272), ('cepeda', 0.215), ('mts', 0.192), ('item', 0.174), ('forgetting', 0.162), ('integrators', 0.147), ('memory', 0.146), ('sam', 0.142), ('ris', 0.09), ('integrator', 0.082), ('isis', 0.082), ('session', 0.08), ('pool', 0.078), ('study', 0.076), ('days', 0.073), ('ri', 0.071), ('wixted', 0.069), ('raaijmakers', 0.068), ('kf', 0.064), ('theories', 0.061), ('pashler', 0.059), ('decay', 0.058), ('context', 0.057), ('contextual', 0.057), ('rohrer', 0.057), ('leaky', 0.056), ('multiscale', 0.055), ('retention', 0.054), ('strength', 0.051), ('recall', 0.051), ('lag', 0.046), ('retrieval', 0.045), ('sessions', 0.044), ('peak', 0.043), ('vul', 0.042), ('kording', 0.042), ('pools', 0.04), ('ei', 0.037), ('units', 0.036), ('activity', 0.035), ('activation', 0.035), ('material', 0.034), ('educationally', 0.034), ('pavlik', 0.034), ('staddon', 0.034), ('studied', 0.033), ('unrelated', 0.032), ('explain', 0.03), ('internal', 0.03), ('participants', 0.03), ('si', 0.03), ('cascaded', 0.03), ('schedules', 0.03), ('update', 0.029), ('net', 0.029), ('anderson', 0.029), ('trace', 0.029), ('psychological', 0.029), ('predicts', 0.029), ('cascade', 0.028), ('cij', 0.027), ('stored', 0.026), ('element', 0.025), ('mi', 0.024), ('time', 0.024), ('wj', 0.024), ('effects', 0.023), ('cancelled', 0.023), ('carpenter', 0.023), ('chelaru', 0.023), ('higa', 0.023), ('marr', 0.023), ('milson', 0.023), ('effect', 0.022), ('human', 0.022), ('drift', 0.022), ('synthesis', 0.022), ('wij', 0.022), ('predictions', 0.022), ('consequently', 0.021), ('months', 0.021), ('behavioral', 0.02), ('peaks', 0.02), ('cj', 0.02), ('panels', 0.02), ('declarative', 0.02), ('shiffrin', 0.02), ('habituation', 0.02), ('massed', 0.02), ('shadmehr', 0.02), ('intriguingly', 0.02), ('uctuate', 0.02), ('et', 0.02), ('solid', 0.019), ('facts', 0.019), ('boost', 0.019), ('vocabulary', 0.019), ('shorter', 0.018)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999988 194 nips-2009-Predicting the Optimal Spacing of Study: A Multiscale Context Model of Memory

Author: Harold Pashler, Nicholas Cepeda, Robert Lindsey, Ed Vul, Michael C. Mozer

Abstract: When individuals learn facts (e.g., foreign language vocabulary) over multiple study sessions, the temporal spacing of study has a significant impact on memory retention. Behavioral experiments have shown a nonmonotonic relationship between spacing and retention: short or long intervals between study sessions yield lower cued-recall accuracy than intermediate intervals. Appropriate spacing of study can double retention on educationally relevant time scales. We introduce a Multiscale Context Model (MCM) that is able to predict the influence of a particular study schedule on retention for specific material. MCM’s prediction is based on empirical data characterizing forgetting of the material following a single study session. MCM is a synthesis of two existing memory models (Staddon, Chelaru, & Higa, 2002; Raaijmakers, 2003). On the surface, these models are unrelated and incompatible, but we show they share a core feature that allows them to be integrated. MCM can determine study schedules that maximize the durability of learning, and has implications for education and training. MCM can be cast either as a neural network with inputs that fluctuate over time, or as a cascade of leaky integrators. MCM is intriguingly similar to a Bayesian multiscale model of memory (Kording, Tenenbaum, & Shadmehr, 2007), yet MCM is better able to account for human declarative memory. 1

2 0.099731222 154 nips-2009-Modeling the spacing effect in sequential category learning

Author: Hongjing Lu, Matthew Weiden, Alan L. Yuille

Abstract: We develop a Bayesian sequential model for category learning. The sequential model updates two category parameters, the mean and the variance, over time. We define conjugate temporal priors to enable closed form solutions to be obtained. This model can be easily extended to supervised and unsupervised learning involving multiple categories. To model the spacing effect, we introduce a generic prior in the temporal updating stage to capture a learning preference, namely, less change for repetition and more change for variation. Finally, we show how this approach can be generalized to efficiently perform model selection to decide whether observations are from one or multiple categories.

3 0.087896876 4 nips-2009-A Bayesian Analysis of Dynamics in Free Recall

Author: Richard Socher, Samuel Gershman, Per Sederberg, Kenneth Norman, Adler J. Perotte, David M. Blei

Abstract: We develop a probabilistic model of human memory performance in free recall experiments. In these experiments, a subject first studies a list of words and then tries to recall them. To model these data, we draw on both previous psychological research and statistical topic models of text documents. We assume that memories are formed by assimilating the semantic meaning of studied words (represented as a distribution over topics) into a slowly changing latent context (represented in the same space). During recall, this context is reinstated and used as a cue for retrieving studied words. By conceptualizing memory retrieval as a dynamic latent variable model, we are able to use Bayesian inference to represent uncertainty and reason about the cognitive processes underlying memory. We present a particle filter algorithm for performing approximate posterior inference, and evaluate our model on the prediction of recalled words in experimental data. By specifying the model hierarchically, we are also able to capture inter-subject variability. 1

4 0.062605232 25 nips-2009-Adaptive Design Optimization in Experiments with People

Author: Daniel Cavagnaro, Jay Myung, Mark A. Pitt

Abstract: In cognitive science, empirical data collected from participants are the arbiters in model selection. Model discrimination thus depends on designing maximally informative experiments. It has been shown that adaptive design optimization (ADO) allows one to discriminate models as efficiently as possible in simulation experiments. In this paper we use ADO in a series of experiments with people to discriminate the Power, Exponential, and Hyperbolic models of memory retention, which has been a long-standing problem in cognitive science, providing an ideal setting in which to test the application of ADO for addressing questions about human cognition. Using an optimality criterion based on mutual information, ADO is able to find designs that are maximally likely to increase our certainty about the true model upon observation of the experiment outcomes. Results demonstrate the usefulness of ADO and also reveal some challenges in its implementation. 1

5 0.052705847 109 nips-2009-Hierarchical Learning of Dimensional Biases in Human Categorization

Author: Adam Sanborn, Nick Chater, Katherine A. Heller

Abstract: Existing models of categorization typically represent to-be-classified items as points in a multidimensional space. While from a mathematical point of view, an infinite number of basis sets can be used to represent points in this space, the choice of basis set is psychologically crucial. People generally choose the same basis dimensions – and have a strong preference to generalize along the axes of these dimensions, but not “diagonally”. What makes some choices of dimension special? We explore the idea that the dimensions used by people echo the natural variation in the environment. Specifically, we present a rational model that does not assume dimensions, but learns the same type of dimensional generalizations that people display. This bias is shaped by exposing the model to many categories with a structure hypothesized to be like those which children encounter. The learning behaviour of the model captures the developmental shift from roughly “isotropic” for children to the axis-aligned generalization that adults show. 1

6 0.050895404 85 nips-2009-Explaining human multiple object tracking as resource-constrained approximate inference in a dynamic probabilistic model

7 0.042605687 13 nips-2009-A Neural Implementation of the Kalman Filter

8 0.041848898 112 nips-2009-Human Rademacher Complexity

9 0.036975052 40 nips-2009-Bayesian Nonparametric Models on Decomposable Graphs

10 0.034236733 196 nips-2009-Quantification and the language of thought

11 0.031978715 188 nips-2009-Perceptual Multistability as Markov Chain Monte Carlo Inference

12 0.031843845 234 nips-2009-Streaming k-means approximation

13 0.030963682 99 nips-2009-Functional network reorganization in motor cortex can be explained by reward-modulated Hebbian learning

14 0.03023494 244 nips-2009-The Wisdom of Crowds in the Recollection of Order Information

15 0.030197242 38 nips-2009-Augmenting Feature-driven fMRI Analyses: Semi-supervised learning and resting state activity

16 0.030151876 216 nips-2009-Sequential effects reflect parallel learning of multiple environmental regularities

17 0.027749419 70 nips-2009-Discriminative Network Models of Schizophrenia

18 0.026728114 84 nips-2009-Evaluating multi-class learning strategies in a generative hierarchical framework for object detection

19 0.026667677 219 nips-2009-Slow, Decorrelated Features for Pretraining Complex Cell-like Networks

20 0.02661187 237 nips-2009-Subject independent EEG-based BCI decoding


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.096), (1, -0.044), (2, 0.023), (3, -0.024), (4, 0.026), (5, 0.016), (6, -0.011), (7, 0.001), (8, -0.05), (9, -0.006), (10, 0.015), (11, -0.052), (12, 0.01), (13, -0.056), (14, 0.079), (15, 0.013), (16, -0.03), (17, 0.062), (18, -0.126), (19, 0.058), (20, -0.009), (21, -0.021), (22, 0.065), (23, 0.007), (24, -0.052), (25, -0.014), (26, -0.003), (27, 0.018), (28, -0.049), (29, -0.017), (30, -0.014), (31, 0.003), (32, 0.029), (33, 0.036), (34, -0.018), (35, -0.028), (36, 0.081), (37, -0.003), (38, -0.062), (39, 0.001), (40, 0.012), (41, 0.02), (42, 0.042), (43, 0.061), (44, 0.038), (45, 0.063), (46, -0.01), (47, -0.024), (48, 0.038), (49, 0.047)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.92312545 194 nips-2009-Predicting the Optimal Spacing of Study: A Multiscale Context Model of Memory

Author: Harold Pashler, Nicholas Cepeda, Robert Lindsey, Ed Vul, Michael C. Mozer

Abstract: When individuals learn facts (e.g., foreign language vocabulary) over multiple study sessions, the temporal spacing of study has a significant impact on memory retention. Behavioral experiments have shown a nonmonotonic relationship between spacing and retention: short or long intervals between study sessions yield lower cued-recall accuracy than intermediate intervals. Appropriate spacing of study can double retention on educationally relevant time scales. We introduce a Multiscale Context Model (MCM) that is able to predict the influence of a particular study schedule on retention for specific material. MCM’s prediction is based on empirical data characterizing forgetting of the material following a single study session. MCM is a synthesis of two existing memory models (Staddon, Chelaru, & Higa, 2002; Raaijmakers, 2003). On the surface, these models are unrelated and incompatible, but we show they share a core feature that allows them to be integrated. MCM can determine study schedules that maximize the durability of learning, and has implications for education and training. MCM can be cast either as a neural network with inputs that fluctuate over time, or as a cascade of leaky integrators. MCM is intriguingly similar to a Bayesian multiscale model of memory (Kording, Tenenbaum, & Shadmehr, 2007), yet MCM is better able to account for human declarative memory. 1

2 0.70290351 25 nips-2009-Adaptive Design Optimization in Experiments with People

Author: Daniel Cavagnaro, Jay Myung, Mark A. Pitt

Abstract: In cognitive science, empirical data collected from participants are the arbiters in model selection. Model discrimination thus depends on designing maximally informative experiments. It has been shown that adaptive design optimization (ADO) allows one to discriminate models as efficiently as possible in simulation experiments. In this paper we use ADO in a series of experiments with people to discriminate the Power, Exponential, and Hyperbolic models of memory retention, which has been a long-standing problem in cognitive science, providing an ideal setting in which to test the application of ADO for addressing questions about human cognition. Using an optimality criterion based on mutual information, ADO is able to find designs that are maximally likely to increase our certainty about the true model upon observation of the experiment outcomes. Results demonstrate the usefulness of ADO and also reveal some challenges in its implementation. 1

3 0.6586135 216 nips-2009-Sequential effects reflect parallel learning of multiple environmental regularities

Author: Matthew Wilder, Matt Jones, Michael C. Mozer

Abstract: Across a wide range of cognitive tasks, recent experience influences behavior. For example, when individuals repeatedly perform a simple two-alternative forcedchoice task (2AFC), response latencies vary dramatically based on the immediately preceding trial sequence. These sequential effects have been interpreted as adaptation to the statistical structure of an uncertain, changing environment (e.g., Jones and Sieck, 2003; Mozer, Kinoshita, and Shettel, 2007; Yu and Cohen, 2008). The Dynamic Belief Model (DBM) (Yu and Cohen, 2008) explains sequential effects in 2AFC tasks as a rational consequence of a dynamic internal representation that tracks second-order statistics of the trial sequence (repetition rates) and predicts whether the upcoming trial will be a repetition or an alternation of the previous trial. Experimental results suggest that first-order statistics (base rates) also influence sequential effects. We propose a model that learns both first- and second-order sequence properties, each according to the basic principles of the DBM but under a unified inferential framework. This model, the Dynamic Belief Mixture Model (DBM2), obtains precise, parsimonious fits to data. Furthermore, the model predicts dissociations in behavioral (Maloney, Martello, Sahm, and Spillmann, 2005) and electrophysiological studies (Jentzsch and Sommer, 2002), supporting the psychological and neurobiological reality of its two components. 1

4 0.65253818 112 nips-2009-Human Rademacher Complexity

Author: Xiaojin Zhu, Bryan R. Gibson, Timothy T. Rogers

Abstract: We propose to use Rademacher complexity, originally developed in computational learning theory, as a measure of human learning capacity. Rademacher complexity measures a learner’s ability to fit random labels, and can be used to bound the learner’s true error based on the observed training sample error. We first review the definition of Rademacher complexity and its generalization bound. We then describe a “learning the noise” procedure to experimentally measure human Rademacher complexities. The results from empirical studies showed that: (i) human Rademacher complexity can be successfully measured, (ii) the complexity depends on the domain and training sample size in intuitive ways, (iii) human learning respects the generalization bounds, (iv) the bounds can be useful in predicting the danger of overfitting in human learning. Finally, we discuss the potential applications of human Rademacher complexity in cognitive science. 1

5 0.65041685 109 nips-2009-Hierarchical Learning of Dimensional Biases in Human Categorization

Author: Adam Sanborn, Nick Chater, Katherine A. Heller

Abstract: Existing models of categorization typically represent to-be-classified items as points in a multidimensional space. While from a mathematical point of view, an infinite number of basis sets can be used to represent points in this space, the choice of basis set is psychologically crucial. People generally choose the same basis dimensions – and have a strong preference to generalize along the axes of these dimensions, but not “diagonally”. What makes some choices of dimension special? We explore the idea that the dimensions used by people echo the natural variation in the environment. Specifically, we present a rational model that does not assume dimensions, but learns the same type of dimensional generalizations that people display. This bias is shaped by exposing the model to many categories with a structure hypothesized to be like those which children encounter. The learning behaviour of the model captures the developmental shift from roughly “isotropic” for children to the axis-aligned generalization that adults show. 1

6 0.64518052 244 nips-2009-The Wisdom of Crowds in the Recollection of Order Information

7 0.63325334 4 nips-2009-A Bayesian Analysis of Dynamics in Free Recall

8 0.5969916 152 nips-2009-Measuring model complexity with the prior predictive

9 0.57500333 115 nips-2009-Individuation, Identification and Object Discovery

10 0.49506569 260 nips-2009-Zero-shot Learning with Semantic Output Codes

11 0.49246913 196 nips-2009-Quantification and the language of thought

12 0.48343349 154 nips-2009-Modeling the spacing effect in sequential category learning

13 0.47902459 21 nips-2009-Abstraction and Relational learning

14 0.44350019 85 nips-2009-Explaining human multiple object tracking as resource-constrained approximate inference in a dynamic probabilistic model

15 0.41847187 188 nips-2009-Perceptual Multistability as Markov Chain Monte Carlo Inference

16 0.41240051 39 nips-2009-Bayesian Belief Polarization

17 0.39909178 233 nips-2009-Streaming Pointwise Mutual Information

18 0.38790148 69 nips-2009-Discrete MDL Predicts in Total Variation

19 0.38646615 59 nips-2009-Construction of Nonparametric Bayesian Models from Parametric Bayes Equations

20 0.37075695 13 nips-2009-A Neural Implementation of the Kalman Filter


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(24, 0.034), (25, 0.064), (35, 0.036), (36, 0.055), (39, 0.065), (58, 0.05), (61, 0.016), (71, 0.096), (81, 0.011), (86, 0.075), (91, 0.019), (93, 0.356)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.78917062 194 nips-2009-Predicting the Optimal Spacing of Study: A Multiscale Context Model of Memory

Author: Harold Pashler, Nicholas Cepeda, Robert Lindsey, Ed Vul, Michael C. Mozer

Abstract: When individuals learn facts (e.g., foreign language vocabulary) over multiple study sessions, the temporal spacing of study has a significant impact on memory retention. Behavioral experiments have shown a nonmonotonic relationship between spacing and retention: short or long intervals between study sessions yield lower cued-recall accuracy than intermediate intervals. Appropriate spacing of study can double retention on educationally relevant time scales. We introduce a Multiscale Context Model (MCM) that is able to predict the influence of a particular study schedule on retention for specific material. MCM’s prediction is based on empirical data characterizing forgetting of the material following a single study session. MCM is a synthesis of two existing memory models (Staddon, Chelaru, & Higa, 2002; Raaijmakers, 2003). On the surface, these models are unrelated and incompatible, but we show they share a core feature that allows them to be integrated. MCM can determine study schedules that maximize the durability of learning, and has implications for education and training. MCM can be cast either as a neural network with inputs that fluctuate over time, or as a cascade of leaky integrators. MCM is intriguingly similar to a Bayesian multiscale model of memory (Kording, Tenenbaum, & Shadmehr, 2007), yet MCM is better able to account for human declarative memory. 1

2 0.61779243 226 nips-2009-Spatial Normalized Gamma Processes

Author: Vinayak Rao, Yee W. Teh

Abstract: Dependent Dirichlet processes (DPs) are dependent sets of random measures, each being marginally DP distributed. They are used in Bayesian nonparametric models when the usual exchangeability assumption does not hold. We propose a simple and general framework to construct dependent DPs by marginalizing and normalizing a single gamma process over an extended space. The result is a set of DPs, each associated with a point in a space such that neighbouring DPs are more dependent. We describe Markov chain Monte Carlo inference involving Gibbs sampling and three different Metropolis-Hastings proposals to speed up convergence. We report an empirical study of convergence on a synthetic dataset and demonstrate an application of the model to topic modeling through time. 1

3 0.42515507 204 nips-2009-Replicated Softmax: an Undirected Topic Model

Author: Geoffrey E. Hinton, Ruslan Salakhutdinov

Abstract: We introduce a two-layer undirected graphical model, called a “Replicated Softmax”, that can be used to model and automatically extract low-dimensional latent semantic representations from a large unstructured collection of documents. We present efficient learning and inference algorithms for this model, and show how a Monte-Carlo based method, Annealed Importance Sampling, can be used to produce an accurate estimate of the log-probability the model assigns to test data. This allows us to demonstrate that the proposed model is able to generalize much better compared to Latent Dirichlet Allocation in terms of both the log-probability of held-out documents and the retrieval accuracy.

4 0.4237344 56 nips-2009-Conditional Neural Fields

Author: Jian Peng, Liefeng Bo, Jinbo Xu

Abstract: Conditional random fields (CRF) are widely used for sequence labeling such as natural language processing and biological sequence analysis. Most CRF models use a linear potential function to represent the relationship between input features and output. However, in many real-world applications such as protein structure prediction and handwriting recognition, the relationship between input features and output is highly complex and nonlinear, which cannot be accurately modeled by a linear function. To model the nonlinear relationship between input and output we propose a new conditional probabilistic graphical model, Conditional Neural Fields (CNF), for sequence labeling. CNF extends CRF by adding one (or possibly more) middle layer between input and output. The middle layer consists of a number of gate functions, each acting as a local neuron or feature extractor to capture the nonlinear relationship between input and output. Therefore, conceptually CNF is much more expressive than CRF. Experiments on two widely-used benchmarks indicate that CNF performs significantly better than a number of popular methods. In particular, CNF is the best among approximately 10 machine learning methods for protein secondary structure prediction and also among a few of the best methods for handwriting recognition.

5 0.42253357 154 nips-2009-Modeling the spacing effect in sequential category learning

Author: Hongjing Lu, Matthew Weiden, Alan L. Yuille

Abstract: We develop a Bayesian sequential model for category learning. The sequential model updates two category parameters, the mean and the variance, over time. We define conjugate temporal priors to enable closed form solutions to be obtained. This model can be easily extended to supervised and unsupervised learning involving multiple categories. To model the spacing effect, we introduce a generic prior in the temporal updating stage to capture a learning preference, namely, less change for repetition and more change for variation. Finally, we show how this approach can be generalized to efficiently perform model selection to decide whether observations are from one or multiple categories.

6 0.4203406 40 nips-2009-Bayesian Nonparametric Models on Decomposable Graphs

7 0.41916379 96 nips-2009-Filtering Abstract Senses From Image Search Results

8 0.41740322 205 nips-2009-Rethinking LDA: Why Priors Matter

9 0.41639265 112 nips-2009-Human Rademacher Complexity

10 0.4146539 260 nips-2009-Zero-shot Learning with Semantic Output Codes

11 0.41464069 155 nips-2009-Modelling Relational Data using Bayesian Clustered Tensor Factorization

12 0.41342238 130 nips-2009-Learning from Multiple Partially Observed Views - an Application to Multilingual Text Categorization

13 0.4133442 28 nips-2009-An Additive Latent Feature Model for Transparent Object Recognition

14 0.41143626 19 nips-2009-A joint maximum-entropy model for binary neural population patterns and continuous signals

15 0.41055763 65 nips-2009-Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process

16 0.40957093 113 nips-2009-Improving Existing Fault Recovery Policies

17 0.40934145 188 nips-2009-Perceptual Multistability as Markov Chain Monte Carlo Inference

18 0.40722033 59 nips-2009-Construction of Nonparametric Bayesian Models from Parametric Bayes Equations

19 0.4067826 107 nips-2009-Help or Hinder: Bayesian Models of Social Goal Inference

20 0.40676695 111 nips-2009-Hierarchical Modeling of Local Image Features through $L p$-Nested Symmetric Distributions