nips nips2009 nips2009-4 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Richard Socher, Samuel Gershman, Per Sederberg, Kenneth Norman, Adler J. Perotte, David M. Blei
Abstract: We develop a probabilistic model of human memory performance in free recall experiments. In these experiments, a subject first studies a list of words and then tries to recall them. To model these data, we draw on both previous psychological research and statistical topic models of text documents. We assume that memories are formed by assimilating the semantic meaning of studied words (represented as a distribution over topics) into a slowly changing latent context (represented in the same space). During recall, this context is reinstated and used as a cue for retrieving studied words. By conceptualizing memory retrieval as a dynamic latent variable model, we are able to use Bayesian inference to represent uncertainty and reason about the cognitive processes underlying memory. We present a particle filter algorithm for performing approximate posterior inference, and evaluate our model on the prediction of recalled words in experimental data. By specifying the model hierarchically, we are also able to capture inter-subject variability. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract We develop a probabilistic model of human memory performance in free recall experiments. [sent-11, score-0.541]
2 In these experiments, a subject first studies a list of words and then tries to recall them. [sent-12, score-0.428]
3 We assume that memories are formed by assimilating the semantic meaning of studied words (represented as a distribution over topics) into a slowly changing latent context (represented in the same space). [sent-14, score-0.929]
4 During recall, this context is reinstated and used as a cue for retrieving studied words. [sent-15, score-0.312]
5 By conceptualizing memory retrieval as a dynamic latent variable model, we are able to use Bayesian inference to represent uncertainty and reason about the cognitive processes underlying memory. [sent-16, score-0.3]
6 We present a particle filter algorithm for performing approximate posterior inference, and evaluate our model on the prediction of recalled words in experimental data. [sent-17, score-0.669]
7 1 Introduction Modern computational models of verbal memory assume that the recall of items is shaped by their semantic representations. [sent-19, score-0.767]
8 TCM was developed to explain the temporal structure of human behavior in free recall experiments, where subjects are presented with lists of words (presented one at a time) and then asked to recall them in any order. [sent-26, score-0.949]
9 TCM posits a slowly changing mental context vector whose evolution is driven by lexical input. [sent-27, score-0.394]
10 At study, words are bound to context states through learning; during recall, context information is used as a cue to probe for stored words. [sent-28, score-0.548]
11 TCM can account for numerous regularities in free recall data, most prominently the finding that subjects tend to consecutively recall items that were studied close in time to one another. [sent-29, score-0.865]
12 ) TCM explains this effect by positing that recalling an item also triggers recall of the context state that was present when the 1 item was studied; subjects can use this retrieved context state to access items that were studied close in time to the just-recalled item. [sent-31, score-1.029]
13 Importantly, temporal structure is not the only organizing principle in free recall data: Semantic relatedness between items also influences the probability of recalling them consecutively [11]. [sent-33, score-0.632]
14 Moreover, subjects often recall semantically-related items that were not presented at study. [sent-34, score-0.387]
15 ) To capture this semantic structure, we will draw on probabilistic topic models of text documents, specifically latent Dirichlet allocation (LDA) [3]. [sent-36, score-0.582]
16 When fit to a corpus, the most probable words of these distributions tend to represent the semantic themes (like “sports” or “chemistry”) that permeate the collection. [sent-38, score-0.533]
17 LDA has been used successfully as a psychological model of semantic representation [7]. [sent-39, score-0.463]
18 We model free recall data by combining the underlying assumptions of TCM with the latent semantic space provided by LDA. [sent-40, score-0.773]
19 Specifically, we reinterpret TCM as a dynamic latent variable model where the mental context vector specifies a distribution over topics. [sent-41, score-0.48]
20 In other words, the human memory component of our model represents the drifting mental context as a sequence of mixtures of topics, in the same way that LDA represents documents. [sent-42, score-0.614]
21 With this representation, the dynamics of the mental context are determined by two factors: the posterior probability over topics given a studied or recalled word (semantic inference) and the retrieval of previous contexts (episodic retrieval). [sent-43, score-1.441]
22 These dynamics let us capture both the episodic and semantic structure of human verbal memory. [sent-44, score-0.818]
23 The work described here goes beyond prior TCM modeling work in two ways: First, our approach allows us to infer the trajectory of the context vector over time, which (in turn) allows us to predict the item-by-item sequence of word recalls; by contrast, previous work (e. [sent-45, score-0.515]
24 1 we present simulation results showing how this model reproduces fundamental behavioral effects in free recall experiments. [sent-53, score-0.492]
25 2 we present inference results for a dataset collected by Sederberg and Norman in which subjects performed free recall of words. [sent-55, score-0.483]
26 In addition to capturing the semantic content of documents, recent psychological work has shown that several aspects of LDA make it attractive as a model of human semantic representation [7]. [sent-73, score-0.825]
27 In our model of memory, the topic proportions χ play the role of a “mental context” that guides memory retrieval by parameterizing a distribution over words to recall. [sent-74, score-0.475]
28 3 Temporal context and memory We now turn to a model of human memory that uses the latent representation of LDA to capture the semantic aspects of recall experiments. [sent-75, score-1.169]
29 Our data consist of two types of observations: a corpus of documents from which we have obtained the word distribution matrix, 1 and behavioral data from free recall experiments, which are studied and recalled words from multiple subjects over multiple runs of the experiment. [sent-76, score-1.444]
30 Our goal is to model the psychological process of recall in terms of a drifting mental context. [sent-77, score-0.563]
31 There are two core principles of TCM: (1) Memory retrieval involves reinstating a representation of context that was active at the time of study; and (2) context change is driven by features of the studied stimuli [10, 16, 14]. [sent-79, score-0.57]
32 We capture these principles by representing the mental context drift of each subject with a trajectory of latent variables χn . [sent-80, score-0.627]
33 Our use of the same variable name (χ) and dimensionality for the context vector and for topics reflects our key assertion: Context and topics reside in the same meaning space. [sent-81, score-0.531]
34 The relationship between context and topics is specified in the generative process of the free recall data. [sent-82, score-0.717]
35 The generative process encompasses both the study phase and the recall phase of the memory experiment. [sent-83, score-0.459]
36 During study, the model specifies the distribution of the trajectory of internal mental contexts of the subject. [sent-84, score-0.315]
37 ) First, the initial mental context is drawn from a Gaussian: N (0 φI) χs 0 K identity matrix. [sent-86, score-0.36]
38 2 Then, for each studied word the where s denotes the study phase and I is a K mental context drifts according to χs n (1) N (hs n φI) (2) where hs n = 1χ n 1 s + (1 1 ) log(ps n ) (3) 1 For simplicity, we fix the word distribution matrix to one fit using the method of [3]. [sent-87, score-1.122]
39 In future work, we will explore how the data from the free recall experiment could be used to constrain estimates of the word distribution matrix. [sent-88, score-0.588]
40 2 More precisely, context vectors are log-transformed topic vectors (see [1, 2]). [sent-89, score-0.319]
41 When generating words from the topics, we renormalize the context vector. [sent-90, score-0.346]
42 3 This equation identifies the two pulls on mental context drift when the subject is studying words: the ˜ previous context vector θn−1 and ps,n ∝ β·,ws,n , the posterior probabilities of each topic given the current word and the topic distribution matrix. [sent-91, score-1.228]
43 This second term captures the idea that mental context is updated with the meaning of the current word (see also [2] for a related treatment of topic dynamics in the context of text modeling). [sent-92, score-0.994]
44 For example, if the studied word is “stocks” then the mental context might drift toward topics that also have words like “business”, “financial”, and “market” with high probability. [sent-93, score-1.068]
45 (Note that this is where the topic model and memory model are coupled. [sent-94, score-0.316]
46 During recall, the model specifies a distribution over drifting contexts and recalled words. [sent-96, score-0.515]
47 For each time t, the recalled word is assumed to be generated from a mixture of two components. [sent-97, score-0.621]
48 Effectively, there are two “paths” to recalling a word: a semantic path and an episodic path. [sent-98, score-0.794]
49 Formally, the probability of recalling a word via the semantic path is expressed as the marginal probability of that word induced by the current context: Ps (w) = π(θr,t ) · β·,w , (4) where π is a function that maps real-valued vectors onto the simplex (i. [sent-100, score-0.959]
50 The episodic path recalls words by drawing them exclusively from the set of studied words. [sent-103, score-0.77]
51 This path puts a high probability on words that were studied in a context that resembles the current context (this is akin to remembering words that you studied when you were thinking about things similar to what you are currently thinking about). [sent-104, score-1.074]
52 Because people tend not to repeatedly recall words, we remove the corresponding delta function after a word is recalled. [sent-108, score-0.504]
53 (6) Intuitively, λ in Equation 6 controls the balance between semantic influences and episodic influences. [sent-111, score-0.653]
54 When λ approaches 1, we obtain a “pure semantic” model wherein words are recalled essentially by free association (this is similar to the model used by [7] to model semantically-related intrusions in free recall). [sent-112, score-1.006]
55 When λ approaches 0, we obtain a “pure episodic” model wherein words are recalled exclusively from the study list. [sent-113, score-0.673]
56 An intermediate value of λ is essential to simultaneously explaining temporal contiguity and semantic effects in memory. [sent-114, score-0.55]
57 p (8) where This is similar to how context drifts in the study phase, except that the context is additionally pushed by the context that was present when the recalled word was studied. [sent-116, score-1.316]
58 This is obtained mathematically by defining n(wr,t ) to be a mapping from a recalled word to the index of the same word at study. [sent-117, score-0.872]
59 example, if the recalled word is “cat” and cat was the sixth studied word then n(wr,t ) = 6. [sent-122, score-0.982]
60 , the subject recalls a word that was not studied, then θs,n(wr,t ) is set to the zero vector. [sent-125, score-0.363]
61 Here, we use the model to answer the following questions about behavior in free recall experiments: (1) Do both semantic and temporal factors influence recall, and if so what are their relative contributions; (2) What are the relevant dimensions of variation across subjects? [sent-134, score-0.819]
62 In our model, semantic and temporal factors exert their influence via the context vector, while variation across subjects is expressed in the parameters drawn from the group prior. [sent-135, score-0.741]
63 Thus, our goal in inference is to compute the posterior distribution over the context trajectory and subject-specific parameters, given a sequence of studied and recalled words. [sent-136, score-0.853]
64 We can also use this posterior to make predictions about what words will be recalled by a subject at each point during the recall phase. [sent-137, score-0.811]
65 By comparing the predictive performance of different model variants, we can examine what types of model assumptions (like the balance between semantic and temporal factors) best capture human behavior. [sent-138, score-0.651]
66 At time t > 0: 5 Figure 3: Factors contributing to context change during recall on a single list. [sent-142, score-0.41]
67 (Left) Illustration of how three successively recalled words influence context. [sent-143, score-0.514]
68 Each column corresponds to a specific recalled word (shown in the top row). [sent-144, score-0.621]
69 The bars in each cell correspond to individual topics (specifically, these are the top ten inferred topics at recall; the center legend shows the top five words associated with each topic). [sent-145, score-0.512]
70 The context vector at recall (Middle Row) is updated by the posterior over topics given the recalled word (Top Row) and also by retrieved study contexts (Bottom Row). [sent-147, score-1.369]
71 (Right) Plot of the inferred context trajectory at study and recall for a different list, in a 2-dimensional projection of the context space obtained by principal components analysis. [sent-148, score-0.76]
72 First, we generate data from the generative model and record a number of common psychological measurements to assess to what extent the model reproduces qualitative patterns of recall behavior. [sent-157, score-0.457]
73 Figure 2 (left) shows the probability of first recall (PFR) curve, which plots the probability of each list position being the first recalled word. [sent-171, score-0.626]
74 This curve illustrates how words in later positions are more likely to be recalled first, a consequence (in our model) of initializing the recall context with the last study context. [sent-172, score-0.966]
75 Figure 2 (right) shows the lag conditional response probability (lag-CRP) curve, which plots the conditional probability of recalling a word given the last recalled word as a function of the lag (measured in terms of serial position) between the two. [sent-173, score-1.023]
76 This curve demonstrates the temporal 6 Figure 4: (Left) Box-plot of average predictive log-probability of recalled words under different models. [sent-174, score-0.651]
77 contiguity effect observed in human recall behavior: the increased probability of recalling words that were studied nearby in time to the last-recalled word. [sent-179, score-0.713]
78 As in TCM, this effect is present in our model because items studied close in time to one another have similar context vectors; as such, cuing with contextual information from time t will facilitate recall of other items studied in temporal proximity to time t. [sent-180, score-0.953]
79 Pure semantic: defined by drawing words exclusively from the semantic path, with λ = 1. [sent-188, score-0.509]
80 This type of model has been used by [7] to examine semantic similarity effects in free recall. [sent-189, score-0.578]
81 Pure episodic: defined by drawing words exclusively from the episodic path, with λ = 0. [sent-191, score-0.53]
82 This corresponds to a model in which words are drawn from a mixture of the episodic and semantic paths. [sent-194, score-0.838]
83 As a metric of model comparison, we calculate the model’s predictive probability for the word recalled at time t given words 1 to t − 1, for all t: T − log p(wr,t |wr,1:t−1 , ws,1:N ). [sent-196, score-0.855]
84 7 Before we present the quantitative results, it is useful to examine some examples of inferred context change and how it interacts with word recall. [sent-201, score-0.524]
85 Figure 3 shows the different factors at work in generating context change during recall on a single trial, illustrating how semantic inference and retrieved episodic memories combine to drive context change. [sent-202, score-1.428]
86 The legend showing the top words in each topic illustrates how these topics appear to capture some of the semantic structure of the recalled words. [sent-203, score-1.166]
87 On the right of Figure 3, we show another representation of context change (from a different trial), where the context trajectory is projected onto the first two principal components of the context vector. [sent-204, score-0.668]
88 We can see from this figure how recall involves reinstatement of studied contexts: Recalling a word pulls the inferred context vector in the direction of the (inferred) contextual state associated with that word at study. [sent-205, score-1.135]
89 Figure 4 (left) shows the average predictive log-probability of recalled words for the models described above. [sent-206, score-0.563]
90 Overall, the semantic-episodic model outperforms the pure episodic and pure semantic models in predictive accuracy (superiority over the closest competitor, the pure episodic model, was confirmed by a paired-sample t-test, with p < 0. [sent-207, score-1.419]
91 The pure episodic model completely fails to predict extra-list intrusions, because it restricts recall to the study list (i. [sent-211, score-0.789]
92 Conversely, the pure semantic model does a poor job of predicting recall of studied list items, because it does not scope recall to the study list. [sent-214, score-1.086]
93 The semantic-episodic model, by occupying an intermediate position between these two extremes, is able to capture both the semantic and temporal structure in free recall. [sent-216, score-0.576]
94 Another pattern to notice is that the values of the episodic-semantic trade-off parameter λ tend to cluster close to 0 (the episodic extreme of the spectrum), consistent with the fact that the pure episodic and semantic-episodic models are fairly comparable in predictive accuracy. [sent-220, score-0.836]
95 6 Discussion We have presented here LDA-TCM, a probabilistic model of memory that integrates semantic and episodic influences on recall behavior. [sent-222, score-1.019]
96 There are a number of advantages to adopting a Bayesian approach to modeling free recall behavior. [sent-229, score-0.337]
97 First, it is easy to integrate more sophisticated semantic models such as hierarchical Dirichlet processes [18]. [sent-230, score-0.316]
98 Existing studies have used fMRI data to decode semantic states in the brain [12] and predict recall behavior at the level of semantic categories [13]. [sent-234, score-0.872]
99 A solution to Platos problem: The latent semantic analysis theory of acquisition, induction and representation of knowledge. [sent-267, score-0.395]
100 A context maintenance and retrieval model of organizational processes in free recall. [sent-346, score-0.428]
wordName wordTfidf (topN-words)
[('recalled', 0.37), ('episodic', 0.337), ('semantic', 0.316), ('word', 0.251), ('tcm', 0.22), ('recall', 0.208), ('context', 0.202), ('mental', 0.158), ('topics', 0.148), ('words', 0.144), ('free', 0.129), ('sederberg', 0.126), ('memory', 0.117), ('topic', 0.117), ('lda', 0.115), ('pure', 0.113), ('studied', 0.11), ('contiguity', 0.11), ('psychological', 0.106), ('subjects', 0.098), ('recalling', 0.095), ('temporal', 0.088), ('intrusions', 0.084), ('recalls', 0.084), ('items', 0.081), ('latent', 0.079), ('princeton', 0.074), ('howard', 0.068), ('norman', 0.067), ('fmri', 0.066), ('trajectory', 0.062), ('posterior', 0.061), ('retrieval', 0.056), ('documents', 0.055), ('drift', 0.055), ('contexts', 0.054), ('particle', 0.053), ('drifting', 0.05), ('dirichlet', 0.05), ('predictive', 0.049), ('exclusively', 0.049), ('inference', 0.048), ('list', 0.048), ('drifts', 0.047), ('behavioral', 0.047), ('path', 0.046), ('human', 0.046), ('psychology', 0.045), ('delta', 0.045), ('memories', 0.045), ('mult', 0.045), ('verbal', 0.045), ('inferred', 0.044), ('capture', 0.043), ('study', 0.042), ('permeate', 0.042), ('model', 0.041), ('thinking', 0.041), ('variability', 0.04), ('associations', 0.038), ('blei', 0.038), ('ps', 0.037), ('accumulative', 0.037), ('acknowledges', 0.037), ('polyn', 0.037), ('pulls', 0.037), ('factors', 0.037), ('effects', 0.036), ('lter', 0.035), ('hierarchically', 0.034), ('idiosyncratic', 0.034), ('posits', 0.034), ('remembering', 0.034), ('sources', 0.034), ('meaning', 0.033), ('retrieved', 0.033), ('brain', 0.032), ('uences', 0.032), ('corpus', 0.032), ('contextual', 0.032), ('consecutively', 0.031), ('reproduces', 0.031), ('themes', 0.031), ('dynamics', 0.031), ('phase', 0.031), ('nj', 0.031), ('hs', 0.03), ('generative', 0.03), ('similarity', 0.029), ('lag', 0.028), ('legend', 0.028), ('sports', 0.028), ('wn', 0.028), ('lists', 0.028), ('subject', 0.028), ('document', 0.028), ('wherein', 0.027), ('examine', 0.027), ('allocation', 0.027), ('uence', 0.027)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000004 4 nips-2009-A Bayesian Analysis of Dynamics in Free Recall
Author: Richard Socher, Samuel Gershman, Per Sederberg, Kenneth Norman, Adler J. Perotte, David M. Blei
Abstract: We develop a probabilistic model of human memory performance in free recall experiments. In these experiments, a subject first studies a list of words and then tries to recall them. To model these data, we draw on both previous psychological research and statistical topic models of text documents. We assume that memories are formed by assimilating the semantic meaning of studied words (represented as a distribution over topics) into a slowly changing latent context (represented in the same space). During recall, this context is reinstated and used as a cue for retrieving studied words. By conceptualizing memory retrieval as a dynamic latent variable model, we are able to use Bayesian inference to represent uncertainty and reason about the cognitive processes underlying memory. We present a particle filter algorithm for performing approximate posterior inference, and evaluate our model on the prediction of recalled words in experimental data. By specifying the model hierarchically, we are also able to capture inter-subject variability. 1
2 0.32777968 260 nips-2009-Zero-shot Learning with Semantic Output Codes
Author: Mark Palatucci, Dean Pomerleau, Geoffrey E. Hinton, Tom M. Mitchell
Abstract: We consider the problem of zero-shot learning, where the goal is to learn a classifier f : X → Y that must predict novel values of Y that were omitted from the training set. To achieve this, we define the notion of a semantic output code classifier (SOC) which utilizes a knowledge base of semantic properties of Y to extrapolate to novel classes. We provide a formalism for this type of classifier and study its theoretical properties in a PAC framework, showing conditions under which the classifier can accurately predict novel classes. As a case study, we build a SOC classifier for a neural decoding task and show that it can often predict words that people are thinking about from functional magnetic resonance images (fMRI) of their neural activity, even without training examples for those words. 1
3 0.19628935 205 nips-2009-Rethinking LDA: Why Priors Matter
Author: Andrew McCallum, David M. Mimno, Hanna M. Wallach
Abstract: Implementations of topic models typically use symmetric Dirichlet priors with fixed concentration parameters, with the implicit assumption that such “smoothing parameters” have little practical effect. In this paper, we explore several classes of structured priors for topic models. We find that an asymmetric Dirichlet prior over the document–topic distributions has substantial advantages over a symmetric prior, while an asymmetric prior over the topic–word distributions provides no real benefit. Approximation of this prior structure through simple, efficient hyperparameter optimization steps is sufficient to achieve these performance gains. The prior structure we advocate substantially increases the robustness of topic models to variations in the number of topics and to the highly skewed word frequency distributions common in natural language. Since this prior structure can be implemented using efficient algorithms that add negligible cost beyond standard inference techniques, we recommend it as a new standard for topic modeling. 1
4 0.19300495 96 nips-2009-Filtering Abstract Senses From Image Search Results
Author: Kate Saenko, Trevor Darrell
Abstract: We propose an unsupervised method that, given a word, automatically selects non-abstract senses of that word from an online ontology and generates images depicting the corresponding entities. When faced with the task of learning a visual model based only on the name of an object, a common approach is to find images on the web that are associated with the object name and train a visual classifier from the search result. As words are generally polysemous, this approach can lead to relatively noisy models if many examples due to outlier senses are added to the model. We argue that images associated with an abstract word sense should be excluded when training a visual classifier to learn a model of a physical object. While image clustering can group together visually coherent sets of returned images, it can be difficult to distinguish whether an image cluster relates to a desired object or to an abstract sense of the word. We propose a method that uses both image features and the text associated with the images to relate latent topics to particular senses. Our model does not require any human supervision, and takes as input only the name of an object category. We show results of retrieving concrete-sense images in two available multimodal, multi-sense databases, as well as experiment with object classifiers trained on concrete-sense images returned by our method for a set of ten common office objects. 1
5 0.14813545 65 nips-2009-Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process
Author: Chong Wang, David M. Blei
Abstract: We present a nonparametric hierarchical Bayesian model of document collections that decouples sparsity and smoothness in the component distributions (i.e., the “topics”). In the sparse topic model (sparseTM), each topic is represented by a bank of selector variables that determine which terms appear in the topic. Thus each topic is associated with a subset of the vocabulary, and topic smoothness is modeled on this subset. We develop an efficient Gibbs sampler for the sparseTM that includes a general-purpose method for sampling from a Dirichlet mixture with a combinatorial number of components. We demonstrate the sparseTM on four real-world datasets. Compared to traditional approaches, the empirical results will show that sparseTMs give better predictive performance with simpler inferred models. 1
6 0.13596439 204 nips-2009-Replicated Softmax: an Undirected Topic Model
7 0.12831225 190 nips-2009-Polynomial Semantic Indexing
8 0.093977302 28 nips-2009-An Additive Latent Feature Model for Transparent Object Recognition
9 0.09241382 153 nips-2009-Modeling Social Annotation Data with Content Relevance using a Topic Model
10 0.090476371 112 nips-2009-Human Rademacher Complexity
11 0.087896876 194 nips-2009-Predicting the Optimal Spacing of Study: A Multiscale Context Model of Memory
12 0.087547891 186 nips-2009-Parallel Inference for Latent Dirichlet Allocation on Graphics Processing Units
13 0.085826457 97 nips-2009-Free energy score space
14 0.083099738 18 nips-2009-A Stochastic approximation method for inference in probabilistic graphical models
15 0.081065968 109 nips-2009-Hierarchical Learning of Dimensional Biases in Human Categorization
16 0.074502915 192 nips-2009-Posterior vs Parameter Sparsity in Latent Variable Models
17 0.073651023 255 nips-2009-Variational Inference for the Nested Chinese Restaurant Process
18 0.073460467 38 nips-2009-Augmenting Feature-driven fMRI Analyses: Semi-supervised learning and resting state activity
19 0.072988257 5 nips-2009-A Bayesian Model for Simultaneous Image Clustering, Annotation and Object Segmentation
20 0.070765659 68 nips-2009-Dirichlet-Bernoulli Alignment: A Generative Model for Multi-Class Multi-Label Multi-Instance Corpora
topicId topicWeight
[(0, -0.202), (1, -0.173), (2, -0.11), (3, -0.193), (4, 0.1), (5, -0.088), (6, -0.071), (7, -0.069), (8, -0.106), (9, 0.247), (10, -0.001), (11, -0.052), (12, 0.062), (13, 0.05), (14, 0.141), (15, -0.027), (16, -0.138), (17, 0.034), (18, -0.149), (19, 0.045), (20, 0.089), (21, 0.012), (22, 0.062), (23, -0.031), (24, -0.048), (25, 0.056), (26, 0.043), (27, 0.029), (28, -0.015), (29, -0.057), (30, -0.096), (31, 0.06), (32, 0.045), (33, 0.057), (34, 0.009), (35, -0.076), (36, 0.115), (37, -0.029), (38, -0.087), (39, -0.079), (40, -0.033), (41, 0.117), (42, 0.067), (43, -0.011), (44, 0.189), (45, 0.121), (46, -0.075), (47, -0.005), (48, 0.004), (49, 0.057)]
simIndex simValue paperId paperTitle
same-paper 1 0.97598535 4 nips-2009-A Bayesian Analysis of Dynamics in Free Recall
Author: Richard Socher, Samuel Gershman, Per Sederberg, Kenneth Norman, Adler J. Perotte, David M. Blei
Abstract: We develop a probabilistic model of human memory performance in free recall experiments. In these experiments, a subject first studies a list of words and then tries to recall them. To model these data, we draw on both previous psychological research and statistical topic models of text documents. We assume that memories are formed by assimilating the semantic meaning of studied words (represented as a distribution over topics) into a slowly changing latent context (represented in the same space). During recall, this context is reinstated and used as a cue for retrieving studied words. By conceptualizing memory retrieval as a dynamic latent variable model, we are able to use Bayesian inference to represent uncertainty and reason about the cognitive processes underlying memory. We present a particle filter algorithm for performing approximate posterior inference, and evaluate our model on the prediction of recalled words in experimental data. By specifying the model hierarchically, we are also able to capture inter-subject variability. 1
2 0.75793946 260 nips-2009-Zero-shot Learning with Semantic Output Codes
Author: Mark Palatucci, Dean Pomerleau, Geoffrey E. Hinton, Tom M. Mitchell
Abstract: We consider the problem of zero-shot learning, where the goal is to learn a classifier f : X → Y that must predict novel values of Y that were omitted from the training set. To achieve this, we define the notion of a semantic output code classifier (SOC) which utilizes a knowledge base of semantic properties of Y to extrapolate to novel classes. We provide a formalism for this type of classifier and study its theoretical properties in a PAC framework, showing conditions under which the classifier can accurately predict novel classes. As a case study, we build a SOC classifier for a neural decoding task and show that it can often predict words that people are thinking about from functional magnetic resonance images (fMRI) of their neural activity, even without training examples for those words. 1
3 0.64669365 194 nips-2009-Predicting the Optimal Spacing of Study: A Multiscale Context Model of Memory
Author: Harold Pashler, Nicholas Cepeda, Robert Lindsey, Ed Vul, Michael C. Mozer
Abstract: When individuals learn facts (e.g., foreign language vocabulary) over multiple study sessions, the temporal spacing of study has a significant impact on memory retention. Behavioral experiments have shown a nonmonotonic relationship between spacing and retention: short or long intervals between study sessions yield lower cued-recall accuracy than intermediate intervals. Appropriate spacing of study can double retention on educationally relevant time scales. We introduce a Multiscale Context Model (MCM) that is able to predict the influence of a particular study schedule on retention for specific material. MCM’s prediction is based on empirical data characterizing forgetting of the material following a single study session. MCM is a synthesis of two existing memory models (Staddon, Chelaru, & Higa, 2002; Raaijmakers, 2003). On the surface, these models are unrelated and incompatible, but we show they share a core feature that allows them to be integrated. MCM can determine study schedules that maximize the durability of learning, and has implications for education and training. MCM can be cast either as a neural network with inputs that fluctuate over time, or as a cascade of leaky integrators. MCM is intriguingly similar to a Bayesian multiscale model of memory (Kording, Tenenbaum, & Shadmehr, 2007), yet MCM is better able to account for human declarative memory. 1
4 0.61628735 204 nips-2009-Replicated Softmax: an Undirected Topic Model
Author: Geoffrey E. Hinton, Ruslan Salakhutdinov
Abstract: We introduce a two-layer undirected graphical model, called a “Replicated Softmax”, that can be used to model and automatically extract low-dimensional latent semantic representations from a large unstructured collection of documents. We present efficient learning and inference algorithms for this model, and show how a Monte-Carlo based method, Annealed Importance Sampling, can be used to produce an accurate estimate of the log-probability the model assigns to test data. This allows us to demonstrate that the proposed model is able to generalize much better compared to Latent Dirichlet Allocation in terms of both the log-probability of held-out documents and the retrieval accuracy.
5 0.60638493 96 nips-2009-Filtering Abstract Senses From Image Search Results
Author: Kate Saenko, Trevor Darrell
Abstract: We propose an unsupervised method that, given a word, automatically selects non-abstract senses of that word from an online ontology and generates images depicting the corresponding entities. When faced with the task of learning a visual model based only on the name of an object, a common approach is to find images on the web that are associated with the object name and train a visual classifier from the search result. As words are generally polysemous, this approach can lead to relatively noisy models if many examples due to outlier senses are added to the model. We argue that images associated with an abstract word sense should be excluded when training a visual classifier to learn a model of a physical object. While image clustering can group together visually coherent sets of returned images, it can be difficult to distinguish whether an image cluster relates to a desired object or to an abstract sense of the word. We propose a method that uses both image features and the text associated with the images to relate latent topics to particular senses. Our model does not require any human supervision, and takes as input only the name of an object category. We show results of retrieving concrete-sense images in two available multimodal, multi-sense databases, as well as experiment with object classifiers trained on concrete-sense images returned by our method for a set of ten common office objects. 1
6 0.59755796 205 nips-2009-Rethinking LDA: Why Priors Matter
7 0.58191597 68 nips-2009-Dirichlet-Bernoulli Alignment: A Generative Model for Multi-Class Multi-Label Multi-Instance Corpora
8 0.55150169 65 nips-2009-Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process
9 0.54664165 190 nips-2009-Polynomial Semantic Indexing
10 0.54532152 153 nips-2009-Modeling Social Annotation Data with Content Relevance using a Topic Model
11 0.49879521 112 nips-2009-Human Rademacher Complexity
12 0.48940977 25 nips-2009-Adaptive Design Optimization in Experiments with People
13 0.46538141 244 nips-2009-The Wisdom of Crowds in the Recollection of Order Information
14 0.42869467 28 nips-2009-An Additive Latent Feature Model for Transparent Object Recognition
15 0.42465174 192 nips-2009-Posterior vs Parameter Sparsity in Latent Variable Models
16 0.39440987 216 nips-2009-Sequential effects reflect parallel learning of multiple environmental regularities
17 0.36918306 171 nips-2009-Nonparametric Bayesian Models for Unsupervised Event Coreference Resolution
18 0.36628088 143 nips-2009-Localizing Bugs in Program Executions with Graphical Models
19 0.36348617 152 nips-2009-Measuring model complexity with the prior predictive
20 0.35577095 97 nips-2009-Free energy score space
topicId topicWeight
[(24, 0.018), (25, 0.059), (35, 0.033), (36, 0.046), (39, 0.052), (58, 0.054), (61, 0.012), (71, 0.552), (81, 0.013), (86, 0.054), (91, 0.012)]
simIndex simValue paperId paperTitle
same-paper 1 0.96870655 4 nips-2009-A Bayesian Analysis of Dynamics in Free Recall
Author: Richard Socher, Samuel Gershman, Per Sederberg, Kenneth Norman, Adler J. Perotte, David M. Blei
Abstract: We develop a probabilistic model of human memory performance in free recall experiments. In these experiments, a subject first studies a list of words and then tries to recall them. To model these data, we draw on both previous psychological research and statistical topic models of text documents. We assume that memories are formed by assimilating the semantic meaning of studied words (represented as a distribution over topics) into a slowly changing latent context (represented in the same space). During recall, this context is reinstated and used as a cue for retrieving studied words. By conceptualizing memory retrieval as a dynamic latent variable model, we are able to use Bayesian inference to represent uncertainty and reason about the cognitive processes underlying memory. We present a particle filter algorithm for performing approximate posterior inference, and evaluate our model on the prediction of recalled words in experimental data. By specifying the model hierarchically, we are also able to capture inter-subject variability. 1
2 0.96194512 143 nips-2009-Localizing Bugs in Program Executions with Graphical Models
Author: Laura Dietz, Valentin Dallmeier, Andreas Zeller, Tobias Scheffer
Abstract: We devise a graphical model that supports the process of debugging software by guiding developers to code that is likely to contain defects. The model is trained using execution traces of passing test runs; it reflects the distribution over transitional patterns of code positions. Given a failing test case, the model determines the least likely transitional pattern in the execution trace. The model is designed such that Bayesian inference has a closed-form solution. We evaluate the Bernoulli graph model on data of the software projects AspectJ and Rhino. 1
3 0.94875246 53 nips-2009-Complexity of Decentralized Control: Special Cases
Author: Martin Allen, Shlomo Zilberstein
Abstract: The worst-case complexity of general decentralized POMDPs, which are equivalent to partially observable stochastic games (POSGs) is very high, both for the cooperative and competitive cases. Some reductions in complexity have been achieved by exploiting independence relations in some models. We show that these results are somewhat limited: when these independence assumptions are relaxed in very small ways, complexity returns to that of the general case. 1
4 0.91045135 11 nips-2009-A General Projection Property for Distribution Families
Author: Yao-liang Yu, Yuxi Li, Dale Schuurmans, Csaba Szepesvári
Abstract: Surjectivity of linear projections between distribution families with fixed mean and covariance (regardless of dimension) is re-derived by a new proof. We further extend this property to distribution families that respect additional constraints, such as symmetry, unimodality and log-concavity. By combining our results with classic univariate inequalities, we provide new worst-case analyses for natural risk criteria arising in classification, optimization, portfolio selection and Markov decision processes. 1
5 0.82411385 56 nips-2009-Conditional Neural Fields
Author: Jian Peng, Liefeng Bo, Jinbo Xu
Abstract: Conditional random fields (CRF) are widely used for sequence labeling such as natural language processing and biological sequence analysis. Most CRF models use a linear potential function to represent the relationship between input features and output. However, in many real-world applications such as protein structure prediction and handwriting recognition, the relationship between input features and output is highly complex and nonlinear, which cannot be accurately modeled by a linear function. To model the nonlinear relationship between input and output we propose a new conditional probabilistic graphical model, Conditional Neural Fields (CNF), for sequence labeling. CNF extends CRF by adding one (or possibly more) middle layer between input and output. The middle layer consists of a number of gate functions, each acting as a local neuron or feature extractor to capture the nonlinear relationship between input and output. Therefore, conceptually CNF is much more expressive than CRF. Experiments on two widely-used benchmarks indicate that CNF performs significantly better than a number of popular methods. In particular, CNF is the best among approximately 10 machine learning methods for protein secondary structure prediction and also among a few of the best methods for handwriting recognition.
6 0.81651652 130 nips-2009-Learning from Multiple Partially Observed Views - an Application to Multilingual Text Categorization
7 0.75155586 205 nips-2009-Rethinking LDA: Why Priors Matter
8 0.73869652 65 nips-2009-Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process
9 0.64064461 204 nips-2009-Replicated Softmax: an Undirected Topic Model
10 0.63057381 150 nips-2009-Maximum likelihood trajectories for continuous-time Markov chains
11 0.62471241 206 nips-2009-Riffled Independence for Ranked Data
12 0.61438972 171 nips-2009-Nonparametric Bayesian Models for Unsupervised Event Coreference Resolution
13 0.61320728 40 nips-2009-Bayesian Nonparametric Models on Decomposable Graphs
14 0.6042459 96 nips-2009-Filtering Abstract Senses From Image Search Results
15 0.59643012 226 nips-2009-Spatial Normalized Gamma Processes
16 0.59347713 194 nips-2009-Predicting the Optimal Spacing of Study: A Multiscale Context Model of Memory
17 0.5926497 154 nips-2009-Modeling the spacing effect in sequential category learning
18 0.58765358 107 nips-2009-Help or Hinder: Bayesian Models of Social Goal Inference
19 0.57441241 260 nips-2009-Zero-shot Learning with Semantic Output Codes
20 0.5702678 215 nips-2009-Sensitivity analysis in HMMs with application to likelihood maximization