nips nips2009 nips2009-154 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Hongjing Lu, Matthew Weiden, Alan L. Yuille
Abstract: We develop a Bayesian sequential model for category learning. The sequential model updates two category parameters, the mean and the variance, over time. We define conjugate temporal priors to enable closed form solutions to be obtained. This model can be easily extended to supervised and unsupervised learning involving multiple categories. To model the spacing effect, we introduce a generic prior in the temporal updating stage to capture a learning preference, namely, less change for repetition and more change for variation. Finally, we show how this approach can be generalized to efficiently perform model selection to decide whether observations are from one or multiple categories.
Reference: text
sentIndex sentText sentNum sentScore
1 Modeling the spacing effect in sequential category learning Hongjing Lu Department of Psychology & Statistics Hongjing@ucla. [sent-1, score-0.738]
2 edu Abstract We develop a Bayesian sequential model for category learning. [sent-5, score-0.574]
3 The sequential model updates two category parameters, the mean and the variance, over time. [sent-6, score-0.574]
4 We define conjugate temporal priors to enable closed form solutions to be obtained. [sent-7, score-0.181]
5 This model can be easily extended to supervised and unsupervised learning involving multiple categories. [sent-8, score-0.151]
6 To model the spacing effect, we introduce a generic prior in the temporal updating stage to capture a learning preference, namely, less change for repetition and more change for variation. [sent-9, score-0.416]
7 1 Introduction Inductive learning the process by which a new concept or category is acquired through observation of exemplars - poses a fundamental theoretical problem for cognitive science. [sent-11, score-0.586]
8 One pervasive phenomenon is the spacing effect, manifested in the finding that given a fixed amount of total study time with a given item, learning is facilitated when presentations of the item are spread across a longer time interval rather than massed into a continuous study period. [sent-13, score-0.624]
9 In category learning, for example, exemplars of two categories can be spaced by presenting them in an interleaved manner (e. [sent-14, score-0.992]
10 , A1 B1 A2 B2 A3 B3 ), or massed by presenting them in consecutive blocks (e. [sent-16, score-0.484]
11 Kornell & Bjork [1] show that when tested later on classification of novel category members, spaced presentation yields superior performance relative to massed presentation. [sent-19, score-1.257]
12 Similar spacing effects have been obtained in studies of item learning [2] and motor learning [3]. [sent-20, score-0.213]
13 Moreover, spacing effects are found not only in human learning, but also in various types of learning in other species, including rats and Aplysia [4][5]. [sent-21, score-0.304]
14 In the present paper we will focus on spacing effects in the context of sequential category learning. [sent-22, score-0.734]
15 Standard statistical methods based on summary information are unable to deal with order effects, including the performance difference between spaced and massed conditions. [sent-23, score-0.718]
16 From a computational perspective, a sequential learning model is needed to construct category representations from training examples and dynamically update parameters of these representations from trial to trial. [sent-24, score-0.796]
17 Bayesian sequential models have been successfully applied to model causal learning and animal conditioning [6] [7]. [sent-25, score-0.155]
18 1 However, given that both the mean and the variance of a category are random variables, standard Kalman filtering [9] is not directly applicable in this case since it assumes a known variance, which is not warranted in the current application. [sent-27, score-0.507]
19 In this paper, we extend traditional Kalman filtering in order to update two category parameters, the mean and the variance, over time in the context of category learning. [sent-28, score-0.996]
20 We define conjugate temporal priors to enable closed form solutions to be obtained in this learning model with two unknown parameters. [sent-29, score-0.23]
21 We will illustrate how the learning model can be easily extended to learning situations involving multiple categories either with supervision (i. [sent-30, score-0.222]
22 , learners are informed of category membership for each training observation) or without supervision (i. [sent-32, score-0.758]
23 , category membership of each training observation is not provided to learners). [sent-34, score-0.65]
24 To model the spacing effect, we introduce a generic prior in the temporal updating stage. [sent-37, score-0.335]
25 In Section 2 we introduce the Bayesian sequential learning framework in the context of category learning, and discuss the conjugacy property of the model. [sent-40, score-0.57]
26 Section 3 and 4 demonstrate how to develop supervised and unsupervised learning models, which can be compared with human performance. [sent-41, score-0.163]
27 2 Bayesian sequential model We adopt the framework of Bayesian sequential learning [11], termed Bayes-Kalman, a probabilistic model in which learning is assumed to be a Markov process with unobserved states. [sent-43, score-0.198]
28 The exemplars in training are directly observable, but the representations of categories are hidden and unobservable. [sent-44, score-0.263]
29 In this paper, we assume that categories can be represented as Gaussian distributions with two unknown parameters, means and variances. [sent-45, score-0.176]
30 We now state the general framework and give the update rule for the simplest situation where the training data is generated by a single category specified by a mean m and precision r – the precision is the inverse of the variance and is used to simplify the algebra. [sent-49, score-0.653]
31 Our model assumes that the mean can change over time and is denoted by mt , where t is the time step. [sent-50, score-0.365]
32 The model is specified by the prior distribution P (m0 , r), the likelihood function P (x|mt , r) for generating the observations, and the temporal prior P (mt+1 |mt ) specifying how mt can vary over time. [sent-51, score-0.444]
33 The update equations are divided into two stages, prediction and correction: ∞ dmt P (mt+1 |mt )P (mt , r|Xt ), P (mt+1 , r|Xt ) = (1) −∞ P (mt+1 , r|Xt+1 ) = P (mt+1 , r|xt+1 , Xt ) = P (xt+1 |mt+1 , r)P (mt+1 , r|Xt ) . [sent-57, score-0.16]
34 P (xt+1 |Xt ) (2) Intuitively, the Bayes-Kalman first predicts the distribution P (mt+1 , r|Xt ) and then uses this as a prior to correct for the new observation xt+1 and determine the new posterior P (mt+1 , r|Xt+1 ). [sent-58, score-0.147]
35 As shown in the following section, this reduces the Bayes-Kalman equations to closed form update rules for the parameters of the dis2 tributions. [sent-62, score-0.177]
36 The likelihood function and temporal prior are both Gaussians: P (xt |mt , r) = G(xt : mt , ζr), P (mt+1 |mt ) = G(mt+1 : mt , γr), (6) where ζ, γ are constants. [sent-69, score-0.708]
37 The conjugacy of the distributions ensures that the posterior distribution P (mt , r|Xt ) will also be a Gamma-Gaussian distribution with parameters µt , τt , αt , βt , where the update rules for these parameters are specified in the next section. [sent-70, score-0.197]
38 2 Update rules for the model parameters The update rules for the model parameters follow from substituting the distributions into the BayesKalman equations 1, 2. [sent-72, score-0.276]
39 2π ζ + τt Γ(αt ) (10) 3 Supervised category learning Although the learning model is presented for one category, it can easily be extended to learning multiple categories with known category membership for training data (i. [sent-86, score-1.227]
40 In this section, we will first describe an experiment with two categories to show how the category representations change over time; then we will simulate learning with six categories and compare predictions with human data in psychological experiments. [sent-89, score-0.877]
41 1 Two-category learning with supervision We first conduct a synthetic experiment with two categories under supervision. [sent-91, score-0.195]
42 We generate six training observations from one of two one-dimensional Gaussian distributions (representing categories A and B, respectively) with means [−0. [sent-92, score-0.285]
43 Two training conditions are included, a massed condition with the data presentation order of AAABBB and a spaced condition with the order of ABABAB. [sent-96, score-0.882]
44 To model the acquisition of category representations during training, we employ the Bayesian learning model as described in the previous section. [sent-97, score-0.562]
45 In the correction stage of each trial, the model updates the parameters corresponding to the category that produced the observation based on the supervision (i. [sent-98, score-0.698]
46 In the prediction stage, however, different values of a fixed model parameter γ are introduced to incorporate a generic prior that controls how much the learner is willing to update category representations from one trial to the next. [sent-101, score-0.755]
47 The basic hypothesis is that learners will have greater confidence in knowledge of a category presented on trial t than of a category absent on trial t. [sent-102, score-1.123]
48 As a consequence, the learner will be willing to accept more change in a category representation if the observation on the previous trial was drawn from a different category. [sent-103, score-0.629]
49 More specifically, if the observation on trial t is from the first category, in the prediction phase we will update the τt parameters for the two categories, τt 1 , τt 2 , as: τt 1 → τt 1 γs , τt 1 + γs τt 2 → τt 2 γd , τt 2 + γd (11) in which γs > γd . [sent-108, score-0.2]
50 Blue lines indicate category parameters in the first category; and red lines indicate parameters in the second category. [sent-111, score-0.475]
51 The top panel shows the results for the massed condition (i. [sent-112, score-0.464]
52 , AAABBB), and the bottom panel shows the results for the spaced condition (i. [sent-114, score-0.31]
53 Figure (1) shows the change of posterior distributions of the two unknown category parameters, means P (mt |Xt ) and precisions P (rt |Xt ), over training trials. [sent-121, score-0.677]
54 Figure (2) shows the category representation in the form of the posterior distribution of P (xt |Xt ). [sent-122, score-0.523]
55 The increase of category variance reflects the forgetting that occurs if no new observations are provided for a particular category after a long interval. [sent-126, score-1.06]
56 This type of forgetting does not occur in the spaced condition, as the interleaved presentation order ABABAB ensured that each category recurs after a short interval. [sent-127, score-0.885]
57 Based upon the learned category representations, we can compute accuracy (the ability to discriminate between the two learnt distributions) using the posterior distributions of the two categories. [sent-128, score-0.581]
58 After 100 simulations, the average accuracy in the massed condition is 0. [sent-129, score-0.485]
59 Thus our model is able to predict the spacing effect found in two-category supervised learning. [sent-132, score-0.279]
60 2 Modeling the spacing effect in six-category learning Kornell and Bjork [1] asked human subjects to study six paintings by six different artists, with a given artists paintings presented consecutively (massed) or interleaved with other artists paintings (spaced). [sent-138, score-0.786]
61 In the training phase, subjects were informed which artist created each training painting. [sent-139, score-0.208]
62 The same 36 paintings were studied in the training phase, but with different presentation orders in the massed and spaced conditions. [sent-140, score-0.933]
63 In the subsequent test phase, six new paintings (one from each artist) were presented and subjects had to identify which artist painted each of a series of new paintings. [sent-141, score-0.262]
64 Human subjects showed significantly better test performance after spaced than massed training. [sent-146, score-0.772]
65 Given that feedback was provided and one painting from each artist appeared in one test block, it is not surprising that test performance increased across test blocks and the spacing effect decreased with more test blocks. [sent-147, score-0.364]
66 To simulate the data, we generated training and test data from six one-dimensional Gaussian distributions with means [−2, −1. [sent-148, score-0.147]
67 Figure (3) shows the learned category representations in terms of posterior distributions. [sent-154, score-0.556]
68 To compare with human performance reported by Kornell and Bjork, the model estimates accuracy in terms of discrimination between the two categories based upon learned distributions. [sent-156, score-0.204]
69 4 Unsupervised category learning Both humans and animals can learn without supervision. [sent-159, score-0.475]
70 For example, in the animal conditioning literature, various studies have shown that exposing two stimuli in blocks (equivalent to a massed condition) is less effective in producing generalization [12]. [sent-160, score-0.551]
71 They conclude that in the massed preexposure the rats are unable to distinguish two separate categories for A and B, and therefore treat them as members of a single category. [sent-181, score-0.702]
72 By contrast, they conclude that rats can distinguish the categories A and B in the spaced preexposure. [sent-182, score-0.477]
73 In this section, we generalize the sequential category model to unsupervised learning, when the category membership of each training example is not provided to observers. [sent-183, score-1.245]
74 Then we determine whether massed and spaced stimuli (as in Balleine et. [sent-185, score-0.749]
75 ’s experiment [4]) are most likely to have been generated by a single category or by two categories. [sent-187, score-0.475]
76 We also assess the importance of supervision in training by comparing performance after unsupervised learning with that after supervised learning. [sent-188, score-0.246]
77 Each category can be represented as a Gaussian distribution with a mean and precision m1 , r1 and m2 , r2 . [sent-190, score-0.503]
78 The likelihood function assumes that the data is generated by either category with equal probability, since the category membership is not provided, 1 1 P (x|m1 , r1 ) + P (x|m2 , r2 ), 2 2 with P (x|m1 , r1 ) = G(x : m1 , ζr1 ), P (x|m2 , r2 ) = G(x : m2 , ζr2 ). [sent-191, score-1.039]
79 t (14) (15) The joint posterior distribution P (m1 , r1 , m2 , r2 |Xt ) after observations Xt can be formally obt t tained by applying the Bayes-Kalman update rules to the joint distribution – i. [sent-193, score-0.18]
80 But this update is more complicated because we do not know t t whether the new observation xt should be assigned to category 1 or category 2. [sent-196, score-1.419]
81 Instead we have to sum over all the possible assignments of the observations to the categories which gives 2t possible assignments at time t. [sent-197, score-0.21]
82 , 1) is the assignment where all the observations are assigned to category 1, (2, 1, . [sent-205, score-0.518]
83 , 1) assigns the first observation to category 2 and the remainder to category 1, and so on. [sent-208, score-0.992]
84 ,at ) (a where denotes the values of the parameters α = (α, β, µ, τ ) for category i (i ∈ {1, 2}) for observation sequence (a1 , . [sent-225, score-0.517]
85 ’s preexposure experiments [4] – why do rats identify a single category for the massed stimuli but two categories for the spaced stimuli? [sent-353, score-1.49]
86 We compare the evidence for the sequential model with one category, see equations (9,10), versus the evidence for the model with two categories, see equations (9,22), for the two cases AAABBB (massed) and ABABAB (spaced). [sent-355, score-0.288]
87 1) but without providing category membership for any of the training data. [sent-357, score-0.608]
88 As shown in figure (5), the model decides that all training observations are from one category in the massed condition, but from two different categories in the spaced condition (using zero as the decision threshold). [sent-360, score-1.452]
89 Left, model selection results as a function of presentation training conditions (massed and spaced). [sent-372, score-0.157]
90 To make the comparison, we assume that learners are provided with the same training data and are informed that the data are from two different categories, either with known category membership (supervised) or unknown category membership (unsupervised) for each training observation. [sent-382, score-1.31]
91 The model predicts higher accuracy given supervised than unsupervised learning. [sent-384, score-0.204]
92 Furthermore, the model predicts a spacing effect for both types of learning, although the effect is reduced with unsupervised learning. [sent-385, score-0.342]
93 5 Conclusions In this paper, we develop a Bayesian sequential model for category learning by updating category representations over time based on two category parameters, the mean and the variance. [sent-386, score-1.593]
94 Analytic updating rules are obtained by defining conjugate temporal priors to enable closed form solutions. [sent-387, score-0.26]
95 A generic prior in the temporal updating stage is introduced to model the spacing effect. [sent-388, score-0.372]
96 In addition to explaining the spacing effect, our model predicts that subjects will become less certain about their knowledge of learned categories as time passes, see the increase in category variance in Figure 2. [sent-391, score-0.877]
97 Instead, as shown in Equation 10, our model predicts the pattern of power-law forgetting that is fairly universal in human memory [14] For small number of observations, our model is extremely efficient because we can derive analytic solutions. [sent-393, score-0.21]
98 Learning concepts and categories: Is spacing the ”enemy of induction”? [sent-403, score-0.162]
99 Maintenance of foreign language vocabulary and the spacing effect. [sent-414, score-0.162]
100 Induction of category distributions: A framework for classification learning. [sent-467, score-0.475]
wordName wordTfidf (topN-words)
[('category', 0.475), ('massed', 0.436), ('xt', 0.381), ('mt', 0.316), ('spaced', 0.282), ('spacing', 0.162), ('categories', 0.117), ('paintings', 0.107), ('balleine', 0.089), ('membership', 0.089), ('tp', 0.084), ('supervision', 0.078), ('rats', 0.078), ('sequential', 0.072), ('aaabbb', 0.071), ('bahrick', 0.071), ('bjork', 0.071), ('kornell', 0.071), ('preexposure', 0.071), ('kalman', 0.069), ('exemplars', 0.069), ('trial', 0.066), ('presentation', 0.064), ('unsupervised', 0.063), ('supervised', 0.061), ('artist', 0.057), ('ababab', 0.053), ('equations', 0.053), ('psychology', 0.051), ('temporal', 0.051), ('posterior', 0.048), ('update', 0.046), ('six', 0.044), ('training', 0.044), ('artists', 0.043), ('rules', 0.043), ('observations', 0.043), ('observation', 0.042), ('learners', 0.041), ('conjugate', 0.04), ('correction', 0.039), ('human', 0.039), ('stage', 0.037), ('distributions', 0.037), ('updating', 0.036), ('conventions', 0.036), ('dmt', 0.036), ('hongjing', 0.036), ('closed', 0.035), ('forgetting', 0.035), ('trials', 0.035), ('generic', 0.034), ('ltering', 0.034), ('representations', 0.033), ('variance', 0.032), ('subjects', 0.032), ('predicts', 0.032), ('informed', 0.031), ('stimuli', 0.031), ('blocking', 0.031), ('angeles', 0.031), ('psychological', 0.03), ('analytic', 0.03), ('animal', 0.03), ('effect', 0.029), ('interleaved', 0.029), ('precisions', 0.029), ('priors', 0.028), ('condition', 0.028), ('evidence', 0.028), ('precision', 0.028), ('blocks', 0.028), ('enable', 0.027), ('fusion', 0.027), ('kording', 0.027), ('model', 0.027), ('item', 0.026), ('conditioning', 0.026), ('gure', 0.026), ('effects', 0.025), ('prior', 0.025), ('assignments', 0.025), ('los', 0.025), ('yuille', 0.025), ('prediction', 0.025), ('willing', 0.024), ('rat', 0.023), ('conjugacy', 0.023), ('block', 0.023), ('change', 0.022), ('unknown', 0.022), ('selection', 0.022), ('test', 0.022), ('accuracy', 0.021), ('phase', 0.021), ('dayan', 0.021), ('bayesian', 0.021), ('presenting', 0.02), ('memory', 0.02), ('proportion', 0.02)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999994 154 nips-2009-Modeling the spacing effect in sequential category learning
Author: Hongjing Lu, Matthew Weiden, Alan L. Yuille
Abstract: We develop a Bayesian sequential model for category learning. The sequential model updates two category parameters, the mean and the variance, over time. We define conjugate temporal priors to enable closed form solutions to be obtained. This model can be easily extended to supervised and unsupervised learning involving multiple categories. To model the spacing effect, we introduce a generic prior in the temporal updating stage to capture a learning preference, namely, less change for repetition and more change for variation. Finally, we show how this approach can be generalized to efficiently perform model selection to decide whether observations are from one or multiple categories.
2 0.1886515 116 nips-2009-Information-theoretic lower bounds on the oracle complexity of convex optimization
Author: Alekh Agarwal, Martin J. Wainwright, Peter L. Bartlett, Pradeep K. Ravikumar
Abstract: Despite a large literature on upper bounds on complexity of convex optimization, relatively less attention has been paid to the fundamental hardness of these problems. Given the extensive use of convex optimization in machine learning and statistics, gaining a understanding of these complexity-theoretic issues is important. In this paper, we study the complexity of stochastic convex optimization in an oracle model of computation. We improve upon known results and obtain tight minimax complexity estimates for various function classes. We also discuss implications of these results for the understanding the inherent complexity of large-scale learning and estimation problems. 1
3 0.18805781 215 nips-2009-Sensitivity analysis in HMMs with application to likelihood maximization
Author: Pierre-arnaud Coquelin, Romain Deguest, Rémi Munos
Abstract: This paper considers a sensitivity analysis in Hidden Markov Models with continuous state and observation spaces. We propose an Infinitesimal Perturbation Analysis (IPA) on the filtering distribution with respect to some parameters of the model. We describe a methodology for using any algorithm that estimates the filtering density, such as Sequential Monte Carlo methods, to design an algorithm that estimates its gradient. The resulting IPA estimator is proven to be asymptotically unbiased, consistent and has computational complexity linear in the number of particles. We consider an application of this analysis to the problem of identifying unknown parameters of the model given a sequence of observations. We derive an IPA estimator for the gradient of the log-likelihood, which may be used in a gradient method for the purpose of likelihood maximization. We illustrate the method with several numerical experiments.
4 0.18735084 133 nips-2009-Learning models of object structure
Author: Joseph Schlecht, Kobus Barnard
Abstract: We present an approach for learning stochastic geometric models of object categories from single view images. We focus here on models expressible as a spatially contiguous assemblage of blocks. Model topologies are learned across groups of images, and one or more such topologies is linked to an object category (e.g. chairs). Fitting learned topologies to an image can be used to identify the object class, as well as detail its geometry. The latter goes beyond labeling objects, as it provides the geometric structure of particular instances. We learn the models using joint statistical inference over category parameters, camera parameters, and instance parameters. These produce an image likelihood through a statistical imaging model. We use trans-dimensional sampling to explore topology hypotheses, and alternate between Metropolis-Hastings and stochastic dynamics to explore instance parameters. Experiments on images of furniture objects such as tables and chairs suggest that this is an effective approach for learning models that encode simple representations of category geometry and the statistics thereof, and support inferring both category and geometry on held out single view images. 1
5 0.1775393 178 nips-2009-On Stochastic and Worst-case Models for Investing
Author: Elad Hazan, Satyen Kale
Abstract: In practice, most investing is done assuming a probabilistic model of stock price returns known as the Geometric Brownian Motion (GBM). While often an acceptable approximation, the GBM model is not always valid empirically. This motivates a worst-case approach to investing, called universal portfolio management, where the objective is to maximize wealth relative to the wealth earned by the best fixed portfolio in hindsight. In this paper we tie the two approaches, and design an investment strategy which is universal in the worst-case, and yet capable of exploiting the mostly valid GBM model. Our method is based on new and improved regret bounds for online convex optimization with exp-concave loss functions. 1
6 0.16398905 109 nips-2009-Hierarchical Learning of Dimensional Biases in Human Categorization
7 0.12064764 27 nips-2009-Adaptive Regularization of Weight Vectors
8 0.099731222 194 nips-2009-Predicting the Optimal Spacing of Study: A Multiscale Context Model of Memory
9 0.099710561 22 nips-2009-Accelerated Gradient Methods for Stochastic Optimization and Online Learning
10 0.097283207 202 nips-2009-Regularized Distance Metric Learning:Theory and Algorithm
11 0.086925723 101 nips-2009-Generalization Errors and Learning Curves for Regression with Multi-task Gaussian Processes
12 0.086614639 21 nips-2009-Abstraction and Relational learning
13 0.084568307 228 nips-2009-Speeding up Magnetic Resonance Image Acquisition by Bayesian Multi-Slice Adaptive Compressed Sensing
14 0.083920009 44 nips-2009-Beyond Categories: The Visual Memex Model for Reasoning About Object Relationships
15 0.077110581 246 nips-2009-Time-Varying Dynamic Bayesian Networks
16 0.070457205 177 nips-2009-On Learning Rotations
17 0.069822878 85 nips-2009-Explaining human multiple object tracking as resource-constrained approximate inference in a dynamic probabilistic model
18 0.069594897 220 nips-2009-Slow Learners are Fast
19 0.065969966 11 nips-2009-A General Projection Property for Distribution Families
20 0.064423934 13 nips-2009-A Neural Implementation of the Kalman Filter
topicId topicWeight
[(0, -0.172), (1, 0.015), (2, 0.068), (3, -0.091), (4, 0.246), (5, 0.178), (6, 0.084), (7, 0.082), (8, -0.032), (9, -0.033), (10, 0.137), (11, 0.001), (12, 0.019), (13, -0.16), (14, 0.117), (15, -0.059), (16, -0.123), (17, 0.136), (18, -0.061), (19, 0.051), (20, 0.038), (21, -0.044), (22, 0.056), (23, 0.199), (24, -0.062), (25, -0.102), (26, 0.016), (27, -0.007), (28, -0.08), (29, -0.06), (30, 0.003), (31, -0.006), (32, 0.058), (33, 0.015), (34, -0.05), (35, 0.093), (36, -0.076), (37, 0.076), (38, 0.003), (39, -0.062), (40, 0.046), (41, 0.003), (42, 0.005), (43, 0.061), (44, 0.012), (45, 0.0), (46, 0.043), (47, 0.102), (48, 0.045), (49, -0.049)]
simIndex simValue paperId paperTitle
same-paper 1 0.97716564 154 nips-2009-Modeling the spacing effect in sequential category learning
Author: Hongjing Lu, Matthew Weiden, Alan L. Yuille
Abstract: We develop a Bayesian sequential model for category learning. The sequential model updates two category parameters, the mean and the variance, over time. We define conjugate temporal priors to enable closed form solutions to be obtained. This model can be easily extended to supervised and unsupervised learning involving multiple categories. To model the spacing effect, we introduce a generic prior in the temporal updating stage to capture a learning preference, namely, less change for repetition and more change for variation. Finally, we show how this approach can be generalized to efficiently perform model selection to decide whether observations are from one or multiple categories.
2 0.71415043 215 nips-2009-Sensitivity analysis in HMMs with application to likelihood maximization
Author: Pierre-arnaud Coquelin, Romain Deguest, Rémi Munos
Abstract: This paper considers a sensitivity analysis in Hidden Markov Models with continuous state and observation spaces. We propose an Infinitesimal Perturbation Analysis (IPA) on the filtering distribution with respect to some parameters of the model. We describe a methodology for using any algorithm that estimates the filtering density, such as Sequential Monte Carlo methods, to design an algorithm that estimates its gradient. The resulting IPA estimator is proven to be asymptotically unbiased, consistent and has computational complexity linear in the number of particles. We consider an application of this analysis to the problem of identifying unknown parameters of the model given a sequence of observations. We derive an IPA estimator for the gradient of the log-likelihood, which may be used in a gradient method for the purpose of likelihood maximization. We illustrate the method with several numerical experiments.
3 0.6441105 178 nips-2009-On Stochastic and Worst-case Models for Investing
Author: Elad Hazan, Satyen Kale
Abstract: In practice, most investing is done assuming a probabilistic model of stock price returns known as the Geometric Brownian Motion (GBM). While often an acceptable approximation, the GBM model is not always valid empirically. This motivates a worst-case approach to investing, called universal portfolio management, where the objective is to maximize wealth relative to the wealth earned by the best fixed portfolio in hindsight. In this paper we tie the two approaches, and design an investment strategy which is universal in the worst-case, and yet capable of exploiting the mostly valid GBM model. Our method is based on new and improved regret bounds for online convex optimization with exp-concave loss functions. 1
4 0.62551576 27 nips-2009-Adaptive Regularization of Weight Vectors
Author: Koby Crammer, Alex Kulesza, Mark Dredze
Abstract: We present AROW, a new online learning algorithm that combines several useful properties: large margin training, confidence weighting, and the capacity to handle non-separable data. AROW performs adaptive regularization of the prediction function upon seeing each new instance, allowing it to perform especially well in the presence of label noise. We derive a mistake bound, similar in form to the second order perceptron bound, that does not assume separability. We also relate our algorithm to recent confidence-weighted online learning techniques and show empirically that AROW achieves state-of-the-art performance and notable robustness in the case of non-separable data. 1
5 0.58943552 109 nips-2009-Hierarchical Learning of Dimensional Biases in Human Categorization
Author: Adam Sanborn, Nick Chater, Katherine A. Heller
Abstract: Existing models of categorization typically represent to-be-classified items as points in a multidimensional space. While from a mathematical point of view, an infinite number of basis sets can be used to represent points in this space, the choice of basis set is psychologically crucial. People generally choose the same basis dimensions – and have a strong preference to generalize along the axes of these dimensions, but not “diagonally”. What makes some choices of dimension special? We explore the idea that the dimensions used by people echo the natural variation in the environment. Specifically, we present a rational model that does not assume dimensions, but learns the same type of dimensional generalizations that people display. This bias is shaped by exposing the model to many categories with a structure hypothesized to be like those which children encounter. The learning behaviour of the model captures the developmental shift from roughly “isotropic” for children to the axis-aligned generalization that adults show. 1
7 0.48290831 115 nips-2009-Individuation, Identification and Object Discovery
8 0.48026496 116 nips-2009-Information-theoretic lower bounds on the oracle complexity of convex optimization
9 0.46915722 11 nips-2009-A General Projection Property for Distribution Families
10 0.45691362 194 nips-2009-Predicting the Optimal Spacing of Study: A Multiscale Context Model of Memory
11 0.43662837 21 nips-2009-Abstraction and Relational learning
12 0.43645152 44 nips-2009-Beyond Categories: The Visual Memex Model for Reasoning About Object Relationships
13 0.42948005 25 nips-2009-Adaptive Design Optimization in Experiments with People
14 0.42916796 177 nips-2009-On Learning Rotations
15 0.42873713 22 nips-2009-Accelerated Gradient Methods for Stochastic Optimization and Online Learning
16 0.42426777 133 nips-2009-Learning models of object structure
17 0.40837583 216 nips-2009-Sequential effects reflect parallel learning of multiple environmental regularities
18 0.40130675 152 nips-2009-Measuring model complexity with the prior predictive
19 0.39645809 101 nips-2009-Generalization Errors and Learning Curves for Regression with Multi-task Gaussian Processes
20 0.38826376 112 nips-2009-Human Rademacher Complexity
topicId topicWeight
[(24, 0.023), (25, 0.065), (35, 0.043), (36, 0.074), (39, 0.134), (58, 0.063), (61, 0.019), (62, 0.011), (71, 0.101), (81, 0.019), (86, 0.052), (91, 0.034), (96, 0.246)]
simIndex simValue paperId paperTitle
same-paper 1 0.79261309 154 nips-2009-Modeling the spacing effect in sequential category learning
Author: Hongjing Lu, Matthew Weiden, Alan L. Yuille
Abstract: We develop a Bayesian sequential model for category learning. The sequential model updates two category parameters, the mean and the variance, over time. We define conjugate temporal priors to enable closed form solutions to be obtained. This model can be easily extended to supervised and unsupervised learning involving multiple categories. To model the spacing effect, we introduce a generic prior in the temporal updating stage to capture a learning preference, namely, less change for repetition and more change for variation. Finally, we show how this approach can be generalized to efficiently perform model selection to decide whether observations are from one or multiple categories.
2 0.75495845 146 nips-2009-Manifold Regularization for SIR with Rate Root-n Convergence
Author: Wei Bian, Dacheng Tao
Abstract: In this paper, we study the manifold regularization for the Sliced Inverse Regression (SIR). The manifold regularization improves the standard SIR in two aspects: 1) it encodes the local geometry for SIR and 2) it enables SIR to deal with transductive and semi-supervised learning problems. We prove that the proposed graph Laplacian based regularization is convergent at rate root-n. The projection directions of the regularized SIR are optimized by using a conjugate gradient method on the Grassmann manifold. Experimental results support our theory.
3 0.7351048 23 nips-2009-Accelerating Bayesian Structural Inference for Non-Decomposable Gaussian Graphical Models
Author: Baback Moghaddam, Emtiyaz Khan, Kevin P. Murphy, Benjamin M. Marlin
Abstract: We make several contributions in accelerating approximate Bayesian structural inference for non-decomposable GGMs. Our first contribution is to show how to efficiently compute a BIC or Laplace approximation to the marginal likelihood of non-decomposable graphs using convex methods for precision matrix estimation. This optimization technique can be used as a fast scoring function inside standard Stochastic Local Search (SLS) for generating posterior samples. Our second contribution is a novel framework for efficiently generating large sets of high-quality graph topologies without performing local search. This graph proposal method, which we call “Neighborhood Fusion” (NF), samples candidate Markov blankets at each node using sparse regression techniques. Our third contribution is a hybrid method combining the complementary strengths of NF and SLS. Experimental results in structural recovery and prediction tasks demonstrate that NF and hybrid NF/SLS out-perform state-of-the-art local search methods, on both synthetic and real-world datasets, when realistic computational limits are imposed.
4 0.73311222 62 nips-2009-Correlation Coefficients are Insufficient for Analyzing Spike Count Dependencies
Author: Arno Onken, Steffen Grünewälder, Klaus Obermayer
Abstract: The linear correlation coefficient is typically used to characterize and analyze dependencies of neural spike counts. Here, we show that the correlation coefficient is in general insufficient to characterize these dependencies. We construct two neuron spike count models with Poisson-like marginals and vary their dependence structure using copulas. To this end, we construct a copula that allows to keep the spike counts uncorrelated while varying their dependence strength. Moreover, we employ a network of leaky integrate-and-fire neurons to investigate whether weakly correlated spike counts with strong dependencies are likely to occur in real networks. We find that the entropy of uncorrelated but dependent spike count distributions can deviate from the corresponding distribution with independent components by more than 25 % and that weakly correlated but strongly dependent spike counts are very likely to occur in biological networks. Finally, we introduce a test for deciding whether the dependence structure of distributions with Poissonlike marginals is well characterized by the linear correlation coefficient and verify it for different copula-based models. 1
5 0.62973893 94 nips-2009-Fast Learning from Non-i.i.d. Observations
Author: Ingo Steinwart, Andreas Christmann
Abstract: We prove an oracle inequality for generic regularized empirical risk minimization algorithms learning from α-mixing processes. To illustrate this oracle inequality, we use it to derive learning rates for some learning methods including least squares SVMs. Since the proof of the oracle inequality uses recent localization ideas developed for independent and identically distributed (i.i.d.) processes, it turns out that these learning rates are close to the optimal rates known in the i.i.d. case. 1
6 0.61825955 110 nips-2009-Hierarchical Mixture of Classification Experts Uncovers Interactions between Brain Regions
7 0.6059283 251 nips-2009-Unsupervised Detection of Regions of Interest Using Iterative Link Analysis
8 0.60369229 54 nips-2009-Compositionality of optimal control laws
9 0.60271049 102 nips-2009-Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models
10 0.59843892 40 nips-2009-Bayesian Nonparametric Models on Decomposable Graphs
11 0.59300184 188 nips-2009-Perceptual Multistability as Markov Chain Monte Carlo Inference
12 0.59067374 155 nips-2009-Modelling Relational Data using Bayesian Clustered Tensor Factorization
13 0.58971584 226 nips-2009-Spatial Normalized Gamma Processes
14 0.58692557 44 nips-2009-Beyond Categories: The Visual Memex Model for Reasoning About Object Relationships
15 0.58311224 109 nips-2009-Hierarchical Learning of Dimensional Biases in Human Categorization
16 0.58025867 115 nips-2009-Individuation, Identification and Object Discovery
17 0.58024365 112 nips-2009-Human Rademacher Complexity
18 0.57813984 133 nips-2009-Learning models of object structure
19 0.57520199 204 nips-2009-Replicated Softmax: an Undirected Topic Model
20 0.57497132 21 nips-2009-Abstraction and Relational learning