nips nips2009 nips2009-154 knowledge-graph by maker-knowledge-mining

154 nips-2009-Modeling the spacing effect in sequential category learning

Source: pdf

Author: Hongjing Lu, Matthew Weiden, Alan L. Yuille

Abstract: We develop a Bayesian sequential model for category learning. The sequential model updates two category parameters, the mean and the variance, over time. We deﬁne conjugate temporal priors to enable closed form solutions to be obtained. This model can be easily extended to supervised and unsupervised learning involving multiple categories. To model the spacing effect, we introduce a generic prior in the temporal updating stage to capture a learning preference, namely, less change for repetition and more change for variation. Finally, we show how this approach can be generalized to efﬁciently perform model selection to decide whether observations are from one or multiple categories.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Modeling the spacing effect in sequential category learning Hongjing Lu Department of Psychology & Statistics Hongjing@ucla. [sent-1, score-0.738]

2 edu Abstract We develop a Bayesian sequential model for category learning. [sent-5, score-0.574]

3 The sequential model updates two category parameters, the mean and the variance, over time. [sent-6, score-0.574]

4 We deﬁne conjugate temporal priors to enable closed form solutions to be obtained. [sent-7, score-0.181]

5 This model can be easily extended to supervised and unsupervised learning involving multiple categories. [sent-8, score-0.151]

6 To model the spacing effect, we introduce a generic prior in the temporal updating stage to capture a learning preference, namely, less change for repetition and more change for variation. [sent-9, score-0.416]

7 1 Introduction Inductive learning the process by which a new concept or category is acquired through observation of exemplars - poses a fundamental theoretical problem for cognitive science. [sent-11, score-0.586]

8 One pervasive phenomenon is the spacing effect, manifested in the ﬁnding that given a ﬁxed amount of total study time with a given item, learning is facilitated when presentations of the item are spread across a longer time interval rather than massed into a continuous study period. [sent-13, score-0.624]

9 In category learning, for example, exemplars of two categories can be spaced by presenting them in an interleaved manner (e. [sent-14, score-0.992]

10 , A1 B1 A2 B2 A3 B3 ), or massed by presenting them in consecutive blocks (e. [sent-16, score-0.484]

11 Kornell & Bjork [1] show that when tested later on classiﬁcation of novel category members, spaced presentation yields superior performance relative to massed presentation. [sent-19, score-1.257]

12 Similar spacing effects have been obtained in studies of item learning [2] and motor learning [3]. [sent-20, score-0.213]

13 Moreover, spacing effects are found not only in human learning, but also in various types of learning in other species, including rats and Aplysia [4][5]. [sent-21, score-0.304]

14 In the present paper we will focus on spacing effects in the context of sequential category learning. [sent-22, score-0.734]

15 Standard statistical methods based on summary information are unable to deal with order effects, including the performance difference between spaced and massed conditions. [sent-23, score-0.718]

16 From a computational perspective, a sequential learning model is needed to construct category representations from training examples and dynamically update parameters of these representations from trial to trial. [sent-24, score-0.796]

17 Bayesian sequential models have been successfully applied to model causal learning and animal conditioning [6] [7]. [sent-25, score-0.155]

18 1 However, given that both the mean and the variance of a category are random variables, standard Kalman ﬁltering [9] is not directly applicable in this case since it assumes a known variance, which is not warranted in the current application. [sent-27, score-0.507]

19 In this paper, we extend traditional Kalman ﬁltering in order to update two category parameters, the mean and the variance, over time in the context of category learning. [sent-28, score-0.996]

20 We deﬁne conjugate temporal priors to enable closed form solutions to be obtained in this learning model with two unknown parameters. [sent-29, score-0.23]

21 We will illustrate how the learning model can be easily extended to learning situations involving multiple categories either with supervision (i. [sent-30, score-0.222]

22 , learners are informed of category membership for each training observation) or without supervision (i. [sent-32, score-0.758]

23 , category membership of each training observation is not provided to learners). [sent-34, score-0.65]

24 To model the spacing effect, we introduce a generic prior in the temporal updating stage. [sent-37, score-0.335]

25 In Section 2 we introduce the Bayesian sequential learning framework in the context of category learning, and discuss the conjugacy property of the model. [sent-40, score-0.57]

26 Section 3 and 4 demonstrate how to develop supervised and unsupervised learning models, which can be compared with human performance. [sent-41, score-0.163]

27 2 Bayesian sequential model We adopt the framework of Bayesian sequential learning [11], termed Bayes-Kalman, a probabilistic model in which learning is assumed to be a Markov process with unobserved states. [sent-43, score-0.198]

28 The exemplars in training are directly observable, but the representations of categories are hidden and unobservable. [sent-44, score-0.263]

29 In this paper, we assume that categories can be represented as Gaussian distributions with two unknown parameters, means and variances. [sent-45, score-0.176]

30 We now state the general framework and give the update rule for the simplest situation where the training data is generated by a single category speciﬁed by a mean m and precision r – the precision is the inverse of the variance and is used to simplify the algebra. [sent-49, score-0.653]

31 Our model assumes that the mean can change over time and is denoted by mt , where t is the time step. [sent-50, score-0.365]

32 The model is speciﬁed by the prior distribution P (m0 , r), the likelihood function P (x|mt , r) for generating the observations, and the temporal prior P (mt+1 |mt ) specifying how mt can vary over time. [sent-51, score-0.444]

34 P (xt+1 |Xt ) (2) Intuitively, the Bayes-Kalman ﬁrst predicts the distribution P (mt+1 , r|Xt ) and then uses this as a prior to correct for the new observation xt+1 and determine the new posterior P (mt+1 , r|Xt+1 ). [sent-58, score-0.147]

35 As shown in the following section, this reduces the Bayes-Kalman equations to closed form update rules for the parameters of the dis2 tributions. [sent-62, score-0.177]

36 The likelihood function and temporal prior are both Gaussians: P (xt |mt , r) = G(xt : mt , ζr), P (mt+1 |mt ) = G(mt+1 : mt , γr), (6) where ζ, γ are constants. [sent-69, score-0.708]

37 The conjugacy of the distributions ensures that the posterior distribution P (mt , r|Xt ) will also be a Gamma-Gaussian distribution with parameters µt , τt , αt , βt , where the update rules for these parameters are speciﬁed in the next section. [sent-70, score-0.197]

38 2 Update rules for the model parameters The update rules for the model parameters follow from substituting the distributions into the BayesKalman equations 1, 2. [sent-72, score-0.276]

39 2π ζ + τt Γ(αt ) (10) 3 Supervised category learning Although the learning model is presented for one category, it can easily be extended to learning multiple categories with known category membership for training data (i. [sent-86, score-1.227]

40 In this section, we will ﬁrst describe an experiment with two categories to show how the category representations change over time; then we will simulate learning with six categories and compare predictions with human data in psychological experiments. [sent-89, score-0.877]

41 1 Two-category learning with supervision We ﬁrst conduct a synthetic experiment with two categories under supervision. [sent-91, score-0.195]

42 We generate six training observations from one of two one-dimensional Gaussian distributions (representing categories A and B, respectively) with means [−0. [sent-92, score-0.285]

43 Two training conditions are included, a massed condition with the data presentation order of AAABBB and a spaced condition with the order of ABABAB. [sent-96, score-0.882]

44 To model the acquisition of category representations during training, we employ the Bayesian learning model as described in the previous section. [sent-97, score-0.562]

45 In the correction stage of each trial, the model updates the parameters corresponding to the category that produced the observation based on the supervision (i. [sent-98, score-0.698]

46 In the prediction stage, however, different values of a ﬁxed model parameter γ are introduced to incorporate a generic prior that controls how much the learner is willing to update category representations from one trial to the next. [sent-101, score-0.755]

47 The basic hypothesis is that learners will have greater conﬁdence in knowledge of a category presented on trial t than of a category absent on trial t. [sent-102, score-1.123]

48 As a consequence, the learner will be willing to accept more change in a category representation if the observation on the previous trial was drawn from a different category. [sent-103, score-0.629]

49 More speciﬁcally, if the observation on trial t is from the ﬁrst category, in the prediction phase we will update the τt parameters for the two categories, τt 1 , τt 2 , as: τt 1 → τt 1 γs , τt 1 + γs τt 2 → τt 2 γd , τt 2 + γd (11) in which γs > γd . [sent-108, score-0.2]

50 Blue lines indicate category parameters in the ﬁrst category; and red lines indicate parameters in the second category. [sent-111, score-0.475]

51 The top panel shows the results for the massed condition (i. [sent-112, score-0.464]

52 , AAABBB), and the bottom panel shows the results for the spaced condition (i. [sent-114, score-0.31]

53 Figure (1) shows the change of posterior distributions of the two unknown category parameters, means P (mt |Xt ) and precisions P (rt |Xt ), over training trials. [sent-121, score-0.677]

54 Figure (2) shows the category representation in the form of the posterior distribution of P (xt |Xt ). [sent-122, score-0.523]

55 The increase of category variance reﬂects the forgetting that occurs if no new observations are provided for a particular category after a long interval. [sent-126, score-1.06]

56 This type of forgetting does not occur in the spaced condition, as the interleaved presentation order ABABAB ensured that each category recurs after a short interval. [sent-127, score-0.885]

57 Based upon the learned category representations, we can compute accuracy (the ability to discriminate between the two learnt distributions) using the posterior distributions of the two categories. [sent-128, score-0.581]

58 After 100 simulations, the average accuracy in the massed condition is 0. [sent-129, score-0.485]

59 Thus our model is able to predict the spacing effect found in two-category supervised learning. [sent-132, score-0.279]

60 2 Modeling the spacing effect in six-category learning Kornell and Bjork [1] asked human subjects to study six paintings by six different artists, with a given artists paintings presented consecutively (massed) or interleaved with other artists paintings (spaced). [sent-138, score-0.786]

61 In the training phase, subjects were informed which artist created each training painting. [sent-139, score-0.208]

62 The same 36 paintings were studied in the training phase, but with different presentation orders in the massed and spaced conditions. [sent-140, score-0.933]

63 In the subsequent test phase, six new paintings (one from each artist) were presented and subjects had to identify which artist painted each of a series of new paintings. [sent-141, score-0.262]

64 Human subjects showed signiﬁcantly better test performance after spaced than massed training. [sent-146, score-0.772]

65 Given that feedback was provided and one painting from each artist appeared in one test block, it is not surprising that test performance increased across test blocks and the spacing effect decreased with more test blocks. [sent-147, score-0.364]

66 To simulate the data, we generated training and test data from six one-dimensional Gaussian distributions with means [−2, −1. [sent-148, score-0.147]

67 Figure (3) shows the learned category representations in terms of posterior distributions. [sent-154, score-0.556]

68 To compare with human performance reported by Kornell and Bjork, the model estimates accuracy in terms of discrimination between the two categories based upon learned distributions. [sent-156, score-0.204]

69 4 Unsupervised category learning Both humans and animals can learn without supervision. [sent-159, score-0.475]

70 For example, in the animal conditioning literature, various studies have shown that exposing two stimuli in blocks (equivalent to a massed condition) is less effective in producing generalization [12]. [sent-160, score-0.551]

71 They conclude that in the massed preexposure the rats are unable to distinguish two separate categories for A and B, and therefore treat them as members of a single category. [sent-181, score-0.702]

72 By contrast, they conclude that rats can distinguish the categories A and B in the spaced preexposure. [sent-182, score-0.477]

73 In this section, we generalize the sequential category model to unsupervised learning, when the category membership of each training example is not provided to observers. [sent-183, score-1.245]

74 Then we determine whether massed and spaced stimuli (as in Balleine et. [sent-185, score-0.749]

75 ’s experiment [4]) are most likely to have been generated by a single category or by two categories. [sent-187, score-0.475]

76 We also assess the importance of supervision in training by comparing performance after unsupervised learning with that after supervised learning. [sent-188, score-0.246]

77 Each category can be represented as a Gaussian distribution with a mean and precision m1 , r1 and m2 , r2 . [sent-190, score-0.503]

78 The likelihood function assumes that the data is generated by either category with equal probability, since the category membership is not provided, 1 1 P (x|m1 , r1 ) + P (x|m2 , r2 ), 2 2 with P (x|m1 , r1 ) = G(x : m1 , ζr1 ), P (x|m2 , r2 ) = G(x : m2 , ζr2 ). [sent-191, score-1.039]

79 t (14) (15) The joint posterior distribution P (m1 , r1 , m2 , r2 |Xt ) after observations Xt can be formally obt t tained by applying the Bayes-Kalman update rules to the joint distribution – i. [sent-193, score-0.18]

80 But this update is more complicated because we do not know t t whether the new observation xt should be assigned to category 1 or category 2. [sent-196, score-1.419]

81 Instead we have to sum over all the possible assignments of the observations to the categories which gives 2t possible assignments at time t. [sent-197, score-0.21]

82 , 1) is the assignment where all the observations are assigned to category 1, (2, 1, . [sent-205, score-0.518]

83 , 1) assigns the ﬁrst observation to category 2 and the remainder to category 1, and so on. [sent-208, score-0.992]

84 ,at ) (a where denotes the values of the parameters α = (α, β, µ, τ ) for category i (i ∈ {1, 2}) for observation sequence (a1 , . [sent-225, score-0.517]

85 ’s preexposure experiments [4] – why do rats identify a single category for the massed stimuli but two categories for the spaced stimuli? [sent-353, score-1.49]

86 We compare the evidence for the sequential model with one category, see equations (9,10), versus the evidence for the model with two categories, see equations (9,22), for the two cases AAABBB (massed) and ABABAB (spaced). [sent-355, score-0.288]

87 1) but without providing category membership for any of the training data. [sent-357, score-0.608]

88 As shown in ﬁgure (5), the model decides that all training observations are from one category in the massed condition, but from two different categories in the spaced condition (using zero as the decision threshold). [sent-360, score-1.452]

89 Left, model selection results as a function of presentation training conditions (massed and spaced). [sent-372, score-0.157]

90 To make the comparison, we assume that learners are provided with the same training data and are informed that the data are from two different categories, either with known category membership (supervised) or unknown category membership (unsupervised) for each training observation. [sent-382, score-1.31]

91 The model predicts higher accuracy given supervised than unsupervised learning. [sent-384, score-0.204]

92 Furthermore, the model predicts a spacing effect for both types of learning, although the effect is reduced with unsupervised learning. [sent-385, score-0.342]

93 5 Conclusions In this paper, we develop a Bayesian sequential model for category learning by updating category representations over time based on two category parameters, the mean and the variance. [sent-386, score-1.593]

94 Analytic updating rules are obtained by deﬁning conjugate temporal priors to enable closed form solutions. [sent-387, score-0.26]

95 A generic prior in the temporal updating stage is introduced to model the spacing effect. [sent-388, score-0.372]

96 In addition to explaining the spacing effect, our model predicts that subjects will become less certain about their knowledge of learned categories as time passes, see the increase in category variance in Figure 2. [sent-391, score-0.877]

97 Instead, as shown in Equation 10, our model predicts the pattern of power-law forgetting that is fairly universal in human memory [14] For small number of observations, our model is extremely efﬁcient because we can derive analytic solutions. [sent-393, score-0.21]

98 Learning concepts and categories: Is spacing the ”enemy of induction”? [sent-403, score-0.162]

99 Maintenance of foreign language vocabulary and the spacing effect. [sent-414, score-0.162]

100 Induction of category distributions: A framework for classiﬁcation learning. [sent-467, score-0.475]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('category', 0.475), ('massed', 0.436), ('xt', 0.381), ('mt', 0.316), ('spaced', 0.282), ('spacing', 0.162), ('categories', 0.117), ('paintings', 0.107), ('balleine', 0.089), ('membership', 0.089), ('tp', 0.084), ('supervision', 0.078), ('rats', 0.078), ('sequential', 0.072), ('aaabbb', 0.071), ('bahrick', 0.071), ('bjork', 0.071), ('kornell', 0.071), ('preexposure', 0.071), ('kalman', 0.069), ('exemplars', 0.069), ('trial', 0.066), ('presentation', 0.064), ('unsupervised', 0.063), ('supervised', 0.061), ('artist', 0.057), ('ababab', 0.053), ('equations', 0.053), ('psychology', 0.051), ('temporal', 0.051), ('posterior', 0.048), ('update', 0.046), ('six', 0.044), ('training', 0.044), ('artists', 0.043), ('rules', 0.043), ('observations', 0.043), ('observation', 0.042), ('learners', 0.041), ('conjugate', 0.04), ('correction', 0.039), ('human', 0.039), ('stage', 0.037), ('distributions', 0.037), ('updating', 0.036), ('conventions', 0.036), ('dmt', 0.036), ('hongjing', 0.036), ('closed', 0.035), ('forgetting', 0.035), ('trials', 0.035), ('generic', 0.034), ('ltering', 0.034), ('representations', 0.033), ('variance', 0.032), ('subjects', 0.032), ('predicts', 0.032), ('informed', 0.031), ('stimuli', 0.031), ('blocking', 0.031), ('angeles', 0.031), ('psychological', 0.03), ('analytic', 0.03), ('animal', 0.03), ('effect', 0.029), ('interleaved', 0.029), ('precisions', 0.029), ('priors', 0.028), ('condition', 0.028), ('evidence', 0.028), ('precision', 0.028), ('blocks', 0.028), ('enable', 0.027), ('fusion', 0.027), ('kording', 0.027), ('model', 0.027), ('item', 0.026), ('conditioning', 0.026), ('gure', 0.026), ('effects', 0.025), ('prior', 0.025), ('assignments', 0.025), ('los', 0.025), ('yuille', 0.025), ('prediction', 0.025), ('willing', 0.024), ('rat', 0.023), ('conjugacy', 0.023), ('block', 0.023), ('change', 0.022), ('unknown', 0.022), ('selection', 0.022), ('test', 0.022), ('accuracy', 0.021), ('phase', 0.021), ('dayan', 0.021), ('bayesian', 0.021), ('presenting', 0.02), ('memory', 0.02), ('proportion', 0.02)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999994 154 nips-2009-Modeling the spacing effect in sequential category learning

Author: Hongjing Lu, Matthew Weiden, Alan L. Yuille

2 0.1886515 116 nips-2009-Information-theoretic lower bounds on the oracle complexity of convex optimization

Author: Alekh Agarwal, Martin J. Wainwright, Peter L. Bartlett, Pradeep K. Ravikumar

Abstract: Despite a large literature on upper bounds on complexity of convex optimization, relatively less attention has been paid to the fundamental hardness of these problems. Given the extensive use of convex optimization in machine learning and statistics, gaining a understanding of these complexity-theoretic issues is important. In this paper, we study the complexity of stochastic convex optimization in an oracle model of computation. We improve upon known results and obtain tight minimax complexity estimates for various function classes. We also discuss implications of these results for the understanding the inherent complexity of large-scale learning and estimation problems. 1

3 0.18805781 215 nips-2009-Sensitivity analysis in HMMs with application to likelihood maximization

Author: Pierre-arnaud Coquelin, Romain Deguest, Rémi Munos

Abstract: This paper considers a sensitivity analysis in Hidden Markov Models with continuous state and observation spaces. We propose an Inﬁnitesimal Perturbation Analysis (IPA) on the ﬁltering distribution with respect to some parameters of the model. We describe a methodology for using any algorithm that estimates the ﬁltering density, such as Sequential Monte Carlo methods, to design an algorithm that estimates its gradient. The resulting IPA estimator is proven to be asymptotically unbiased, consistent and has computational complexity linear in the number of particles. We consider an application of this analysis to the problem of identifying unknown parameters of the model given a sequence of observations. We derive an IPA estimator for the gradient of the log-likelihood, which may be used in a gradient method for the purpose of likelihood maximization. We illustrate the method with several numerical experiments.

4 0.18735084 133 nips-2009-Learning models of object structure

Author: Joseph Schlecht, Kobus Barnard

Abstract: We present an approach for learning stochastic geometric models of object categories from single view images. We focus here on models expressible as a spatially contiguous assemblage of blocks. Model topologies are learned across groups of images, and one or more such topologies is linked to an object category (e.g. chairs). Fitting learned topologies to an image can be used to identify the object class, as well as detail its geometry. The latter goes beyond labeling objects, as it provides the geometric structure of particular instances. We learn the models using joint statistical inference over category parameters, camera parameters, and instance parameters. These produce an image likelihood through a statistical imaging model. We use trans-dimensional sampling to explore topology hypotheses, and alternate between Metropolis-Hastings and stochastic dynamics to explore instance parameters. Experiments on images of furniture objects such as tables and chairs suggest that this is an effective approach for learning models that encode simple representations of category geometry and the statistics thereof, and support inferring both category and geometry on held out single view images. 1

5 0.1775393 178 nips-2009-On Stochastic and Worst-case Models for Investing

Author: Elad Hazan, Satyen Kale

Abstract: In practice, most investing is done assuming a probabilistic model of stock price returns known as the Geometric Brownian Motion (GBM). While often an acceptable approximation, the GBM model is not always valid empirically. This motivates a worst-case approach to investing, called universal portfolio management, where the objective is to maximize wealth relative to the wealth earned by the best ﬁxed portfolio in hindsight. In this paper we tie the two approaches, and design an investment strategy which is universal in the worst-case, and yet capable of exploiting the mostly valid GBM model. Our method is based on new and improved regret bounds for online convex optimization with exp-concave loss functions. 1

6 0.16398905 109 nips-2009-Hierarchical Learning of Dimensional Biases in Human Categorization

7 0.12064764 27 nips-2009-Adaptive Regularization of Weight Vectors

8 0.099731222 194 nips-2009-Predicting the Optimal Spacing of Study: A Multiscale Context Model of Memory

9 0.099710561 22 nips-2009-Accelerated Gradient Methods for Stochastic Optimization and Online Learning

10 0.097283207 202 nips-2009-Regularized Distance Metric Learning:Theory and Algorithm

11 0.086925723 101 nips-2009-Generalization Errors and Learning Curves for Regression with Multi-task Gaussian Processes

12 0.086614639 21 nips-2009-Abstraction and Relational learning

13 0.084568307 228 nips-2009-Speeding up Magnetic Resonance Image Acquisition by Bayesian Multi-Slice Adaptive Compressed Sensing

14 0.083920009 44 nips-2009-Beyond Categories: The Visual Memex Model for Reasoning About Object Relationships

15 0.077110581 246 nips-2009-Time-Varying Dynamic Bayesian Networks

16 0.070457205 177 nips-2009-On Learning Rotations

17 0.069822878 85 nips-2009-Explaining human multiple object tracking as resource-constrained approximate inference in a dynamic probabilistic model

18 0.069594897 220 nips-2009-Slow Learners are Fast

19 0.065969966 11 nips-2009-A General Projection Property for Distribution Families

20 0.064423934 13 nips-2009-A Neural Implementation of the Kalman Filter

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.172), (1, 0.015), (2, 0.068), (3, -0.091), (4, 0.246), (5, 0.178), (6, 0.084), (7, 0.082), (8, -0.032), (9, -0.033), (10, 0.137), (11, 0.001), (12, 0.019), (13, -0.16), (14, 0.117), (15, -0.059), (16, -0.123), (17, 0.136), (18, -0.061), (19, 0.051), (20, 0.038), (21, -0.044), (22, 0.056), (23, 0.199), (24, -0.062), (25, -0.102), (26, 0.016), (27, -0.007), (28, -0.08), (29, -0.06), (30, 0.003), (31, -0.006), (32, 0.058), (33, 0.015), (34, -0.05), (35, 0.093), (36, -0.076), (37, 0.076), (38, 0.003), (39, -0.062), (40, 0.046), (41, 0.003), (42, 0.005), (43, 0.061), (44, 0.012), (45, 0.0), (46, 0.043), (47, 0.102), (48, 0.045), (49, -0.049)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97716564 154 nips-2009-Modeling the spacing effect in sequential category learning

Author: Hongjing Lu, Matthew Weiden, Alan L. Yuille

2 0.71415043 215 nips-2009-Sensitivity analysis in HMMs with application to likelihood maximization

Author: Pierre-arnaud Coquelin, Romain Deguest, Rémi Munos

3 0.6441105 178 nips-2009-On Stochastic and Worst-case Models for Investing

Author: Elad Hazan, Satyen Kale

4 0.62551576 27 nips-2009-Adaptive Regularization of Weight Vectors

Author: Koby Crammer, Alex Kulesza, Mark Dredze

Abstract: We present AROW, a new online learning algorithm that combines several useful properties: large margin training, conﬁdence weighting, and the capacity to handle non-separable data. AROW performs adaptive regularization of the prediction function upon seeing each new instance, allowing it to perform especially well in the presence of label noise. We derive a mistake bound, similar in form to the second order perceptron bound, that does not assume separability. We also relate our algorithm to recent conﬁdence-weighted online learning techniques and show empirically that AROW achieves state-of-the-art performance and notable robustness in the case of non-separable data. 1

5 0.58943552 109 nips-2009-Hierarchical Learning of Dimensional Biases in Human Categorization

Author: Adam Sanborn, Nick Chater, Katherine A. Heller

Abstract: Existing models of categorization typically represent to-be-classiﬁed items as points in a multidimensional space. While from a mathematical point of view, an inﬁnite number of basis sets can be used to represent points in this space, the choice of basis set is psychologically crucial. People generally choose the same basis dimensions – and have a strong preference to generalize along the axes of these dimensions, but not “diagonally”. What makes some choices of dimension special? We explore the idea that the dimensions used by people echo the natural variation in the environment. Speciﬁcally, we present a rational model that does not assume dimensions, but learns the same type of dimensional generalizations that people display. This bias is shaped by exposing the model to many categories with a structure hypothesized to be like those which children encounter. The learning behaviour of the model captures the developmental shift from roughly “isotropic” for children to the axis-aligned generalization that adults show. 1

6 0.49156594 85 nips-2009-Explaining human multiple object tracking as resource-constrained approximate inference in a dynamic probabilistic model

7 0.48290831 115 nips-2009-Individuation, Identification and Object Discovery

8 0.48026496 116 nips-2009-Information-theoretic lower bounds on the oracle complexity of convex optimization

9 0.46915722 11 nips-2009-A General Projection Property for Distribution Families

10 0.45691362 194 nips-2009-Predicting the Optimal Spacing of Study: A Multiscale Context Model of Memory

11 0.43662837 21 nips-2009-Abstraction and Relational learning

12 0.43645152 44 nips-2009-Beyond Categories: The Visual Memex Model for Reasoning About Object Relationships

13 0.42948005 25 nips-2009-Adaptive Design Optimization in Experiments with People

14 0.42916796 177 nips-2009-On Learning Rotations

15 0.42873713 22 nips-2009-Accelerated Gradient Methods for Stochastic Optimization and Online Learning

16 0.42426777 133 nips-2009-Learning models of object structure

17 0.40837583 216 nips-2009-Sequential effects reflect parallel learning of multiple environmental regularities

18 0.40130675 152 nips-2009-Measuring model complexity with the prior predictive

19 0.39645809 101 nips-2009-Generalization Errors and Learning Curves for Regression with Multi-task Gaussian Processes

20 0.38826376 112 nips-2009-Human Rademacher Complexity

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(24, 0.023), (25, 0.065), (35, 0.043), (36, 0.074), (39, 0.134), (58, 0.063), (61, 0.019), (62, 0.011), (71, 0.101), (81, 0.019), (86, 0.052), (91, 0.034), (96, 0.246)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.79261309 154 nips-2009-Modeling the spacing effect in sequential category learning

Author: Hongjing Lu, Matthew Weiden, Alan L. Yuille

2 0.75495845 146 nips-2009-Manifold Regularization for SIR with Rate Root-n Convergence

Author: Wei Bian, Dacheng Tao

Abstract: In this paper, we study the manifold regularization for the Sliced Inverse Regression (SIR). The manifold regularization improves the standard SIR in two aspects: 1) it encodes the local geometry for SIR and 2) it enables SIR to deal with transductive and semi-supervised learning problems. We prove that the proposed graph Laplacian based regularization is convergent at rate root-n. The projection directions of the regularized SIR are optimized by using a conjugate gradient method on the Grassmann manifold. Experimental results support our theory.

3 0.7351048 23 nips-2009-Accelerating Bayesian Structural Inference for Non-Decomposable Gaussian Graphical Models

Author: Baback Moghaddam, Emtiyaz Khan, Kevin P. Murphy, Benjamin M. Marlin

Abstract: We make several contributions in accelerating approximate Bayesian structural inference for non-decomposable GGMs. Our ﬁrst contribution is to show how to efﬁciently compute a BIC or Laplace approximation to the marginal likelihood of non-decomposable graphs using convex methods for precision matrix estimation. This optimization technique can be used as a fast scoring function inside standard Stochastic Local Search (SLS) for generating posterior samples. Our second contribution is a novel framework for efﬁciently generating large sets of high-quality graph topologies without performing local search. This graph proposal method, which we call “Neighborhood Fusion” (NF), samples candidate Markov blankets at each node using sparse regression techniques. Our third contribution is a hybrid method combining the complementary strengths of NF and SLS. Experimental results in structural recovery and prediction tasks demonstrate that NF and hybrid NF/SLS out-perform state-of-the-art local search methods, on both synthetic and real-world datasets, when realistic computational limits are imposed.

4 0.73311222 62 nips-2009-Correlation Coefficients are Insufficient for Analyzing Spike Count Dependencies

Author: Arno Onken, Steffen Grünewälder, Klaus Obermayer

Abstract: The linear correlation coefﬁcient is typically used to characterize and analyze dependencies of neural spike counts. Here, we show that the correlation coefﬁcient is in general insufﬁcient to characterize these dependencies. We construct two neuron spike count models with Poisson-like marginals and vary their dependence structure using copulas. To this end, we construct a copula that allows to keep the spike counts uncorrelated while varying their dependence strength. Moreover, we employ a network of leaky integrate-and-ﬁre neurons to investigate whether weakly correlated spike counts with strong dependencies are likely to occur in real networks. We ﬁnd that the entropy of uncorrelated but dependent spike count distributions can deviate from the corresponding distribution with independent components by more than 25 % and that weakly correlated but strongly dependent spike counts are very likely to occur in biological networks. Finally, we introduce a test for deciding whether the dependence structure of distributions with Poissonlike marginals is well characterized by the linear correlation coefﬁcient and verify it for different copula-based models. 1

5 0.62973893 94 nips-2009-Fast Learning from Non-i.i.d. Observations

Author: Ingo Steinwart, Andreas Christmann

Abstract: We prove an oracle inequality for generic regularized empirical risk minimization algorithms learning from α-mixing processes. To illustrate this oracle inequality, we use it to derive learning rates for some learning methods including least squares SVMs. Since the proof of the oracle inequality uses recent localization ideas developed for independent and identically distributed (i.i.d.) processes, it turns out that these learning rates are close to the optimal rates known in the i.i.d. case. 1

6 0.61825955 110 nips-2009-Hierarchical Mixture of Classification Experts Uncovers Interactions between Brain Regions

7 0.6059283 251 nips-2009-Unsupervised Detection of Regions of Interest Using Iterative Link Analysis

8 0.60369229 54 nips-2009-Compositionality of optimal control laws

9 0.60271049 102 nips-2009-Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models

10 0.59843892 40 nips-2009-Bayesian Nonparametric Models on Decomposable Graphs

11 0.59300184 188 nips-2009-Perceptual Multistability as Markov Chain Monte Carlo Inference

12 0.59067374 155 nips-2009-Modelling Relational Data using Bayesian Clustered Tensor Factorization

13 0.58971584 226 nips-2009-Spatial Normalized Gamma Processes

14 0.58692557 44 nips-2009-Beyond Categories: The Visual Memex Model for Reasoning About Object Relationships

15 0.58311224 109 nips-2009-Hierarchical Learning of Dimensional Biases in Human Categorization

16 0.58025867 115 nips-2009-Individuation, Identification and Object Discovery

17 0.58024365 112 nips-2009-Human Rademacher Complexity

18 0.57813984 133 nips-2009-Learning models of object structure

19 0.57520199 204 nips-2009-Replicated Softmax: an Undirected Topic Model

20 0.57497132 21 nips-2009-Abstraction and Relational learning