nips nips2006 nips2006-71 knowledge-graph by maker-knowledge-mining

71 nips-2006-Effects of Stress and Genotype on Meta-parameter Dynamics in Reinforcement Learning


Source: pdf

Author: Gediminas Lukšys, Jérémie Knüsel, Denis Sheynikhovich, Carmen Sandi, Wulfram Gerstner

Abstract: Stress and genetic background regulate different aspects of behavioral learning through the action of stress hormones and neuromodulators. In reinforcement learning (RL) models, meta-parameters such as learning rate, future reward discount factor, and exploitation-exploration factor, control learning dynamics and performance. They are hypothesized to be related to neuromodulatory levels in the brain. We found that many aspects of animal learning and performance can be described by simple RL models using dynamic control of the meta-parameters. To study the effects of stress and genotype, we carried out 5-hole-box light conditioning and Morris water maze experiments with C57BL/6 and DBA/2 mouse strains. The animals were exposed to different kinds of stress to evaluate its effects on immediate performance as well as on long-term memory. Then, we used RL models to simulate their behavior. For each experimental session, we estimated a set of model meta-parameters that produced the best fit between the model and the animal performance. The dynamics of several estimated meta-parameters were qualitatively similar for the two simulated experiments, and with statistically significant differences between different genetic strains and stress conditions. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 ch 1 Laboratory of Computational Neuroscience 2 Laboratory of Behavioral Genetics Ecole Polytechnique F´ d´ rale de Lausanne e e CH-1015, Switzerland Abstract Stress and genetic background regulate different aspects of behavioral learning through the action of stress hormones and neuromodulators. [sent-11, score-0.706]

2 In reinforcement learning (RL) models, meta-parameters such as learning rate, future reward discount factor, and exploitation-exploration factor, control learning dynamics and performance. [sent-12, score-0.43]

3 We found that many aspects of animal learning and performance can be described by simple RL models using dynamic control of the meta-parameters. [sent-14, score-0.193]

4 To study the effects of stress and genotype, we carried out 5-hole-box light conditioning and Morris water maze experiments with C57BL/6 and DBA/2 mouse strains. [sent-15, score-0.669]

5 The animals were exposed to different kinds of stress to evaluate its effects on immediate performance as well as on long-term memory. [sent-16, score-0.675]

6 For each experimental session, we estimated a set of model meta-parameters that produced the best fit between the model and the animal performance. [sent-18, score-0.275]

7 The dynamics of several estimated meta-parameters were qualitatively similar for the two simulated experiments, and with statistically significant differences between different genetic strains and stress conditions. [sent-19, score-0.659]

8 1 Introduction Animals choose their actions based on reward expectation and motivational drives. [sent-20, score-0.431]

9 Different aspects of learning are known to be influenced by acute stress [1, 2, 3] and genetic background [4, 5]. [sent-21, score-0.469]

10 Stress effects on learning depend on the stress type (eg task-specific or unspecific) and intensity, as well as on the learning paradigm (eg spatial/episodic vs. [sent-22, score-0.419]

11 It is known that stress can affect short- and long-term memory by modulating plasticity through stress hormones and neuromodulators [1, 2, 3, 6]. [sent-24, score-0.904]

12 Although stress factors can be described in quantitative measures, their effects on learning, memory, and performance are strongly influenced by how an animal perceives it. [sent-26, score-0.688]

13 The subjective experience can be influenced by emotional memories as well as by behavioral genetic traits such as anxiety, impulsivity, and novelty reactivity [4, 5, 7]. [sent-27, score-0.264]

14 In the present study, behavioral experiments conducted on two different genetic strains of mice and under different stress conditions were combined with a modeling approach. [sent-28, score-0.957]

15 The expected values of cumulative future reward (Q-values) are learned by observing immediate rewards delivered under different state-action combinations. [sent-31, score-0.377]

16 Their update is controlled by certain meta-parameters such as learning rate, future reward discount factor, and memory decay/interference factor. [sent-32, score-0.448]

17 The Q-values (together with the exploitation/exploration factor) determine what actions are more likely to be chosen when the animal is at a certain state, ie they represent the goal-oriented behavioral strategy learned by the agent. [sent-33, score-0.398]

18 Besides dopamine (DA), whose levels are known to be related to the TD reward prediction error [9], serotonin (5-HT), noradrenaline (NA), and acetylcholine (ACh) were discussed in relation to TDRL meta-parameters [10]. [sent-35, score-0.267]

19 In our study, we carried out 5-hole-box light conditioning and Morris water maze experiments with C57BL/6 and DBA/2 inbred mouse strains (referred to as C57 and DBA from now on), renown for their differences in anxiety, impulsivity, and spatial learning [4, 5, 12]. [sent-38, score-0.393]

20 We exposed subgroups of animals to different kinds of stress (such as motivational stress or task-specific uncertainty) in order to evaluate its effects on immediate performance, and also tested their long-term memory after a break of 4-7 weeks. [sent-39, score-1.329]

21 Then, we used TDRL models to describe the mouse behavior and established a number of performance measures that are relevant to task learning and memory (such as mean response times and latencies to platform) in order to compare the outcome of the model with the animal performance. [sent-40, score-0.455]

22 This approach made it possible to relate the effects of stress and genotype to differences in the meta-parameter values, allowing us to make specific inferences about learning dynamics (generalized over two different experimental paradigms) and their neurobiological correlates. [sent-42, score-0.549]

23 2 Reinforcement learning model of animal behavior In the TDRL framework [8] animal behavior is modelled as a sequence of actions. [sent-43, score-0.418]

24 After an action is performed, the animal is in a new state where it can again choose from a set of possible actions. [sent-44, score-0.285]

25 As soon as state st+1 is reached and a new action is selected, the estimate of the previous state’s value Q(st , at ) is updated based on the reward prediction error δt [8]: δt = rt+1 + γQ(st+1 , at+1 ) − Q(st , at ) , (2) Q(st , at ) ← Q(st , at ) + αδt , (3) where α is the learning rate. [sent-46, score-0.324]

26 The action selection at each state is controlled by the exploitation factor β such that actions with high Q-values are chosen more often if the β is high, whereas random actions are chosen most of the time if the β is close to zero. [sent-47, score-0.442]

27 3 5-hole-box experiment and modeling Experimental subjects were male mice (24 of the C57 strain, and 24 of the DBA strain), 2. [sent-49, score-0.287]

28 During an experimental session, each animal was placed into the 5-hole-box (5HB) (Figure 1a). [sent-51, score-0.224]

29 The animals had to learn to make a nose poke into any of the holes upon the onset of lights and not to make it in the absence of light. [sent-52, score-0.451]

30 After the response to light, the animals received a reward in form of a food pellet. [sent-53, score-0.488]

31 Once a poke was initiated (see starting a poke in Figure 1b), the mouse had to stay in the hole at least for a short time (0. [sent-54, score-0.63]

32 5 sec) in order to find the delivered reward (continuing a poke). [sent-56, score-0.301]

33 Trial ended (lights turned off) as soon as the nose poke was finished. [sent-57, score-0.243]

34 If the mouse did not find the reward, the reward remained in the box and the animal could find it during the next poke in the same box. [sent-58, score-0.777]

35 ITI ITI, staying outside Trial, staying outside ITI, starting a poke Trial, starting a poke Reward (if available) Reward ITI, continuing a poke Trial, continuing a poke Figure 1: a. [sent-71, score-1.24]

36 Open circles are the holes where the food is delivered, filled circles are the lights. [sent-73, score-0.224]

37 After 2 days of habituation, during which the mice learned that food could be delivered in the holes, they underwent 8 consecutive days of training. [sent-78, score-0.625]

38 During days 5-7 subsets of the animals were exposed to different stress conditions: motivational stress (MS, food deprivation to 85-87% of the initial weight vs. [sent-79, score-1.255]

39 88-90% in controls) and uncertainty in the reward delivery (US, in 50% of correct responses they received either none or 2 food pellets). [sent-80, score-0.392]

40 Mice of each strain were divided into 4 stress groups: controls, MS, US, and MS+US. [sent-81, score-0.425]

41 After a break of 26 days the long-term memory of the mice was tested by retraining them for another 8 days. [sent-82, score-0.585]

42 During days 5-8 of the retraining, we again evaluated the impact of stress factors by exposing half of the mice to extrinsic stress (ES, 30 min on an elevated platform right before the 5HB experiment). [sent-83, score-1.525]

43 Actions were chosen according to the soft-max method [8]: p(a|s) = exp(βQ(s, a))/ exp(βQ(s, ak )) , (4) k where k runs over all actions and β is the exploitation factor. [sent-85, score-0.21]

44 43 sec) was constant throughout the experiment and was chosen to fit the animal performance in the beginning of the experiment. [sent-89, score-0.193]

45 Finally, to account for the memory decay after each day all Q(s, a) values were updated as follows: Q(s, a) ← Q(s, a) · (1 − λ) + Q(s, a) where λ is a memory decay/interference factor, and Q(s, a) states and all actions at the end of the day. [sent-90, score-0.482]

46 As opposed to an online ”SARSA”-type update of Q-values, we work with state occupancy probabilities p(st ) and update Q-values with the following reward prediction error: δt = E[rt ] − Q(at , st ) + γ Q(at+1 , st+1 ) · p(at+1 , st+1 |at , st ) . [sent-92, score-0.442]

47 (6) ∀at+1 ,st+1 4 Morris water maze experiment and modeling The same mice as in the 5HB (4. [sent-93, score-0.428]

48 Starting from one of 4 starting positions in the circular pool filled with an opaque liquid they had to learn the location of a hidden escape platform using stable extra-maze cues (Fig. [sent-95, score-0.267]

49 Animals were initially trained for 4 days with 4 sessions a day (to avoid confusion with 5HB, we consider each WM session consisting of only one trial). [sent-97, score-0.347]

50 Half of the mice had to swim in cold water of 19◦ C (motivational stress, MS), while the rest were learning at 26◦ C (control). [sent-100, score-0.368]

51 Finally, after another 2 weeks, the mice performed the task for 5 more days: half of them did a version with uncertainty stress (US), where the platform location was randomly varying between the old position and its rotationally opposite; the other half did the same task as before. [sent-102, score-0.953]

52 Behavior was quantified using the following 4 PMs: time to reach the goal (escape latency), time spent in the target platform quadrant, the opposite platform quadrant, and in the wall region (Fig. [sent-103, score-0.412]

53 2           0 1 0 0 1 0 3 wij ' & ¦ ¦ % $ ¦ § § § § ' & § ¦ % $ ¦ ¦ § % $ ¥ % $ ¥ ¤ ¤ ¥ ¥ ¤ ¡     £ ¢ ¤ ¡   £ ¢ 1 PC   £ ¢ £ ¢ £ ¢ £ ¢ 3 2 ( ) ( 3 2 ( ) ( water pool platform 3 2 3 2 Figure 2: WM experiment and model. [sent-109, score-0.403]

54 1 – target platform quadrant, 2 – opposite platform quadrant, 3 – wall region. [sent-112, score-0.412]

55 Small filled circles mark 4 starting positions, large filled circle marks the target platform, open circle marks the opposite platform (used only in the US condition), pool ∅ = 1. [sent-113, score-0.308]

56 Activities of place cells (PC) encode position of the animal in the WM, activities of action cells encode direction of the next movement. [sent-116, score-0.457]

57 A TDRL paradigm (1)-(3) in continuous state and action spaces has been used to model the mouse behavior in the WM [14, 15]. [sent-117, score-0.201]

58 The position of the animal is represented as a population activity of Npc = 211 ’place cells’ (PC) whose preferred locations are distributed uniformly over the area of a modelled circular arena (Fig. [sent-118, score-0.265]

59 Activity of place cell j is modelled by a Gaussian centered at the preferred location pj of the cell: pc rj = exp(− p − pj 2 2 /2σpc ) , (7) where p is the current position of the modelled animal and σpc = 0. [sent-120, score-0.362]

60 state st ) causes a different activity profile on the level of the action cells depending on the value of the weight vector. [sent-126, score-0.29]

61 The activity of action cell i is considered as the value of the action (defined as a movement in direction φi 4 ): pc wij rj . [sent-127, score-0.389]

62 ac Q(st , at ) = ri = j 4 A constant step length was chosen to fit the average speed of the animals during the experiment (8) The action selection follows -greedy policy, where the optimal action a∗ is chosen with probability β = 1 − and a random action with probability 1 − β. [sent-128, score-0.525]

63 Q-value corresponding to an action with continuous angle φ is calculated as linear interpolation between activities of the two closest action cells. [sent-130, score-0.26]

64 During learning the PC→AC connection weights are updated on each time step in such a way as to decrease the reward prediction error δt (3): ac pc ∆wij = αδri rj . [sent-131, score-0.417]

65 Wall hits result in a small negative reward (rwall = −3). [sent-136, score-0.232]

66 For each session and each set of the meta-parameters, 48 different sets of random initial weights wij (corresponding to individual mice) were used to run the model, with 50 simulations started out of each set. [sent-137, score-0.217]

67 To account for the loss of memory, after each day all weights were updated as follows: new old initial wij = wij · (1 − λ) + wij ·λ old wij where λ is the memory decay factor, is the weight value at the end of the day, and the initial weight value before any learning took place. [sent-139, score-0.716]

68 PMexp were calculated either for each animal (5HB), k or for each subgroup (WM). [sent-142, score-0.274]

69 φ∗ = arctan( i ac ri sin(2πk/Nac )/ i ac ri cos(2πk/Nac )) b. [sent-167, score-0.22]

70 Self-consistency check: true (open circles) and estimated (filled circles) meta-parameter values for the 24 random sets in the 5HB each animal and each experimental day. [sent-178, score-0.275]

71 Learning dynamics in both experiments are illustrated in Figure 3a for 2 representative PMs, where average performances for all mice and the corresponding models (with estimated meta-parameters) are shown. [sent-181, score-0.386]

72 The mean absolute errors (distances between real and estimated parameter values) were quite small for exploitation factors (β) – approximately 6% of the total range, but higher for the reward discount factors (γ) and for the learning rates (α) – 10-29% of the total range (Figure 3b). [sent-198, score-0.617]

73 1 Meta-parameter dynamics During the course of learning, exploitation factors (β) (Figure 4a,b) showed progressive increase (regression p 0. [sent-201, score-0.239]

74 They were consistently higher for the C57 mice than for the DBA mice (2-way ANOVA with replications, p 0. [sent-203, score-0.574]

75 001 for both experiments), indicating that the DBA mice were exploring the environment more actively, and/or were not able to focus their attention well on the specific task. [sent-204, score-0.287]

76 Finally, C57 mouse groups, exposed to motivational stress in the WM and to extrinsic stress in the 5HB, had elevated exploitation factors (ANOVA p < 0. [sent-205, score-1.359]

77 There were no differences between the 2 genetic strains (nor among the stress conditions) with one exception: for the first several days of the training, C57 learning rates were b. [sent-208, score-0.658]

78 8 Future reward deference factor γ Fixed platform Variable platform 0. [sent-216, score-0.709]

79 8 Future reward deference factor γ C57BL/6 DBA/2 5 6 7 Memory decay/interference factor a. [sent-232, score-0.374]

80 Estimated exploitation factors β for 5HB (a, break is between days 8 & 9) and WM (b, breaks between days 4 & 5 and between 7 & 8). [sent-238, score-0.448]

81 Estimated future reward deference factors for the variable platform trials in the WM (c) and for the uncertainty trials in the 5HB (d). [sent-240, score-0.647]

82 Estimated memory decay / interference factors for the first day after the break in the 5HB. [sent-242, score-0.415]

83 01 in both experiments), indicating that C57 mice could learn a novel task more quickly. [sent-244, score-0.287]

84 Under uncertainty (in reward delivery for the 5HB, and in the target platform location for the WM) future reward discount factors (γ) were significantly elevated (ANOVA p < 0. [sent-245, score-0.993]

85 In the 5HB, memory decay factors (λ), estimated for the first day after the break, were significantly higher (p < 0. [sent-247, score-0.405]

86 This suggests that uncertainty makes animals consider rewards further into the future, and it seems to impair memory consolidation. [sent-249, score-0.305]

87 7 Discussion In this paper we showed that various behavioral outcomes (caused by genetic traits and/or stress factors) could be predicted by our TDRL models for 2 different tasks. [sent-250, score-0.609]

88 Results for the exploitation factors suggest that with learning (and decreasing reward prediction errors) the acquired knowledge is used more for choosing actions. [sent-252, score-0.423]

89 This might also be related to decreased subjective stress and higher stressor controllability. [sent-253, score-0.467]

90 Firstly, the anxious DBA mice cannot exploit their knowledge as well as C57 can. [sent-255, score-0.287]

91 Secondly, in response to motivational or extrinsic stress C57 mice are the only ones that increase their exploitation. [sent-256, score-0.86]

92 Animals with low anxiety (C57) might be on the left side of the curve, and additional stress might lead them to optimal performance, while those with high anxiety – already on the right side, leading to possibly impaired performance. [sent-258, score-0.553]

93 Our results may also suggest that the widely proclaimed deficiency of DBA mice in spatial learning (as compared to C57) [4, 12] might be primarily due to differential attentional capabilities. [sent-259, score-0.287]

94 The increased future reward discount factors under uncertainty indicate a reasonable adaptive response – animals should not concentrate their learning on immediate events when task-reward rela- tions become ambiguous. [sent-260, score-0.691]

95 Uncertainty in behaviorally relevant outcomes under stress causes a decrease in subjective stressor controllability, which is known to be related to elevated serotonin levels [18]. [sent-261, score-0.571]

96 Higher memory decay / interference factors for the animals previously exposed to uncertainty could be due to partially impaired memory consolidation and/or due to stronger competition between different strategies and perceptions of the uncertain task. [sent-262, score-0.653]

97 Although estimated meta-parameter values can be easily compared between certain experimental conditions, it is difficult to study in this way the interactions between different genetic and environmental factors or extrapolate beyond the limits of available conditions. [sent-263, score-0.247]

98 Behavioral profiles of inbred strains on novel olfactory, spatial and emotional tests for reference memory in mice. [sent-319, score-0.252]

99 Genetic influences on impulsivity, risk taking, stress responsivity and vulnerability to drug abuse and addiction. [sent-339, score-0.38]

100 What do comparative studies of inbred mice add to current investigations on the neural basis of spatial behaviors? [sent-370, score-0.339]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('stress', 0.38), ('mice', 0.287), ('poke', 0.243), ('reward', 0.232), ('iti', 0.208), ('animal', 0.193), ('platform', 0.19), ('tdrl', 0.174), ('wm', 0.173), ('dba', 0.156), ('animals', 0.139), ('session', 0.127), ('day', 0.122), ('exploitation', 0.115), ('ac', 0.11), ('behavioral', 0.11), ('memory', 0.109), ('mouse', 0.109), ('st', 0.105), ('motivational', 0.104), ('pms', 0.104), ('days', 0.098), ('actions', 0.095), ('action', 0.092), ('strains', 0.091), ('quadrant', 0.091), ('wij', 0.09), ('genetic', 0.089), ('exposed', 0.081), ('water', 0.081), ('factors', 0.076), ('morris', 0.075), ('pc', 0.075), ('food', 0.073), ('anxiety', 0.069), ('elevated', 0.069), ('pmexp', 0.069), ('delivered', 0.069), ('holes', 0.069), ('discount', 0.067), ('trial', 0.063), ('break', 0.061), ('maze', 0.06), ('uncertainty', 0.057), ('staying', 0.055), ('cells', 0.053), ('deference', 0.052), ('impulsivity', 0.052), ('inbred', 0.052), ('neuromodulatory', 0.052), ('npm', 0.052), ('pmmod', 0.052), ('pokes', 0.052), ('stressor', 0.052), ('estimated', 0.051), ('anova', 0.051), ('genotype', 0.051), ('dynamics', 0.048), ('decay', 0.047), ('lled', 0.046), ('extrinsic', 0.045), ('strain', 0.045), ('factor', 0.045), ('continuing', 0.044), ('response', 0.044), ('reinforcement', 0.043), ('pool', 0.042), ('subgroup', 0.041), ('sec', 0.041), ('circles', 0.041), ('future', 0.04), ('calculated', 0.04), ('activity', 0.04), ('effects', 0.039), ('old', 0.039), ('nat', 0.038), ('rev', 0.038), ('activities', 0.036), ('immediate', 0.036), ('starting', 0.035), ('epub', 0.035), ('hormones', 0.035), ('impaired', 0.035), ('lengthpref', 0.035), ('rpl', 0.035), ('serotonin', 0.035), ('timepref', 0.035), ('subjective', 0.035), ('mar', 0.035), ('ms', 0.034), ('modelled', 0.032), ('wall', 0.032), ('experimental', 0.031), ('uenced', 0.031), ('place', 0.03), ('rt', 0.03), ('traits', 0.03), ('controllability', 0.03), ('retraining', 0.03), ('delivery', 0.03)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000004 71 nips-2006-Effects of Stress and Genotype on Meta-parameter Dynamics in Reinforcement Learning

Author: Gediminas Lukšys, Jérémie Knüsel, Denis Sheynikhovich, Carmen Sandi, Wulfram Gerstner

Abstract: Stress and genetic background regulate different aspects of behavioral learning through the action of stress hormones and neuromodulators. In reinforcement learning (RL) models, meta-parameters such as learning rate, future reward discount factor, and exploitation-exploration factor, control learning dynamics and performance. They are hypothesized to be related to neuromodulatory levels in the brain. We found that many aspects of animal learning and performance can be described by simple RL models using dynamic control of the meta-parameters. To study the effects of stress and genotype, we carried out 5-hole-box light conditioning and Morris water maze experiments with C57BL/6 and DBA/2 mouse strains. The animals were exposed to different kinds of stress to evaluate its effects on immediate performance as well as on long-term memory. Then, we used RL models to simulate their behavior. For each experimental session, we estimated a set of model meta-parameters that produced the best fit between the model and the animal performance. The dynamics of several estimated meta-parameters were qualitatively similar for the two simulated experiments, and with statistically significant differences between different genetic strains and stress conditions. 1

2 0.11520316 191 nips-2006-The Robustness-Performance Tradeoff in Markov Decision Processes

Author: Huan Xu, Shie Mannor

Abstract: Computation of a satisfactory control policy for a Markov decision process when the parameters of the model are not exactly known is a problem encountered in many practical applications. The traditional robust approach is based on a worstcase analysis and may lead to an overly conservative policy. In this paper we consider the tradeoff between nominal performance and the worst case performance over all possible models. Based on parametric linear programming, we propose a method that computes the whole set of Pareto efficient policies in the performancerobustness plane when only the reward parameters are subject to uncertainty. In the more general case when the transition probabilities are also subject to error, we show that the strategy with the “optimal” tradeoff might be non-Markovian and hence is in general not tractable. 1

3 0.10972387 114 nips-2006-Learning Time-Intensity Profiles of Human Activity using Non-Parametric Bayesian Models

Author: Alexander T. Ihler, Padhraic Smyth

Abstract: Data sets that characterize human activity over time through collections of timestamped events or counts are of increasing interest in application areas as humancomputer interaction, video surveillance, and Web data analysis. We propose a non-parametric Bayesian framework for modeling collections of such data. In particular, we use a Dirichlet process framework for learning a set of intensity functions corresponding to different categories, which form a basis set for representing individual time-periods (e.g., several days) depending on which categories the time-periods are assigned to. This allows the model to learn in a data-driven fashion what “factors” are generating the observations on a particular day, including (for example) weekday versus weekend effects or day-specific effects corresponding to unique (single-day) occurrences of unusual behavior, sharing information where appropriate to obtain improved estimates of the behavior associated with each category. Applications to real–world data sets of count data involving both vehicles and people are used to illustrate the technique. 1

4 0.093692765 10 nips-2006-A Novel Gaussian Sum Smoother for Approximate Inference in Switching Linear Dynamical Systems

Author: David Barber, Bertrand Mesot

Abstract: We introduce a method for approximate smoothed inference in a class of switching linear dynamical systems, based on a novel form of Gaussian Sum smoother. This class includes the switching Kalman Filter and the more general case of switch transitions dependent on the continuous latent state. The method improves on the standard Kim smoothing approach by dispensing with one of the key approximations, thus making fuller use of the available future information. Whilst the only central assumption required is projection to a mixture of Gaussians, we show that an additional conditional independence assumption results in a simpler but stable and accurate alternative. Unlike the alternative unstable Expectation Propagation procedure, our method consists only of a single forward and backward pass and is reminiscent of the standard smoothing ‘correction’ recursions in the simpler linear dynamical system. The algorithm performs well on both toy experiments and in a large scale application to noise robust speech recognition. 1 Switching Linear Dynamical System The Linear Dynamical System (LDS) [1] is a key temporal model in which a latent linear process generates the observed series. For complex time-series which are not well described globally by a single LDS, we may break the time-series into segments, each modeled by a potentially different LDS. This is the basis for the Switching LDS (SLDS) [2, 3, 4, 5] where, for each time t, a switch variable st ∈ 1, . . . , S describes which of the LDSs is to be used. The observation (or ‘visible’) vt ∈ RV is linearly related to the hidden state ht ∈ RH with additive noise η by vt = B(st )ht + η v (st ) p(vt |ht , st ) = N (B(st )ht , Σv (st )) ≡ (1) where N (µ, Σ) denotes a Gaussian distribution with mean µ and covariance Σ. The transition dynamics of the continuous hidden state ht is linear, ht = A(st )ht−1 + η h (st ), ≡ p(ht |ht−1 , st ) = N A(st )ht−1 , Σh (st ) (2) The switch st may depend on both the previous st−1 and ht−1 . This is an augmented SLDS (aSLDS), and defines the model T p(vt |ht , st )p(ht |ht−1 , st )p(st |ht−1 , st−1 ) p(v1:T , h1:T , s1:T ) = t=1 The standard SLDS[4] considers only switch transitions p(st |st−1 ). At time t = 1, p(s1 |h0 , s0 ) simply denotes the prior p(s1 ), and p(h1 |h0 , s1 ) denotes p(h1 |s1 ). The aim of this article is to address how to perform inference in the aSLDS. In particular we desire the filtered estimate p(ht , st |v1:t ) and the smoothed estimate p(ht , st |v1:T ), for any 1 ≤ t ≤ T . Both filtered and smoothed inference in the SLDS is intractable, scaling exponentially with time [4]. s1 s2 s3 s4 h1 h2 h3 h4 v1 v2 v3 v4 Figure 1: The independence structure of the aSLDS. Square nodes denote discrete variables, round nodes continuous variables. In the SLDS links from h to s are not normally considered. 2 Expectation Correction Our approach to approximate p(ht , st |v1:T ) mirrors the Rauch-Tung-Striebel ‘correction’ smoother for the simpler LDS [1].The method consists of a single forward pass to recursively find the filtered posterior p(ht , st |v1:t ), followed by a single backward pass to correct this into a smoothed posterior p(ht , st |v1:T ). The forward pass we use is equivalent to standard Assumed Density Filtering (ADF) [6]. The main contribution of this paper is a novel form of backward pass, based only on collapsing the smoothed posterior to a mixture of Gaussians. Together with the ADF forward pass, we call the method Expectation Correction, since it corrects the moments found from the forward pass. A more detailed description of the method, including pseudocode, is given in [7]. 2.1 Forward Pass (Filtering) Readers familiar with ADF may wish to continue directly to Section (2.2). Our aim is to form a recursion for p(st , ht |v1:t ), based on a Gaussian mixture approximation of p(ht |st , v1:t ). Without loss of generality, we may decompose the filtered posterior as p(ht , st |v1:t ) = p(ht |st , v1:t )p(st |v1:t ) (3) The exact representation of p(ht |st , v1:t ) is a mixture with O(S t ) components. We therefore approximate this with a smaller I-component mixture I p(ht |st , v1:t ) ≈ p(ht |it , st , v1:t )p(it |st , v1:t ) it =1 where p(ht |it , st , v1:t ) is a Gaussian parameterized with mean f (it , st ) and covariance F (it , st ). To find a recursion for these parameters, consider p(ht+1 |st+1 , v1:t+1 ) = p(ht+1 |st , it , st+1 , v1:t+1 )p(st , it |st+1 , v1:t+1 ) (4) st ,it Evaluating p(ht+1 |st , it , st+1 , v1:t+1 ) We find p(ht+1 |st , it , st+1 , v1:t+1 ) by first computing the joint distribution p(ht+1 , vt+1 |st , it , st+1 , v1:t ), which is a Gaussian with covariance and mean elements, Σhh = A(st+1 )F (it , st )AT (st+1 ) + Σh (st+1 ), Σvv = B(st+1 )Σhh B T (st+1 ) + Σv (st+1 ) Σvh = B(st+1 )F (it , st ), µv = B(st+1 )A(st+1 )f (it , st ), µh = A(st+1 )f (it , st ) (5) and then conditioning on vt+1 1 . For the case S = 1, this forms the usual Kalman Filter recursions[1]. Evaluating p(st , it |st+1 , v1:t+1 ) The mixture weight in (4) can be found from the decomposition p(st , it |st+1 , v1:t+1 ) ∝ p(vt+1 |it , st , st+1 , v1:t )p(st+1 |it , st , v1:t )p(it |st , v1:t )p(st |v1:t ) (6) 1 p(x|y) is a Gaussian with mean µx + Σxy Σ−1 (y − µy ) and covariance Σxx − Σxy Σ−1 Σyx . yy yy The first factor in (6), p(vt+1 |it , st , st+1 , v1:t ) is a Gaussian with mean µv and covariance Σvv , as given in (5). The last two factors p(it |st , v1:t ) and p(st |v1:t ) are given from the previous iteration. Finally, p(st+1 |it , st , v1:t ) is found from p(st+1 |it , st , v1:t ) = p(st+1 |ht , st ) p(ht |it ,st ,v1:t ) (7) where · p denotes expectation with respect to p. In the SLDS, (7) is replaced by the Markov transition p(st+1 |st ). In the aSLDS, however, (7) will generally need to be computed numerically. Closing the recursion We are now in a position to calculate (4). For each setting of the variable st+1 , we have a mixture of I × S Gaussians which we numerically collapse back to I Gaussians to form I p(ht+1 |st+1 , v1:t+1 ) ≈ p(ht+1 |it+1 , st+1 , v1:t+1 )p(it+1 |st+1 , v1:t+1 ) it+1 =1 Any method of choice may be supplied to collapse a mixture to a smaller mixture; our code simply repeatedly merges low-weight components. In this way the new mixture coefficients p(it+1 |st+1 , v1:t+1 ), it+1 ∈ 1, . . . , I are defined, completing the description of how to form a recursion for p(ht+1 |st+1 , v1:t+1 ) in (3). A recursion for the switch variable is given by p(st+1 |v1:t+1 ) ∝ p(vt+1 |st+1 , it , st , v1:t )p(st+1 |it , st , v1:t )p(it |st , v1:t )p(st |v1:t ) st ,it where all terms have been computed during the recursion for p(ht+1 |st+1 , v1:t+1 ). The likelihood p(v1:T ) may be found by recursing p(v1:t+1 ) = p(vt+1 |v1:t )p(v1:t ), where p(vt+1 |vt ) = p(vt+1 |it , st , st+1 , v1:t )p(st+1 |it , st , v1:t )p(it |st , v1:t )p(st |v1:t ) it ,st ,st+1 2.2 Backward Pass (Smoothing) The main contribution of this paper is to find a suitable way to ‘correct’ the filtered posterior p(st , ht |v1:t ) obtained from the forward pass into a smoothed posterior p(st , ht |v1:T ). We derive this for the case of a single Gaussian representation. The extension to the mixture case is straightforward and presented in [7]. We approximate the smoothed posterior p(ht |st , v1:T ) by a Gaussian with mean g(st ) and covariance G(st ) and our aim is to find a recursion for these parameters. A useful starting point for a recursion is: p(st+1 |v1:T )p(ht |st , st+1 , v1:T )p(st |st+1 , v1:T ) p(ht , st |v1:T ) = st+1 The term p(ht |st , st+1 , v1:T ) may be computed as p(ht |st , st+1 , v1:T ) = p(ht |ht+1 , st , st+1 , v1:t )p(ht+1 |st , st+1 , v1:T ) (8) ht+1 The recursion therefore requires p(ht+1 |st , st+1 , v1:T ), which we can write as p(ht+1 |st , st+1 , v1:T ) ∝ p(ht+1 |st+1 , v1:T )p(st |st+1 , ht+1 , v1:t ) (9) The difficulty here is that the functional form of p(st |st+1 , ht+1 , v1:t ) is not squared exponential in ht+1 , so that p(ht+1 |st , st+1 , v1:T ) will not be Gaussian2 . One possibility would be to approximate the non-Gaussian p(ht+1 |st , st+1 , v1:T ) by a Gaussian (or mixture thereof) by minimizing the Kullback-Leilbler divergence between the two, or performing moment matching in the case of a single Gaussian. A simpler alternative (which forms ‘standard’ EC) is to make the assumption p(ht+1 |st , st+1 , v1:T ) ≈ p(ht+1 |st+1 , v1:T ), where p(ht+1 |st+1 , v1:T ) is already known from the previous backward recursion. Under this assumption, the recursion becomes p(ht , st |v1:T ) ≈ p(st+1 |v1:T )p(st |st+1 , v1:T ) p(ht |ht+1 , st , st+1 , v1:t ) p(ht+1 |st+1 ,v1:T ) (10) st+1 2 In the exact calculation, p(ht+1 |st , st+1 , v1:T ) is a mixture of Gaussians, see [7]. However, since in (9) the two terms p(ht+1 |st+1 , v1:T ) will only be approximately computed during the recursion, our approximation to p(ht+1 |st , st+1 , v1:T ) will not be a mixture of Gaussians. Evaluating p(ht |ht+1 , st , st+1 , v1:t ) p(ht+1 |st+1 ,v1:T ) p(ht |ht+1 , st , st+1 , v1:t ) p(ht+1 |st+1 ,v1:T ) is a Gaussian in ht , whose statistics we will now compute. First we find p(ht |ht+1 , st , st+1 , v1:t ) which may be obtained from the joint distribution p(ht , ht+1 |st , st+1 , v1:t ) = p(ht+1 |ht , st+1 )p(ht |st , v1:t ) (11) which itself can be found from a forward dynamics from the filtered estimate p(ht |st , v1:t ). The statistics for the marginal p(ht |st , st+1 , v1:t ) are simply those of p(ht |st , v1:t ), since st+1 carries no extra information about ht . The remaining statistics are the mean of ht+1 , the covariance of ht+1 and cross-variance between ht and ht+1 , which are given by ht+1 = A(st+1 )ft (st ), Σt+1,t+1 = A(st+1 )Ft (st )AT (st+1 )+Σh (st+1 ), Σt+1,t = A(st+1 )Ft (st ) Given the statistics of (11), we may now condition on ht+1 to find p(ht |ht+1 , st , st+1 , v1:t ). Doing so effectively constitutes a reversal of the dynamics, ← − − ht = A (st , st+1 )ht+1 + ←(st , st+1 ) η ← − ← − − − where A (st , st+1 ) and ←(st , st+1 ) ∼ N (← t , st+1 ), Σ (st , st+1 )) are easily found using η m(s conditioning. Averaging the above reversed dynamics over p(ht+1 |st+1 , v1:T ), we find that p(ht |ht+1 , st , st+1 , v1:t ) p(ht+1 |st+1 ,v1:T ) is a Gaussian with statistics ← − ← − ← − ← − − µt = A (st , st+1 )g(st+1 )+← t , st+1 ), Σt,t = A (st , st+1 )G(st+1 ) A T (st , st+1 )+ Σ (st , st+1 ) m(s These equations directly mirror the standard RTS backward pass[1]. Evaluating p(st |st+1 , v1:T ) The main departure of EC from previous methods is in treating the term p(st |st+1 , v1:T ) = p(st |ht+1 , st+1 , v1:t ) p(ht+1 |st+1 ,v1:T ) (12) The term p(st |ht+1 , st+1 , v1:t ) is given by p(st |ht+1 , st+1 , v1:t ) = p(ht+1 |st+1 , st , v1:t )p(st , st+1 |v1:t ) ′ ′ s′ p(ht+1 |st+1 , st , v1:t )p(st , st+1 |v1:t ) (13) t Here p(st , st+1 |v1:t ) = p(st+1 |st , v1:t )p(st |v1:t ), where p(st+1 |st , v1:t ) occurs in the forward pass, (7). In (13), p(ht+1 |st+1 , st , v1:t ) is found by marginalizing (11). Computing the average of (13) with respect to p(ht+1 |st+1 , v1:T ) may be achieved by any numerical integration method desired. A simple approximation is to evaluate the integrand at the mean value of the averaging distribution p(ht+1 |st+1 , v1:T ). More sophisticated methods (see [7]) such as sampling from the Gaussian p(ht+1 |st+1 , v1:T ) have the advantage that covariance information is used3 . Closing the Recursion We have now computed both the continuous and discrete factors in (8), which we wish to use to write the smoothed estimate in the form p(ht , st |v1:T ) = p(st |v1:T )p(ht |st , v1:T ). The distribution p(ht |st , v1:T ) is readily obtained from the joint (8) by conditioning on st to form the mixture p(ht |st , v1:T ) = p(st+1 |st , v1:T )p(ht |st , st+1 , v1:T ) st+1 which may then be collapsed to a single Gaussian (the mixture case is discussed in [7]). The smoothed posterior p(st |v1:T ) is given by p(st |v1:T ) = p(st+1 |v1:T ) p(st |ht+1 , st+1 , v1:t ) p(ht+1 |st+1 ,v1:T ) . (14) st+1 3 This is a form of exact sampling since drawing samples from a Gaussian is easy. This should not be confused with meaning that this use of sampling renders EC a sequential Monte-Carlo scheme. 2.3 Relation to other methods The EC Backward pass is closely related to Kim’s method [8]. In both EC and Kim’s method, the approximation p(ht+1 |st , st+1 , v1:T ) ≈ p(ht+1 |st+1 , v1:T ), is used to form a numerically simple backward pass. The other ‘approximation’ in EC is to numerically compute the average in (14). In Kim’s method, however, an update for the discrete variables is formed by replacing the required term in (14) by p(st |ht+1 , st+1 , v1:t ) p(ht+1 |st+1 ,v1:T ) ≈ p(st |st+1 , v1:t ) (15) Since p(st |st+1 , v1:t ) ∝ p(st+1 |st )p(st |v1:t )/p(st+1 |v1:t ), this can be computed simply from the filtered results alone. The fundamental difference therefore between EC and Kim’s method is that the approximation, (15), is not required by EC. The EC backward pass therefore makes fuller use of the future information, resulting in a recursion which intimately couples the continuous and discrete variables. The resulting effect on the quality of the approximation can be profound, as we will see in the experiments. The Expectation Propagation (EP) algorithm makes the central assumption of collapsing the posteriors to a Gaussian family [5]; the collapse is defined by a consistency criterion on overlapping marginals. In our experiments, we take the approach in [9] of collapsing to a single Gaussian. Ensuring consistency requires frequent translations between moment and canonical parameterizations, which is the origin of potentially severe numerical instability [10]. In contrast, EC works largely with moment parameterizations of Gaussians, for which relatively few numerical difficulties arise. Unlike EP, EC is not based on a consistency criterion and a subtle issue arises about possible inconsistencies in the Forward and Backward approximations for EC. For example, under the conditional independence assumption in the Backward Pass, p(hT |sT −1 , sT , v1:T ) ≈ p(hT |sT , v1:T ), which is in contradiction to (5) which states that the approximation to p(hT |sT −1 , sT , v1:T ) will depend on sT −1 . Such potential inconsistencies arise because of the approximations made, and should not be considered as separate approximations in themselves. Rather than using a global (consistency) objective, EC attempts to faithfully approximate the exact Forward and Backward propagation routines. For this reason, as in the exact computation, only a single Forward and Backward pass are required in EC. In [11] a related dynamics reversed is proposed. However, the singularities resulting from incorrectly treating p(vt+1:T |ht , st ) as a density are heuristically finessed. In [12] a variational method approximates the joint distribution p(h1:T , s1:T |v1:T ) rather than the marginal inference p(ht , st |v1:T ). This is a disadvantage when compared to other methods that directly approximate the marginal. Sequential Monte Carlo methods (Particle Filters)[13], are essentially mixture of delta-function approximations. Whilst potentially powerful, these typically suffer in high-dimensional hidden spaces, unless techniques such as Rao-Blackwellization are performed. ADF is generally preferential to Particle Filtering since in ADF the approximation is a mixture of non-trivial distributions, and is therefore more able to represent the posterior. 3 Demonstration Testing EC in a problem with a reasonably long temporal sequence, T , is important since numerical instabilities may not be apparent in timeseries of just a few points. To do this, we sequentially generate hidden and visible states from a given model, here with H = 3, S = 2, V = 1 – see Figure(2) for full details of the experimental setup. Then, given only the parameters of the model and the visible observations (but not any of the hidden states h1:T , s1:T ), the task is to infer p(ht |st , v1:T ) and p(st |v1:T ). Since the exact computation is exponential in T , a simple alternative is to assume that the original sample states s1:T are the ‘correct’ inferences, and compare how our most probable posterior smoothed estimates arg maxst p(st |v1:T ) compare with the assumed correct sample st . We chose conditions that, from the viewpoint of classical signal processing, are difficult, with changes in the switches occurring at a much higher rate than the typical frequencies in the signal vt . For EC we use the mean approximation for the numerical integration of (12). We included the Particle Filter merely for a point of comparison with ADF, since they are not designed to approximate PF RBPF EP ADFS KimS ECS ADFM KimM ECM 1000 800 600 400 200 0 0 10 20 0 10 20 0 10 20 0 10 20 0 10 20 0 10 20 0 10 20 0 10 20 0 10 20 Figure 2: The number of errors in estimating p(st |v1:T ) for a binary switch (S = 2) over a time series of length T = 100. Hence 50 errors corresponds to random guessing. Plotted are histograms of the errors are over 1000 experiments. The x-axes are cut off at 20 errors to improve visualization of the results. (PF) Particle Filter. (RBPF) Rao-Blackwellized PF. (EP) Expectation Propagation. (ADFS) Assumed Density Filtering using a Single Gaussian. (KimS) Kim’s smoother using the results from ADFS. (ECS) Expectation Correction using a Single Gaussian (I = J = 1). (ADFM) ADF using a multiple of I = 4 Gaussians. (KimM) Kim’s smoother using the results from ADFM. (ECM) Expectation Correction using a mixture with I = J = 4 components. S = 2, V = 1 (scalar observations), T = 100, with zero output bias. A(s) = 0.9999 ∗ orth(randn(H, H)), B(s) = randn(V, H). H = 3, Σh (s) = IH , Σv (s) = 0.1IV , p(st+1 |st ) ∝ 1S×S + IS . At time t = 1, the priors are p1 = uniform, with h1 drawn from N (10 ∗ randn(H, 1), IH ). the smoothed estimate, for which 1000 particles were used, with Kitagawa resampling. For the RaoBlackwellized Particle Filter [13], 500 particles were used, with Kitagawa resampling. We found that EP4 was numerically unstable and often struggled to converge. To encourage convergence, we used the damping method in [9], performing 20 iterations with a damping factor of 0.5. Nevertheless, the disappointing performance of EP is most likely due to conflicts resulting from numerical instabilities introduced by the frequent conversions between moment and canonical representations. The best filtered results are given using ADF, since this is better able to represent the variance in the filtered posterior than the sampling methods. Unlike Kim’s method, EC makes good use of the future information to clean up the filtered results considerably. One should bear in mind that both EC and Kim’s method use the same ADF filtered results. This demonstrates that EC may dramatically improve on Kim’s method, so that the small amount of extra work in making a numerical approximation of p(st |st+1 , v1:T ), (12), may bring significant benefits. We found similar conclusions for experiments with an aSLDS[7]. 4 Application to Noise Robust ASR Here we briefly present an application of the SLDS to robust Automatic Speech Recognition (ASR), for which the intractable inference is performed by EC, and serves to demonstrate how EC scales well to a large-scale application. Fuller details are given in [14]. The standard approach to noise robust ASR is to provide a set of noise-robust features to a standard Hidden Markov Model (HMM) classifier, which is based on modeling the acoustic feature vector. For example, the method of Unsupervised Spectral Subtraction (USS) [15] provides state-of-the-art performance in this respect. Incorporating noise models directly into such feature-based HMM systems is difficult, mainly because the explicit influence of the noise on the features is poorly understood. An alternative is to model the raw speech signal directly, such as the SAR-HMM model [16] for which, under clean conditions, isolated spoken digit recognition performs well. However, the SAR-HMM performs poorly under noisy conditions, since no explicit noise processes are taken into account by the model. The approach we take here is to extend the SAR-HMM to include an explicit noise process, so that h the observed signal vt is modeled as a noise corrupted version of a clean hidden signal vt : h vt = vt + ηt ˜ 4 with ηt ∼ N (0, σ 2 ) ˜ ˜ Generalized EP [5], which groups variables together improves on the results, but is still far inferior to the EC results presented here – Onno Zoeter personal communication. Noise Variance 0 10−7 10−6 10−5 10−4 10−3 SNR (dB) 26.5 26.3 25.1 19.7 10.6 0.7 HMM 100.0% 100.0% 90.9% 86.4% 59.1% 9.1% SAR-HMM 97.0% 79.8% 56.7% 22.2% 9.7% 9.1% AR-SLDS 96.8% 96.8% 96.4% 94.8% 84.0% 61.2% Table 1: Comparison of the recognition accuracy of three models when the test utterances are corrupted by various levels of Gaussian noise. The dynamics of the clean signal is modeled by a switching AR process R h vt = h h cr (st )vt−r + ηt (st ), h ηt (st ) ∼ N (0, σ 2 (st )) r=1 where st ∈ {1, . . . , S} denotes which of a set of AR coefficients cr (st ) are to be used at time t, h and ηt (st ) is the so-called innovation noise. When σ 2 (st ) ≡ 0, this model reproduces the SARHMM of [16], a specially constrained HMM. Hence inference and learning for the SAR-HMM are tractable and straightforward. For the case σ 2 (st ) > 0 the model can be recast as an SLDS. To do this we define ht as a vector which contains the R most recent clean hidden samples ht = h vt h . . . vt−r+1 T (16) and we set A(st ) to be an R × R matrix where the first row contains the AR coefficients −cr (st ) and the rest is a shifted down identity matrix. For example, for a third order (R = 3) AR process, A(st ) = −c1 (st ) −c2 (st ) −c3 (st ) 1 0 0 0 1 0 . (17) The hidden covariance matrix Σh (s) has all elements zero, except the top-left most which is set to the innovation variance. To extract the first component of ht we use the (switch independent) 1 × R projection matrix B = [ 1 0 . . . 0 ]. The (switch independent) visible scalar noise 2 variance is given by Σv ≡ σv . A well-known issue with raw speech signal models is that the energy of a signal may vary from one speaker to another or because of a change in recording conditions. For this reason the innovation Σh is adjusted by maximizing the likelihood of an observed sequence with respect to the innovation covariance, a process called Gain Adaptation [16]. 4.1 Training & Evaluation Following [16], we trained a separate SAR-HMM for each of the eleven digits (0–9 and ‘oh’) from the TI-DIGITS database [17]. The training set for each digit was composed of 110 single digit utterances down-sampled to 8 kHz, each one pronounced by a male speaker. Each SAR-HMM was composed of ten states with a left-right transition matrix. Each state was associated with a 10thorder AR process and the model was constrained to stay an integer multiple of K = 140 time steps (0.0175 seconds) in the same state. We refer the reader to [16] for a detailed explanation of the training procedure used with the SAR-HMM. An AR-SLDS was built for each of the eleven digits by copying the parameters of the corresponding trained SAR-HMM, i.e., the AR coefficients cr (s) are copied into the first row of the hidden transition matrix A(s) and the same discrete transition distribution p(st | st−1 ) is used. The models were then evaluated on a test set composed of 112 corrupted utterances of each of the eleven digits, each pronounced by different male speakers than those used in the training set. The recognition accuracy obtained by the models on the corrupted test sets is presented in Table 1. As expected, the performance of the SAR-HMM rapidly decreases with noise. The feature-based HMM with USS has high accuracy only for high SNR levels. In contrast, the AR-SLDS achieves a recognition accuracy of 61.2% at a SNR close to 0 dB, while the performance of the two other methods is equivalent to random guessing (9.1%). Whilst other inference methods may also perform well in this case, we found that EC performs admirably, without numerical instabilities, even for time-series with several thousand time-steps. 5 Discussion We presented a method for approximate smoothed inference in an augmented class of switching linear dynamical systems. Our approximation is based on the idea that due to the forgetting which commonly occurs in Markovian models, a finite number of mixture components may provide a reasonable approximation. Clearly, in systems with very long correlation times our method may require too many mixture components to produce a satisfactory result, although we are unaware of other techniques that would be able to cope well in that case. The main benefit of EC over Kim smoothing is that future information is more accurately dealt with. Whilst EC is not as general as EP, EC carefully exploits the properties of singly-connected distributions, such as the aSLDS, to provide a numerically stable procedure. We hope that the ideas presented here may therefore help facilitate the practical application of dynamic hybrid networks. Acknowledgements This work is supported by the EU Project FP6-0027787. This paper only reflects the authors’ views and funding agencies are not liable for any use that may be made of the information contained herein. References [1] Y. Bar-Shalom and Xiao-Rong Li. Estimation and Tracking : Principles, Techniques and Software. Artech House, Norwood, MA, 1998. [2] V. Pavlovic, J. M. Rehg, and J. MacCormick. Learning switching linear models of human motion. In Advances in Neural Information Processing systems (NIPS 13), pages 981–987, 2001. [3] A. T. Cemgil, B. Kappen, and D. Barber. A Generative Model for Music Transcription. IEEE Transactions on Audio, Speech and Language Processing, 14(2):679 – 694, 2006. [4] U. N. Lerner. Hybrid Bayesian Networks for Reasoning about Complex Systems. PhD thesis, Stanford University, 2002. [5] O. Zoeter. Monitoring non-linear and switching dynamical systems. PhD thesis, Radboud University Nijmegen, 2005. [6] T. Minka. A family of algorithms for approximate Bayesian inference. PhD thesis, MIT Media Lab, 2001. [7] D. Barber. Expectation Correction for Smoothed Inference in Switching Linear Dynamical Systems. Journal of Machine Learning Research, 7:2515–2540, 2006. [8] C-J. Kim. Dynamic linear models with Markov-switching. Journal of Econometrics, 60:1–22, 1994. [9] T. Heskes and O. Zoeter. Expectation Propagation for approximate inference in dynamic Bayesian networks. In A. Darwiche and N. Friedman, editors, Uncertainty in Art. Intelligence, pages 216–223, 2002. [10] S. Lauritzen and F. Jensen. Stable local computation with conditional Gaussian distributions. Statistics and Computing, 11:191–203, 2001. [11] G. Kitagawa. The Two-Filter Formula for Smoothing and an implementation of the Gaussian-sum smoother. Annals of the Institute of Statistical Mathematics, 46(4):605–623, 1994. [12] Z. Ghahramani and G. E. Hinton. Variational learning for switching state-space models. Neural Computation, 12(4):963–996, 1998. [13] A. Doucet, N. de Freitas, and N. Gordon. Sequential Monte Carlo Methods in Practice. Springer, 2001. [14] B. Mesot and D. Barber. Switching Linear Dynamical Systems for Noise Robust Speech Recognition. IDIAP-RR 08, 2006. [15] G. Lathoud, M. Magimai-Doss, B. Mesot, and H. Bourlard. Unsupervised spectral subtraction for noiserobust ASR. In Proceedings of ASRU 2005, pages 189–194, November 2005. [16] Y. Ephraim and W. J. J. Roberts. Revisiting autoregressive hidden Markov modeling of speech signals. IEEE Signal Processing Letters, 12(2):166–169, February 2005. [17] R.G. Leonard. A database for speaker independent digit recognition. In Proceedings of ICASSP84, volume 3, 1984.

5 0.085207805 125 nips-2006-Logarithmic Online Regret Bounds for Undiscounted Reinforcement Learning

Author: Peter Auer, Ronald Ortner

Abstract: We present a learning algorithm for undiscounted reinforcement learning. Our interest lies in bounds for the algorithm’s online performance after some finite number of steps. In the spirit of similar methods already successfully applied for the exploration-exploitation tradeoff in multi-armed bandit problems, we use upper confidence bounds to show that our UCRL algorithm achieves logarithmic online regret in the number of steps taken with respect to an optimal policy. 1 1.1

6 0.081276104 197 nips-2006-Uncertainty, phase and oscillatory hippocampal recall

7 0.069130406 112 nips-2006-Learning Nonparametric Models for Probabilistic Imitation

8 0.065656483 154 nips-2006-Optimal Change-Detection and Spiking Neurons

9 0.06255953 53 nips-2006-Combining causal and similarity-based reasoning

10 0.059723165 171 nips-2006-Sample Complexity of Policy Search with Known Dynamics

11 0.057235114 100 nips-2006-Information Bottleneck for Non Co-Occurrence Data

12 0.053776756 192 nips-2006-Theory and Dynamics of Perceptual Bistability

13 0.0535492 168 nips-2006-Reducing Calibration Time For Brain-Computer Interfaces: A Clustering Approach

14 0.051917303 143 nips-2006-Natural Actor-Critic for Road Traffic Optimisation

15 0.050405171 141 nips-2006-Multiple timescales and uncertainty in motor adaptation

16 0.049560871 193 nips-2006-Tighter PAC-Bayes Bounds

17 0.047943704 148 nips-2006-Nonlinear physically-based models for decoding motor-cortical population activity

18 0.046640951 89 nips-2006-Handling Advertisements of Unknown Quality in Search Advertising

19 0.045566637 12 nips-2006-A Probabilistic Algorithm Integrating Source Localization and Noise Suppression of MEG and EEG data

20 0.044948652 25 nips-2006-An Application of Reinforcement Learning to Aerobatic Helicopter Flight


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.132), (1, -0.078), (2, -0.061), (3, -0.102), (4, -0.042), (5, -0.093), (6, 0.072), (7, -0.021), (8, 0.123), (9, -0.04), (10, -0.009), (11, 0.04), (12, -0.045), (13, 0.034), (14, -0.003), (15, -0.03), (16, 0.046), (17, 0.049), (18, -0.139), (19, -0.032), (20, 0.008), (21, 0.035), (22, 0.078), (23, -0.01), (24, -0.012), (25, 0.083), (26, -0.021), (27, 0.042), (28, 0.062), (29, -0.069), (30, -0.099), (31, -0.091), (32, 0.087), (33, -0.218), (34, 0.035), (35, -0.002), (36, 0.104), (37, 0.033), (38, 0.132), (39, 0.002), (40, 0.04), (41, 0.067), (42, -0.167), (43, -0.062), (44, -0.099), (45, -0.178), (46, -0.049), (47, 0.05), (48, -0.052), (49, 0.029)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96269631 71 nips-2006-Effects of Stress and Genotype on Meta-parameter Dynamics in Reinforcement Learning

Author: Gediminas Lukšys, Jérémie Knüsel, Denis Sheynikhovich, Carmen Sandi, Wulfram Gerstner

Abstract: Stress and genetic background regulate different aspects of behavioral learning through the action of stress hormones and neuromodulators. In reinforcement learning (RL) models, meta-parameters such as learning rate, future reward discount factor, and exploitation-exploration factor, control learning dynamics and performance. They are hypothesized to be related to neuromodulatory levels in the brain. We found that many aspects of animal learning and performance can be described by simple RL models using dynamic control of the meta-parameters. To study the effects of stress and genotype, we carried out 5-hole-box light conditioning and Morris water maze experiments with C57BL/6 and DBA/2 mouse strains. The animals were exposed to different kinds of stress to evaluate its effects on immediate performance as well as on long-term memory. Then, we used RL models to simulate their behavior. For each experimental session, we estimated a set of model meta-parameters that produced the best fit between the model and the animal performance. The dynamics of several estimated meta-parameters were qualitatively similar for the two simulated experiments, and with statistically significant differences between different genetic strains and stress conditions. 1

2 0.60093135 114 nips-2006-Learning Time-Intensity Profiles of Human Activity using Non-Parametric Bayesian Models

Author: Alexander T. Ihler, Padhraic Smyth

Abstract: Data sets that characterize human activity over time through collections of timestamped events or counts are of increasing interest in application areas as humancomputer interaction, video surveillance, and Web data analysis. We propose a non-parametric Bayesian framework for modeling collections of such data. In particular, we use a Dirichlet process framework for learning a set of intensity functions corresponding to different categories, which form a basis set for representing individual time-periods (e.g., several days) depending on which categories the time-periods are assigned to. This allows the model to learn in a data-driven fashion what “factors” are generating the observations on a particular day, including (for example) weekday versus weekend effects or day-specific effects corresponding to unique (single-day) occurrences of unusual behavior, sharing information where appropriate to obtain improved estimates of the behavior associated with each category. Applications to real–world data sets of count data involving both vehicles and people are used to illustrate the technique. 1

3 0.50511283 25 nips-2006-An Application of Reinforcement Learning to Aerobatic Helicopter Flight

Author: Pieter Abbeel, Adam Coates, Morgan Quigley, Andrew Y. Ng

Abstract: Autonomous helicopter flight is widely regarded to be a highly challenging control problem. This paper presents the first successful autonomous completion on a real RC helicopter of the following four aerobatic maneuvers: forward flip and sideways roll at low speed, tail-in funnel, and nose-in funnel. Our experimental results significantly extend the state of the art in autonomous helicopter flight. We used the following approach: First we had a pilot fly the helicopter to help us find a helicopter dynamics model and a reward (cost) function. Then we used a reinforcement learning (optimal control) algorithm to find a controller that is optimized for the resulting model and reward function. More specifically, we used differential dynamic programming (DDP), an extension of the linear quadratic regulator (LQR). 1

4 0.49653873 58 nips-2006-Context Effects in Category Learning: An Investigation of Four Probabilistic Models

Author: Michael C. Mozer, Michael Shettel, Michael P. Holmes

Abstract: Categorization is a central activity of human cognition. When an individual is asked to categorize a sequence of items, context effects arise: categorization of one item influences category decisions for subsequent items. Specifically, when experimental subjects are shown an exemplar of some target category, the category prototype appears to be pulled toward the exemplar, and the prototypes of all nontarget categories appear to be pushed away. These push and pull effects diminish with experience, and likely reflect long-term learning of category boundaries. We propose and evaluate four principled probabilistic (Bayesian) accounts of context effects in categorization. In all four accounts, the probability of an exemplar given a category is encoded as a Gaussian density in feature space, and categorization involves computing category posteriors given an exemplar. The models differ in how the uncertainty distribution of category prototypes is represented (localist or distributed), and how it is updated following each experience (using a maximum likelihood gradient ascent, or a Kalman filter update). We find that the distributed maximum-likelihood model can explain the key experimental phenomena. Further, the model predicts other phenomena that were confirmed via reanalysis of the experimental data. Categorization is a key cognitive activity. We continually make decisions about characteristics of objects and individuals: Is the fruit ripe? Does your friend seem unhappy? Is your car tire flat? When an individual is asked to categorize a sequence of items, context effects arise: categorization of one item influences category decisions for subsequent items. Intuitive naturalistic scenarios in which context effects occur are easy to imagine. For example, if one lifts a medium-weight object after lifting a light-weight or heavy-weight object, the medium weight feels heavier following the light weight than following the heavy weight. Although the object-contrast effect might be due to fatigue of sensory-motor systems, many context effects in categorization are purely cognitive and cannot easily be attributed to neural habituation. For example, if you are reviewing a set of conference papers, and the first three in the set are dreadful, then even a mediocre paper seems like it might be above threshold for acceptance. Another example of a category boundary shift due to context is the following. Suppose you move from San Diego to Pittsburgh and notice that your neighbors repeatedly describe muggy, somewhat overcast days as ”lovely.” Eventually, your notion of what constitutes a lovely day accommodates to your new surroundings. As we describe shortly, experimental studies have shown a fundamental link between context effects in categorization and long-term learning of category boundaries. We believe that context effects can be viewed as a reflection of a trial-to-trial learning, and the cumulative effect of these trial-to-trial modulations corresponds to what we classically consider to be category learning. Consequently, any compelling model of category learning should also be capable of explaining context effects. 1 Experimental Studies of Context Effects in Categorization Consider a set of stimuli that vary along a single continuous dimension. Throughout this paper, we use as an illustration circles of varying diameters, and assume four categories of circles defined ranges of diameters; call them A, B, C, and D, in order from smallest to largest diameter. In a classification paradigm, experimental subjects are given an exemplar drawn from one category and are asked to respond with the correct category label (Zotov, Jones, & Mewhort, 2003). After making their response, subjects receive feedback as to the correct label, which we’ll refer to as the target. In a production paradigm, subjects are given a target category label and asked to produce an exemplar of that category, e.g., using a computer mouse to indicate the circle diameter (Jones & Mewhort, 2003). Once a response is made, subjects receive feedback as to the correct or true category label for the exemplar they produced. Neither classification nor production task has sequential structure, because the order of trial is random in both experiments. The production task provides direct information about the subjects’ internal representations, because subjects are producing exemplars that they consider to be prototypes of a category, whereas the categorization task requires indirect inferences to be made about internal representations from reaction time and accuracy data. Nonetheless, the findings in the production and classification tasks mirror one another nicely, providing converging evidence as to the nature of learning. The production task reveals how mental representations shift as a function of trial-to-trial sequences, and these shifts cause the sequential pattern of errors and response times typically observed in the classification task. We focus on the production task in this paper because it provides a richer source of data. However, we address the categorization task with our models as well. Figure 1 provides a schematic depiction of the key sequential effects in categorization. The horizontal line represents the stimulus dimension, e.g., circle diameter. The dimension is cut into four regions labeled with the corresponding category. The category center, which we’ll refer to as the prototype, is indicated by a vertical dashed line. The long solid vertical line marks the current exemplar—whether it is an exemplar presented to subjects in the classification task or an exemplar generated by subjects in the production task. Following an experimental trial with this exemplar, category prototypes appear to shift: the target-category prototype moves toward the exemplar, which we refer to as a pull effect, and all nontarget-category prototypes move away from the exemplar, which we refer to as a push effect. Push and pull effects are assessed in the production task by examining the exemplar produced on the following trial, and in the categorization task by examining the likelihood of an error response near category boundaries. The set of phenomena to be explained are as follows, described in terms of the production task. All numerical results referred to are from Jones and Mewhort (2003). This experiment consisted of 12 blocks of 40 trials, with each category label given as target 10 times within a block. • Within-category pull: When a target category is repeated on successive trials, the exemplar generated on the second trial moves toward the exemplar generated on the first trial, with respect to the true category prototype. Across the experiment, a correlation coefficient of 0.524 is obtained, and remains fairly constant over trials. • Between-category push: When the target category changes from one trial to the next, the exemplar generated on the second trial moves away from the exemplar generated on the first trial (or equivalently, from the prototype of the target category on the first trial). Figure 2a summarizes the sequential push effects from Jones and Mewhort. The diameter of the circle produced on trial t is plotted as a function of the target category on trial t − 1, with one line for each of the four trial t targets. The mean diameter for each target category is subtracted out, so the absolute vertical offset of each line is unimportant. The main feature of the data to note is that all four curves have a negative slope, which has the following meaning: the smaller that target t − 1 is (i.e., the further to the left on the x axis in Figure 1), the larger the response to target t is (further to the right in Figure 1), and vice versa, reflecting a push away from target t − 1. Interestingly and importantly, the magnitude of the push increases with the ordinal distance between targets t − 1 and t. Figure 2a is based on data from only eight subjects and is therefore noisy, though the effect is statistically reliable. As further evidence, Figure 2b shows data from a categorization task (Zotov et al., 2003), where the y-axis is a different dependent measure, but the negative slope has the same interpretation as in Figure 2a. example Figure 1: Schematic depiction of sequential effects in categorization A B C D stimulus dimension C 0 B −0.02 −0.04 −0.06 −0.08 −0.1 0.06 0.04 0.04 response deviation D 0.02 0.02 0 −0.02 −0.04 −0.06 −0.08 A B C −0.1 D previous category label 0.02 0 −0.02 −0.04 −0.06 −0.08 A B C −0.1 D previous category label (b) humans: classification (d) KFU−distrib 0.08 D previous category label C D 0.06 0.04 0.04 response deviation response deviation C B (f) MLGA−distrib 0.08 0.02 0 −0.02 −0.04 −0.06 −0.08 B A previous category label 0.06 A (e) MLGA−local 0.08 0.06 A 0.04 (c) KFU−local 0.08 response deviation response deviation 0.06 response bias from (a) production task of Jones and Mewhort (2003), (b) classification task of Zotov et al. (2003), and (c)-(f) the models proposed in this paper. The y axis is the deviation of the response from the mean, as a proportion of the total category width. The response to category A is solid red, B is dashed magenta, C is dash-dotted blue, and D is dotted green. (a) humans: production 0.08 Figure 2: Push effect data −0.1 0.02 0 −0.02 −0.04 −0.06 −0.08 A B C D previous category label −0.1 A B C D previous category label • Push and pull effects are not solely a consequence of errors or experimenter feedback. In quantitative estimation of push and pull effects, trial t is included in the data only if the response on trial t − 1 is correct. Thus, the effects follow trials in which no error feedback is given to the subjects, and therefore the adjustments are not due to explicit error correction. • Push and pull effects diminish over the course of the experiment. The magnitude of push effects can be measured by the slope of the regression lines fit to the data in Figure 2a. The slopes get shallower over successive trial blocks. The magnitude of pull effects can be measured by the standard deviation (SD) of the produced exemplars, which also decreases over successive trial blocks. • Accuracy increases steadily over the course of the experiment, from 78% correct responses in the first block to 91% in the final block. This improvement occurs despite the fact that error feedback is relatively infrequent and becomes even less frequent as performance improves. 2 Four Models In this paper, we explore four probabilistic (Bayesian) models to explain data described in the previous section. The key phenomenon to explain turns out to be the push effect, for which three of the four models fail to account. Modelers typically discard the models that they reject, and present only their pet model. In this work, we find it useful to report on the rejected models for three reasons. First, they help to set up and motivate the one successful model. Second, they include several obvious candidates, and we therefore have the imperative to address them. Third, in order to evaluate a model that can explain certain data, one needs to know the degree to which the the data constrain the space of models. If many models exist that are consistent with the data, one has little reason to prefer our pet candidate. Underlying all of the models is a generative probabilistic framework in which a category i is represented by a prototype value, di , on the dimension that discriminates among the categories. In the example used throughout this paper, the dimension is the diameter of a circle (hence the notation d for the prototype). An exemplar, E, of category i is drawn from a Gaussian distribution with mean di and variance vi , denoted E ∼ N (di , vi ). Category learning involves determining d ≡ {di }. In this work, we assume that the {vi } are fixed and given. Because d is unknown at the start of the experiment, it is treated as the value of a random vector, D ≡ {Di }. Figure 3a shows a simple graphical model representing the generative framework, in which E is the exemplar and C the category label. To formalize our discussion so far, we adopt the following notation: P (E|C = c, D = d) ∼ N (hc d, vc ), (1) where, for the time being, hc is a unary column vector all of whose elements are zero except for element c which has value 1. (Subscripts may indicate either an index over elements of a vector, or an index over vectors. Boldface is used for vectors and matrices.) Figure 3: (a) Graphical model depicting selection of an exemplar, E, of a category, C, based on the prototype vector, D; (b) Dynamic version of model indexed by trials, t (a) D (b) C Dt-1 Ct-1 E Dt Ct Et-1 Et We assume that the prototype representation, D, is multivariate Gaussian, D ∼ N (Ψ, Σ), where Ψ and Σ encode knowledge—and uncertainty in the knowledge—of the category prototype structure. Given this formulation, the uncertainty in D can be integrated out: P (E|C) ∼ N (hc Ψ, hc ΣhT + vc ). c (2) For the categorization task, a category label can be assigned by evaluating the category posterior, P (C|E), via Bayes rule, Equation 1, and the category priors, P (C). In this framework, learning takes place via trial-to-trial adaptation of the category prototype distribution, D. In Figure 3b, we add the subscript t to each random variable to denote the trial, yielding a dynamic graphical model for the sequential updating of the prototype vector, Dt . (The reader should be attentive to the fact that we use subscripted indices to denote both trials and category labels. We generally use the index t to denote trial, and c or i to denote a category label.) The goal of our modeling work is to show that the sequential updating process leads to context effects, such as the push and pull effects discussed earlier. We propose four alternative models to explore within this framework. The four models are obtained via the Cartesian product of two binary choices: the learning rule and the prototype representation. 2.1 Learning rule The first learning rule, maximum likelihood gradient ascent (MLGA), attempts to adjust the prototype representation so as to maximize the log posterior of the category given the exemplar. (The category, C = c, is the true label associated with the exemplar, i.e., either the target label the subject was asked to produce, or—if an error was made—the actual category label the subject did produce.) Gradient ascent is performed in all parameters of Ψ and Σ: ∆ψi = ψ ∂ log(P (c|e)) and ∂ψi ∆σij = σ ∂ log(P (c|e)), ∂σij (3) where ψ and σ are step sizes. To ensure that Σ remains a covariance matrix, constrained gradient 2 steps are applied. The constraints are: (1) diagonal terms are nonnegative, i.e., σi ≥ 0; (2) offdiagonal terms are symmetric, i.e., σij = σji ; and (3) the matrix remains positive definite, ensured σ by −1 ≤ σiijj ≤ 1. σ The second learning rule, a Kalman filter update (KFU), reestimates the uncertainty distribution of the prototypes given evidence provided by the current exemplar and category label. To draw the correspondence between our framework and a Kalman filter: the exemplar is a scalar measurement that pops out of the filter, the category prototypes are the hidden state of the filter, the measurement noise is vc , and the linear mapping from state to measurement is achieved by hc . Technically, the model is a measurement-switched Kalman filter, where the switching is determined by the category label c, i.e., the measurement function, hc , and noise, vc , are conditioned on c. The Kalman filter also allows temporal dynamics via the update equation, dt = Adt−1 , as well as internal process noise, whose covariance matrix is often denoted Q in standard Kalman filter notation. We investigated the choice of A and R, but because they did not impact the qualitative outcome of the simulations, we used A = I and R = 0. Given the correspondence we’ve established, the KFU equations—which specify Ψt+1 and Σt+1 as a function of ct , et , Ψt , and Σt —can be found in an introductory text (e.g., Maybeck, 1979). Change to a category prototype for each category following a trial of a given category. Solid (open) bars indicate trials in which the exemplar is larger (smaller) than the prototype. 2.2 prototype mvt. Figure 4: 0.2 trial t −1: A 0 −0.2 0.2 trial t −1: B 0 A B C trial t D −0.2 0.2 trial t −1: C 0 A B C trial t D −0.2 0.2 trial t −1: D 0 A B C trial t D −0.2 A B C trial t D Representation of the prototype The prototype representation that we described is localist: there is a one-to-one correspondence between the prototype for each category i and the random variable Di . To select the appropriate prototype given a current category c, we defined the unary vector hc and applied hc as a linear transform on D. The identical operations can be performed in conjunction with a distributed representation of the prototype. But we step back momentarily to motivate the distributed representation. The localist representation suffers from a key weakness: it does not exploit interrelatedness constraints on category structure. The task given to experimental subjects specifies that there are four categories, and they have an ordering; the circle diameters associated with category A are smaller than the diameters associated with B, etc. Consequently, dA < dB < dC < dD . One might make a further assumption that the category prototypes are equally spaced. Exploiting these two sources of domain knowledge leads to the distributed representation of category structure. A simple sort of distributed representation involves defining the prototype for category i not as di but as a linear function of an underlying two-dimensional state-space representation of structure. In this state space, d1 indicates the distance between categories and d2 an offset for all categories. This representation of state can be achieved by applying Equation 1 and defining hc = (nc , 1), where nc is the ordinal position of the category (nA = 1, nB = 2, etc.). We augment this representation with a bit of redundancy by incorporating not only the ordinal positions but also the reverse ordinal positions; this addition yields a symmetry in the representation between the two ends of the ordinal category scale. As a result of this augmentation, d becomes a three-dimensional state space, and hc = (nc , N + 1 − nc , 1), where N is the number of categories. To summarize, both the localist and distributed representations posit the existence of a hidden-state space—unknown at the start of learning—that specifies category prototypes. The localist model assumes one dimension in the state space per prototype, whereas the distributed model assumes fewer dimensions in the state space—three, in our proposal—than there are prototypes, and computes the prototype location as a function of the state. Both localist and distributed representations assume a fixed, known {hc } that specify the interpretation of the state space, or, in the case of the distributed model, the subject’s domain knowledge about category structure. 3 Simulation Methodology We defined a one-dimensional feature space in which categories A-D corresponded to the ranges [1, 2), [2, 3), [3, 4), and [4, 5), respectively. In the human experiment, responses were considered incorrect if they were smaller than A or larger than D; we call these two cases out-of-bounds-low (OOBL) and out-of-bounds-high (OOBH). OOBL and OOBH were treated as two additional categories, resulting in 6 categories altogether for the simulation. Subjects and the model were never asked to produce exemplars of OOBL or OOBH, but feedback was given if a response fell into these categories. As in the human experiment, our simulation involved 480 trials. We performed 100 replications of each simulation with identical initial conditions but different trial sequences, and averaged results over replications. All prototypes were initialized to have the same mean, 3.0, at the start of the simulation. Because subjects had some initial practice on the task before the start of the experimental trials, we provided the models with 12 initial trials of a categorization (not production) task, two for each of the 6 categories. (For the MLGA models, it was necessary to use a large step size on these trials to move the prototypes to roughly the correct neighborhood.) To perform the production task, the models must generate an exemplar given a category. It seems natural to draw an exemplar from the distribution in Equation 2 for P (E|C). However, this distribu- tion reflects the full range of exemplars that lie within the category boundaries, and presumably in the production task, subjects attempt to produce a prototypical exemplar. Consequently, we exclude the intrinsic category variance, vc , from Equation 2 in generating exemplars, leaving variance only via uncertainty about the prototype. Each model involved selection of various parameters and initial conditions. We searched the parameter space by hand, attempting to find parameters that satisfied basic properties of the data: the accuracy and response variance in the first and second halves of the experiment. We report only parameters for the one model that was successful, the MLGA-Distrib: ψ = 0.0075, σ = 1.5 × 10−6 for off-diagonal terms and 1.5 × 10−7 for diagonal terms (the gradient for the diagonal terms was relatively steep), Σ0 = 0.01I, and for all categories c, vc = 0.42 . 4 4.1 Results Push effect The phenomenon that most clearly distinguishes the models is the push effect. The push effect is manifested in sequential-dependency functions, which plot the (relative) response on trial t as a function of trial t − 1. As we explained using Figures 2a,b, the signature of the push effect is a negatively sloped line for each of the different trial t target categories. The sequential-dependency functions for the four models are presented in Figures 2c-f. KFU-Local (Figure 2c) produces a flat line, indicating no push whatsoever. The explanation for this result is straightforward: the Kalman filter update alters only the variable that is responsible for the measurement (exemplar) obtained on that trial. That variable is the prototype of the target class c, Dc . We thought the lack of an interaction among the category prototypes might be overcome with KFU-Distrib, because with a distributed prototype representation, all of the state variables jointly determine the target category prototype. However, our intuition turned out to be incorrect. We experimented with many different representations and parameter settings, but KFU-Distrib consistently obtained flat or shallow positive sloping lines (Figure 2d). MLGA-Local (Figure 2e) obtains a push effect for neighboring classes, but not distant classes. For example, examining the dashed magenta line, note that B is pushed away by A and C, but is not affected by D. MLGA-Local maximizes the likelihood of the target category both by pulling the classconditional density of the target category toward the exemplar and by pushing the class-conditional densities of the other categories away from the exemplar. However, if a category has little probability mass at the location of the exemplar, the increase in likelihood that results from pushing it further away is negligible, and consequently, so is the push effect. MLGA-Distrib obtains a lovely result (Figure 2f)—a negatively-sloped line, diagnostic of the push effect. The effect magnitude matches that in the human data (Figure 2a), and captures the key property that the push effect increases with the ordinal distance of the categories. We did not build a mechanism into MLGA-Distrib to produce the push effect; it is somewhat of an emergent property of the model. The state representation of MLGA-Distrib has three components: d1 , the weight of the ordinal position of a category prototype, d2 , the weight of the reverse ordinal position, and d3 , an offset. The last term, d3 , cannot be responsible for a push effect, because it shifts all prototypes equally, and therefore can only produce a flat sequential dependency function. Figure 4 helps provide an intuition how d1 and d2 work together to produce the push effect. Each graph shows the average movement of the category prototype (units on the y-axis are arbitrary) observed on trial t, for each of the four categories, following presentation of a given category on trial t − 1. Positve values on the y axis indicate increases in the prototype (movement to the right in Figure 1), and negative values decreases. Each solid vertical bar represents the movement of a given category prototype following a trial in which the exemplar is larger than its current prototype; each open vertical bar represents movement when the exemplar is to the left of its prototype. Notice that all category prototypes get larger or smaller on a given trial. But over the course of the experiment, the exemplar should be larger than the prototype as often as it is smaller, and the two shifts should sum together and partially cancel out. The result is the value indicated by the small horizontal bar along each line. The balance between the shifts in the two directions exactly corresponds to the push effect. Thus, the model produce a push-effect graph, but it is not truly producing a push effect as was originally conceived by the experimentalists. We are currently considering empirical consequences of this simulation result. Figure 5 shows a trial-by-trial trace from MLGA-Distrib. (a) class prototype 2 50 100 150 200 250 300 350 400 6 4 2 0 50 100 150 200 250 300 350 400 450 −4 50 100 150 200 250 50 100 150 200 300 350 400 450 250 300 350 400 450 300 350 400 450 300 350 400 450 0.2 0 −0.2 50 100 150 200 250 (f) 1 −2 −6 0.6 (e) (c) 0 0.8 0.4 450 (b) posterior log(class variance) P(correct) 4 0 (d) 1 shift (+=toward −=away) example 6 0.8 0.6 0.4 50 100 150 200 250 Figure 5: Trial-by-trial trace of MLGA-Distrib. (a) exemplars generated on one run of the simulation; (b) the mean and (c) variance of the class prototype distribution for the 6 classes on one run; (d) mean proportion correct over 100 replications of the simulation; (e) push and pull effects, as measured by changes to the prototype means: the upper (green) curve is the pull of the target prototype mean toward the exemplar, and the lower (red) curve is the push of the nontarget prototype means away from the exemplar, over 100 replications; (f) category posterior of the generated exemplar over 100 replications, reflecting gradient ascent in the posterior. 4.2 Other phenomena accounted for MLGA-Distrib captures the other phenomena we listed at the outset of this paper. Like all of the other models, MLGA-Distrib readily produces a pull effect, which is shown in the movement of category prototypes in Figure 5e. More observably, a pull effect is manifested when two successive trials of the same category are positively correlated: when trial t − 1 is to the left of the true category prototype, trial t is likely to be to the left as well. In the human data, the correlation coefficient over the experiment is 0.524; in the model, the coefficient is 0.496. The explanation for the pull effect is apparent: moving the category prototype to the exemplar increases the category likelihood. Although many learning effects in humans are based on error feedback, the experimental studies showed that push and pull effects occur even in the absence of errors, as they do in MLGA-Distrib. The model simply assumes that the target category it used to generate an exemplar is the correct category when no feedback to the contrary is provided. As long as the likelihood gradient is nonzero, category prototypes will be shifted. Pull and push effects shrink over the course of the experiment in human studies, as they do in the simulation. Figure 5e shows a reduction in both pull and push, as measured by the shift of the prototype means toward or away from the exemplar. We measured the slope of MLGA-Distrib’s push function (Figure 2f) for trials in the first and second half of the simulation. The slope dropped from −0.042 to −0.025, as one would expect from Figure 5e. (These slopes are obtained by combining responses from 100 replications of the simulation. Consequently, each point on the push function was an average over 6000 trials, and therefore the regression slopes are highly reliable.) A quantitative, observable measure of pull is the standard deviation (SD) of responses. As push and pull effects diminish, SDs should decrease. In human subjects, the response SDs in the first and second half of the experiment are 0.43 and 0.33, respectively. In the simulation, the response SDs are 0.51 and 0.38. Shrink reflects the fact that the model is approaching a local optimum in log likelihood, causing gradients—and learning steps—to become smaller. Not all model parameter settings lead to shrink; as in any gradient-based algorithm, step sizes that are too large do not lead to converge. However, such parameter settings make little sense in the context of the learning objective. 4.3 Model predictions MLGA-Distrib produces greater pull of the target category toward the exemplar than push of the neighboring categories away from the exemplar. In the simulation, the magnitude of the target pull— measured by the movement of the prototype mean—is 0.105, contrasted with the neighbor push, which is 0.017. After observing this robust result in the simulation, we found pertinent experimental data. Using the categorization paradigm, Zotov et al. (2003) found that if the exemplar on trial t is near a category border, subjects are more likely to produce an error if the category on trial t − 1 is repeated (i.e., a pull effect just took place) than if the previous trial is of the neighboring category (i.e., a push effect), even when the distance between exemplars on t − 1 and t is matched. The greater probability of error translates to a greater magnitude of pull than push. The experimental studies noted a phenomenon termed snap back. If the same target category is presented on successive trials, and an error is made on the first trial, subjects perform very accurately on the second trial, i.e., they generate an exemplar near the true category prototype. It appears as if subjects, realizing they have been slacking, reawaken and snap the category prototype back to where it belongs. We tested the model, but observed a sort of anti snap back. If the model made an error on the first trial, the mean deviation was larger—not smaller—on the second trial: 0.40 versus 0.32. Thus, MLGA-Distrib fails to explain this phenomenon. However, the phenomenon is not inconsistent with the model. One might suppose that on an error trial, subjects become more attentive, and increased attention might correspond to a larger learning rate on an error trial, which should yield a more accurate response on the following trial. McLaren et al. (1995) studied a phenomenon in humans known as peak shift, in which subjects are trained to categorize unidimensional stimuli into one of two categories. Subjects are faster and more accurate when presented with exemplars far from the category boundary than those near the boundary. In fact, they respond more efficiently to far exemplars than they do to the category prototype. The results are characterized in terms of the prototype of one category being pushed away from the prototype of the other category. It seems straightforward to explain these data in MLGA-Distrib as a type of long-term push effect. 5 Related Work and Conclusions Stewart, Brown, and Chater (2002) proposed an account of categorization context effects in which responses are based solely on the relative difference between the previous and present exemplars. No representation of the category prototype is maintained. However, classification based solely on relative difference cannot account for a diminished bias effects as a function of experience. A long-term stable prototype representation, of the sort incorporated into our models, seems necessary. We considered four models in our investigation, and the fact that only one accounts for the experimental data suggests that the data are nontrivial. All four models have principled theoretical underpinnings, and they space they define may suggest other elegant frameworks for understanding mechanisms of category learning. The successful model, MLDA-Distrib, offers a deep insight into understanding multiple-category domains: category structure must be considered. MLGA-Distrib exploits knowledge available to subjects performing the task concerning the ordinal relationships among categories. A model without this knowledge, MLGA-Local, fails to explain data. Thus, the interrelatedness of categories appears to provide a source of constraint that individuals use in learning about the structure of the world. Acknowledgments This research was supported by NSF BCS 0339103 and NSF CSE-SMA 0509521. Support for the second author comes from an NSERC fellowship. References Jones, M. N., & Mewhort, D. J. K. (2003). Sequential contrast and assimilation effects in categorization of perceptual stimuli. Poster presented at the 44th Meeting of the Psychonomic Society. Vancouver, B.C. Maybeck, P.S. (1979). Stochastic models, estimation, and control, Volume I. Academic Press. McLaren, I. P. L., et al. (1995). Prototype effects and peak shift in categorization. JEP:LMC, 21, 662–673. Stewart, N. Brown, G. D. A., & Chater, N. (2002). Sequence effects in categorization of simple perceptual stimuli. JEP:LMC, 28, 3–11. Zotov, V., Jones, M. N., & Mewhort, D. J. K. (2003). Trial-to-trial representation shifts in categorization. Poster presented at the 13th Meeting of the Canadian Society for Brain, Behaviour, and Cognitive Science: Hamilton, Ontario.

5 0.47160885 191 nips-2006-The Robustness-Performance Tradeoff in Markov Decision Processes

Author: Huan Xu, Shie Mannor

Abstract: Computation of a satisfactory control policy for a Markov decision process when the parameters of the model are not exactly known is a problem encountered in many practical applications. The traditional robust approach is based on a worstcase analysis and may lead to an overly conservative policy. In this paper we consider the tradeoff between nominal performance and the worst case performance over all possible models. Based on parametric linear programming, we propose a method that computes the whole set of Pareto efficient policies in the performancerobustness plane when only the reward parameters are subject to uncertainty. In the more general case when the transition probabilities are also subject to error, we show that the strategy with the “optimal” tradeoff might be non-Markovian and hence is in general not tractable. 1

6 0.4647162 112 nips-2006-Learning Nonparametric Models for Probabilistic Imitation

7 0.45339158 89 nips-2006-Handling Advertisements of Unknown Quality in Search Advertising

8 0.4232682 141 nips-2006-Multiple timescales and uncertainty in motor adaptation

9 0.38157484 90 nips-2006-Hidden Markov Dirichlet Process: Modeling Genetic Recombination in Open Ancestral Space

10 0.36231199 125 nips-2006-Logarithmic Online Regret Bounds for Undiscounted Reinforcement Learning

11 0.36144379 10 nips-2006-A Novel Gaussian Sum Smoother for Approximate Inference in Switching Linear Dynamical Systems

12 0.35718322 192 nips-2006-Theory and Dynamics of Perceptual Bistability

13 0.34570459 202 nips-2006-iLSTD: Eligibility Traces and Convergence Analysis

14 0.34253651 148 nips-2006-Nonlinear physically-based models for decoding motor-cortical population activity

15 0.32069862 100 nips-2006-Information Bottleneck for Non Co-Occurrence Data

16 0.28631505 135 nips-2006-Modelling transcriptional regulation using Gaussian Processes

17 0.28541979 149 nips-2006-Nonnegative Sparse PCA

18 0.28303859 197 nips-2006-Uncertainty, phase and oscillatory hippocampal recall

19 0.28197545 129 nips-2006-Map-Reduce for Machine Learning on Multicore

20 0.27530697 146 nips-2006-No-regret Algorithms for Online Convex Programs


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(1, 0.076), (3, 0.019), (6, 0.027), (7, 0.041), (9, 0.045), (22, 0.039), (25, 0.012), (44, 0.047), (57, 0.049), (65, 0.033), (69, 0.044), (71, 0.069), (82, 0.391), (90, 0.013)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.82904422 71 nips-2006-Effects of Stress and Genotype on Meta-parameter Dynamics in Reinforcement Learning

Author: Gediminas Lukšys, Jérémie Knüsel, Denis Sheynikhovich, Carmen Sandi, Wulfram Gerstner

Abstract: Stress and genetic background regulate different aspects of behavioral learning through the action of stress hormones and neuromodulators. In reinforcement learning (RL) models, meta-parameters such as learning rate, future reward discount factor, and exploitation-exploration factor, control learning dynamics and performance. They are hypothesized to be related to neuromodulatory levels in the brain. We found that many aspects of animal learning and performance can be described by simple RL models using dynamic control of the meta-parameters. To study the effects of stress and genotype, we carried out 5-hole-box light conditioning and Morris water maze experiments with C57BL/6 and DBA/2 mouse strains. The animals were exposed to different kinds of stress to evaluate its effects on immediate performance as well as on long-term memory. Then, we used RL models to simulate their behavior. For each experimental session, we estimated a set of model meta-parameters that produced the best fit between the model and the animal performance. The dynamics of several estimated meta-parameters were qualitatively similar for the two simulated experiments, and with statistically significant differences between different genetic strains and stress conditions. 1

2 0.61364836 56 nips-2006-Conditional Random Sampling: A Sketch-based Sampling Technique for Sparse Data

Author: Ping Li, Kenneth W. Church, Trevor J. Hastie

Abstract: We1 develop Conditional Random Sampling (CRS), a technique particularly suitable for sparse data. In large-scale applications, the data are often highly sparse. CRS combines sketching and sampling in that it converts sketches of the data into conditional random samples online in the estimation stage, with the sample size determined retrospectively. This paper focuses on approximating pairwise l2 and l1 distances and comparing CRS with random projections. For boolean (0/1) data, CRS is provably better than random projections. We show using real-world data that CRS often outperforms random projections. This technique can be applied in learning, data mining, information retrieval, and database query optimizations.

3 0.56666553 187 nips-2006-Temporal Coding using the Response Properties of Spiking Neurons

Author: Thomas Voegtlin

Abstract: In biological neurons, the timing of a spike depends on the timing of synaptic currents, in a way that is classically described by the Phase Response Curve. This has implications for temporal coding: an action potential that arrives on a synapse has an implicit meaning, that depends on the position of the postsynaptic neuron on the firing cycle. Here we show that this implicit code can be used to perform computations. Using theta neurons, we derive a spike-timing dependent learning rule from an error criterion. We demonstrate how to train an auto-encoder neural network using this rule. 1

4 0.34855258 191 nips-2006-The Robustness-Performance Tradeoff in Markov Decision Processes

Author: Huan Xu, Shie Mannor

Abstract: Computation of a satisfactory control policy for a Markov decision process when the parameters of the model are not exactly known is a problem encountered in many practical applications. The traditional robust approach is based on a worstcase analysis and may lead to an overly conservative policy. In this paper we consider the tradeoff between nominal performance and the worst case performance over all possible models. Based on parametric linear programming, we propose a method that computes the whole set of Pareto efficient policies in the performancerobustness plane when only the reward parameters are subject to uncertainty. In the more general case when the transition probabilities are also subject to error, we show that the strategy with the “optimal” tradeoff might be non-Markovian and hence is in general not tractable. 1

5 0.34390646 135 nips-2006-Modelling transcriptional regulation using Gaussian Processes

Author: Neil D. Lawrence, Guido Sanguinetti, Magnus Rattray

Abstract: Modelling the dynamics of transcriptional processes in the cell requires the knowledge of a number of key biological quantities. While some of them are relatively easy to measure, such as mRNA decay rates and mRNA abundance levels, it is still very hard to measure the active concentration levels of the transcription factor proteins that drive the process and the sensitivity of target genes to these concentrations. In this paper we show how these quantities for a given transcription factor can be inferred from gene expression levels of a set of known target genes. We treat the protein concentration as a latent function with a Gaussian process prior, and include the sensitivities, mRNA decay rates and baseline expression levels as hyperparameters. We apply this procedure to a human leukemia dataset, focusing on the tumour repressor p53 and obtaining results in good accordance with recent biological studies.

6 0.33683532 99 nips-2006-Information Bottleneck Optimization and Independent Component Extraction with Spiking Neurons

7 0.33639693 165 nips-2006-Real-time adaptive information-theoretic optimization of neurophysiology experiments

8 0.33305135 162 nips-2006-Predicting spike times from subthreshold dynamics of a neuron

9 0.33243293 154 nips-2006-Optimal Change-Detection and Spiking Neurons

10 0.32899371 167 nips-2006-Recursive ICA

11 0.31813812 65 nips-2006-Denoising and Dimension Reduction in Feature Space

12 0.31749091 5 nips-2006-A Kernel Method for the Two-Sample-Problem

13 0.31620142 95 nips-2006-Implicit Surfaces with Globally Regularised and Compactly Supported Basis Functions

14 0.31537569 76 nips-2006-Emergence of conjunctive visual features by quadratic independent component analysis

15 0.31466007 112 nips-2006-Learning Nonparametric Models for Probabilistic Imitation

16 0.31458923 32 nips-2006-Analysis of Empirical Bayesian Methods for Neuroelectromagnetic Source Localization

17 0.31451899 36 nips-2006-Attentional Processing on a Spike-Based VLSI Neural Network

18 0.31353188 175 nips-2006-Simplifying Mixture Models through Function Approximation

19 0.31220031 8 nips-2006-A Nonparametric Approach to Bottom-Up Visual Saliency

20 0.31196648 158 nips-2006-PG-means: learning the number of clusters in data