nips nips2005 nips2005-141 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Peter Dayan, Angela J. Yu
Abstract: Experimental data indicate that norepinephrine is critically involved in aspects of vigilance and attention. Previously, we considered the function of this neuromodulatory system on a time scale of minutes and longer, and suggested that it signals global uncertainty arising from gross changes in environmental contingencies. However, norepinephrine is also known to be activated phasically by familiar stimuli in welllearned tasks. Here, we extend our uncertainty-based treatment of norepinephrine to this phasic mode, proposing that it is involved in the detection and reaction to state uncertainty within a task. This role of norepinephrine can be understood through the metaphor of neural interrupts. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract Experimental data indicate that norepinephrine is critically involved in aspects of vigilance and attention. [sent-6, score-0.293]
2 Here, we extend our uncertainty-based treatment of norepinephrine to this phasic mode, proposing that it is involved in the detection and reaction to state uncertainty within a task. [sent-9, score-0.505]
3 There are various general notions about their roles, such as regulating sleeping and waking [13] and changing the signal to noise ratios of cortical neurons [11]. [sent-13, score-0.26]
4 In this paper, we focus on the short term activity of norepinephrine (NE) neurons in the locus coeruleus [18, 1, 2, 3, 16, 4]. [sent-15, score-0.756]
5 Given the widespread distribution and effects of NE in key cognitive tasks, it is very important to understand what it is in a task that drives the activity of NE neurons, and thus what computational effects it may be exerting. [sent-18, score-0.38]
6 Figure 1A;B;C show more tonic responses operating around a time-scale of minutes. [sent-20, score-0.234]
7 Briefly, Figures 1A;B show that when the rules of a task are reversed, NE influences the speed of adaptation to the changed contingency (Figure 1A) and the activity of noradrenergic cells is tonically elevated (Figure 1B). [sent-22, score-0.377]
8 Based on these data, we suggested [24, 25] that medium-term NE reports unexpected uncertainty arising from unpredicted changes in an environment or task. [sent-23, score-0.34]
9 It operates in collaboration with a putatively cholinergic signal which reports on expected uncertainty that arises, for instance, from known variability or noise. [sent-25, score-0.302]
10 (B) In a vigilance task, monkeys respond to rare targets and ignore common distractor stimuli. [sent-30, score-0.568]
11 The trace shows the activity of a single NE neuron in the locus coeruleus (LC) around the time of a target-distractor reversal (vertical line). [sent-31, score-0.503]
12 (C) Correlation between the gross fluctuations in the tonic activity of a single NE neuron (upper) and performance in the task (lower, measured by false alarm rate). [sent-34, score-0.629]
13 (D) Single NE cells are activated on a phasic time-scale stimulus locked (vertical line) to the target (upper plot) and not the distractor (lower plot). [sent-36, score-1.112]
14 (E) The average responses of a large number of norepinephrine cells (over a total of 41,454 trials) stimulus locked (vertical line) to targets or distractors, sorted by the nature and rectitude of the response. [sent-38, score-0.496]
15 (F) In a GO/NO-GO olfactory discrimination task for rats, single units are activated by the target odor (and not by the distractor odor), but are temporally much more tightly locked to the response (right) than the stimulus (left). [sent-41, score-1.217]
16 However, Figures 1D;E;F, along with other substantial neurophysiological data on the activity of NE neurons [18, 4], show NE neurons have phasic response properties that lie outside this model. [sent-44, score-0.637]
17 The data in Figure 1D;E come from a vigilance task [1], in which subjects can gain reward by reacting to a rare target (a rectangle oriented one way), while ignoring distractors (a rectangle oriented in the orthogonal direction). [sent-45, score-0.548]
18 Under these circumstances, NE is consistently activated by the target and not the distractor (Figure 1D). [sent-46, score-0.658]
19 There are also clear correlations in the magnitude of the NE activity and the nature of a trial: hit, miss, false alarm, correct reject (Figure 1E). [sent-47, score-0.268]
20 It is known that the activity is weaker if the targets are more common [17] (though the lack of response to rare distractors shows that NE is not driven by mere rarity), and disappears if no action need be taken in response to the target [18]. [sent-48, score-0.687]
21 In fact, the signal is more tightly related in time to the subsequent action than the preceding stimulus (Figure 1F). [sent-49, score-0.319]
22 Since it arises on every trial in an extremely well-learned task with stable stimulus contingencies, this NE signal clearly cannot be indicating unpredicted task changes. [sent-51, score-0.443]
23 Brown et al [5] have recently made the seminal suggestion that it reports changes in the statistical structure of the input (stimulus-present versus stimulus-absent) to decision-making circuits that are involved in initiating differential responding to distinct target stimuli. [sent-52, score-0.375]
24 A statistically necessary consequence of the change in the input structure is that afferent information should be integrated differently: sensory responses should be ignored if no target is present, but taken seriously otherwise. [sent-53, score-0.285]
25 In this paper, we argue for a related, but distinct, notion of phasic NE, suggesting that it reports on unexpected state changes within a task. [sent-55, score-0.509]
26 This is a significant, though natural, extension of its role in reporting unexpected task changes [25]. [sent-56, score-0.296]
27 In agreement with the various accounts of the effects of phasic NE, we consider its role as a form of internal interrupt signal [6]. [sent-58, score-0.596]
28 We argue that phasic NE is the medium for a somewhat similar neural interrupt, allowing the correct handling of statistically atypical events. [sent-61, score-0.302]
29 2qt st = target where qt = 1/(11−t) for (6 ≤ t ≤ 10) and qt = 0 otherwise. [sent-67, score-0.395]
30 The start and target states are assumed to be absorbing states (self-transition probability = 1). [sent-68, score-0.45]
31 This transition function ensures that the stimulus onset has a uniform distribution between 6 and 10 timesteps (and 0 otherwise). [sent-69, score-0.296]
32 Given that a transition out of start (into either target or distractor) takes place, the probability is . [sent-70, score-0.483]
33 In addition, it is assumed that the node start does not emit observations, while target emits xt = t with probability η > 0. [sent-73, score-0.444]
34 5 and d with probability 1 − η, and distractor emits xt = d with probability η and t with probability 1 − η. [sent-74, score-0.509]
35 The transition out of start is evident as soon as the first d or t is observed, while the magnitude of η controls the “confusability” of the target and distractor states. [sent-75, score-0.812]
36 The transition into target happens on step 10 (top), and the outputs generated are a mixture of t and d(middle), with an overall prevalence of t (bottom). [sent-77, score-0.361]
37 (2) st−1 Because start does not produce outputs, as soon as the first t is observed, the probability of start plummets to 0. [sent-85, score-0.289]
38 There then ensues an inferential battle between target and distractor, with the latter having the initial advantage, since its prior probability is 80%. [sent-86, score-0.322]
39 0 outputs T D state t d s 20 cumulative 10 outputs 0 10 output 20 timestep 30 1 0. [sent-90, score-0.265]
40 5 0 10 D P(start) P(distract) 20 timestep 30 NE activity Bt 1. [sent-93, score-0.29]
41 0 Probability A 5 hit stim 0 5 fa resp 0 5 miss 0 5 cr 0 10 20 timestep 30 Figure 2: The model. [sent-94, score-0.615]
42 (A) Task is modeled as a hidden Markov model (HMM), with transitions from start to either distractor (probability . [sent-95, score-0.49]
43 The transitions happen between timesteps 6 and 10 with uniform probability; distractor and target are absorbing states. [sent-98, score-0.694]
44 (B) Sample run with a transition from start to target at timestep 10 (upper). [sent-101, score-0.577]
45 The outputs favor target once the state has changed (middle), more clearly shown in the cumulative plot (bottom). [sent-102, score-0.334]
46 (D) Model NE signal for four trials including one for hit (top; same trials as in B;C), a false alarm (fa), a miss (miss) and a correct rejection (cr). [sent-105, score-0.575]
47 Because of the preponderance of transitions to distractor over target, the distractor state can be thought of as the reference or default state. [sent-108, score-0.853]
48 Evidence against that default state is a form of unexpected uncertainty within a task, and we propose that phasic NE reports this uncertainty. [sent-109, score-0.604]
49 2 is the prior probability of observing a target trial. [sent-114, score-0.268]
50 This implies the following intuitive relationship: the smaller the probability of the non-default state target the greater the NE-mediated “surprise” signal has to be in order to convince the inferential system that an anomalous stimulus has been observed. [sent-121, score-0.643]
51 We also assume that if the posterior probability of target reaches 0. [sent-122, score-0.268]
52 The NE activity during the start state is also rather arbitrary. [sent-131, score-0.351]
53 When the stimulus comes on, the divisive normalization makes the activity go above baseline because although the transition was expected, its occurrence was not predicted with perfect precision. [sent-133, score-0.381]
54 NE activity A 4 stim 3 hit fa 2 1 0 miss cr 10 20 30 40 Timestep B5 4 3 2 1 0 resp 10 20 30 40 Timestep Figure 3: NE activity. [sent-136, score-0.651]
55 (A) NE activity locked to the stimulus onset (ie the transition out of start). [sent-137, score-0.524]
56 (B) NE activity response-locked to the decision to act, just for hit and fa trials. [sent-138, score-0.384]
57 When the first t is observed on timestep 10, the probability of start drops to 0 and the probability of distractor, which has an initial advantage over target due to its higher probability, eventually loses out to target as the evidence overwhelms the prior. [sent-141, score-0.791]
58 Figure 3A shows the average NE signal for the four classes of responses (hit, false alarm, miss, and correct rejection), time-locked to the start of the stimulus. [sent-144, score-0.413]
59 Figure 3B shows the average signal locked to the time of reaction (for hit and false alarm trials) rather than stimulus onset. [sent-147, score-0.665]
60 As in the data (Figure 1F), response-locked activities are much more tightly clustered, although this flatters the model somewhat, since we do not allow for any variability in the response time as a function of when the probability of state target reaches the threshold. [sent-148, score-0.505]
61 Since the decay of the signal following a response is unconstrained, the trace terminates when the response is determined, usually when the probability of target reaches threshold, but also sometimes when there is an accidental erroneous response. [sent-149, score-0.568]
62 Figure 4A compares the effect of making the discrimination between target and distractor more or less difficult in the model (upper) and in the data (lower; [16]). [sent-151, score-0.662]
63 Note that since the NE signal is calculated relative to the prior likelihood, making target more likely would reduce the NE signal exactly proportionally. [sent-163, score-0.495]
64 4 Discussion The present model of the phasic activity of NE cells is a direct and major extension of our previous model of tonic aspects of this neuromodulator. [sent-165, score-0.562]
65 The key difference is that B C Spikes/sec NE activity A D5 4 3 4 hit 3 2 2 1 1 cr 0 Time (sec) Time (sec) 10 20 30 Timestep 40 0 10 20 30 Timestep 40 Figure 4: NE activities and task difficulty. [sent-166, score-0.484]
66 (A) Stimulus-locked LC responses are slower and broader for a more difficult discrimination; where difficulty is controlled by the similarity of target and distractor stimuli. [sent-167, score-0.693]
67 (D) Same traces aligned to response indicate NE activity in the difficult condition is attenuated in the model. [sent-174, score-0.248]
68 unexpected uncertainty is now about the state within a current characterization of the task rather than about the characterization as a whole. [sent-175, score-0.324]
69 In the model, NE activity is explicitly normalized by prior probabilities arising from the default state transitions in tasks. [sent-177, score-0.292]
70 This is necessary to measure specifically unexpected uncertainty, and explains the decrement in NE phasic response as a function of the target probability [17]. [sent-178, score-0.698]
71 We would also expect this transition to effect a different NE signature if stimuli were expected during start that could also be confused with those expected during target and distractor. [sent-181, score-0.45]
72 One useful way to think about the signal is in terms of an interrupt signal in computers. [sent-183, score-0.468]
73 Computers have highly centralized processing architecture, and therefore the interrupt signal only needs a very limited spatial extent to exert a widespread effect on the course of computation. [sent-185, score-0.399]
74 By contrast, processing in the brain is highly distributed, and therefore it is necessary for the interrupt signal to have a widespread distribution, so that the full ramifications of the failure of the current state can be felt. [sent-186, score-0.507]
75 The interrupt signal should engage mechanisms for establishing the new state, which then allows a new set of conditions to be established as to which interrupts will be allowed to occur, and also to take any appropriate action (as in the task we modeled). [sent-188, score-0.512]
76 The interrupt signal can be expected to be beneficial, for instance, when there is competition between tasks for the use of neural resources such as receptive fields [8]. [sent-189, score-0.338]
77 The interrupt-based account is a close relative of existing notions of phasic NE. [sent-192, score-0.258]
78 These propose that NE controls the gain in competitive decisionmaking networks that implement sequential decision-making [22], essentially by reporting on the changes in the statistical structure of the inputs induced by stimulus onset. [sent-197, score-0.265]
79 The idea that NE can signal the change in the input statistics occasioned by the (temporallyunpredictable) occurrence of the target is highly appealing. [sent-199, score-0.365]
80 However, the statistics of the input change when either the target or the distractor appears, and so the preference for responding to the target at the expense of the distractor is strange. [sent-200, score-1.194]
81 The effect of forcing the decision making network to become unstable, and therefore enforcing a speeded decision is much closer to an interrupt; but then it is not clear why this signal should decrease as the target becomes more common. [sent-201, score-0.365]
82 However, this alternative theory does make an important link to sequential statistical analysis [22], raising issues about things like thresholds for deciding target and distractor that should be important foci of future work here too. [sent-203, score-0.597]
83 The overall performance fluctuates dramatically (shown by the changing false alarm rate), in a manner that is tightly correlated with fluctuations in tonic NE activity. [sent-205, score-0.4]
84 Periods of high tonic activity are correlated with low phasic activation to the targets (data not shown). [sent-206, score-0.653]
85 The high tonic phase is associated with the former, with subjects failing to concentrate on the contingencies that lead to their current rewards in order to search for stimuli or actions that might be associated with better rewards. [sent-208, score-0.283]
86 Given the relationship between phasic and tonic firing, further investigation of these periodic fluctuations and their implications would be desirable. [sent-211, score-0.399]
87 Finally, in our previous model [24, 25], tonic NE was closely coupled with tonic acetylcholine (ACh), with the latter reporting expected rather than unexpected uncertainty. [sent-212, score-0.556]
88 The account of ACh should transfer somewhat directly into the short-term contingencies within a task – we might expect it to be involved in reporting on aspects of the known variability associated with each state, including each distinct stimulus state as well as the no-stimulus state. [sent-213, score-0.466]
89 As such, this ACh signal might be expected to be relatively more tonic than NE (an effect that is also apparent in our previous work on more tonic interactions between ACh and NE (eg Figure 2 of [24]). [sent-214, score-0.498]
90 One attractive target for an account along these lines is the sustained attention task studied by Sarter and colleagues, which involves temporal uncertainty. [sent-215, score-0.305]
91 Performance in this task is exquisitely sensitive to cholinergic manipulation [19], but unaffected by gross noradrenergic manipulation [15]. [sent-216, score-0.274]
92 Locus coeruleus neurons in monkey are selectively activated by attended cues in a vigilance task. [sent-221, score-0.57]
93 Conditioned responses of monkey locus coeruleus neurons anticipate Acquisition of discriminative behavior in a vigilance task. [sent-226, score-0.679]
94 Effects of putative neurotransmitters on neuronal activity in monkey auditory cortex. [sent-267, score-0.238]
95 [12] Freedman, R, Foote, SL & Bloom, FE (1975) Histochemical characterization of a neocortical projection of the nucleus locus coeruleus in the squirrel monkey. [sent-270, score-0.34]
96 Activation of monkey locus coeruleus neurons varies with difficulty and performance in a target detection task. [sent-289, score-0.737]
97 Phasic activation of monkey locus coeruleus (LC) neurons with recognition of motivationally relevant stimuli. [sent-294, score-0.552]
98 Plasticity of sensory responses of locus coeruleus neurons in the behaving rat: implications for cognition. [sent-298, score-0.477]
99 The role of locus coeruleus in the regulation of cognitive performance. [sent-306, score-0.34]
100 Alteration of brain noradrenergic activity in rhesus monkeys affects the alerting component of covert orienting. [sent-315, score-0.361]
wordName wordTfidf (topN-words)
[('distractor', 0.362), ('target', 0.235), ('phasic', 0.215), ('interrupt', 0.208), ('tonic', 0.184), ('coeruleus', 0.181), ('norepinephrine', 0.166), ('activity', 0.163), ('locus', 0.159), ('hit', 0.138), ('stimulus', 0.131), ('unexpected', 0.13), ('signal', 0.13), ('start', 0.128), ('timestep', 0.127), ('vigilance', 0.127), ('miss', 0.11), ('noradrenergic', 0.108), ('rajkowski', 0.108), ('ach', 0.108), ('locked', 0.108), ('interrupts', 0.104), ('contingencies', 0.099), ('ne', 0.094), ('alarm', 0.092), ('st', 0.09), ('cohen', 0.087), ('transition', 0.087), ('neurons', 0.087), ('response', 0.085), ('sara', 0.083), ('fa', 0.083), ('cr', 0.079), ('monkey', 0.075), ('rats', 0.072), ('task', 0.07), ('default', 0.069), ('reports', 0.066), ('false', 0.066), ('lc', 0.066), ('discrimination', 0.065), ('uncertainty', 0.064), ('neuromodulatory', 0.062), ('sarter', 0.062), ('widespread', 0.061), ('activated', 0.061), ('state', 0.06), ('reporting', 0.058), ('tightly', 0.058), ('dayan', 0.055), ('adapted', 0.054), ('absorbing', 0.054), ('inferential', 0.054), ('jd', 0.054), ('gross', 0.054), ('activation', 0.05), ('responses', 0.05), ('sec', 0.049), ('uctuations', 0.049), ('behavioral', 0.049), ('brain', 0.048), ('somewhat', 0.048), ('xt', 0.048), ('broader', 0.046), ('dif', 0.044), ('notions', 0.043), ('timesteps', 0.043), ('yu', 0.043), ('effects', 0.043), ('alerting', 0.042), ('bouret', 0.042), ('cholinergic', 0.042), ('distract', 0.042), ('foote', 0.042), ('idazoxan', 0.042), ('majczynski', 0.042), ('odor', 0.042), ('resp', 0.042), ('unpredicted', 0.042), ('hmm', 0.041), ('targets', 0.041), ('eg', 0.04), ('distractors', 0.04), ('correct', 0.039), ('cult', 0.039), ('cues', 0.039), ('outputs', 0.039), ('gain', 0.038), ('changes', 0.038), ('rare', 0.038), ('kubiak', 0.036), ('suggestion', 0.036), ('freedman', 0.036), ('elevated', 0.036), ('stim', 0.036), ('onset', 0.035), ('qt', 0.035), ('activities', 0.034), ('neuroscience', 0.034), ('probability', 0.033)]
simIndex simValue paperId paperTitle
same-paper 1 1.000001 141 nips-2005-Norepinephrine and Neural Interrupts
Author: Peter Dayan, Angela J. Yu
Abstract: Experimental data indicate that norepinephrine is critically involved in aspects of vigilance and attention. Previously, we considered the function of this neuromodulatory system on a time scale of minutes and longer, and suggested that it signals global uncertainty arising from gross changes in environmental contingencies. However, norepinephrine is also known to be activated phasically by familiar stimuli in welllearned tasks. Here, we extend our uncertainty-based treatment of norepinephrine to this phasic mode, proposing that it is involved in the detection and reaction to state uncertainty within a task. This role of norepinephrine can be understood through the metaphor of neural interrupts. 1
2 0.27154493 26 nips-2005-An exploration-exploitation model based on norepinepherine and dopamine activity
Author: Samuel M. McClure, Mark S. Gilzenrat, Jonathan D. Cohen
Abstract: We propose a model by which dopamine (DA) and norepinepherine (NE) combine to alternate behavior between relatively exploratory and exploitative modes. The model is developed for a target detection task for which there is extant single neuron recording data available from locus coeruleus (LC) NE neurons. An exploration-exploitation trade-off is elicited by regularly switching which of the two stimuli are rewarded. DA functions within the model to change synaptic weights according to a reinforcement learning algorithm. Exploration is mediated by the state of LC firing, with higher tonic and lower phasic activity producing greater response variability. The opposite state of LC function, with lower baseline firing rate and greater phasic responses, favors exploitative behavior. Changes in LC firing mode result from combined measures of response conflict and reward rate, where response conflict is monitored using models of anterior cingulate cortex (ACC). Increased long-term response conflict and decreased reward rate, which occurs following reward contingency switch, favors the higher tonic state of LC function and NE release. This increases exploration, and facilitates discovery of the new target. 1 In t rod u ct i on A central problem in reinforcement learning is determining how to adaptively move between exploitative and exploratory behaviors in changing environments. We propose a set of neurophysiologic mechanisms whose interaction may mediate this behavioral shift. Empirical work on the midbrain dopamine (DA) system has suggested that this system is particularly well suited for guiding exploitative behaviors. This hypothesis has been reified by a number of studies showing that a temporal difference (TD) learning algorithm accounts for activity in these neurons in a wide variety of behavioral tasks [1,2]. DA release is believed to encode a reward prediction error signal that acts to change synaptic weights relevant for producing behaviors [3]. Through learning, this allows neural pathways to predict future expected reward through the relative strength of their synaptic connections [1]. Decision-making procedures based on these value estimates are necessarily greedy. Including reward bonuses for exploratory choices supports non-greedy actions [4] and accounts for additional data derived from DA neurons [5]. We show that combining a DA learning algorithm with models of response conflict detection [6] and NE function [7] produces an effective annealing procedure for alternating between exploration and exploitation. NE neurons within the LC alternate between two firing modes [8]. In the first mode, known as the phasic mode, NE neurons fire at a low baseline rate but have relatively robust phasic responses to behaviorally salient stimuli. The second mode, called the tonic mode, is associated with a higher baseline firing and absent or attenuated phasic responses. The effects of NE on efferent areas are modulatory in nature, and are well captured as a change in the gain of efferent inputs so that neuronal responses are potentiated in the presence of NE [9]. Thus, in phasic mode, the LC provides transient facilitation in processing, time-locked to the presence of behaviorally salient information in motor or decision areas. Conversely, in tonic mode, higher overall LC discharge rate increases gain generally and hence increases the probability of arbitrary responding. Consistent with this account, for periods when NE neurons are in the phasic mode, monkey performance is nearly perfect. However, when NE neurons are in the tonic mode, performance is more erratic, with increased response times and error rate [8]. These findings have led to a recent characterization of the LC as a dynamic temporal filter, adjusting the system's relative responsivity to salient and irrelevant information [8]. In this way, the LC is ideally positioned to mediate the shift between exploitative and exploratory behavior. The parameters that underlie changes in LC firing mode remain largely unexplored. Based on data from a target detection task by Aston-Jones and colleagues [10], we propose that LC firing mode is determined in part by measures of response conflict and reward rate as calculated by the ACC and OFC, respectively [8]. Together, the ACC and OFC are the principle sources of cortical input to the LC [8]. Activity in the ACC is known, largely through human neuroimaging experiments, to change in accord with response conflict [6]. In brief, relatively equal activity in competing behavioral responses (reflecting uncertainty) produces high conflict. Low conflict results when one behavioral response predominates. We propose that increased long-term response conflict biases the LC towards a tonic firing mode. Increased conflict necessarily follows changes in reward contingency. As the previously rewarded target no longer produces reward, there will be a relative increase in response ambiguity and hence conflict. This relationship between conflict and LC firing is analogous to other modeling work [11], which proposes that increased tonic firing reflects increased environmental uncertainty. As a final component to our model, we hypothesize that the OFC maintains an ongoing estimate in reward rate, and that this estimate of reward rate also influences LC firing mode. As reward rate increases, we assume that the OFC tends to bias the LC in favor of phasic firing to target stimuli. We have aimed to fix model parameters based on previous work using simpler networks. We use parameters derived primarily from a previous model of the LC by Gilzenrat and colleagues [7]. Integration of response conflict by the ACC and its influence on LC firing was borrowed from unpublished work by Gilzenrat and colleagues in which they fit human behavioral data in a diminishing utilities task. Given this approach, we interpret our observed improvement in model performance with combined NE and DA function as validation of a mechanism for automatically switching between exploitative and exploratory action selection. 2 G o- No- G o Task and Core Mod el We have modeled an experiment in which monkeys performed a target detection task [10]. In the task, monkeys were shown either a vertical bar or a horizontal bar and were required to make or omit a motor response appropriately. Initially, the vertical bar was the target stimulus and correctly responding was rewarded with a squirt of fruit juice (r=1 in the model). Responding to the non-target horizontal stimulus resulted in time out punishment (r=-.1; Figure 1A). No responses to either the target or non-target gave zero reward. After the monkeys had fully acquired the task, the experimenters periodically switched the reward contingency such that the previously rewarded stimulus (target) became the distractor, and vice versa. Following such reversals, LC neurons were observed to change from emitting phasic bursts of firing to the target, to tonic firing following the switch, and slowly back to phasic firing for the new target as the new response criteria was obtained [10]. Figure 1: Task and model design. (A) Responses were required for targets in order to obtain reward. Responses to distractors resulted in a minor punishment. No responses gave zero reward. (B) In the model, vertical and horizontal bar inputs (I1 and I 2 ) fed to integrator neurons (X1 and X2 ) which then drove response units (Y1 and Y2 ). Responses were made if Y 1 or Y2 crossed a threshold while input units were active. We have previously modeled this task [7,12] with a three-layer connectionist network in which two input units, I1 and I 2 , corresponding to the vertical and horizontal bars, drive two mutually inhibitory integrator units, X1 and X2 . The integrator units subsequently feed two response units, Y1 and Y2 (Figure 1B). Responses are made whenever output from Y1 or Y2 crosses a threshold level of activity, θ. Relatively weak cross connections from each input unit to the opposite integrator unit (I1 to X2 and I 2 to X1 ) are intended to model stimulus similarity. Both the integrator and response units were modeled as noisy, leaky accumulators: ˙ X i =
3 0.22919014 149 nips-2005-Optimal cue selection strategy
Author: Vidhya Navalpakkam, Laurent Itti
Abstract: Survival in the natural world demands the selection of relevant visual cues to rapidly and reliably guide attention towards prey and predators in cluttered environments. We investigate whether our visual system selects cues that guide search in an optimal manner. We formally obtain the optimal cue selection strategy by maximizing the signal to noise ratio (SN R) between a search target and surrounding distractors. This optimal strategy successfully accounts for several phenomena in visual search behavior, including the effect of target-distractor discriminability, uncertainty in target’s features, distractor heterogeneity, and linear separability. Furthermore, the theory generates a new prediction, which we verify through psychophysical experiments with human subjects. Our results provide direct experimental evidence that humans select visual cues so as to maximize SN R between the targets and surrounding clutter.
4 0.17938362 194 nips-2005-Top-Down Control of Visual Attention: A Rational Account
Author: Michael Shettel, Shaun Vecera, Michael C. Mozer
Abstract: Theories of visual attention commonly posit that early parallel processes extract conspicuous features such as color contrast and motion from the visual field. These features are then combined into a saliency map, and attention is directed to the most salient regions first. Top-down attentional control is achieved by modulating the contribution of different feature types to the saliency map. A key source of data concerning attentional control comes from behavioral studies in which the effect of recent experience is examined as individuals repeatedly perform a perceptual discrimination task (e.g., “what shape is the odd-colored object?”). The robust finding is that repetition of features of recent trials (e.g., target color) facilitates performance. We view this facilitation as an adaptation to the statistical structure of the environment. We propose a probabilistic model of the environment that is updated after each trial. Under the assumption that attentional control operates so as to make performance more efficient for more likely environmental states, we obtain parsimonious explanations for data from four different experiments. Further, our model provides a rational explanation for why the influence of past experience on attentional control is short lived. 1 INTRODUCTION The brain does not have the computational capacity to fully process the massive quantity of information provided by the eyes. Selective attention operates to filter the spatiotemporal stream to a manageable quantity. Key to understanding the nature of attention is discovering the algorithm governing selection, i.e., understanding what information will be selected and what will be suppressed. Selection is influenced by attributes of the spatiotemporal stream, often referred to as bottom-up contributions to attention. For example, attention is drawn to abrupt onsets, motion, and regions of high contrast in brightness and color. Most theories of attention posit that some visual information processing is performed preattentively and in parallel across the visual field. This processing extracts primitive visual features such as color and motion, which provide the bottom-up cues for attentional guidance. However, attention is not driven willy nilly by these cues. The deployment of attention can be modulated by task instructions, current goals, and domain knowledge, collectively referred to as top-down contributions to attention. How do bottom-up and top-down contributions to attention interact? Most psychologically and neurobiologically motivated models propose a very similar architecture in which information from bottom-up and top-down sources combines in a saliency (or activation) map (e.g., Itti et al., 1998; Koch & Ullman, 1985; Mozer, 1991; Wolfe, 1994). The saliency map indicates, for each location in the visual field, the relative importance of that location. Attention is drawn to the most salient locations first. Figure 1 sketches the basic architecture that incorporates bottom-up and top-down contributions to the saliency map. The visual image is analyzed to extract maps of primitive features such as color and orientation. Associated with each location in a map is a scalar visual image horizontal primitive feature maps vertical green top-down gains red saliency map FIGURE 1. An attentional saliency map constructed from bottom-up and top-down information bottom-up activations FIGURE 2. Sample display from Experiment 1 of Maljkovic and Nakayama (1994) response or activation indicating the presence of a particular feature. Most models assume that responses are stronger at locations with high local feature contrast, consistent with neurophysiological data, e.g., the response of a red feature detector to a red object is stronger if the object is surrounded by green objects. The saliency map is obtained by taking a sum of bottom-up activations from the feature maps. The bottom-up activations are modulated by a top-down gain that specifies the contribution of a particular map to saliency in the current task and environment. Wolfe (1994) describes a heuristic algorithm for determining appropriate gains in a visual search task, where the goal is to detect a target object among distractor objects. Wolfe proposes that maps encoding features that discriminate between target and distractors have higher gains, and to be consistent with the data, he proposes limits on the magnitude of gain modulation and the number of gains that can be modulated. More recently, Wolfe et al. (2003) have been explicit in proposing optimization as a principle for setting gains given the task definition and stimulus environment. One aspect of optimizing attentional control involves configuring the attentional system to perform a given task; for example, in a visual search task for a red vertical target among green vertical and red horizontal distractors, the task definition should result in a higher gain for red and vertical feature maps than for other feature maps. However, there is a more subtle form of gain modulation, which depends on the statistics of display environments. For example, if green vertical distractors predominate, then red is a better discriminative cue than vertical; and if red horizontal distractors predominate, then vertical is a better discriminative cue than red. In this paper, we propose a model that encodes statistics of the environment in order to allow for optimization of attentional control to the structure of the environment. Our model is designed to address a key set of behavioral data, which we describe next. 1.1 Attentional priming phenomena Psychological studies involve a sequence of experimental trials that begin with a stimulus presentation and end with a response from the human participant. Typically, trial order is randomized, and the context preceding a trial is ignored. However, in sequential studies, performance is examined on one trial contingent on the past history of trials. These sequential studies explore how experience influences future performance. Consider a the sequential attentional task of Maljkovic and Nakayama (1994). On each trial, the stimulus display (Figure 2) consists of three notched diamonds, one a singleton in color—either green among red or red among green. The task is to report whether the singleton diamond, referred to as the target, is notched on the left or the right. The task is easy because the singleton pops out, i.e., the time to locate the singleton does not depend on the number of diamonds in the display. Nonetheless, the response time significantly depends on the sequence of trials leading up to the current trial: If the target is the same color on the cur- rent trial as on the previous trial, response time is roughly 100 ms faster than if the target is a different color on the current trial. Considering that response times are on the order of 700 ms, this effect, which we term attentional priming, is gigantic in the scheme of psychological phenomena. 2 ATTENTIONAL CONTROL AS ADAPTATION TO THE STATISTICS OF THE ENVIRONMENT We interpret the phenomenon of attentional priming via a particular perspective on attentional control, which can be summarized in two bullets. • The perceptual system dynamically constructs a probabilistic model of the environment based on its past experience. • Control parameters of the attentional system are tuned so as to optimize performance under the current environmental model. The primary focus of this paper is the environmental model, but we first discuss the nature of performance optimization. The role of attention is to make processing of some stimuli more efficient, and consequently, the processing of other stimuli less efficient. For example, if the gain on the red feature map is turned up, processing will be efficient for red items, but competition from red items will reduce the efficiency for green items. Thus, optimal control should tune the system for the most likely states of the world by minimizing an objective function such as: J(g) = ∑ P ( e )RT g ( e ) (1) e where g is a vector of top-down gains, e is an index over environmental states, P(.) is the probability of an environmental state, and RTg(.) is the expected response time—assuming a constant error rate—to the environmental state under gains g. Determining the optimal gains is a challenge because every gain setting will result in facilitation of responses to some environmental states but hindrance of responses to other states. The optimal control problem could be solved via direct reinforcement learning, but the rapidity of human learning makes this possibility unlikely: In a variety of experimental tasks, evidence suggests that adaptation to a new task or environment can occur in just one or two trials (e.g., Rogers & Monsell, 1996). Model-based reinforcement learning is an attractive alternative, because given a model, optimization can occur without further experience in the real world. Although the number of real-world trials necessary to achieve a given level of performance is comparable for direct and model-based reinforcement learning in stationary environments (Kearns & Singh, 1999), naturalistic environments can be viewed as highly nonstationary. In such a situation, the framework we suggest is well motivated: After each experience, the environment model is updated. The updated environmental model is then used to retune the attentional system. In this paper, we propose a particular model of the environment suitable for visual search tasks. Rather than explicitly modeling the optimization of attentional control by setting gains, we assume that the optimization process will serve to minimize Equation 1. Because any gain adjustment will facilitate performance in some environmental states and hinder performance in others, an optimized control system should obtain faster reaction times for more probable environmental states. This assumption allows us to explain experimental results in a minimal, parsimonious framework. 3 MODELING THE ENVIRONMENT Focusing on the domain of visual search, we characterize the environment in terms of a probability distribution over configurations of target and distractor features. We distinguish three classes of features: defining, reported, and irrelevant. To explain these terms, consider the task of searching a display of size varying, colored, notched diamonds (Figure 2), with the task of detecting the singleton in color and judging the notch location. Color is the defining feature, notch location is the reported feature, and size is an irrelevant feature. To simplify the exposition, we treat all features as having discrete values, an assumption which is true of the experimental tasks we model. We begin by considering displays containing a single target and a single distractor, and shortly generalize to multidistractor displays. We use the framework of Bayesian networks to characterize the environment. Each feature of the target and distractor is a discrete random variable, e.g., Tcolor for target color and Dnotch for the location of the notch on the distractor. The Bayes net encodes the probability distribution over environmental states; in our working example, this distribution is P(Tcolor, Tsize, Tnotch, Dcolor, Dsize, Dnotch). The structure of the Bayes net specifies the relationships among the features. The simplest model one could consider would be to treat the features as independent, illustrated in Figure 3a for singleton-color search task. The opposite extreme would be the full joint distribution, which could be represented by a look up table indexed by the six features, or by the cascading Bayes net architecture in Figure 3b. The architecture we propose, which we’ll refer to as the dominance model (Figure 3c), has an intermediate dependency structure, and expresses the joint distribution as: P(Tcolor)P(Dcolor |Tcolor)P(Tsize |Tcolor)P(Tnotch |Tcolor)P(Dsize |Dcolor)P(Dnotch |Tcolor). The structured model is constructed based on three rules. 1. The defining feature of the target is at the root of the tree. 2. The defining feature of the distractor is conditionally dependent on the defining feature of the target. We refer to this rule as dominance of the target over the distractor. 3. The reported and irrelevant features of target (distractor) are conditionally dependent on the defining feature of the target (distractor). We refer to this rule as dominance of the defining feature over nondefining features. As we will demonstrate, the dominance model produces a parsimonious account of a wide range of experimental data. 3.1 Updating the environment model The model’s parameters are the conditional distributions embodied in the links. In the example of Figure 3c with binary random variables, the model has 11 parameters. However, these parameters are determined by the environment: To be adaptive in nonstationary environments, the model must be updated following each experienced state. We propose a simple exponentially weighted averaging approach. For two variables V and W with observed values v and w on trial t, a conditional distribution, P t ( V = u W = w ) = δ uv , is (a) Tcolor Dcolor Tsize Tnotch (b) Tcolor Dcolor Dsize Tsize Dnotch Tnotch (c) Tcolor Dcolor Dsize Tsize Dsize Dnotch Tnotch Dnotch FIGURE 3. Three models of a visual-search environment with colored, notched, size-varying diamonds. (a) feature-independence model; (b) full-joint model; (c) dominance model. defined, where δ is the Kronecker delta. The distribution representing the environment E following trial t, denoted P t , is then updated as follows: E E P t ( V = u W = w ) = αP t – 1 ( V = u W = w ) + ( 1 – α )P t ( V = u W = w ) (2) for all u, where α is a memory constant. Note that no update is performed for values of W other than w. An analogous update is performed for unconditional distributions. E How the model is initialized—i.e., specifying P 0 —is irrelevant, because all experimental tasks that we model, participants begin the experiment with many dozens of practice trials. E Data is not collected during practice trials. Consequently, any transient effects of P 0 do E not impact the results. In our simulations, we begin with a uniform distribution for P 0 , and include practice trials as in the human studies. Thus far, we’ve assumed a single target and a single distractor. The experiments that we model involve multiple distractors. The simple extension we require to handle multiple distractors is to define a frequentist probability for each distractor feature V, P t ( V = v W = w ) = C vw ⁄ C w , where C vw is the count of co-occurrences of feature values v and w among the distractors, and C w is the count of w. Our model is extremely simple. Given a description of the visual search task and environment, the model has only a single degree of freedom, α . In all simulations, we fix α = 0.75 ; however, the choice of α does not qualitatively impact any result. 4 SIMULATIONS In this section, we show that the model can explain a range of data from four different experiments examining attentional priming. All experiments measure response times of participants. On each trial, the model can be used to obtain a probability of the display configuration (the environmental state) on that trial, given the history of trials to that point. Our critical assumption—as motivated earlier—is that response times monotonically decrease with increasing probability, indicating that visual information processing is better configured for more likely environmental states. The particular relationship we assume is that response times are linear in log probability. This assumption yields long response time tails, as are observed in all human studies. 4.1 Maljkovic and Nakayama (1994, Experiment 5) In this experiment, participants were asked to search for a singleton in color in a display of three red or green diamonds. Each diamond was notched on either the left or right side, and the task was to report the side of the notch on the color singleton. The well-practiced participants made very few errors. Reaction time (RT) was examined as a function of whether the target on a given trial is the same or different color as the target on trial n steps back or ahead. Figure 4 shows the results, with the human RTs in the left panel and the simulation log probabilities in the right panel. The horizontal axis represents n. Both graphs show the same outcome: repetition of target color facilitates performance. This influence lasts only for a half dozen trials, with an exponentially decreasing influence further into the past. In the model, this decreasing influence is due to the exponential decay of recent history (Equation 2). Figure 4 also shows that—as expected—the future has no influence on the current trial. 4.2 Maljkovic and Nakayama (1994, Experiment 8) In the previous experiment, it is impossible to determine whether facilitation is due to repetition of the target’s color or the distractor’s color, because the display contains only two colors, and therefore repetition of target color implies repetition of distractor color. To unconfound these two potential factors, an experiment like the previous one was con- ducted using four distinct colors, allowing one to examine the effect of repeating the target color while varying the distractor color, and vice versa. The sequence of trials was composed of subsequences of up-to-six consecutive trials with either the target or distractor color held constant while the other color was varied trial to trial. Following each subsequence, both target and distractors were changed. Figure 5 shows that for both humans and the simulation, performance improves toward an asymptote as the number of target and distractor repetitions increases; in the model, the asymptote is due to the probability of the repeated color in the environment model approaching 1.0. The performance improvement is greater for target than distractor repetition; in the model, this difference is due to the dominance of the defining feature of the target over the defining feature of the distractor. 4.3 Huang, Holcombe, and Pashler (2004, Experiment 1) Huang et al. (2004) and Hillstrom (2000) conducted studies to determine whether repetitions of one feature facilitate performance independently of repetitions of another feature. In the Huang et al. study, participants searched for a singleton in size in a display consisting of lines that were short and long, slanted left or right, and colored white or black. The reported feature was target slant. Slant, size, and color were uncorrelated. Huang et al. discovered that repeating an irrelevant feature (color or orientation) facilitated performance, but only when the defining feature (size) was repeated. As shown in Figure 6, the model replicates human performance, due to the dominance of the defining feature over the reported and irrelevant features. 4.4 Wolfe, Butcher, Lee, and Hyde (2003, Experiment 1) In an empirical tour-de-force, Wolfe et al. (2003) explored singleton search over a range of environments. The task is to detect the presence or absence of a singleton in displays conHuman data Different Color 600 Different Color 590 580 570 15 13 11 9 7 Past 5 3.2 3 Same Color 2.8 Same Color 560 550 Simulation 3.4 log(P(trial)) Reaction Time (msec) 610 3 1 +1 +3 +5 Future 2.6 +7 15 13 Relative Trial Number 11 9 7 Past 5 3 1 +1 +3 +5 Future +7 Relative Trial Number FIGURE 4. Experiment 5 of Maljkovic and Nakayama (1994): performance on a given trial conditional on the color of the target on a previous or subsequent trial. Human data is from subject KN. 650 6 Distractors Same 630 5.5 log(P(trial)) FIGURE 5. Experiment 8 of Maljkovic and Nakayama (1994). (left panel) human data, average of subjects KN and SS; (right panel) simulation Reaction Time (msec) 640 620 Target Same 610 5 Distractors Same 4.5 4 600 Target Same 3.5 590 3 580 1 2 3 4 5 1 6 4 5 6 4 1000 Size Alternate Size Alternate log(P(trial)) Reaction Time (msec) 3 4.2 1050 FIGURE 6. Experiment 1 of Huang, Holcombe, & Pashler (2004). (left panel) human data; (right panel) simulation 2 Order in Sequence Order in Sequence 950 3.8 3.6 3.4 900 3.2 Size Repeat 850 Size Repeat 3 Color Repeat Color Alternate Color Repeat Color Alternate sisting of colored (red or green), oriented (horizontal or vertical) lines. Target-absent trials were used primarily to ensure participants were searching the display. The experiment examined seven experimental conditions, which varied in the amount of uncertainty as to the target identity. The essential conditions, from least to most uncertainty, are: blocked (e.g., target always red vertical among green horizontals), mixed feature (e.g., target always a color singleton), mixed dimension (e.g., target either red or vertical), and fully mixed (target could be red, green, vertical, or horizontal). With this design, one can ascertain how uncertainty in the environment and in the target definition influence task difficulty. Because the defining feature in this experiment could be either color or orientation, we modeled the environment with two Bayes nets—one color dominant and one orientation dominant—and performed model averaging. A comparison of Figures 7a and 7b show a correspondence between human RTs and model predictions. Less uncertainty in the environment leads to more efficient performance. One interesting result from the model is its prediction that the mixed-feature condition is easier than the fully-mixed condition; that is, search is more efficient when the dimension (i.e., color vs. orientation) of the singleton is known, even though the model has no abstract representation of feature dimensions, only feature values. 4.5 Optimal adaptation constant In all simulations so far, we fixed the memory constant. From the human data, it is clear that memory for recent experience is relatively short lived, on the order of a half dozen trials (e.g., left panel of Figure 4). In this section we provide a rational argument for the short duration of memory in attentional control. Figure 7c shows mean negative log probability in each condition of the Wolfe et al. (2003) experiment, as a function of α . To assess these probabilities, for each experimental condition, the model was initialized so that all of the conditional distributions were uniform, and then a block of trials was run. Log probability for all trials in the block was averaged. The negative log probability (y axis of the Figure) is a measure of the model’s misprediction of the next trial in the sequence. For complex environments, such as the fully-mixed condition, a small memory constant is detrimental: With rapid memory decay, the effective history of trials is a high-variance sample of the distribution of environmental states. For simple environments, a large memory constant is detrimental: With slow memory decay, the model does not transition quickly from the initial environmental model to one that reflects the statistics of a new environment. Thus, the memory constant is constrained by being large enough that the environment model can hold on to sufficient history to represent complex environments, and by being small enough that the model adapts quickly to novel environments. If the conditions in Wolfe et al. give some indication of the range of naturalistic environments an agent encounters, we have a rational account of why attentional priming is so short lived. Whether priming lasts 2 trials or 20, the surprising empirical result is that it does not last 200 or 2000 trials. Our rational argument provides a rough insight into this finding. (a) fully mixed mixed feature mixed dimension blocked 460 (c) Simulation fully mixed mixed feature mixed dimension blocked 4 5 420 log(P(trial)) 440 2 Blocked Red or Vertical Blocked Red and Vertical Mixed Feature Mixed Dimension Fully Mixed 4 3 log(P(trial)) reaction time (msec) (b) Human Data 480 3 2 1 400 1 380 0 360 0 red or vert red and vert target type red or vert red and vert target type 0 0.5 0.8 0.9 0.95 0.98 Memory Constant FIGURE 7. (a) Human data for Wolfe et al. (2003), Experiment 1; (b) simulation; (c) misprediction of model (i.e., lower y value = better) as a function of α for five experimental condition 5 DISCUSSION The psychological literature contains two opposing accounts of attentional priming and its relation to attentional control. Huang et al. (2004) and Hillstrom (2000) propose an episodic account in which a distinct memory trace—representing the complete configuration of features in the display—is laid down for each trial, and priming depends on configural similarity of the current trial to previous trials. Alternatively, Maljkovic and Nakayama (1994) and Wolfe et al. (2003) propose a feature-strengthening account in which detection of a feature on one trial increases its ability to attract attention on subsequent trials, and priming is proportional to the number of overlapping features from one trial to the next. The episodic account corresponds roughly to the full joint model (Figure 3b), and the feature-strengthening account corresponds roughly to the independence model (Figure 3a). Neither account is adequate to explain the range of data we presented. However, an intermediate account, the dominance model (Figure 3c), is not only sufficient, but it offers a parsimonious, rational explanation. Beyond the model’s basic assumptions, it has only one free parameter, and can explain results from diverse experimental paradigms. The model makes a further theoretical contribution. Wolfe et al. distinguish the environments in their experiment in terms of the amount of top-down control available, implying that different mechanisms might be operating in different environments. However, in our account, top-down control is not some substance distributed in different amounts depending on the nature of the environment. Our account treats all environments uniformly, relying on attentional control to adapt to the environment at hand. We conclude with two limitations of the present work. First, our account presumes a particular network architecture, instead of a more elegant Bayesian approach that specifies priors over architectures, and performs automatic model selection via the sequence of trials. We did explore such a Bayesian approach, but it was unable to explain the data. Second, at least one finding in the literature is problematic for the model. Hillstrom (2000) occasionally finds that RTs slow when an irrelevant target feature is repeated but the defining target feature is not. However, because this effect is observed only in some experiments, it is likely that any model would require elaboration to explain the variability. ACKNOWLEDGEMENTS We thank Jeremy Wolfe for providing the raw data from his experiment for reanalysis. This research was funded by NSF BCS Award 0339103. REFERENCES Huang, L, Holcombe, A. O., & Pashler, H. (2004). Repetition priming in visual search: Episodic retrieval, not feature priming. Memory & Cognition, 32, 12–20. Hillstrom, A. P. (2000). Repetition effects in visual search. Perception & Psychophysics, 62, 800-817. Itti, L., Koch, C., & Niebur, E. (1998). A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. Pattern Analysis & Machine Intelligence, 20, 1254–1259. Kearns, M., & Singh, S. (1999). Finite-sample convergence rates for Q-learning and indirect algorithms. In Advances in Neural Information Processing Systems 11 (pp. 996–1002). Cambridge, MA: MIT Press. Koch, C. and Ullman, S. (1985). Shifts in selective visual attention: towards the underlying neural circuitry. Human Neurobiology, 4, 219–227. Maljkovic, V., & Nakayama, K. (1994). Priming of pop-out: I. Role of features. Mem. & Cognition, 22, 657-672. Mozer, M. C. (1991). The perception of multiple objects: A connectionist approach. Cambridge, MA: MIT Press. Rogers, R. D., & Monsell, S. (1995). The cost of a predictable switch between simple cognitive tasks. Journal of Experimental Psychology: General, 124, 207–231. Wolfe, J.M. (1994). Guided Search 2.0: A Revised Model of Visual Search. Psych. Bull. & Rev., 1, 202–238. Wolfe, J. S., Butcher, S. J., Lee, C., & Hyde, M. (2003). Changing your mind: on the contributions of top-down and bottom-up guidance in visual search for feature singletons. Journal of Exptl. Psychology: Human Perception & Performance, 29, 483-502.
5 0.13265112 67 nips-2005-Extracting Dynamical Structure Embedded in Neural Activity
Author: Afsheen Afshar, Gopal Santhanam, Stephen I. Ryu, Maneesh Sahani, Byron M. Yu, Krishna V. Shenoy
Abstract: Spiking activity from neurophysiological experiments often exhibits dynamics beyond that driven by external stimulation, presumably reflecting the extensive recurrence of neural circuitry. Characterizing these dynamics may reveal important features of neural computation, particularly during internally-driven cognitive operations. For example, the activity of premotor cortex (PMd) neurons during an instructed delay period separating movement-target specification and a movementinitiation cue is believed to be involved in motor planning. We show that the dynamics underlying this activity can be captured by a lowdimensional non-linear dynamical systems model, with underlying recurrent structure and stochastic point-process output. We present and validate latent variable methods that simultaneously estimate the system parameters and the trial-by-trial dynamical trajectories. These methods are applied to characterize the dynamics in PMd data recorded from a chronically-implanted 96-electrode array while monkeys perform delayed-reach tasks. 1
6 0.1153698 91 nips-2005-How fast to work: Response vigor, motivation and tonic dopamine
7 0.10582394 28 nips-2005-Analyzing Auditory Neurons by Learning Distance Functions
8 0.091979221 109 nips-2005-Learning Cue-Invariant Visual Responses
9 0.088015884 5 nips-2005-A Computational Model of Eye Movements during Object Class Detection
10 0.086300135 173 nips-2005-Sensory Adaptation within a Bayesian Framework for Perception
11 0.081170313 129 nips-2005-Modeling Neural Population Spiking Activity with Gibbs Distributions
12 0.076914974 124 nips-2005-Measuring Shared Information and Coordinated Activity in Neuronal Networks
13 0.075161271 157 nips-2005-Principles of real-time computing with feedback applied to cortical microcircuit models
14 0.071469218 164 nips-2005-Representing Part-Whole Relationships in Recurrent Neural Networks
15 0.069420047 94 nips-2005-Identifying Distributed Object Representations in Human Extrastriate Visual Cortex
16 0.068612292 121 nips-2005-Location-based activity recognition
17 0.064896554 41 nips-2005-Coarse sample complexity bounds for active learning
18 0.06474416 60 nips-2005-Dynamic Social Network Analysis using Latent Space Models
19 0.064333357 8 nips-2005-A Criterion for the Convergence of Learning with Spike Timing Dependent Plasticity
20 0.064259984 64 nips-2005-Efficient estimation of hidden state dynamics from spike trains
topicId topicWeight
[(0, 0.207), (1, -0.193), (2, 0.102), (3, 0.106), (4, 0.009), (5, 0.134), (6, -0.019), (7, -0.197), (8, -0.193), (9, 0.187), (10, 0.076), (11, -0.084), (12, -0.081), (13, -0.092), (14, -0.121), (15, 0.007), (16, 0.079), (17, 0.067), (18, 0.084), (19, -0.274), (20, 0.042), (21, 0.096), (22, 0.098), (23, 0.084), (24, -0.048), (25, -0.037), (26, -0.101), (27, 0.07), (28, -0.014), (29, 0.033), (30, 0.013), (31, -0.101), (32, -0.086), (33, 0.04), (34, 0.005), (35, -0.001), (36, -0.055), (37, 0.002), (38, 0.028), (39, -0.053), (40, -0.011), (41, -0.028), (42, 0.048), (43, 0.042), (44, -0.006), (45, -0.031), (46, -0.034), (47, -0.0), (48, -0.06), (49, 0.068)]
simIndex simValue paperId paperTitle
same-paper 1 0.97172612 141 nips-2005-Norepinephrine and Neural Interrupts
Author: Peter Dayan, Angela J. Yu
Abstract: Experimental data indicate that norepinephrine is critically involved in aspects of vigilance and attention. Previously, we considered the function of this neuromodulatory system on a time scale of minutes and longer, and suggested that it signals global uncertainty arising from gross changes in environmental contingencies. However, norepinephrine is also known to be activated phasically by familiar stimuli in welllearned tasks. Here, we extend our uncertainty-based treatment of norepinephrine to this phasic mode, proposing that it is involved in the detection and reaction to state uncertainty within a task. This role of norepinephrine can be understood through the metaphor of neural interrupts. 1
2 0.86562043 26 nips-2005-An exploration-exploitation model based on norepinepherine and dopamine activity
Author: Samuel M. McClure, Mark S. Gilzenrat, Jonathan D. Cohen
Abstract: We propose a model by which dopamine (DA) and norepinepherine (NE) combine to alternate behavior between relatively exploratory and exploitative modes. The model is developed for a target detection task for which there is extant single neuron recording data available from locus coeruleus (LC) NE neurons. An exploration-exploitation trade-off is elicited by regularly switching which of the two stimuli are rewarded. DA functions within the model to change synaptic weights according to a reinforcement learning algorithm. Exploration is mediated by the state of LC firing, with higher tonic and lower phasic activity producing greater response variability. The opposite state of LC function, with lower baseline firing rate and greater phasic responses, favors exploitative behavior. Changes in LC firing mode result from combined measures of response conflict and reward rate, where response conflict is monitored using models of anterior cingulate cortex (ACC). Increased long-term response conflict and decreased reward rate, which occurs following reward contingency switch, favors the higher tonic state of LC function and NE release. This increases exploration, and facilitates discovery of the new target. 1 In t rod u ct i on A central problem in reinforcement learning is determining how to adaptively move between exploitative and exploratory behaviors in changing environments. We propose a set of neurophysiologic mechanisms whose interaction may mediate this behavioral shift. Empirical work on the midbrain dopamine (DA) system has suggested that this system is particularly well suited for guiding exploitative behaviors. This hypothesis has been reified by a number of studies showing that a temporal difference (TD) learning algorithm accounts for activity in these neurons in a wide variety of behavioral tasks [1,2]. DA release is believed to encode a reward prediction error signal that acts to change synaptic weights relevant for producing behaviors [3]. Through learning, this allows neural pathways to predict future expected reward through the relative strength of their synaptic connections [1]. Decision-making procedures based on these value estimates are necessarily greedy. Including reward bonuses for exploratory choices supports non-greedy actions [4] and accounts for additional data derived from DA neurons [5]. We show that combining a DA learning algorithm with models of response conflict detection [6] and NE function [7] produces an effective annealing procedure for alternating between exploration and exploitation. NE neurons within the LC alternate between two firing modes [8]. In the first mode, known as the phasic mode, NE neurons fire at a low baseline rate but have relatively robust phasic responses to behaviorally salient stimuli. The second mode, called the tonic mode, is associated with a higher baseline firing and absent or attenuated phasic responses. The effects of NE on efferent areas are modulatory in nature, and are well captured as a change in the gain of efferent inputs so that neuronal responses are potentiated in the presence of NE [9]. Thus, in phasic mode, the LC provides transient facilitation in processing, time-locked to the presence of behaviorally salient information in motor or decision areas. Conversely, in tonic mode, higher overall LC discharge rate increases gain generally and hence increases the probability of arbitrary responding. Consistent with this account, for periods when NE neurons are in the phasic mode, monkey performance is nearly perfect. However, when NE neurons are in the tonic mode, performance is more erratic, with increased response times and error rate [8]. These findings have led to a recent characterization of the LC as a dynamic temporal filter, adjusting the system's relative responsivity to salient and irrelevant information [8]. In this way, the LC is ideally positioned to mediate the shift between exploitative and exploratory behavior. The parameters that underlie changes in LC firing mode remain largely unexplored. Based on data from a target detection task by Aston-Jones and colleagues [10], we propose that LC firing mode is determined in part by measures of response conflict and reward rate as calculated by the ACC and OFC, respectively [8]. Together, the ACC and OFC are the principle sources of cortical input to the LC [8]. Activity in the ACC is known, largely through human neuroimaging experiments, to change in accord with response conflict [6]. In brief, relatively equal activity in competing behavioral responses (reflecting uncertainty) produces high conflict. Low conflict results when one behavioral response predominates. We propose that increased long-term response conflict biases the LC towards a tonic firing mode. Increased conflict necessarily follows changes in reward contingency. As the previously rewarded target no longer produces reward, there will be a relative increase in response ambiguity and hence conflict. This relationship between conflict and LC firing is analogous to other modeling work [11], which proposes that increased tonic firing reflects increased environmental uncertainty. As a final component to our model, we hypothesize that the OFC maintains an ongoing estimate in reward rate, and that this estimate of reward rate also influences LC firing mode. As reward rate increases, we assume that the OFC tends to bias the LC in favor of phasic firing to target stimuli. We have aimed to fix model parameters based on previous work using simpler networks. We use parameters derived primarily from a previous model of the LC by Gilzenrat and colleagues [7]. Integration of response conflict by the ACC and its influence on LC firing was borrowed from unpublished work by Gilzenrat and colleagues in which they fit human behavioral data in a diminishing utilities task. Given this approach, we interpret our observed improvement in model performance with combined NE and DA function as validation of a mechanism for automatically switching between exploitative and exploratory action selection. 2 G o- No- G o Task and Core Mod el We have modeled an experiment in which monkeys performed a target detection task [10]. In the task, monkeys were shown either a vertical bar or a horizontal bar and were required to make or omit a motor response appropriately. Initially, the vertical bar was the target stimulus and correctly responding was rewarded with a squirt of fruit juice (r=1 in the model). Responding to the non-target horizontal stimulus resulted in time out punishment (r=-.1; Figure 1A). No responses to either the target or non-target gave zero reward. After the monkeys had fully acquired the task, the experimenters periodically switched the reward contingency such that the previously rewarded stimulus (target) became the distractor, and vice versa. Following such reversals, LC neurons were observed to change from emitting phasic bursts of firing to the target, to tonic firing following the switch, and slowly back to phasic firing for the new target as the new response criteria was obtained [10]. Figure 1: Task and model design. (A) Responses were required for targets in order to obtain reward. Responses to distractors resulted in a minor punishment. No responses gave zero reward. (B) In the model, vertical and horizontal bar inputs (I1 and I 2 ) fed to integrator neurons (X1 and X2 ) which then drove response units (Y1 and Y2 ). Responses were made if Y 1 or Y2 crossed a threshold while input units were active. We have previously modeled this task [7,12] with a three-layer connectionist network in which two input units, I1 and I 2 , corresponding to the vertical and horizontal bars, drive two mutually inhibitory integrator units, X1 and X2 . The integrator units subsequently feed two response units, Y1 and Y2 (Figure 1B). Responses are made whenever output from Y1 or Y2 crosses a threshold level of activity, θ. Relatively weak cross connections from each input unit to the opposite integrator unit (I1 to X2 and I 2 to X1 ) are intended to model stimulus similarity. Both the integrator and response units were modeled as noisy, leaky accumulators: ˙ X i =
3 0.67574209 149 nips-2005-Optimal cue selection strategy
Author: Vidhya Navalpakkam, Laurent Itti
Abstract: Survival in the natural world demands the selection of relevant visual cues to rapidly and reliably guide attention towards prey and predators in cluttered environments. We investigate whether our visual system selects cues that guide search in an optimal manner. We formally obtain the optimal cue selection strategy by maximizing the signal to noise ratio (SN R) between a search target and surrounding distractors. This optimal strategy successfully accounts for several phenomena in visual search behavior, including the effect of target-distractor discriminability, uncertainty in target’s features, distractor heterogeneity, and linear separability. Furthermore, the theory generates a new prediction, which we verify through psychophysical experiments with human subjects. Our results provide direct experimental evidence that humans select visual cues so as to maximize SN R between the targets and surrounding clutter.
4 0.67391586 194 nips-2005-Top-Down Control of Visual Attention: A Rational Account
Author: Michael Shettel, Shaun Vecera, Michael C. Mozer
Abstract: Theories of visual attention commonly posit that early parallel processes extract conspicuous features such as color contrast and motion from the visual field. These features are then combined into a saliency map, and attention is directed to the most salient regions first. Top-down attentional control is achieved by modulating the contribution of different feature types to the saliency map. A key source of data concerning attentional control comes from behavioral studies in which the effect of recent experience is examined as individuals repeatedly perform a perceptual discrimination task (e.g., “what shape is the odd-colored object?”). The robust finding is that repetition of features of recent trials (e.g., target color) facilitates performance. We view this facilitation as an adaptation to the statistical structure of the environment. We propose a probabilistic model of the environment that is updated after each trial. Under the assumption that attentional control operates so as to make performance more efficient for more likely environmental states, we obtain parsimonious explanations for data from four different experiments. Further, our model provides a rational explanation for why the influence of past experience on attentional control is short lived. 1 INTRODUCTION The brain does not have the computational capacity to fully process the massive quantity of information provided by the eyes. Selective attention operates to filter the spatiotemporal stream to a manageable quantity. Key to understanding the nature of attention is discovering the algorithm governing selection, i.e., understanding what information will be selected and what will be suppressed. Selection is influenced by attributes of the spatiotemporal stream, often referred to as bottom-up contributions to attention. For example, attention is drawn to abrupt onsets, motion, and regions of high contrast in brightness and color. Most theories of attention posit that some visual information processing is performed preattentively and in parallel across the visual field. This processing extracts primitive visual features such as color and motion, which provide the bottom-up cues for attentional guidance. However, attention is not driven willy nilly by these cues. The deployment of attention can be modulated by task instructions, current goals, and domain knowledge, collectively referred to as top-down contributions to attention. How do bottom-up and top-down contributions to attention interact? Most psychologically and neurobiologically motivated models propose a very similar architecture in which information from bottom-up and top-down sources combines in a saliency (or activation) map (e.g., Itti et al., 1998; Koch & Ullman, 1985; Mozer, 1991; Wolfe, 1994). The saliency map indicates, for each location in the visual field, the relative importance of that location. Attention is drawn to the most salient locations first. Figure 1 sketches the basic architecture that incorporates bottom-up and top-down contributions to the saliency map. The visual image is analyzed to extract maps of primitive features such as color and orientation. Associated with each location in a map is a scalar visual image horizontal primitive feature maps vertical green top-down gains red saliency map FIGURE 1. An attentional saliency map constructed from bottom-up and top-down information bottom-up activations FIGURE 2. Sample display from Experiment 1 of Maljkovic and Nakayama (1994) response or activation indicating the presence of a particular feature. Most models assume that responses are stronger at locations with high local feature contrast, consistent with neurophysiological data, e.g., the response of a red feature detector to a red object is stronger if the object is surrounded by green objects. The saliency map is obtained by taking a sum of bottom-up activations from the feature maps. The bottom-up activations are modulated by a top-down gain that specifies the contribution of a particular map to saliency in the current task and environment. Wolfe (1994) describes a heuristic algorithm for determining appropriate gains in a visual search task, where the goal is to detect a target object among distractor objects. Wolfe proposes that maps encoding features that discriminate between target and distractors have higher gains, and to be consistent with the data, he proposes limits on the magnitude of gain modulation and the number of gains that can be modulated. More recently, Wolfe et al. (2003) have been explicit in proposing optimization as a principle for setting gains given the task definition and stimulus environment. One aspect of optimizing attentional control involves configuring the attentional system to perform a given task; for example, in a visual search task for a red vertical target among green vertical and red horizontal distractors, the task definition should result in a higher gain for red and vertical feature maps than for other feature maps. However, there is a more subtle form of gain modulation, which depends on the statistics of display environments. For example, if green vertical distractors predominate, then red is a better discriminative cue than vertical; and if red horizontal distractors predominate, then vertical is a better discriminative cue than red. In this paper, we propose a model that encodes statistics of the environment in order to allow for optimization of attentional control to the structure of the environment. Our model is designed to address a key set of behavioral data, which we describe next. 1.1 Attentional priming phenomena Psychological studies involve a sequence of experimental trials that begin with a stimulus presentation and end with a response from the human participant. Typically, trial order is randomized, and the context preceding a trial is ignored. However, in sequential studies, performance is examined on one trial contingent on the past history of trials. These sequential studies explore how experience influences future performance. Consider a the sequential attentional task of Maljkovic and Nakayama (1994). On each trial, the stimulus display (Figure 2) consists of three notched diamonds, one a singleton in color—either green among red or red among green. The task is to report whether the singleton diamond, referred to as the target, is notched on the left or the right. The task is easy because the singleton pops out, i.e., the time to locate the singleton does not depend on the number of diamonds in the display. Nonetheless, the response time significantly depends on the sequence of trials leading up to the current trial: If the target is the same color on the cur- rent trial as on the previous trial, response time is roughly 100 ms faster than if the target is a different color on the current trial. Considering that response times are on the order of 700 ms, this effect, which we term attentional priming, is gigantic in the scheme of psychological phenomena. 2 ATTENTIONAL CONTROL AS ADAPTATION TO THE STATISTICS OF THE ENVIRONMENT We interpret the phenomenon of attentional priming via a particular perspective on attentional control, which can be summarized in two bullets. • The perceptual system dynamically constructs a probabilistic model of the environment based on its past experience. • Control parameters of the attentional system are tuned so as to optimize performance under the current environmental model. The primary focus of this paper is the environmental model, but we first discuss the nature of performance optimization. The role of attention is to make processing of some stimuli more efficient, and consequently, the processing of other stimuli less efficient. For example, if the gain on the red feature map is turned up, processing will be efficient for red items, but competition from red items will reduce the efficiency for green items. Thus, optimal control should tune the system for the most likely states of the world by minimizing an objective function such as: J(g) = ∑ P ( e )RT g ( e ) (1) e where g is a vector of top-down gains, e is an index over environmental states, P(.) is the probability of an environmental state, and RTg(.) is the expected response time—assuming a constant error rate—to the environmental state under gains g. Determining the optimal gains is a challenge because every gain setting will result in facilitation of responses to some environmental states but hindrance of responses to other states. The optimal control problem could be solved via direct reinforcement learning, but the rapidity of human learning makes this possibility unlikely: In a variety of experimental tasks, evidence suggests that adaptation to a new task or environment can occur in just one or two trials (e.g., Rogers & Monsell, 1996). Model-based reinforcement learning is an attractive alternative, because given a model, optimization can occur without further experience in the real world. Although the number of real-world trials necessary to achieve a given level of performance is comparable for direct and model-based reinforcement learning in stationary environments (Kearns & Singh, 1999), naturalistic environments can be viewed as highly nonstationary. In such a situation, the framework we suggest is well motivated: After each experience, the environment model is updated. The updated environmental model is then used to retune the attentional system. In this paper, we propose a particular model of the environment suitable for visual search tasks. Rather than explicitly modeling the optimization of attentional control by setting gains, we assume that the optimization process will serve to minimize Equation 1. Because any gain adjustment will facilitate performance in some environmental states and hinder performance in others, an optimized control system should obtain faster reaction times for more probable environmental states. This assumption allows us to explain experimental results in a minimal, parsimonious framework. 3 MODELING THE ENVIRONMENT Focusing on the domain of visual search, we characterize the environment in terms of a probability distribution over configurations of target and distractor features. We distinguish three classes of features: defining, reported, and irrelevant. To explain these terms, consider the task of searching a display of size varying, colored, notched diamonds (Figure 2), with the task of detecting the singleton in color and judging the notch location. Color is the defining feature, notch location is the reported feature, and size is an irrelevant feature. To simplify the exposition, we treat all features as having discrete values, an assumption which is true of the experimental tasks we model. We begin by considering displays containing a single target and a single distractor, and shortly generalize to multidistractor displays. We use the framework of Bayesian networks to characterize the environment. Each feature of the target and distractor is a discrete random variable, e.g., Tcolor for target color and Dnotch for the location of the notch on the distractor. The Bayes net encodes the probability distribution over environmental states; in our working example, this distribution is P(Tcolor, Tsize, Tnotch, Dcolor, Dsize, Dnotch). The structure of the Bayes net specifies the relationships among the features. The simplest model one could consider would be to treat the features as independent, illustrated in Figure 3a for singleton-color search task. The opposite extreme would be the full joint distribution, which could be represented by a look up table indexed by the six features, or by the cascading Bayes net architecture in Figure 3b. The architecture we propose, which we’ll refer to as the dominance model (Figure 3c), has an intermediate dependency structure, and expresses the joint distribution as: P(Tcolor)P(Dcolor |Tcolor)P(Tsize |Tcolor)P(Tnotch |Tcolor)P(Dsize |Dcolor)P(Dnotch |Tcolor). The structured model is constructed based on three rules. 1. The defining feature of the target is at the root of the tree. 2. The defining feature of the distractor is conditionally dependent on the defining feature of the target. We refer to this rule as dominance of the target over the distractor. 3. The reported and irrelevant features of target (distractor) are conditionally dependent on the defining feature of the target (distractor). We refer to this rule as dominance of the defining feature over nondefining features. As we will demonstrate, the dominance model produces a parsimonious account of a wide range of experimental data. 3.1 Updating the environment model The model’s parameters are the conditional distributions embodied in the links. In the example of Figure 3c with binary random variables, the model has 11 parameters. However, these parameters are determined by the environment: To be adaptive in nonstationary environments, the model must be updated following each experienced state. We propose a simple exponentially weighted averaging approach. For two variables V and W with observed values v and w on trial t, a conditional distribution, P t ( V = u W = w ) = δ uv , is (a) Tcolor Dcolor Tsize Tnotch (b) Tcolor Dcolor Dsize Tsize Dnotch Tnotch (c) Tcolor Dcolor Dsize Tsize Dsize Dnotch Tnotch Dnotch FIGURE 3. Three models of a visual-search environment with colored, notched, size-varying diamonds. (a) feature-independence model; (b) full-joint model; (c) dominance model. defined, where δ is the Kronecker delta. The distribution representing the environment E following trial t, denoted P t , is then updated as follows: E E P t ( V = u W = w ) = αP t – 1 ( V = u W = w ) + ( 1 – α )P t ( V = u W = w ) (2) for all u, where α is a memory constant. Note that no update is performed for values of W other than w. An analogous update is performed for unconditional distributions. E How the model is initialized—i.e., specifying P 0 —is irrelevant, because all experimental tasks that we model, participants begin the experiment with many dozens of practice trials. E Data is not collected during practice trials. Consequently, any transient effects of P 0 do E not impact the results. In our simulations, we begin with a uniform distribution for P 0 , and include practice trials as in the human studies. Thus far, we’ve assumed a single target and a single distractor. The experiments that we model involve multiple distractors. The simple extension we require to handle multiple distractors is to define a frequentist probability for each distractor feature V, P t ( V = v W = w ) = C vw ⁄ C w , where C vw is the count of co-occurrences of feature values v and w among the distractors, and C w is the count of w. Our model is extremely simple. Given a description of the visual search task and environment, the model has only a single degree of freedom, α . In all simulations, we fix α = 0.75 ; however, the choice of α does not qualitatively impact any result. 4 SIMULATIONS In this section, we show that the model can explain a range of data from four different experiments examining attentional priming. All experiments measure response times of participants. On each trial, the model can be used to obtain a probability of the display configuration (the environmental state) on that trial, given the history of trials to that point. Our critical assumption—as motivated earlier—is that response times monotonically decrease with increasing probability, indicating that visual information processing is better configured for more likely environmental states. The particular relationship we assume is that response times are linear in log probability. This assumption yields long response time tails, as are observed in all human studies. 4.1 Maljkovic and Nakayama (1994, Experiment 5) In this experiment, participants were asked to search for a singleton in color in a display of three red or green diamonds. Each diamond was notched on either the left or right side, and the task was to report the side of the notch on the color singleton. The well-practiced participants made very few errors. Reaction time (RT) was examined as a function of whether the target on a given trial is the same or different color as the target on trial n steps back or ahead. Figure 4 shows the results, with the human RTs in the left panel and the simulation log probabilities in the right panel. The horizontal axis represents n. Both graphs show the same outcome: repetition of target color facilitates performance. This influence lasts only for a half dozen trials, with an exponentially decreasing influence further into the past. In the model, this decreasing influence is due to the exponential decay of recent history (Equation 2). Figure 4 also shows that—as expected—the future has no influence on the current trial. 4.2 Maljkovic and Nakayama (1994, Experiment 8) In the previous experiment, it is impossible to determine whether facilitation is due to repetition of the target’s color or the distractor’s color, because the display contains only two colors, and therefore repetition of target color implies repetition of distractor color. To unconfound these two potential factors, an experiment like the previous one was con- ducted using four distinct colors, allowing one to examine the effect of repeating the target color while varying the distractor color, and vice versa. The sequence of trials was composed of subsequences of up-to-six consecutive trials with either the target or distractor color held constant while the other color was varied trial to trial. Following each subsequence, both target and distractors were changed. Figure 5 shows that for both humans and the simulation, performance improves toward an asymptote as the number of target and distractor repetitions increases; in the model, the asymptote is due to the probability of the repeated color in the environment model approaching 1.0. The performance improvement is greater for target than distractor repetition; in the model, this difference is due to the dominance of the defining feature of the target over the defining feature of the distractor. 4.3 Huang, Holcombe, and Pashler (2004, Experiment 1) Huang et al. (2004) and Hillstrom (2000) conducted studies to determine whether repetitions of one feature facilitate performance independently of repetitions of another feature. In the Huang et al. study, participants searched for a singleton in size in a display consisting of lines that were short and long, slanted left or right, and colored white or black. The reported feature was target slant. Slant, size, and color were uncorrelated. Huang et al. discovered that repeating an irrelevant feature (color or orientation) facilitated performance, but only when the defining feature (size) was repeated. As shown in Figure 6, the model replicates human performance, due to the dominance of the defining feature over the reported and irrelevant features. 4.4 Wolfe, Butcher, Lee, and Hyde (2003, Experiment 1) In an empirical tour-de-force, Wolfe et al. (2003) explored singleton search over a range of environments. The task is to detect the presence or absence of a singleton in displays conHuman data Different Color 600 Different Color 590 580 570 15 13 11 9 7 Past 5 3.2 3 Same Color 2.8 Same Color 560 550 Simulation 3.4 log(P(trial)) Reaction Time (msec) 610 3 1 +1 +3 +5 Future 2.6 +7 15 13 Relative Trial Number 11 9 7 Past 5 3 1 +1 +3 +5 Future +7 Relative Trial Number FIGURE 4. Experiment 5 of Maljkovic and Nakayama (1994): performance on a given trial conditional on the color of the target on a previous or subsequent trial. Human data is from subject KN. 650 6 Distractors Same 630 5.5 log(P(trial)) FIGURE 5. Experiment 8 of Maljkovic and Nakayama (1994). (left panel) human data, average of subjects KN and SS; (right panel) simulation Reaction Time (msec) 640 620 Target Same 610 5 Distractors Same 4.5 4 600 Target Same 3.5 590 3 580 1 2 3 4 5 1 6 4 5 6 4 1000 Size Alternate Size Alternate log(P(trial)) Reaction Time (msec) 3 4.2 1050 FIGURE 6. Experiment 1 of Huang, Holcombe, & Pashler (2004). (left panel) human data; (right panel) simulation 2 Order in Sequence Order in Sequence 950 3.8 3.6 3.4 900 3.2 Size Repeat 850 Size Repeat 3 Color Repeat Color Alternate Color Repeat Color Alternate sisting of colored (red or green), oriented (horizontal or vertical) lines. Target-absent trials were used primarily to ensure participants were searching the display. The experiment examined seven experimental conditions, which varied in the amount of uncertainty as to the target identity. The essential conditions, from least to most uncertainty, are: blocked (e.g., target always red vertical among green horizontals), mixed feature (e.g., target always a color singleton), mixed dimension (e.g., target either red or vertical), and fully mixed (target could be red, green, vertical, or horizontal). With this design, one can ascertain how uncertainty in the environment and in the target definition influence task difficulty. Because the defining feature in this experiment could be either color or orientation, we modeled the environment with two Bayes nets—one color dominant and one orientation dominant—and performed model averaging. A comparison of Figures 7a and 7b show a correspondence between human RTs and model predictions. Less uncertainty in the environment leads to more efficient performance. One interesting result from the model is its prediction that the mixed-feature condition is easier than the fully-mixed condition; that is, search is more efficient when the dimension (i.e., color vs. orientation) of the singleton is known, even though the model has no abstract representation of feature dimensions, only feature values. 4.5 Optimal adaptation constant In all simulations so far, we fixed the memory constant. From the human data, it is clear that memory for recent experience is relatively short lived, on the order of a half dozen trials (e.g., left panel of Figure 4). In this section we provide a rational argument for the short duration of memory in attentional control. Figure 7c shows mean negative log probability in each condition of the Wolfe et al. (2003) experiment, as a function of α . To assess these probabilities, for each experimental condition, the model was initialized so that all of the conditional distributions were uniform, and then a block of trials was run. Log probability for all trials in the block was averaged. The negative log probability (y axis of the Figure) is a measure of the model’s misprediction of the next trial in the sequence. For complex environments, such as the fully-mixed condition, a small memory constant is detrimental: With rapid memory decay, the effective history of trials is a high-variance sample of the distribution of environmental states. For simple environments, a large memory constant is detrimental: With slow memory decay, the model does not transition quickly from the initial environmental model to one that reflects the statistics of a new environment. Thus, the memory constant is constrained by being large enough that the environment model can hold on to sufficient history to represent complex environments, and by being small enough that the model adapts quickly to novel environments. If the conditions in Wolfe et al. give some indication of the range of naturalistic environments an agent encounters, we have a rational account of why attentional priming is so short lived. Whether priming lasts 2 trials or 20, the surprising empirical result is that it does not last 200 or 2000 trials. Our rational argument provides a rough insight into this finding. (a) fully mixed mixed feature mixed dimension blocked 460 (c) Simulation fully mixed mixed feature mixed dimension blocked 4 5 420 log(P(trial)) 440 2 Blocked Red or Vertical Blocked Red and Vertical Mixed Feature Mixed Dimension Fully Mixed 4 3 log(P(trial)) reaction time (msec) (b) Human Data 480 3 2 1 400 1 380 0 360 0 red or vert red and vert target type red or vert red and vert target type 0 0.5 0.8 0.9 0.95 0.98 Memory Constant FIGURE 7. (a) Human data for Wolfe et al. (2003), Experiment 1; (b) simulation; (c) misprediction of model (i.e., lower y value = better) as a function of α for five experimental condition 5 DISCUSSION The psychological literature contains two opposing accounts of attentional priming and its relation to attentional control. Huang et al. (2004) and Hillstrom (2000) propose an episodic account in which a distinct memory trace—representing the complete configuration of features in the display—is laid down for each trial, and priming depends on configural similarity of the current trial to previous trials. Alternatively, Maljkovic and Nakayama (1994) and Wolfe et al. (2003) propose a feature-strengthening account in which detection of a feature on one trial increases its ability to attract attention on subsequent trials, and priming is proportional to the number of overlapping features from one trial to the next. The episodic account corresponds roughly to the full joint model (Figure 3b), and the feature-strengthening account corresponds roughly to the independence model (Figure 3a). Neither account is adequate to explain the range of data we presented. However, an intermediate account, the dominance model (Figure 3c), is not only sufficient, but it offers a parsimonious, rational explanation. Beyond the model’s basic assumptions, it has only one free parameter, and can explain results from diverse experimental paradigms. The model makes a further theoretical contribution. Wolfe et al. distinguish the environments in their experiment in terms of the amount of top-down control available, implying that different mechanisms might be operating in different environments. However, in our account, top-down control is not some substance distributed in different amounts depending on the nature of the environment. Our account treats all environments uniformly, relying on attentional control to adapt to the environment at hand. We conclude with two limitations of the present work. First, our account presumes a particular network architecture, instead of a more elegant Bayesian approach that specifies priors over architectures, and performs automatic model selection via the sequence of trials. We did explore such a Bayesian approach, but it was unable to explain the data. Second, at least one finding in the literature is problematic for the model. Hillstrom (2000) occasionally finds that RTs slow when an irrelevant target feature is repeated but the defining target feature is not. However, because this effect is observed only in some experiments, it is likely that any model would require elaboration to explain the variability. ACKNOWLEDGEMENTS We thank Jeremy Wolfe for providing the raw data from his experiment for reanalysis. This research was funded by NSF BCS Award 0339103. REFERENCES Huang, L, Holcombe, A. O., & Pashler, H. (2004). Repetition priming in visual search: Episodic retrieval, not feature priming. Memory & Cognition, 32, 12–20. Hillstrom, A. P. (2000). Repetition effects in visual search. Perception & Psychophysics, 62, 800-817. Itti, L., Koch, C., & Niebur, E. (1998). A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. Pattern Analysis & Machine Intelligence, 20, 1254–1259. Kearns, M., & Singh, S. (1999). Finite-sample convergence rates for Q-learning and indirect algorithms. In Advances in Neural Information Processing Systems 11 (pp. 996–1002). Cambridge, MA: MIT Press. Koch, C. and Ullman, S. (1985). Shifts in selective visual attention: towards the underlying neural circuitry. Human Neurobiology, 4, 219–227. Maljkovic, V., & Nakayama, K. (1994). Priming of pop-out: I. Role of features. Mem. & Cognition, 22, 657-672. Mozer, M. C. (1991). The perception of multiple objects: A connectionist approach. Cambridge, MA: MIT Press. Rogers, R. D., & Monsell, S. (1995). The cost of a predictable switch between simple cognitive tasks. Journal of Experimental Psychology: General, 124, 207–231. Wolfe, J.M. (1994). Guided Search 2.0: A Revised Model of Visual Search. Psych. Bull. & Rev., 1, 202–238. Wolfe, J. S., Butcher, S. J., Lee, C., & Hyde, M. (2003). Changing your mind: on the contributions of top-down and bottom-up guidance in visual search for feature singletons. Journal of Exptl. Psychology: Human Perception & Performance, 29, 483-502.
5 0.63928556 91 nips-2005-How fast to work: Response vigor, motivation and tonic dopamine
Author: Yael Niv, Nathaniel D. Daw, Peter Dayan
Abstract: Reinforcement learning models have long promised to unify computational, psychological and neural accounts of appetitively conditioned behavior. However, the bulk of data on animal conditioning comes from free-operant experiments measuring how fast animals will work for reinforcement. Existing reinforcement learning (RL) models are silent about these tasks, because they lack any notion of vigor. They thus fail to address the simple observation that hungrier animals will work harder for food, as well as stranger facts such as their sometimes greater productivity even when working for irrelevant outcomes such as water. Here, we develop an RL framework for free-operant behavior, suggesting that subjects choose how vigorously to perform selected actions by optimally balancing the costs and benefits of quick responding. Motivational states such as hunger shift these factors, skewing the tradeoff. This accounts normatively for the effects of motivation on response rates, as well as many other classic findings. Finally, we suggest that tonic levels of dopamine may be involved in the computation linking motivational state to optimal responding, thereby explaining the complex vigor-related effects of pharmacological manipulation of dopamine. 1
6 0.41917691 67 nips-2005-Extracting Dynamical Structure Embedded in Neural Activity
7 0.39048332 3 nips-2005-A Bayesian Framework for Tilt Perception and Confidence
8 0.36899105 94 nips-2005-Identifying Distributed Object Representations in Human Extrastriate Visual Cortex
9 0.34757051 176 nips-2005-Silicon growth cones map silicon retina
10 0.32978711 173 nips-2005-Sensory Adaptation within a Bayesian Framework for Perception
11 0.29624957 28 nips-2005-Analyzing Auditory Neurons by Learning Distance Functions
12 0.29440755 109 nips-2005-Learning Cue-Invariant Visual Responses
13 0.29304364 157 nips-2005-Principles of real-time computing with feedback applied to cortical microcircuit models
14 0.29284176 119 nips-2005-Learning to Control an Octopus Arm with Gaussian Process Temporal Difference Methods
15 0.2901845 169 nips-2005-Saliency Based on Information Maximization
16 0.28956747 203 nips-2005-Visual Encoding with Jittering Eyes
17 0.28122994 183 nips-2005-Stimulus Evoked Independent Factor Analysis of MEG Data with Large Background Activity
18 0.28086841 61 nips-2005-Dynamical Synapses Give Rise to a Power-Law Distribution of Neuronal Avalanches
19 0.27718791 193 nips-2005-The Role of Top-down and Bottom-up Processes in Guiding Eye Movements during Visual Search
20 0.27570036 64 nips-2005-Efficient estimation of hidden state dynamics from spike trains
topicId topicWeight
[(3, 0.025), (10, 0.033), (11, 0.012), (27, 0.04), (31, 0.053), (34, 0.037), (39, 0.045), (55, 0.029), (57, 0.021), (60, 0.038), (65, 0.396), (69, 0.05), (73, 0.027), (88, 0.06), (91, 0.037)]
simIndex simValue paperId paperTitle
same-paper 1 0.86145729 141 nips-2005-Norepinephrine and Neural Interrupts
Author: Peter Dayan, Angela J. Yu
Abstract: Experimental data indicate that norepinephrine is critically involved in aspects of vigilance and attention. Previously, we considered the function of this neuromodulatory system on a time scale of minutes and longer, and suggested that it signals global uncertainty arising from gross changes in environmental contingencies. However, norepinephrine is also known to be activated phasically by familiar stimuli in welllearned tasks. Here, we extend our uncertainty-based treatment of norepinephrine to this phasic mode, proposing that it is involved in the detection and reaction to state uncertainty within a task. This role of norepinephrine can be understood through the metaphor of neural interrupts. 1
2 0.72331548 199 nips-2005-Value Function Approximation with Diffusion Wavelets and Laplacian Eigenfunctions
Author: Sridhar Mahadevan, Mauro Maggioni
Abstract: We investigate the problem of automatically constructing efficient representations or basis functions for approximating value functions based on analyzing the structure and topology of the state space. In particular, two novel approaches to value function approximation are explored based on automatically constructing basis functions on state spaces that can be represented as graphs or manifolds: one approach uses the eigenfunctions of the Laplacian, in effect performing a global Fourier analysis on the graph; the second approach is based on diffusion wavelets, which generalize classical wavelets to graphs using multiscale dilations induced by powers of a diffusion operator or random walk on the graph. Together, these approaches form the foundation of a new generation of methods for solving large Markov decision processes, in which the underlying representation and policies are simultaneously learned.
3 0.67580795 152 nips-2005-Phase Synchrony Rate for the Recognition of Motor Imagery in Brain-Computer Interface
Author: Le Song, Evian Gordon, Elly Gysels
Abstract: Motor imagery attenuates EEG µ and β rhythms over sensorimotor cortices. These amplitude changes are most successfully captured by the method of Common Spatial Patterns (CSP) and widely used in braincomputer interfaces (BCI). BCI methods based on amplitude information, however, have not incoporated the rich phase dynamics in the EEG rhythm. This study reports on a BCI method based on phase synchrony rate (SR). SR, computed from binarized phase locking value, describes the number of discrete synchronization events within a window. Statistical nonparametric tests show that SRs contain significant differences between 2 types of motor imageries. Classifiers trained on SRs consistently demonstrate satisfactory results for all 5 subjects. It is further observed that, for 3 subjects, phase is more discriminative than amplitude in the first 1.5-2.0 s, which suggests that phase has the potential to boost the information transfer rate in BCIs. 1
4 0.66132331 172 nips-2005-Selecting Landmark Points for Sparse Manifold Learning
Author: Jorge Silva, Jorge Marques, João Lemos
Abstract: There has been a surge of interest in learning non-linear manifold models to approximate high-dimensional data. Both for computational complexity reasons and for generalization capability, sparsity is a desired feature in such models. This usually means dimensionality reduction, which naturally implies estimating the intrinsic dimension, but it can also mean selecting a subset of the data to use as landmarks, which is especially important because many existing algorithms have quadratic complexity in the number of observations. This paper presents an algorithm for selecting landmarks, based on LASSO regression, which is well known to favor sparse approximations because it uses regularization with an l1 norm. As an added benefit, a continuous manifold parameterization, based on the landmarks, is also found. Experimental results with synthetic and real data illustrate the algorithm. 1
5 0.42516604 26 nips-2005-An exploration-exploitation model based on norepinepherine and dopamine activity
Author: Samuel M. McClure, Mark S. Gilzenrat, Jonathan D. Cohen
Abstract: We propose a model by which dopamine (DA) and norepinepherine (NE) combine to alternate behavior between relatively exploratory and exploitative modes. The model is developed for a target detection task for which there is extant single neuron recording data available from locus coeruleus (LC) NE neurons. An exploration-exploitation trade-off is elicited by regularly switching which of the two stimuli are rewarded. DA functions within the model to change synaptic weights according to a reinforcement learning algorithm. Exploration is mediated by the state of LC firing, with higher tonic and lower phasic activity producing greater response variability. The opposite state of LC function, with lower baseline firing rate and greater phasic responses, favors exploitative behavior. Changes in LC firing mode result from combined measures of response conflict and reward rate, where response conflict is monitored using models of anterior cingulate cortex (ACC). Increased long-term response conflict and decreased reward rate, which occurs following reward contingency switch, favors the higher tonic state of LC function and NE release. This increases exploration, and facilitates discovery of the new target. 1 In t rod u ct i on A central problem in reinforcement learning is determining how to adaptively move between exploitative and exploratory behaviors in changing environments. We propose a set of neurophysiologic mechanisms whose interaction may mediate this behavioral shift. Empirical work on the midbrain dopamine (DA) system has suggested that this system is particularly well suited for guiding exploitative behaviors. This hypothesis has been reified by a number of studies showing that a temporal difference (TD) learning algorithm accounts for activity in these neurons in a wide variety of behavioral tasks [1,2]. DA release is believed to encode a reward prediction error signal that acts to change synaptic weights relevant for producing behaviors [3]. Through learning, this allows neural pathways to predict future expected reward through the relative strength of their synaptic connections [1]. Decision-making procedures based on these value estimates are necessarily greedy. Including reward bonuses for exploratory choices supports non-greedy actions [4] and accounts for additional data derived from DA neurons [5]. We show that combining a DA learning algorithm with models of response conflict detection [6] and NE function [7] produces an effective annealing procedure for alternating between exploration and exploitation. NE neurons within the LC alternate between two firing modes [8]. In the first mode, known as the phasic mode, NE neurons fire at a low baseline rate but have relatively robust phasic responses to behaviorally salient stimuli. The second mode, called the tonic mode, is associated with a higher baseline firing and absent or attenuated phasic responses. The effects of NE on efferent areas are modulatory in nature, and are well captured as a change in the gain of efferent inputs so that neuronal responses are potentiated in the presence of NE [9]. Thus, in phasic mode, the LC provides transient facilitation in processing, time-locked to the presence of behaviorally salient information in motor or decision areas. Conversely, in tonic mode, higher overall LC discharge rate increases gain generally and hence increases the probability of arbitrary responding. Consistent with this account, for periods when NE neurons are in the phasic mode, monkey performance is nearly perfect. However, when NE neurons are in the tonic mode, performance is more erratic, with increased response times and error rate [8]. These findings have led to a recent characterization of the LC as a dynamic temporal filter, adjusting the system's relative responsivity to salient and irrelevant information [8]. In this way, the LC is ideally positioned to mediate the shift between exploitative and exploratory behavior. The parameters that underlie changes in LC firing mode remain largely unexplored. Based on data from a target detection task by Aston-Jones and colleagues [10], we propose that LC firing mode is determined in part by measures of response conflict and reward rate as calculated by the ACC and OFC, respectively [8]. Together, the ACC and OFC are the principle sources of cortical input to the LC [8]. Activity in the ACC is known, largely through human neuroimaging experiments, to change in accord with response conflict [6]. In brief, relatively equal activity in competing behavioral responses (reflecting uncertainty) produces high conflict. Low conflict results when one behavioral response predominates. We propose that increased long-term response conflict biases the LC towards a tonic firing mode. Increased conflict necessarily follows changes in reward contingency. As the previously rewarded target no longer produces reward, there will be a relative increase in response ambiguity and hence conflict. This relationship between conflict and LC firing is analogous to other modeling work [11], which proposes that increased tonic firing reflects increased environmental uncertainty. As a final component to our model, we hypothesize that the OFC maintains an ongoing estimate in reward rate, and that this estimate of reward rate also influences LC firing mode. As reward rate increases, we assume that the OFC tends to bias the LC in favor of phasic firing to target stimuli. We have aimed to fix model parameters based on previous work using simpler networks. We use parameters derived primarily from a previous model of the LC by Gilzenrat and colleagues [7]. Integration of response conflict by the ACC and its influence on LC firing was borrowed from unpublished work by Gilzenrat and colleagues in which they fit human behavioral data in a diminishing utilities task. Given this approach, we interpret our observed improvement in model performance with combined NE and DA function as validation of a mechanism for automatically switching between exploitative and exploratory action selection. 2 G o- No- G o Task and Core Mod el We have modeled an experiment in which monkeys performed a target detection task [10]. In the task, monkeys were shown either a vertical bar or a horizontal bar and were required to make or omit a motor response appropriately. Initially, the vertical bar was the target stimulus and correctly responding was rewarded with a squirt of fruit juice (r=1 in the model). Responding to the non-target horizontal stimulus resulted in time out punishment (r=-.1; Figure 1A). No responses to either the target or non-target gave zero reward. After the monkeys had fully acquired the task, the experimenters periodically switched the reward contingency such that the previously rewarded stimulus (target) became the distractor, and vice versa. Following such reversals, LC neurons were observed to change from emitting phasic bursts of firing to the target, to tonic firing following the switch, and slowly back to phasic firing for the new target as the new response criteria was obtained [10]. Figure 1: Task and model design. (A) Responses were required for targets in order to obtain reward. Responses to distractors resulted in a minor punishment. No responses gave zero reward. (B) In the model, vertical and horizontal bar inputs (I1 and I 2 ) fed to integrator neurons (X1 and X2 ) which then drove response units (Y1 and Y2 ). Responses were made if Y 1 or Y2 crossed a threshold while input units were active. We have previously modeled this task [7,12] with a three-layer connectionist network in which two input units, I1 and I 2 , corresponding to the vertical and horizontal bars, drive two mutually inhibitory integrator units, X1 and X2 . The integrator units subsequently feed two response units, Y1 and Y2 (Figure 1B). Responses are made whenever output from Y1 or Y2 crosses a threshold level of activity, θ. Relatively weak cross connections from each input unit to the opposite integrator unit (I1 to X2 and I 2 to X1 ) are intended to model stimulus similarity. Both the integrator and response units were modeled as noisy, leaky accumulators: ˙ X i =
6 0.34673968 66 nips-2005-Estimation of Intrinsic Dimensionality Using High-Rate Vector Quantization
7 0.34293035 91 nips-2005-How fast to work: Response vigor, motivation and tonic dopamine
8 0.33797085 203 nips-2005-Visual Encoding with Jittering Eyes
9 0.31864601 94 nips-2005-Identifying Distributed Object Representations in Human Extrastriate Visual Cortex
10 0.31604668 3 nips-2005-A Bayesian Framework for Tilt Perception and Confidence
11 0.31603375 35 nips-2005-Bayesian model learning in human visual perception
12 0.30861685 9 nips-2005-A Domain Decomposition Method for Fast Manifold Learning
13 0.30820578 149 nips-2005-Optimal cue selection strategy
14 0.30678028 99 nips-2005-Integrate-and-Fire models with adaptation are good enough
15 0.30649221 116 nips-2005-Learning Topology with the Generative Gaussian Graph and the EM Algorithm
16 0.30591175 190 nips-2005-The Curse of Highly Variable Functions for Local Kernel Machines
17 0.30249071 181 nips-2005-Spiking Inputs to a Winner-take-all Network
18 0.30152705 168 nips-2005-Rodeo: Sparse Nonparametric Regression in High Dimensions
19 0.29649287 67 nips-2005-Extracting Dynamical Structure Embedded in Neural Activity
20 0.29603735 56 nips-2005-Diffusion Maps, Spectral Clustering and Eigenfunctions of Fokker-Planck Operators