nips nips2010 nips2010-19 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Pradeep Shenoy, Angela J. Yu, Rajesh P. Rao
Abstract: Intelligent agents are often faced with the need to choose actions with uncertain consequences, and to modify those actions according to ongoing sensory processing and changing task demands. The requisite ability to dynamically modify or cancel planned actions is known as inhibitory control in psychology. We formalize inhibitory control as a rational decision-making problem, and apply to it to the classical stop-signal task. Using Bayesian inference and stochastic control tools, we show that the optimal policy systematically depends on various parameters of the problem, such as the relative costs of different action choices, the noise level of sensory inputs, and the dynamics of changing environmental demands. Our normative model accounts for a range of behavioral data in humans and animals in the stop-signal task, suggesting that the brain implements statistically optimal, dynamically adaptive, and reward-sensitive decision-making in the context of inhibitory control problems. 1
Reference: text
sentIndex sentText sentNum sentScore
1 The requisite ability to dynamically modify or cancel planned actions is known as inhibitory control in psychology. [sent-9, score-0.244]
2 We formalize inhibitory control as a rational decision-making problem, and apply to it to the classical stop-signal task. [sent-10, score-0.229]
3 Our normative model accounts for a range of behavioral data in humans and animals in the stop-signal task, suggesting that the brain implements statistically optimal, dynamically adaptive, and reward-sensitive decision-making in the context of inhibitory control problems. [sent-12, score-0.351]
4 This ability to dynamically modify or cancel a planned action that is no longer advantageous or appropriate is known as inhibitory control in psychology. [sent-18, score-0.272]
5 In this task, subjects perform a simple two-alternative forced choice (2AFC) discrimination task on a go stimulus, whereby one of two responses is required depending on the stimulus. [sent-20, score-0.456]
6 On a small fraction of trials, an additional stop signal appears after some delay, which instructs the subject to withhold the discrimination or go response. [sent-21, score-1.085]
7 As might be expected, the later the stop signal appears, the harder it is for subjects to stop the response [9] (see Figure 3). [sent-22, score-1.444]
8 The classical model of the stop-signal task is the race model [11], which posits a race to threshold between independent go and stop processes. [sent-23, score-1.238]
9 It also hypothesizes a stopping latency, the stop-signal reaction time (SSRT), which is the delay between stop signal onset and successful withholding of a go response. [sent-24, score-1.314]
10 The (unobservable) SSRT is estimated as shown in Figure 1A, and is 1 thought to be longer in patient populations associated with inhibitory deficit than in healthy controls (attention-deficit hyperactivity disorder [1], obsessive-compulsive disorder [12], and substance dependence [13]). [sent-25, score-0.297]
11 Although the race model is elegant in its simplicity and captures key experimental data, it is descriptive in nature and does not address how the stopping latency and other elements of the model depend on various underlying cognitive factors. [sent-27, score-0.449]
12 Consequently, it cannot explain why behavior and stopping latency varies systematically across different experimental conditions or across different subject populations. [sent-28, score-0.31]
13 We formalize interactions among various cognitive components: the continual monitoring of noisy sensory information, the integration of sensory inputs with top-down expectations, and the assessment of the relative values of potential actions. [sent-30, score-0.233]
14 Within our normative model of inhibitory control, stopping latency is an emergent property, arising from interactions between the monitoring and decision processes. [sent-33, score-0.454]
15 We show that our model captures classical behavioral data in the task, makes quantitative behavioral predictions under different experimental manipulations, and suggests that the brain may be implementing near-optimal decision-making in the stop-signal task. [sent-34, score-0.266]
16 In the generative model (see Figure 1B for graphical model), there are two independent hidden variables, corresponding to the identity of the go stimulus, d ∈ {0, 1}, and whether or not the current trial is a stop trial, s ∈ {0, 1}. [sent-36, score-1.107]
17 The dynamic variable z t denotes the presence/absence of the stop signal: if the stop signal appears at time θ then z 1 = . [sent-50, score-1.328]
18 On a go trial, s = 0, the stop-signal of course never appears, P (θ = ∞) = 1. [sent-57, score-0.302]
19 On a stop trial, s = 1, we assume for simplicity that the onset of the stop signal follows a constant hazard rate, i. [sent-58, score-1.328]
20 Conditioned on z t , there is a separate iid stream of observations associated with the stop signal: p(y t |z t = 0) = g0 (y t ), and p(y t |z t = 1) = g1 (y t ). [sent-61, score-0.621]
21 In the recognition model, the posterior probability associated with signal identity pt P (d = 1|xt ), d where xt {x1 , . [sent-63, score-0.384]
22 First, we define pt as the posterior probability that the stop signal has already appeared pt P {θ ≤ t|yt }, z z where yt {y 1 , . [sent-67, score-1.175]
23 pt = z g1 (y t )(pt−1 z h(t) = r · P (θ = t|s = 1) rλe−λt = −λ(t−1) r · P (θ > t − 1|s = 1) + (1 − r) re + (1 − r) 2 Figure 1: Modeling inhibitory control in the stop-signal task. [sent-72, score-0.36]
24 Go and stop stimuli, separated by a stop signal delay (SSD), initiate two independent processes that race to thresholds and determine trial outcome. [sent-74, score-1.663]
25 On go trials, noise in the go process results in a broad distribution over threshold-crossing times, i. [sent-75, score-0.604]
26 The stop process is typically modeled as deterministic, with an associated stop signal reaction time or SSRT. [sent-78, score-1.418]
27 The SSRT determines the fraction of go responses successfully stopped: the go RT cumulative density function evaluated at SSD+SSRT should give the stopping error rate at that SSD. [sent-79, score-0.892]
28 Based on these assumptions, the SSRT is estimated from data given the go RT distribution, and error rate as a function of SSD. [sent-80, score-0.302]
29 }, are associated with the go and stop stimuli, respectively. [sent-94, score-0.923]
30 y t depends on whether the current trial is a stop trial, s = {0, 1}, and whether the stop-signal has already appeared by time t, z t ∈ {0, 1}. [sent-96, score-0.83]
31 where r = P (s = 1) is the prior probability of a stop trial. [sent-97, score-0.621]
32 Note that h(t) does not depend on the observations, since given that the stop signal has not yet appeared, whether it will appear in the next instant does not depend on previous observations. [sent-98, score-0.707]
33 In the stop-signal task, a stop trial is considered a stop trial even if the subject makes the go response early, before the stop signal is presented. [sent-99, score-2.678]
34 For simplicity, only trials where d = 1 are shown, and θs on stop trials is 17 steps. [sent-102, score-1.109]
35 Due to stochasticity in the sensory information, the go stimulus is processed slower and the stop signal is detected faster than average on some trials; these lead to successful stopping, with SE trials showing the opposite trend. [sent-103, score-1.413]
36 On all trials, ps shows an initial increase due to anticipation of the stop signal. [sent-104, score-0.643]
37 We assume there is a deadline D for responding on go trials, and an opportunity cost of c per unit time on each trial. [sent-110, score-0.408]
38 In addition, there is a penalty cs for choosing to respond on a stop-signal trial, and a unit cost for making an error on a go trial (by 3 choosing the wrong discrimination response or exceeding the deadline for responding). [sent-111, score-0.705]
39 Let τ denote the trial termination time, so that τ = D if no response is made before the deadline, and τ < D if a response is made. [sent-113, score-0.302]
40 On each trial, the policy π produces a stopping time τ and a possible binary response δ ∈ {0, 1}. [sent-114, score-0.29]
41 Note that the go action results in either δ = 1 or δ = 0, depending on whether pτ is greater or smaller than . [sent-119, score-0.33]
42 In our simulations, we do so numerically by discretizing the probability space for pt into s 1000 bins; pt is represented exactly using its sufficient statistics. [sent-125, score-0.384]
43 Reflecting the sensory processing differences, SS trials show a slower drop in the cost of going, and a faster increase after the stop signal is processed; this is the converse of stop error trials. [sent-130, score-1.718]
44 Note that although the average trajectory Qg does not dip below Qw in the non-canceled (error) stop trials, there is substantial variability in the individual trajectories under a Bernoulli observation model, and each one of them dips below Qw at some point. [sent-131, score-0.621]
45 The histograms show reaction time distributions for go and SE trials. [sent-132, score-0.392]
46 1 Results Model captures classical behavioral data in the stop-signal task We first show that our model captures the basic behavioral results characteristic of the stop-signal task. [sent-134, score-0.303]
47 (A) Evolution of the average belief states pd and ps corresponding to go and stop signals, for various trials–GO: go trials, SS: stop trials with successfully canceled response, SE: stop error trials. [sent-136, score-2.828]
48 Stochasticity results in faster or slower processing of the two sensory input streams; these lead to stop success or error. [sent-137, score-0.741]
49 For simplicity, d = 1 for all trials in the figure. [sent-138, score-0.244]
50 The stop signal is presented at θs = 17 time steps (dashed vertical line); the initial rise in ps corresponds to anticipation of a potential stop signal. [sent-139, score-1.35]
51 (B) Go and Wait costs for the same partitioning of trials, along with the reaction time distributions for go and SE trials. [sent-140, score-0.427]
52 On SE trials, the cost of going drops faster, and crosses below the cost of waiting before the stop signal can be adequately processed. [sent-141, score-0.797]
53 Although the average go cost does not drop below the average wait cost, each individual trajectory crosses over at various time points, as indicated by the RT histograms. [sent-142, score-0.372]
54 (A) Inhibition function: errors on stop trials increase as a function of SSD. [sent-153, score-0.908]
55 (C) Discrimination RT is faster on non-canceled stop trials than go trials. [sent-155, score-1.167]
56 (A,C) Data of two monkeys performing the stopping task (from [9]). [sent-157, score-0.277]
57 One of the basic measures of performance is the inhibition function, which is the average error rate on stop trials as a function of SSD. [sent-159, score-0.928]
58 Another classical result in the stop-signal task is that RT’s on non-canceled (error) stop trials are on average faster than those on go trials (Figure 3C). [sent-161, score-1.472]
59 Intuitively, this is because inference about the go stimulus identity can proceed slowly or rapidly on different trials, due to noise in the observation process. [sent-163, score-0.342]
60 Non-canceled trials are those in which pd happens to evolve rapidly enough for a go response to be initiated before the stop signal is adequately processed. [sent-164, score-1.378]
61 Go trial RT’s, on the other hand, include all trajectories, whether pd happens to evolve quickly or not (see Figure 2). [sent-165, score-0.25]
62 2 Effect of stop trial frequency on behavior The overall frequency of stop signal trials has systematic effects on stopping behavior [6]. [sent-167, score-2.179]
63 As the fraction of stop trials is increased, go responses slow down and stop errors decrease in a graded fashion (Figure 4A;B). [sent-168, score-1.905]
64 In our model (Figure 4C;D), the stop signal frequency, r, influences the speed with which a stop signal is detected, whereby larger r leads to greater posterior belief that a stop signal is present, and also greater confidence that a stop signal will appear soon even it has not already. [sent-169, score-2.881]
65 If stop signals are more prevalent, the optimal decision policy can use that information to make fewer errors on stop trials, by delaying the go response, and by detecting the stop signal faster. [sent-171, score-2.334]
66 Even in experiments where the fraction of stop trials is held constant, chance runs of stop or go trials may result in fluctuating local frequency of stop trials, which in turn may lead to trial-by-trial behavioral adjustments due to subjects’ fluctuating estimate of r. [sent-172, score-2.874]
67 Indeed, subjects speed up after a chance run of go trials, and slow down following a sequence of stop trials [6] (see Figure 4E). [sent-173, score-1.247]
68 Previous work has shown that this is essentially equivalent to using a causal, exponential window to estimate the current rate of stop trials [20], where the exponential decay constant is monotonically related to the assumed volatility in the environment in the Bayesian model. [sent-175, score-0.865]
69 The probability of trial k being a stop trial, P (sk = 1|sk−1 ), where sk {s1 , . [sent-176, score-0.91]
70 , sk }, is P (sk = 1|sk−1 ) = P (sk = 1|rk )p(rk |sk−1 )drk = rk p(rk |sk−1 )drk = rk |sk−1 . [sent-179, score-0.259]
71 In other words, the predictive probability of seeing a stop trial is just the mean of the predictive distribution p(rk |sk−1 ). [sent-180, score-0.805]
72 Since the majority of trials (75%) are go trials, a chance run of go trials impacts RT much less than a chance run of stop trials. [sent-184, score-1.759]
73 These values encode different expectations about volatility in the stop trial frequency, and produce slightly different predictions about sequential effects. [sent-186, score-0.805]
74 Recent data shows that neural activity in the supplementary eye field is predictive of trial-by-trial slowing as a function of the recent stop trial frequency [15]. [sent-188, score-0.903]
75 Moreover, microstimulation of supplementary eye field neurons results in slower responses to the go stimulus and fewer stop errors [16]. [sent-189, score-1.133]
76 Together, this suggests that supplementary eye field may encode the local frequency of stop trials, and influence stopping behavior in a statistically appropriate manner. [sent-190, score-0.973]
77 3 Influence of reward structure on behavior The previous section demonstrated how adjustments to behavior in the face of experimental manipulations can be seen as instances of optimal decision-making in the stop signal task. [sent-192, score-0.893]
78 An important component of the race model for stopping behavior [11] is the SSRT, which is thought to be a stable, subject-specific index of stopping ability. [sent-193, score-0.572]
79 Leotti & Wager showed that subjects can be biased toward stopping or going when the relative penalties associated with go and stop errors are experimentally manipulated [10]. [sent-195, score-1.276]
80 Figure 5A;B show that as subjects are biased toward stopping, they make fewer stop trial errors and have slower 6 Figure 4: Effect of global and local frequency of stop trials on behavior. [sent-196, score-1.873]
81 (A) Go reaction times shift to the right (slower), as the fraction of stop trials is increased. [sent-199, score-0.997]
82 (B) Inhibitory function (stop error rate as a function of SSD) shifts to the right (fewer errors), as the fraction of stop trials is increased. [sent-200, score-0.907]
83 (E) Sequential effects in reaction times from 6 subjects showing faster go RTs following longer sequences of go trials (columns 1-3), and slower RTs following longer sequences of stop trials (columns 4-6, data adapted from [6]). [sent-203, score-1.922]
84 Increasing the cost of a stop error induces an increase in reaction time and an associated decrease in the fraction of stop errors. [sent-211, score-1.4]
85 This is a direct consequence of the optimal model attempting to minimize the total expected cost – with stop errors being more expensive, there is an incentive to slow down the go response in order to minimize the possibility of missing a stop signal. [sent-212, score-1.672]
86 Although the SSRT is not an explicit component of our model, we can nevertheless estimate it from the reaction times and fraction of stop errors produced by our model simulations, following the race model’s prescribed procedure [11]. [sent-214, score-0.923]
87 Essentially, the SSRT is estimated as the difference between mean go RT and the SSD at which 50% stop errors are committed (see Figure 1). [sent-215, score-0.966]
88 By reconciling the competing demands of stopping and going in an optimal manner, the estimated SSRT from our simulations is automatically adjusted to mimic the observed human behavior (Figure 5F). [sent-216, score-0.292]
89 The parameters of the model are either set directly by experimental design (cost function, stop frequency and timing), or correspond to subject-specific abilities that can be estimated from behavior (sensory processing); thus, there are no “free” parameters. [sent-219, score-0.725]
90 The model successfully captures classical behavioral results, such as the increase in error rate on stop trials with the increase of SSD, as well as the decreases in average response time from go trials to error stop trials. [sent-220, score-2.265]
91 The model also captures more subtle changes in stopping behavior, when the fraction of stop-signal trials, the penalties for various types of errors, and the history of experienced trials are manipulated. [sent-221, score-0.516]
92 (A-C) Data from human subjects performing a variant of the stop-signal task where the ratio of rewards for quick go responses and successful stopping was varied, inducing a bias towards going or stopping (Data from [10]). [sent-223, score-0.865]
93 , fewer stop errors, (A)) is associated with an increase in the average reaction time on go trials (B), and a decrease in the stopping latency or SSRT (C). [sent-226, score-1.504]
94 (D-F) Our model captures this change in SSRT as a function of the inherent tradeoff between RT and stop errors. [sent-227, score-0.66]
95 Moreover, the stopping latency measure prescribed by the race model (the SSRT) changes systematically across various experimental manipulations, indicating that it cannot be used as a simplistic, global measure of inhibitory control for each subject. [sent-233, score-0.542]
96 Instead, inhibitory control is a multifaceted function of factors such as subject-specific sensory processing rates, attentional factors, and internal/external bias towards stopping or going, which are explicitly related to parameters in our normative model. [sent-234, score-0.485]
97 Recent studies of the frontal eye fields (FEF, [8]) and superior colliculus [14] of monkeys show neural responses that diverge on go and correct stop trials, indicating that they may encode computations leading to the execution or cancellation of movement. [sent-237, score-1.067]
98 One major aim of our work is to understand how stopping ability and SSRT arise from various cognitive factors, such as sensitivity to rewards, learning capacity related to estimating stop signal frequency, and the rate at which sensory inputs are processed. [sent-243, score-1.016]
99 One of our goals for future research is to map group differences in stopping behavior to the parameters of our model, thus gaining insight into exactly which cognitive components go awry in each dysfunctional state. [sent-245, score-0.592]
100 Inhibitory control in mind and brain: an interactive race model of countermanding saccades. [sent-269, score-0.261]
wordName wordTfidf (topN-words)
[('stop', 0.621), ('go', 0.302), ('ssrt', 0.26), ('trials', 0.244), ('pt', 0.192), ('stopping', 0.191), ('trial', 0.184), ('bt', 0.154), ('inhibitory', 0.13), ('race', 0.127), ('sk', 0.105), ('countermanding', 0.096), ('reaction', 0.09), ('signal', 0.086), ('xt', 0.083), ('behavioral', 0.082), ('sensory', 0.082), ('rk', 0.077), ('ssd', 0.072), ('qt', 0.064), ('inhibition', 0.063), ('behavior', 0.063), ('response', 0.059), ('subjects', 0.057), ('eye', 0.057), ('latency', 0.056), ('monkeys', 0.055), ('deadline', 0.055), ('rt', 0.054), ('disorder', 0.052), ('se', 0.049), ('cs', 0.045), ('normative', 0.044), ('wait', 0.044), ('errors', 0.043), ('fraction', 0.042), ('pd', 0.042), ('hanes', 0.041), ('logan', 0.041), ('qw', 0.041), ('stuphorn', 0.041), ('frequency', 0.041), ('policy', 0.04), ('stimulus', 0.04), ('captures', 0.039), ('going', 0.038), ('control', 0.038), ('slower', 0.038), ('cognitive', 0.036), ('substance', 0.036), ('yt', 0.036), ('costs', 0.035), ('discrimination', 0.034), ('adjustments', 0.033), ('cit', 0.033), ('saccade', 0.033), ('saccades', 0.033), ('brain', 0.033), ('monitoring', 0.033), ('responses', 0.032), ('task', 0.031), ('rational', 0.031), ('classical', 0.03), ('belief', 0.03), ('qd', 0.029), ('qs', 0.029), ('action', 0.028), ('cancel', 0.028), ('ss', 0.028), ('biobehavioral', 0.027), ('boucher', 0.027), ('drk', 0.027), ('emeric', 0.027), ('executive', 0.027), ('hyperactivity', 0.027), ('leotti', 0.027), ('rts', 0.027), ('stopsignal', 0.027), ('verbruggen', 0.027), ('manipulations', 0.027), ('reproduces', 0.027), ('neuroscience', 0.027), ('cost', 0.026), ('appeared', 0.025), ('opportunity', 0.025), ('delay', 0.024), ('jd', 0.024), ('planned', 0.024), ('psychiatry', 0.024), ('pz', 0.024), ('valuation', 0.024), ('dynamically', 0.024), ('evolve', 0.024), ('toward', 0.024), ('effects', 0.024), ('posterior', 0.023), ('successfully', 0.023), ('chance', 0.023), ('rewards', 0.023), ('ps', 0.022)]
simIndex simValue paperId paperTitle
same-paper 1 1.0 19 nips-2010-A rational decision making framework for inhibitory control
Author: Pradeep Shenoy, Angela J. Yu, Rajesh P. Rao
Abstract: Intelligent agents are often faced with the need to choose actions with uncertain consequences, and to modify those actions according to ongoing sensory processing and changing task demands. The requisite ability to dynamically modify or cancel planned actions is known as inhibitory control in psychology. We formalize inhibitory control as a rational decision-making problem, and apply to it to the classical stop-signal task. Using Bayesian inference and stochastic control tools, we show that the optimal policy systematically depends on various parameters of the problem, such as the relative costs of different action choices, the noise level of sensory inputs, and the dynamics of changing environmental demands. Our normative model accounts for a range of behavioral data in humans and animals in the stop-signal task, suggesting that the brain implements statistically optimal, dynamically adaptive, and reward-sensitive decision-making in the context of inhibitory control problems. 1
2 0.093012087 196 nips-2010-Online Markov Decision Processes under Bandit Feedback
Author: Gergely Neu, Andras Antos, András György, Csaba Szepesvári
Abstract: We consider online learning in finite stochastic Markovian environments where in each time step a new reward function is chosen by an oblivious adversary. The goal of the learning agent is to compete with the best stationary policy in terms of the total reward received. In each time step the agent observes the current state and the reward associated with the last transition, however, the agent does not observe the rewards associated with other state-action pairs. The agent is assumed to know the transition probabilities. The state of the art result for this setting is a no-regret algorithm. In this paper we propose a new learning algorithm and, assuming that stationary policies mix uniformly fast, we show that after T time steps, the expected regret of the new algorithm is O T 2/3 (ln T )1/3 , giving the first rigorously proved regret bound for the problem. 1
3 0.077921554 189 nips-2010-On a Connection between Importance Sampling and the Likelihood Ratio Policy Gradient
Author: Tang Jie, Pieter Abbeel
Abstract: Likelihood ratio policy gradient methods have been some of the most successful reinforcement learning algorithms, especially for learning on physical systems. We describe how the likelihood ratio policy gradient can be derived from an importance sampling perspective. This derivation highlights how likelihood ratio methods under-use past experience by (i) using the past experience to estimate only the gradient of the expected return U (θ) at the current policy parameterization θ, rather than to obtain a more complete estimate of U (θ), and (ii) using past experience under the current policy only rather than using all past experience to improve the estimates. We present a new policy search method, which leverages both of these observations as well as generalized baselines—a new technique which generalizes commonly used baseline techniques for policy gradient methods. Our algorithm outperforms standard likelihood ratio policy gradient algorithms on several testbeds. 1
4 0.077207319 212 nips-2010-Predictive State Temporal Difference Learning
Author: Byron Boots, Geoffrey J. Gordon
Abstract: We propose a new approach to value function approximation which combines linear temporal difference reinforcement learning with subspace identification. In practical applications, reinforcement learning (RL) is complicated by the fact that state is either high-dimensional or partially observable. Therefore, RL methods are designed to work with features of state rather than state itself, and the success or failure of learning is often determined by the suitability of the selected features. By comparison, subspace identification (SSID) methods are designed to select a feature set which preserves as much information as possible about state. In this paper we connect the two approaches, looking at the problem of reinforcement learning with a large set of features, each of which may only be marginally useful for value function approximation. We introduce a new algorithm for this situation, called Predictive State Temporal Difference (PSTD) learning. As in SSID for predictive state representations, PSTD finds a linear compression operator that projects a large set of features down to a small set that preserves the maximum amount of predictive information. As in RL, PSTD then uses a Bellman recursion to estimate a value function. We discuss the connection between PSTD and prior approaches in RL and SSID. We prove that PSTD is statistically consistent, perform several experiments that illustrate its properties, and demonstrate its potential on a difficult optimal stopping problem. 1
5 0.067377716 268 nips-2010-The Neural Costs of Optimal Control
Author: Samuel Gershman, Robert Wilson
Abstract: Optimal control entails combining probabilities and utilities. However, for most practical problems, probability densities can be represented only approximately. Choosing an approximation requires balancing the benefits of an accurate approximation against the costs of computing it. We propose a variational framework for achieving this balance and apply it to the problem of how a neural population code should optimally represent a distribution under resource constraints. The essence of our analysis is the conjecture that population codes are organized to maximize a lower bound on the log expected utility. This theory can account for a plethora of experimental data, including the reward-modulation of sensory receptive fields, GABAergic effects on saccadic movements, and risk aversion in decisions under uncertainty. 1
6 0.065483429 174 nips-2010-Multi-label Multiple Kernel Learning by Stochastic Approximation: Application to Visual Object Recognition
7 0.064288907 199 nips-2010-Optimal learning rates for Kernel Conjugate Gradient regression
8 0.059481151 66 nips-2010-Double Q-learning
9 0.059411995 238 nips-2010-Short-term memory in neuronal networks through dynamical compressed sensing
10 0.057633512 121 nips-2010-Improving Human Judgments by Decontaminating Sequential Dependencies
11 0.054617792 119 nips-2010-Implicit encoding of prior probabilities in optimal neural populations
12 0.053824361 161 nips-2010-Linear readout from a neural population with partial correlation data
13 0.05228807 180 nips-2010-Near-Optimal Bayesian Active Learning with Noisy Observations
14 0.050415173 68 nips-2010-Effects of Synaptic Weight Diffusion on Learning in Decision Making Networks
15 0.050141178 21 nips-2010-Accounting for network effects in neuronal responses using L1 regularized point process models
16 0.048039988 152 nips-2010-Learning from Logged Implicit Exploration Data
17 0.04736042 132 nips-2010-Joint Cascade Optimization Using A Product Of Boosted Classifiers
18 0.047178864 29 nips-2010-An Approximate Inference Approach to Temporal Optimization in Optimal Control
19 0.046839017 95 nips-2010-Feature Transitions with Saccadic Search: Size, Color, and Orientation Are Not Alike
20 0.046707489 65 nips-2010-Divisive Normalization: Justification and Effectiveness as Efficient Coding Transform
topicId topicWeight
[(0, 0.131), (1, -0.061), (2, -0.069), (3, 0.063), (4, 0.022), (5, 0.038), (6, -0.006), (7, -0.007), (8, -0.013), (9, 0.049), (10, 0.04), (11, 0.022), (12, -0.004), (13, -0.003), (14, 0.017), (15, 0.061), (16, -0.024), (17, -0.007), (18, -0.039), (19, 0.107), (20, -0.032), (21, 0.056), (22, 0.119), (23, 0.008), (24, -0.035), (25, 0.004), (26, -0.014), (27, 0.021), (28, -0.024), (29, -0.024), (30, -0.031), (31, 0.054), (32, -0.038), (33, 0.106), (34, -0.015), (35, 0.027), (36, 0.033), (37, -0.046), (38, -0.046), (39, -0.0), (40, 0.127), (41, 0.062), (42, -0.064), (43, -0.012), (44, 0.021), (45, -0.073), (46, -0.042), (47, -0.017), (48, 0.047), (49, 0.011)]
simIndex simValue paperId paperTitle
same-paper 1 0.96160543 19 nips-2010-A rational decision making framework for inhibitory control
Author: Pradeep Shenoy, Angela J. Yu, Rajesh P. Rao
Abstract: Intelligent agents are often faced with the need to choose actions with uncertain consequences, and to modify those actions according to ongoing sensory processing and changing task demands. The requisite ability to dynamically modify or cancel planned actions is known as inhibitory control in psychology. We formalize inhibitory control as a rational decision-making problem, and apply to it to the classical stop-signal task. Using Bayesian inference and stochastic control tools, we show that the optimal policy systematically depends on various parameters of the problem, such as the relative costs of different action choices, the noise level of sensory inputs, and the dynamics of changing environmental demands. Our normative model accounts for a range of behavioral data in humans and animals in the stop-signal task, suggesting that the brain implements statistically optimal, dynamically adaptive, and reward-sensitive decision-making in the context of inhibitory control problems. 1
2 0.53589034 122 nips-2010-Improving the Asymptotic Performance of Markov Chain Monte-Carlo by Inserting Vortices
Author: Yi Sun, Jürgen Schmidhuber, Faustino J. Gomez
Abstract: We present a new way of converting a reversible finite Markov chain into a nonreversible one, with a theoretical guarantee that the asymptotic variance of the MCMC estimator based on the non-reversible chain is reduced. The method is applicable to any reversible chain whose states are not connected through a tree, and can be interpreted graphically as inserting vortices into the state transition graph. Our result confirms that non-reversible chains are fundamentally better than reversible ones in terms of asymptotic performance, and suggests interesting directions for further improving MCMC. 1
3 0.52034962 121 nips-2010-Improving Human Judgments by Decontaminating Sequential Dependencies
Author: Harold Pashler, Matthew Wilder, Robert Lindsey, Matt Jones, Michael C. Mozer, Michael P. Holmes
Abstract: For over half a century, psychologists have been struck by how poor people are at expressing their internal sensations, impressions, and evaluations via rating scales. When individuals make judgments, they are incapable of using an absolute rating scale, and instead rely on reference points from recent experience. This relativity of judgment limits the usefulness of responses provided by individuals to surveys, questionnaires, and evaluation forms. Fortunately, the cognitive processes that transform internal states to responses are not simply noisy, but rather are influenced by recent experience in a lawful manner. We explore techniques to remove sequential dependencies, and thereby decontaminate a series of ratings to obtain more meaningful human judgments. In our formulation, decontamination is fundamentally a problem of inferring latent states (internal sensations) which, because of the relativity of judgment, have temporal dependencies. We propose a decontamination solution using a conditional random field with constraints motivated by psychological theories of relative judgment. Our exploration of decontamination models is supported by two experiments we conducted to obtain ground-truth rating data on a simple length estimation task. Our decontamination techniques yield an over 20% reduction in the error of human judgments. 1
4 0.46544483 212 nips-2010-Predictive State Temporal Difference Learning
Author: Byron Boots, Geoffrey J. Gordon
Abstract: We propose a new approach to value function approximation which combines linear temporal difference reinforcement learning with subspace identification. In practical applications, reinforcement learning (RL) is complicated by the fact that state is either high-dimensional or partially observable. Therefore, RL methods are designed to work with features of state rather than state itself, and the success or failure of learning is often determined by the suitability of the selected features. By comparison, subspace identification (SSID) methods are designed to select a feature set which preserves as much information as possible about state. In this paper we connect the two approaches, looking at the problem of reinforcement learning with a large set of features, each of which may only be marginally useful for value function approximation. We introduce a new algorithm for this situation, called Predictive State Temporal Difference (PSTD) learning. As in SSID for predictive state representations, PSTD finds a linear compression operator that projects a large set of features down to a small set that preserves the maximum amount of predictive information. As in RL, PSTD then uses a Bellman recursion to estimate a value function. We discuss the connection between PSTD and prior approaches in RL and SSID. We prove that PSTD is statistically consistent, perform several experiments that illustrate its properties, and demonstrate its potential on a difficult optimal stopping problem. 1
5 0.46227741 161 nips-2010-Linear readout from a neural population with partial correlation data
Author: Adrien Wohrer, Ranulfo Romo, Christian K. Machens
Abstract: How much information does a neural population convey about a stimulus? Answers to this question are known to strongly depend on the correlation of response variability in neural populations. These noise correlations, however, are essentially immeasurable as the number of parameters in a noise correlation matrix grows quadratically with population size. Here, we suggest to bypass this problem by imposing a parametric model on a noise correlation matrix. Our basic assumption is that noise correlations arise due to common inputs between neurons. On average, noise correlations will therefore reflect signal correlations, which can be measured in neural populations. We suggest an explicit parametric dependency between signal and noise correlations. We show how this dependency can be used to ”fill the gaps” in noise correlations matrices using an iterative application of the Wishart distribution over positive definitive matrices. We apply our method to data from the primary somatosensory cortex of monkeys performing a two-alternativeforced choice task. We compare the discrimination thresholds read out from the population of recorded neurons with the discrimination threshold of the monkey and show that our method predicts different results than simpler, average schemes of noise correlations. 1
6 0.45091769 66 nips-2010-Double Q-learning
7 0.44569778 167 nips-2010-Mixture of time-warped trajectory models for movement decoding
8 0.42438552 238 nips-2010-Short-term memory in neuronal networks through dynamical compressed sensing
9 0.42141247 57 nips-2010-Decoding Ipsilateral Finger Movements from ECoG Signals in Humans
10 0.42137676 196 nips-2010-Online Markov Decision Processes under Bandit Feedback
11 0.41686356 119 nips-2010-Implicit encoding of prior probabilities in optimal neural populations
12 0.40469828 81 nips-2010-Evaluating neuronal codes for inference using Fisher information
13 0.40272158 219 nips-2010-Random Conic Pursuit for Semidefinite Programming
14 0.40212139 21 nips-2010-Accounting for network effects in neuronal responses using L1 regularized point process models
15 0.39172137 154 nips-2010-Learning sparse dynamic linear systems using stable spline kernels and exponential hyperpriors
16 0.38528058 95 nips-2010-Feature Transitions with Saccadic Search: Size, Color, and Orientation Are Not Alike
17 0.37965706 96 nips-2010-Fractionally Predictive Spiking Neurons
18 0.37790525 157 nips-2010-Learning to localise sounds with spiking neural networks
19 0.37746984 268 nips-2010-The Neural Costs of Optimal Control
20 0.36707097 34 nips-2010-Attractor Dynamics with Synaptic Depression
topicId topicWeight
[(13, 0.035), (17, 0.022), (27, 0.149), (30, 0.048), (32, 0.273), (35, 0.012), (45, 0.143), (50, 0.033), (52, 0.04), (60, 0.028), (77, 0.064), (90, 0.038)]
simIndex simValue paperId paperTitle
same-paper 1 0.81604666 19 nips-2010-A rational decision making framework for inhibitory control
Author: Pradeep Shenoy, Angela J. Yu, Rajesh P. Rao
Abstract: Intelligent agents are often faced with the need to choose actions with uncertain consequences, and to modify those actions according to ongoing sensory processing and changing task demands. The requisite ability to dynamically modify or cancel planned actions is known as inhibitory control in psychology. We formalize inhibitory control as a rational decision-making problem, and apply to it to the classical stop-signal task. Using Bayesian inference and stochastic control tools, we show that the optimal policy systematically depends on various parameters of the problem, such as the relative costs of different action choices, the noise level of sensory inputs, and the dynamics of changing environmental demands. Our normative model accounts for a range of behavioral data in humans and animals in the stop-signal task, suggesting that the brain implements statistically optimal, dynamically adaptive, and reward-sensitive decision-making in the context of inhibitory control problems. 1
2 0.62984574 161 nips-2010-Linear readout from a neural population with partial correlation data
Author: Adrien Wohrer, Ranulfo Romo, Christian K. Machens
Abstract: How much information does a neural population convey about a stimulus? Answers to this question are known to strongly depend on the correlation of response variability in neural populations. These noise correlations, however, are essentially immeasurable as the number of parameters in a noise correlation matrix grows quadratically with population size. Here, we suggest to bypass this problem by imposing a parametric model on a noise correlation matrix. Our basic assumption is that noise correlations arise due to common inputs between neurons. On average, noise correlations will therefore reflect signal correlations, which can be measured in neural populations. We suggest an explicit parametric dependency between signal and noise correlations. We show how this dependency can be used to ”fill the gaps” in noise correlations matrices using an iterative application of the Wishart distribution over positive definitive matrices. We apply our method to data from the primary somatosensory cortex of monkeys performing a two-alternativeforced choice task. We compare the discrimination thresholds read out from the population of recorded neurons with the discrimination threshold of the monkey and show that our method predicts different results than simpler, average schemes of noise correlations. 1
3 0.6286093 81 nips-2010-Evaluating neuronal codes for inference using Fisher information
Author: Haefner Ralf, Matthias Bethge
Abstract: Many studies have explored the impact of response variability on the quality of sensory codes. The source of this variability is almost always assumed to be intrinsic to the brain. However, when inferring a particular stimulus property, variability associated with other stimulus attributes also effectively act as noise. Here we study the impact of such stimulus-induced response variability for the case of binocular disparity inference. We characterize the response distribution for the binocular energy model in response to random dot stereograms and find it to be very different from the Poisson-like noise usually assumed. We then compute the Fisher information with respect to binocular disparity, present in the monocular inputs to the standard model of early binocular processing, and thereby obtain an upper bound on how much information a model could theoretically extract from them. Then we analyze the information loss incurred by the different ways of combining those inputs to produce a scalar single-neuron response. We find that in the case of depth inference, monocular stimulus variability places a greater limit on the extractable information than intrinsic neuronal noise for typical spike counts. Furthermore, the largest loss of information is incurred by the standard model for position disparity neurons (tuned-excitatory), that are the most ubiquitous in monkey primary visual cortex, while more information from the inputs is preserved in phase-disparity neurons (tuned-near or tuned-far) primarily found in higher cortical regions. 1
4 0.62290406 39 nips-2010-Bayesian Action-Graph Games
Author: Albert X. Jiang, Kevin Leyton-brown
Abstract: Games of incomplete information, or Bayesian games, are an important gametheoretic model and have many applications in economics. We propose Bayesian action-graph games (BAGGs), a novel graphical representation for Bayesian games. BAGGs can represent arbitrary Bayesian games, and furthermore can compactly express Bayesian games exhibiting commonly encountered types of structure including symmetry, action- and type-specific utility independence, and probabilistic independence of type distributions. We provide an algorithm for computing expected utility in BAGGs, and discuss conditions under which the algorithm runs in polynomial time. Bayes-Nash equilibria of BAGGs can be computed by adapting existing algorithms for complete-information normal form games and leveraging our expected utility algorithm. We show both theoretically and empirically that our approaches improve significantly on the state of the art. 1
5 0.6226328 21 nips-2010-Accounting for network effects in neuronal responses using L1 regularized point process models
Author: Ryan Kelly, Matthew Smith, Robert Kass, Tai S. Lee
Abstract: Activity of a neuron, even in the early sensory areas, is not simply a function of its local receptive field or tuning properties, but depends on global context of the stimulus, as well as the neural context. This suggests the activity of the surrounding neurons and global brain states can exert considerable influence on the activity of a neuron. In this paper we implemented an L1 regularized point process model to assess the contribution of multiple factors to the firing rate of many individual units recorded simultaneously from V1 with a 96-electrode “Utah” array. We found that the spikes of surrounding neurons indeed provide strong predictions of a neuron’s response, in addition to the neuron’s receptive field transfer function. We also found that the same spikes could be accounted for with the local field potentials, a surrogate measure of global network states. This work shows that accounting for network fluctuations can improve estimates of single trial firing rate and stimulus-response transfer functions. 1
6 0.6200285 121 nips-2010-Improving Human Judgments by Decontaminating Sequential Dependencies
7 0.61777812 127 nips-2010-Inferring Stimulus Selectivity from the Spatial Structure of Neural Network Dynamics
8 0.61554396 266 nips-2010-The Maximal Causes of Natural Scenes are Edge Filters
9 0.61132163 128 nips-2010-Infinite Relational Modeling of Functional Connectivity in Resting State fMRI
10 0.60808533 98 nips-2010-Functional form of motion priors in human motion perception
11 0.6076569 60 nips-2010-Deterministic Single-Pass Algorithm for LDA
12 0.60652494 268 nips-2010-The Neural Costs of Optimal Control
13 0.60355145 6 nips-2010-A Discriminative Latent Model of Image Region and Object Tag Correspondence
14 0.60348362 200 nips-2010-Over-complete representations on recurrent neural networks can support persistent percepts
15 0.60203636 17 nips-2010-A biologically plausible network for the computation of orientation dominance
16 0.59653962 194 nips-2010-Online Learning for Latent Dirichlet Allocation
17 0.59538078 119 nips-2010-Implicit encoding of prior probabilities in optimal neural populations
18 0.59528983 44 nips-2010-Brain covariance selection: better individual functional connectivity models using population prior
19 0.59095657 238 nips-2010-Short-term memory in neuronal networks through dynamical compressed sensing
20 0.58876342 56 nips-2010-Deciphering subsampled data: adaptive compressive sampling as a principle of brain communication