nips nips2003 nips2003-75 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Thomas L. Griffiths, Joshua B. Tenenbaum
Abstract: We explore the phenomena of subjective randomness as a case study in understanding how people discover structure embedded in noise. We present a rational account of randomness perception based on the statistical problem of model selection: given a stimulus, inferring whether the process that generated it was random or regular. Inspired by the mathematical definition of randomness given by Kolmogorov complexity, we characterize regularity in terms of a hierarchy of automata that augment a finite controller with different forms of memory. We find that the regularities detected in binary sequences depend upon presentation format, and that the kinds of automata that can identify these regularities are informative about the cognitive processes engaged by different formats. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Massachusetts Institute of Technology Cambridge, MA 02139 Abstract We explore the phenomena of subjective randomness as a case study in understanding how people discover structure embedded in noise. [sent-4, score-0.616]
2 We present a rational account of randomness perception based on the statistical problem of model selection: given a stimulus, inferring whether the process that generated it was random or regular. [sent-5, score-0.57]
3 Inspired by the mathematical definition of randomness given by Kolmogorov complexity, we characterize regularity in terms of a hierarchy of automata that augment a finite controller with different forms of memory. [sent-6, score-1.098]
4 We find that the regularities detected in binary sequences depend upon presentation format, and that the kinds of automata that can identify these regularities are informative about the cognitive processes engaged by different formats. [sent-7, score-1.128]
5 This sensitivity to patterns and regularities is at the heart of many of the inductive leaps characteristic of human cognition, such as identifying the words in a stream of sounds, or discovering the presence of a common cause underlying a set of events. [sent-9, score-0.383]
6 The ability to detect structure embedded in noise has a paradoxical character: while it is an excellent example of the kind of inference at which people excel but machines fail, it also seems to be the source of errors in tasks at which machines regularly succeed. [sent-11, score-0.221]
7 For example, a common demonstration conducted in introductory psychology classes involves presenting students with two binary sequences of the same length, such as HHTHTHTT and HHHHHHHH, and asking them to judge which one seems more random. [sent-12, score-0.193]
8 When students select the former, they are told that their judgments are irrational: the two sequences are equally random, since they have the same probability of being produced by a fair coin. [sent-13, score-0.292]
9 In the real world, the sense that some random sequences seem more structured than others can lead people to a variety of erroneous inferences, whether in a casino or thinking about patterns of births and deaths in a hospital [1]. [sent-14, score-0.277]
10 Here we show how this paradox can be resolved through a proper understanding of what our sense of randomness is designed to compute. [sent-15, score-0.324]
11 We will argue that our sense of randomness is actually extremely well-calibrated with a rational statistical computation – just not the one to which it is usually compared. [sent-16, score-0.52]
12 Solving this model selection problem for small datasets requires two ingredients: a set of hypotheses about the processes by which the data could have been generated, and a rational statistical inference by which these hypotheses are evaluated. [sent-18, score-0.378]
13 We will model subjective randomness as an inference comparing the probability of a sequence under a random process, P (X|random), with the probability of that sequence under a regular process, P (X|regular). [sent-19, score-1.03]
14 In previous work we have shown that defining P (X|regular) using a restricted form of Kolmogorov complexity, in which regularity is characterized in terms of a simple computing machine, can provide a good account of human randomness judgments for binary sequences [2]. [sent-20, score-0.82]
15 We will show that the kinds of regularity to which people are sensitive depend upon whether the full sequence is presented simultaneously, or its elements are presented sequentially. [sent-22, score-0.408]
16 A sequence X has Kolmogorov complexity K(X) equal to the length of the shortest program p for a (prefix) universal Turing machine U that produces X and then halts, K(X) = min p:U (p)=X (p), (1) where (p) is the length of p in bits. [sent-26, score-0.337]
17 Kolmogorov complexity identifies a sequence X as random if (X) − K(X) is small: random sequences are those that are irreducibly complex [4]. [sent-27, score-0.411]
18 While not necessarily following the form of this definition, psychologists have preserved its spirit in proposing that the perceived randomness of a sequence increases with its complexity (eg. [sent-28, score-0.498]
19 Kolmogorov complexity can also be used to define a variety of probability distributions, assigning probability to events based upon their complexity. [sent-30, score-0.258]
20 There are three problems with using Kolmogorov complexity as the basis for a computational model of subjective randomness. [sent-33, score-0.261]
21 Firstly, the Kolmogorov complexity of any particular sequence X is not computable [4], presenting a practical challenge for any modelling effort. [sent-34, score-0.238]
22 The set of regularities identified by people are a strict subset of those that might be expressed in short computer programs. [sent-37, score-0.376]
23 For example, people are very unlikely to be able to tell the difference between a binary sequence produced by a linear congruential random number generator (a very short program) and a sequence produced by flipping a coin, but these sequences should differ significantly in Kolmogorov complexity. [sent-38, score-0.592]
24 Restricting the set of regularities does not imply that people are worse than machines at recognizing patterns: reducing the size of the set of hypotheses increases inductive bias, making it possible to identify the presence of structure from smaller samples. [sent-39, score-0.529]
25 3 A statistical account of subjective randomness While there are problems with using Kolmogorov complexity as the basis for a rational theory of subjective randomness, it provides a clear definition of regularity. [sent-40, score-1.043]
26 In this section we will present a statistical account of subjective randomness in terms of a comparison between random and regular sources, where regularity is defined by analogues of Kolmogorov complexity for simpler computing machines. [sent-41, score-1.116]
27 1 Subjective randomness as model selection One of the most basic problems that arises in statistical inference is identifying the source of a set of observations, based upon a set of hypotheses. [sent-43, score-0.501]
28 Model selection provides a natural basis for a statistical theory of subjective randomness, viewing these judgments as the consequence of an inference to the process that produced a set of observations. [sent-45, score-0.409]
29 On seeing a stimulus X, we consider two hypotheses: X was produced by a random process, or X was produced by a regular process. [sent-46, score-0.418]
30 The only part of the right hand side of the equation affected by X is the likelihood ratio, so we define the subjective randomness of X as random(X) = log P (X|random) , P (X|regular) (4) being the evidence that X provides towards the conclusion that it was produced by a random process. [sent-48, score-0.67]
31 2 The nature of regularity In order to define random(X), we need to specify P (X|random) and P (X|regular). [sent-50, score-0.209]
32 We obtain random(X) = K(X) − (X), the difference between the complexity of a sequence and its length, if we choose P (X|regular) = R(X), the algorithmic probability defined in Equation 2. [sent-53, score-0.319]
33 This is identical to the mathematical definition of randomness given by Kolmogorov complexity. [sent-54, score-0.412]
34 However, the key point of this statistical approach is that we are not restricted to using R(X): we have a measure of the randomness of X for any choice of P (X|regular). [sent-55, score-0.383]
35 The choice of P (X|regular) will reflect the stimulus domain, and express the kinds of regularity which people can detect in that domain. [sent-56, score-0.31]
36 For binary sequences, a good candidate for specifying P (X|regular) is a hidden Markov model (HMM), a probabilistic finite 1 H 2 T 5 4 H T T H 6 3 Figure 1: Finite state automaton used to define P (X|regular) to give random(X) ∝ DP . [sent-57, score-0.775]
37 Dashed arrows are motif changes, using the prior determined by α. [sent-59, score-0.188]
38 In fact, specifying P (X|regular)in terms of a particular HMM results in random(X) being equivalent to the “Difficulty Predictor” (DP) [6] a measure of sequence complexity that has been extremely successful in modelling subjective randomness judgments. [sent-61, score-0.725]
39 DP measures the complexity of a sequence in terms of the number of repeating (eg. [sent-62, score-0.231]
40 HTHT) subsequences it contains, adding one point for each repeating subsequence and two points for each alternating subsequence. [sent-64, score-0.178]
41 For example, the sequence TTTHHHTHTH is a run of tails, a run of heads, and an alternating sub-sequence, DP = 4. [sent-65, score-0.152]
42 This HMM produces sequences by motif repetition, using the transition graph shown in Figure 1. [sent-68, score-0.325]
43 The model emits sequences by choosing a motif, a sequence of symbols of length k, with probability proportional to αk , and emitting symbols consistent with that motif with probability δ, switching to a new motif with probability 1 − δ. [sent-69, score-0.753]
44 In Figure 1, state 1 repeats the motif H, state 2 repeats T, and the remaining states repeat the alternating motifs HT and TH. [sent-70, score-0.48]
45 The randomness of a sequence under this definition of regularity depends on δ and α, but is generally affected by the number of repeating and alternating subsequences. [sent-71, score-0.762]
46 The equivalence to DP, in which a sequence scores a single point for each repeating subsequence and two points for each alternating subsequence, results from taking √ δ = 0. [sent-72, score-0.24]
47 5 and α = 3−1 , and choosing the the state sequence for the HMM that maximizes 2 the probability of the sequence. [sent-73, score-0.181]
48 Just as the algorithmic probability R(X) is a probability distribution defined by the length of programs for a universal Turing machine, this choice of P (X|regular) can be seen as specifying the length of “programs” for a particular finite state automaton. [sent-74, score-0.669]
49 The output of a finite state automaton is determined by its state sequence, just as the output of a universal Turing machine is determined by its program. [sent-75, score-0.731]
50 However, since the state sequence is the same length as the sequence itself, this alone does not provide a meaningful measure of complexity. [sent-76, score-0.289]
51 In our model, probability imposes a metric on state sequences, dictating a greater cost for moves between certain states, which translates into a code length through the logarithm. [sent-77, score-0.148]
52 Since we find the state sequence most likely to have produced X, and thus the shortest code length, we have an analogue of Kolmogorov complexity defined on a finite state automaton. [sent-78, score-0.685]
53 3 Regularities and automata Using a hidden Markov model to specify P (X|regular) provides a measure of complexity defined in terms of a finite state automaton. [sent-80, score-0.646]
54 However, the kinds of regularities people can detect in binary sequences go beyond the capacity of a finite state automaton. [sent-81, score-0.862]
55 THTHHTHT), symmetry in the com- Finite state automaton (motif repetition) Queue automaton (duplication) Pushdown automaton (symmetry) Stack automaton Turing machine (all computable) Figure 2: Hierarchy of automata used to define measures of complexity. [sent-83, score-1.847]
56 Of the regularities discussed in this paper, each automaton can identify all regularities identified by those automata to its left as well as those stated in parentheses beneath its name. [sent-84, score-1.157]
57 These regularities identify formal languages that cannot be recognized by a finite state automaton, suggesting that we might be able to develop better models of subjective randomness by defining P (X|regular) in terms of more sophisticated automata. [sent-89, score-1.174]
58 The automata we will consider in this paper form a hierarchy, shown in Figure 2. [sent-90, score-0.192]
59 This hierarchy expresses the same content as Chomsky’s [7] hierarchy of computing machines – the regularities identifiable by each machine are a strict superset of those identifiable to the machine to the left – although it features a different set of automata. [sent-91, score-0.513]
60 The most restricted set of regularities are those associated with the finite state automaton, and the least restricted are those associated with the Turing machine. [sent-92, score-0.599]
61 The key difference between these kinds of automata is the memory available to the finite controller, and exploring measures of complexity defined in terms of these automata thus involves assessing the kind of memory required to identify regularities. [sent-94, score-1.056]
62 Each of the automata shown in Figure 2 can identify a different set of regularities. [sent-95, score-0.252]
63 The finite state automaton is only capable of identifying motif repetition, while the pushdown automaton can identify both kinds of symmetry, and the queue automaton can identify duplication. [sent-96, score-2.075]
64 The stack automaton can identify all of these regularities, and the Turing machine can identify all computable regularities. [sent-97, score-0.733]
65 Similar models can be defined for the queue and stack automata, with the queue automaton allowing generation by repetition or duplication, and the stack automaton allowing any of these four methods. [sent-101, score-1.745]
66 1 An unrestricted queue automaton is equivalent to a Turing machine. [sent-105, score-0.549]
67 We will use the phrase to refer to an automaton in which the number of queue operations that can be performed for each input symbol is limited, which is generally termed a quasi real time queue automaton [8]. [sent-106, score-1.098]
68 4 Testing the models The models introduced in the previous section differ in the memory systems with which they augment the finite controller. [sent-107, score-0.333]
69 The appropriateness of any one measure of complexity to a particular task may thus depend upon the memory demands placed upon the participant. [sent-108, score-0.244]
70 To explore this hypothesis, we conducted an experiment in which participants make randomness judgments after either seeing a sequence in its entirety, or seeing each element one after another. [sent-109, score-0.645]
71 We then used model selection to determine which measure of complexity gave the best account of each condition, illustrating how the strategy of defining more restricted forms of complexity can shed light into the cognitive processes underlying regularity detection. [sent-110, score-0.506]
72 The stimuli were sequences of heads (H) and tails (T) presented in 130 point fixed width sans-serif font on a 19” monitor at 1280 × 1024 pixel resolution. [sent-113, score-0.206]
73 Participants were instructed that they were about to see sequences which had either been produced by a random process (flipping a fair coin) or by other processes in which the choice of heads and tails was not random, and had to classify these sequences according to their source. [sent-117, score-0.475]
74 After a practice session, each participant classified all 128 sequences of length 8, in random order, with each sequence randomly starting with either a head or a tail. [sent-118, score-0.362]
75 2 Results and Discussion We analyzed the results by fitting the models corresponding to the four automata described above, using all motifs up to length 4 to specify the basic model. [sent-121, score-0.297]
76 We then optimized λ, ψ, δ, α and the parameters contributing to P (M ) for each model, maximizing the likelihood of the classifications of the sequences by the 20 participants in each of the 2 conditions. [sent-125, score-0.241]
77 The results of the model-fitting are shown in Figure 3(a) and (b), which indicate the relationship between the posterior probabilities predicted by the model and the proportion of participants who classified a sequence as random. [sent-126, score-0.225]
78 The correlation coefficients shown in the figure provide a relatively good indicator of the fit of the models, and each sequence is labelled according to the regularity it expresses, showing how accommodating particular regularities contributes to the fit. [sent-127, score-0.545]
79 Since the models form a nested hierarchy, we can use likelihood ratio tests to evaluate whether introducing a particular regularity (and the parameters associated with it) results in a statistically significant improvement in fit. [sent-129, score-0.169]
80 4025) Stack Figure 3: Experimental results for (a) the Simultaneous and (b) the Sequential condition, showing the proportion of participants classifying a sequence as “random” (horizontal axis) and P (random|X) (vertical axis) as assessed by the four models. [sent-158, score-0.191]
81 (c) and (d) show the model selection results for the Simultaneous and Sequential conditions respectively, showing the four automata with edges between them labelled with χ2 score (df, p-value) for improvement in fit. [sent-160, score-0.263]
82 Additional regularities always improved the fit for the Simultaneous condition, while adding duplication, but not symmetry, resulted in a statistically significant improvement in the Sequential condition. [sent-163, score-0.286]
83 The model selection results suggest that the best model for the Simultaneous condition is the stack automaton, while the best model for the Sequential condition is the queue automaton. [sent-164, score-0.527]
84 These results indicate the importance of presentation format in determining subjective randomness, as well as the benefits of exploring measures of complexity defined in terms of a range of computing machines. [sent-165, score-0.398]
85 The stack automaton can evaluate regularities that require checking information in arbitrary positions in a sequence, something that is facilitated by a display in which the entire sequence is available. [sent-166, score-0.921]
86 In contrast, the queue automaton can only access information in the order that it enters memory, and gives a better match to the task in which working memory is required. [sent-167, score-0.632]
87 This illustrates an important fact about cognition – that human working memory operates like a queue rather than a stack – that is highlighted by this approach. [sent-168, score-0.561]
88 The final parameters of the best-fitting models provide some insight into the relative importance of the different kinds of regularities under different presentation conditions. [sent-169, score-0.345]
89 98 and motif repetition, symmetry, symmetry in the complement, and duplication were given probabilities of 0. [sent-174, score-0.405]
90 Symmetry is thus a far stronger characteristic of reg- ularity than either symmetry in the complement or duplication, when entire sequences are viewed simultaneously. [sent-179, score-0.275]
91 24, and motif repetition was given a probability of 0. [sent-184, score-0.358]
92 038, with both forms of symmetry being given zero probability since the queue model provided the best fit. [sent-186, score-0.323]
93 5 for both models indicates that regular sequences tend to repeat motifs, rather than rapidly switching between them, and the low α values reflect a preference for short motifs. [sent-188, score-0.34]
94 5 Conclusion We have outlined a framework for understanding the rational basis of the human ability to find structure embedded in noise, viewing this inference in terms of the statistical problem of model selection. [sent-189, score-0.268]
95 Solving this problem for small datasets requires two ingredients: strong prior beliefs about the hypothetical mechanisms by which the data could have been generated, and a rational statistical inference by which these hypotheses are evaluated. [sent-190, score-0.261]
96 When assessing the randomness of binary sequences, which involves comparing random and regular sources, people’s beliefs about the nature of regularity can be expressed in terms of probabilistic versions of simple computing machines. [sent-191, score-0.751]
97 Different machines capture regularity when sequences are presented simultaneously and when their elements are presented sequentially, and the differences between these machines provide insight into the cognitive processes involved in the task. [sent-192, score-0.404]
98 Analyses of the rational basis of human inference typically either ignore questions about processing or introduce them as relatively arbitrary constraints. [sent-193, score-0.206]
99 Here, we are able to give a rational characterization of process as well as inference, evaluating a set of alternatives that all correspond to restrictions of Kolmogorov complexity to simple general-purpose automata. [sent-194, score-0.226]
100 On the length of programs for computing finite binary sequences: statistical considerations. [sent-225, score-0.387]
wordName wordTfidf (topN-words)
[('automaton', 0.357), ('randomness', 0.324), ('regularities', 0.262), ('kolmogorov', 0.238), ('finite', 0.226), ('stack', 0.215), ('regular', 0.203), ('queue', 0.192), ('automata', 0.192), ('motif', 0.188), ('subjective', 0.174), ('regularity', 0.141), ('rational', 0.139), ('sequences', 0.137), ('pushdown', 0.137), ('repetition', 0.137), ('turing', 0.137), ('duplication', 0.119), ('participants', 0.104), ('symmetry', 0.098), ('people', 0.09), ('definition', 0.088), ('complexity', 0.087), ('sequence', 0.087), ('memory', 0.083), ('defined', 0.08), ('dp', 0.076), ('define', 0.068), ('judgments', 0.068), ('simultaneous', 0.067), ('alternating', 0.065), ('hmm', 0.062), ('state', 0.061), ('hierarchy', 0.06), ('identify', 0.06), ('repeating', 0.057), ('length', 0.054), ('produced', 0.054), ('kinds', 0.053), ('motifs', 0.051), ('random', 0.05), ('hypotheses', 0.049), ('sequential', 0.049), ('cognition', 0.043), ('controller', 0.043), ('computable', 0.041), ('defining', 0.041), ('condition', 0.04), ('programs', 0.04), ('complement', 0.04), ('selection', 0.04), ('inference', 0.039), ('tails', 0.038), ('upon', 0.037), ('inductive', 0.036), ('classified', 0.034), ('griffiths', 0.034), ('identifiable', 0.034), ('statistical', 0.034), ('cognitive', 0.034), ('probability', 0.033), ('binary', 0.033), ('item', 0.032), ('algorithmic', 0.032), ('machines', 0.032), ('seeing', 0.031), ('expresses', 0.031), ('labelled', 0.031), ('heads', 0.031), ('subsequence', 0.031), ('presentation', 0.03), ('specifying', 0.03), ('leaps', 0.03), ('grammars', 0.03), ('ipping', 0.03), ('shortest', 0.029), ('embedded', 0.028), ('human', 0.028), ('ratio', 0.028), ('processes', 0.028), ('identifying', 0.027), ('specified', 0.027), ('ection', 0.027), ('format', 0.027), ('repeats', 0.027), ('everyday', 0.027), ('accessed', 0.027), ('ingredients', 0.027), ('formal', 0.026), ('universal', 0.026), ('stimulus', 0.026), ('coin', 0.025), ('subsequences', 0.025), ('restricted', 0.025), ('augment', 0.024), ('identified', 0.024), ('fit', 0.024), ('account', 0.023), ('extremely', 0.023), ('presenting', 0.023)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999994 75 nips-2003-From Algorithmic to Subjective Randomness
Author: Thomas L. Griffiths, Joshua B. Tenenbaum
Abstract: We explore the phenomena of subjective randomness as a case study in understanding how people discover structure embedded in noise. We present a rational account of randomness perception based on the statistical problem of model selection: given a stimulus, inferring whether the process that generated it was random or regular. Inspired by the mathematical definition of randomness given by Kolmogorov complexity, we characterize regularity in terms of a hierarchy of automata that augment a finite controller with different forms of memory. We find that the regularities detected in binary sequences depend upon presentation format, and that the kinds of automata that can identify these regularities are informative about the cognitive processes engaged by different formats. 1
2 0.072636694 91 nips-2003-Inferring State Sequences for Non-linear Systems with Embedded Hidden Markov Models
Author: Radford M. Neal, Matthew J. Beal, Sam T. Roweis
Abstract: We describe a Markov chain method for sampling from the distribution of the hidden state sequence in a non-linear dynamical system, given a sequence of observations. This method updates all states in the sequence simultaneously using an embedded Hidden Markov Model (HMM). An update begins with the creation of “pools” of candidate states at each time. We then define an embedded HMM whose states are indexes within these pools. Using a forward-backward dynamic programming algorithm, we can efficiently choose a state sequence with the appropriate probabilities from the exponentially large number of state sequences that pass through states in these pools. We illustrate the method in a simple one-dimensional example, and in an example showing how an embedded HMM can be used to in effect discretize the state space without any discretization error. We also compare the embedded HMM to a particle smoother on a more substantial problem of inferring human motion from 2D traces of markers. 1
3 0.064009942 70 nips-2003-Fast Algorithms for Large-State-Space HMMs with Applications to Web Usage Analysis
Author: Pedro F. Felzenszwalb, Daniel P. Huttenlocher, Jon M. Kleinberg
Abstract: In applying Hidden Markov Models to the analysis of massive data streams, it is often necessary to use an artificially reduced set of states; this is due in large part to the fact that the basic HMM estimation algorithms have a quadratic dependence on the size of the state set. We present algorithms that reduce this computational bottleneck to linear or near-linear time, when the states can be embedded in an underlying grid of parameters. This type of state representation arises in many domains; in particular, we show an application to traffic analysis at a high-volume Web site. 1
4 0.050783195 191 nips-2003-Unsupervised Context Sensitive Language Acquisition from a Large Corpus
Author: Zach Solan, David Horn, Eytan Ruppin, Shimon Edelman
Abstract: We describe a pattern acquisition algorithm that learns, in an unsupervised fashion, a streamlined representation of linguistic structures from a plain natural-language corpus. This paper addresses the issues of learning structured knowledge from a large-scale natural language data set, and of generalization to unseen text. The implemented algorithm represents sentences as paths on a graph whose vertices are words (or parts of words). Significant patterns, determined by recursive context-sensitive statistical inference, form new vertices. Linguistic constructions are represented by trees composed of significant patterns and their associated equivalence classes. An input module allows the algorithm to be subjected to a standard test of English as a Second Language (ESL) proficiency. The results are encouraging: the model attains a level of performance considered to be “intermediate” for 9th-grade students, despite having been trained on a corpus (CHILDES) containing transcribed speech of parents directed to small children. 1
5 0.049643137 52 nips-2003-Different Cortico-Basal Ganglia Loops Specialize in Reward Prediction at Different Time Scales
Author: Saori C. Tanaka, Kenji Doya, Go Okada, Kazutaka Ueda, Yasumasa Okamoto, Shigeto Yamawaki
Abstract: To understand the brain mechanisms involved in reward prediction on different time scales, we developed a Markov decision task that requires prediction of both immediate and future rewards, and analyzed subjects’ brain activities using functional MRI. We estimated the time course of reward prediction and reward prediction error on different time scales from subjects' performance data, and used them as the explanatory variables for SPM analysis. We found topographic maps of different time scales in medial frontal cortex and striatum. The result suggests that different cortico-basal ganglia loops are specialized for reward prediction on different time scales. 1 Intro du ction In our daily life, we make decisions based on the prediction of rewards on different time scales; immediate and long-term effects of an action are often in conflict, and biased evaluation of immediate or future outcome can lead to pathetic behaviors. Lesions in the central serotonergic system result in impulsive behaviors in humans [1], and animals [2, 3], which can be attributed to deficits in reward prediction on a long time scale. Damages in the ventral part of medial frontal cortex (MFC) also cause deficits in decision-making that requires assessment of future outcomes [4-6]. A possible mechanism underlying these observations is that different brain areas are specialized for reward prediction on different time scales, and that the ascending serotonergic system activates those specialized for predictions in longer time scales [7]. The theoretical framework of temporal difference (TD) learning [8] successfully explains reward-predictive activities of the midbrain dopaminergic system as well as those of the cortex and the striatum [9-13]. In TD learning theory, the predicted amount of future reward starting from a state s(t) is formulated as the “value function” V(t) = E[r(t + 1) + γ r(t + 2) + γ 2r(t + 3) + …] (1) and learning is based on the TD error δ(t) = r(t) + γ V(t) – V(t - 1). (2) The ‘discount factor’ γ controls the time scale of prediction; while only the immediate reward r(t + 1) is considered with γ = 0, rewards in the longer future are taken into account with γ closer to 1. In order to test the above hypothesis [7], we developed a reinforcement learning task which requires a large value of discount factor for successful performance, and analyzed subjects’ brain activities using functional MRI. In addition to conventional block-design analysis, a novel model-based regression analysis revealed topographic representation of prediction time scale with in the cortico-basal ganglia loops. 2 2.1 Methods Markov Decision Task In the Markov decision task (Fig. 1), markers on the corners of a square present four states, and the subject selects one of two actions by pressing a button (a1 = left button, a2 = right button) (Fig. 1A). The action determines both the amount of reward and the movement of the marker (Fig. 1B). In the REGULAR condition, the next trial is started from the marker position at the end of the previous trial. Therefore, in order to maximize the reward acquired in a long run, the subject has to select an action by taking into account both the immediate reward and the future reward expected from the subsequent state. The optimal behavior is to receive small negative rewards at states s 2, s3, and s4 to obtain a large positive reward at state s1 (Fig. 1C). In the RANDOM condition, next trial is started from a random marker position so that the subject has to consider only immediate reward. Thus, the optimal behavior is to collect a larger reward at each state (Fig. 1D). In the baseline condition (NO condition), the reward is always zero. In order to learn the optimal behaviors, the discount factor γ has to be larger than 0.3425 in REGULAR condition, while it can be arbitrarily small in RANDOM condition. 2.2 fMRI imaging Eighteen healthy, right-handed volunteers (13 males and 5 females), gave informed consent to take part in the study, with the approval of the ethics and safety committees of ATR and Hiroshima University. A 0 Time 1.0 2.0 2.5 3.0 100 C B +r 2 s2 s1 REGULAR condition s2 -r 1 -r 2 +r 1 s1 100 D RANDOM condition +r 2 s2 s1 -r 1 +r 1 -r 2 -r 1 s4 +r 2 4.0 (s) -r 1 s3 a1 a2 r1 = 20 10 yen r2 = 100 10 yen +r 1 -r 1 s4 -r 1 -r 1 s3 s4 -r 1 s3 Fig. 1. (A) Sequence of stimulus and response events in the Markov decision task. First, one of four squares representing present state turns green (0s). As the fixation point turns green (1s), the subject presses either the right or left button within 1 second. After 1s delay, the green square changes its position (2s), and then a reward for the current action is presented by a number (2.5s) and a bar graph showing cumulative reward during the block is updated (3.0s). One trial takes four seconds. Subjects performed five trials in the NO condition, 32 trials in the RANDOM condition, five trials in the NO condition, and 32 trials in the REGULAR condition in one block. They repeated four blocks; thus, the entire experiment consisted of 312 trials, taking about 20 minutes. (B) The rule of the reward and marker movement. (C) In the REGULAR condition, the optimal behavior is to receive small negative rewards –r 1 (-10, -20, or -30 yen) at states s2, s3, and s4 to obtain a large positive reward +r2 (90, 100, or 110 yen) at state s1. (D) In the RANDOM condition, the next trial is started from random state. Thus, the optimal behavior is to select a larger reward at each state. A 1.5-Tesla scanner (Marconi, MAGNEX ECLIPSE, Japan) was used to acquire both structural T1-weighted images (TR = 12 s, TE = 450 ms, flip angle = 20 deg, matrix = 256 × 256, FoV = 256 mm, thickness = 1 mm, slice gap = 0 mm ) and T2*-weighted echo planar images (TR = 4 s, TE = 55 msec, flip angle = 90 deg, 38 transverse slices, matrix = 64 × 64, FoV = 192 mm, thickness = 4 mm, slice gap = 0 mm, slice gap = 0 mm) with blood oxygen level-dependent (BOLD) contrast. 2.3 Data analysis The data were preprocessed and analyzed with SPM99 (Friston et al., 1995; Wellcome Department of Cognitive Neurology, London, UK). The first three volumes of images were discarded to avoid T1 equilibrium effects. The images were realigned to the first image as a reference, spatially normalized with respect to the Montreal Neurological Institute EPI template, and spatially smoothed with a Gaussian kernel (8 mm, full-width at half-maximum). A RANDOM condition action larger reward Fig. 2. The selected action of a representative single subject (solid line) and the group average ratio of selecting optimal action (dashed line) in (A) RANDOM and (B) REGULAR conditions. smaller reward 1 32 64 96 128 96 128 trial REGULAR condition B action optimal nonoptimal 1 32 64 trial Images of parameter estimates for the contrast of interest were created for each subject. These were then used for a second-level group analysis using a one-sample t-test across the subjects (random effects analysis). We conducted two types of analysis. One was block design analysis using three boxcar regressors convolved with a hemodynamic response function as the reference waveform for each condition (RANDOM, REGULAR, and NO). The other was multivariate regression analysis using explanatory variables, representing the time course of the reward prediction V(t) and reward prediction error δ(t) estimated from subjects’ performance data (described below), in addition to three regressors representing the condition of the block. 2.4 Estimation of predicted reward V(t) and prediction error δ(t) The time course of reward prediction V(t) and reward prediction error δ(t) were estimated from each subject’s performance data, i.e. state s(t), action a(t), and reward r(t), as follows. If the subject starts from a state s(t) and comes back to the same state after k steps, the expected cumulative reward V(t) should satisfy the consistency condition V(t) = r(t + 1) + γ r(t + 2) + … + γ k-1 r(t + k) + γ kV(t). (3) Thus, for each time t of the data file, we calculated the weighted sum of the rewards acquired until the subject returned to the same state and estimated the value function for that episode as r ( t + 1) + γ r ( t + 2 ) + ... + γ k −1r ( t + k ) . ˆ (t ) = V 1− γ k (4) The estimate of the value function V(t) at time t was given by the average of all previous episodes from the same state as at time t V (t ) = 1 L L ∑ Vˆ ( t ) , l (5) l =1 where {t1, …, tL} are the indices of time visiting the same state as s(t), i.e. s(t1) = … = s(tL) = s(t). The TD error was given by the difference between the actual reward r(t) and the temporal difference of the value function V(t) according to equation (2). Assuming that different brain areas are involved in reward prediction on different time scales, we varied the discount factor γ as 0, 0.3, 0.6, 0.8, 0.9, and 0.99. Fig. 3. (A) In REGULAR vs. RANDOM comparison, significant activation was observed in DLPFC ((x, y, z) = (46, 45, 9), peak t = 4.06) (p < 0.001 uncorrected). (B) In RANDOM vs. REGULAR comparison, significant activation was observed in lateral OFC ((x, y, z) = (-32, 9, -21), peak t = 4.90) (p < 0.001 uncorrected). 3 3.1 R e sul t s Behavioral results Figure 2 summarizes the learning performance of a representative single subject (solid line) and group average (dashed line) during fMRI measurement. Fourteen subjects successfully learned to take larger immediate rewards in the RANDOM condition (Fig. 2A) and a large positive reward at s1 after small negative rewards at s2, s3 and s4 in the REGULAR condition (Fig. 2B). 3.2 Block-design analysis In REGULAR vs. RANDOM contrast, we observed a significant activation in the dorsolateral prefrontal cortex (DLPFC) (Fig. 3A) (p < 0.001 uncorrected). In RANDOM vs. REGULAR contrast, we observed a significant activation in lateral orbitofrontal cortex (lOFC) (Fig. 3B) (p < 0.001 uncorrected). The result of block-design analysis suggests differential involvement of neural pathways in reward prediction on long and short time scales. The result in RANDOM vs. REGULAR contrast was consistent with previous studies that the OFC is involved in reward prediction within a short delay and reward outcome [14-20]. 3.3 Regression analysis We observed significant correlation with reward prediction V(t) in the MFC, DLPFC (all γ ), ventromedial insula (small γ ), dorsal striatum, amygdala, hippocampus, and parahippocampal gyrus (large γ ) (p < 0.001 uncorrected) (Fig. 4A). We also found significant correlation with reward prediction error δ(t) in the IPC, PMd, cerebellum (all γ ), ventral striatum (small γ ), and lateral OFC (large γ ) (p < 0.001 uncorrected) (Fig. 4B). As we changed the time scale parameter γ of reward prediction, we found rostro-caudal maps of correlation to V(t) in MFC with increasing γ. Fig. 4. Voxels with a significant correlation (p < 0.001 uncorrected) with reward prediction V(t) and prediction error δ(t) are shown in different colors for different settings of the time scale parameter (γ = 0 in red, γ = 0.3 in orange, γ = 0.6 in yellow, γ = 0.8 in green, γ = 0.9 in cyan, and γ = 0.99 in blue). Voxels correlated with two or more regressors are shown by a mosaic of colors. (A) Significant correlation with reward prediction V(t) was observed in the MFC, DLPFC, dorsal striatum, insula, and hippocampus. Note the anterior-ventral to posterior-dorsal gradient with the increase in γ in the MFC. (B) Significant correlation with reward prediction error δ(t) on γ = 0 was observed in the ventral striatum. 4 D i s c u ss i o n In the MFC, anterior and ventral part was involved in reward prediction V(t) on shorter time scales (0 ≤ γ ≤ 0.6), whereas posterior and dorsal part was involved in reward prediction V(t) on longer time scales (0.6 ≤ γ ≤ 0.99). The ventral striatum involved in reward prediction error δ(t) on shortest time scale (γ = 0), while the dorsolateral striatum correlated with reward prediction V(t) on longer time scales (0.9 ≤ γ ≤ 0.99). These results are consistent with the topographic organization of fronto-striatal connection; the rostral part of the MFC project to the ventral striatum, whereas the dorsal and posterior part of the cingulate cortex project to the dorsolateral striatum [21]. In the MFC and the striatum, no significant difference in activity was observed in block-design analysis while we did find graded maps of activities with different values of γ. A possible reason is that different parts of the MFC and the striatum are concurrently involved with reward prediction on different time scales, regardless of the task context. Activities of the DLPFC and lOFC, which show significant differences in block-design analysis (Fig. 3), may be regulated according to the necessity for the task; From these results, we propose the following mechanism of reward prediction on different time scales. The parallel cortico-basal ganglia loops are responsible for reward prediction on various time scales. The ‘limbic loop’ via the ventral striatum specializes in immediate reward prediction, whereas the ‘cognitive and motor loop’ via the dorsal striatum specialises in future reward prediction. Each loop learns to predict rewards on its specific time scale. To perform an optimal action under a given time scale, the output of the loop with an appropriate time scale is used for actual action selection. Previous studies in brain damages and serotonergic functions suggest that the MFC and the dorsal raphe, which are reciprocally connected [22, 23], play an important role in future reward prediction. The cortico-cortico projections from the MFC, or the serotonergic projections from the dorsal raphe to the cortex and the striatum may be involved in the modulation of these parallel loops. In present study, using a novel regression analysis based on subjects’ performance data and reinforcement learning model, we revealed the maps of time scales in reward prediction, which could not be found by conventional block-design analysis. Future studies using this method under pharmacological manipulation of the serotonergic system would clarify the role of serotonin in regulating the time scale of reward prediction. Acknowledgments We thank Nicolas Schweighofer, Kazuyuki Samejima, Masahiko Haruno, Hiroshi Imamizu, Satomi Higuchi, Toshinori Yoshioka, and Mitsuo Kawato for helpful discussions and technical advice. References [1] Rogers, R.D., et al. (1999) Dissociable deficits in the decision-making cognition of chronic amphetamine abusers, opiate abusers, patients with focal damage to prefrontal cortex, and tryptophan-depleted normal volunteers: evidence for monoaminergic mechanisms. Neuropsychopharmacology 20(4):322-339. [2] Evenden, J.L. & Ryan, C.N. (1996) The pharmacology of impulsive behaviour in rats: the effects of drugs on response choice with varying delays of reinforcement. Psychopharmacology (Berl) 128(2):161-170. [3] Mobini, S., et al. (2000) Effects of central 5-hydroxytryptamine depletion on sensitivity to delayed and probabilistic reinforcement. Psychopharmacology (Berl) 152(4):390-397. [4] Bechara, A., et al. (1994) Insensitivity to future consequences following damage to human prefrontal cortex. Cognition 50(1-3):7-15. [5] Bechara, A., Tranel, D. & Damasio, H. (2000) Characterization of the decision-making deficit of patients with ventromedial prefrontal cortex lesions. Brain 123:2189-2202. [6] Mobini, S., et al. (2002) Effects of lesions of the orbitofrontal cortex on sensitivity to delayed and probabilistic reinforcement. Psychopharmacology (Berl) 160(3):290-298. [7] Doya, K. (2002) 15(4-6):495-506. Metalearning and neuromodulation. Neural Netw [8] Sutton, R.S., Barto, A. G. (1998) Reinforcement learning. Cambridge, MA: MIT press. [9] Houk, J.C., Adams, J.L. & Barto, A.G., A model of how the basal ganglia generate and use neural signals that predict reinforcement, in Models of information processing in the basal ganglia, J.C. Houk, J.L. Davis, and D.G. Beiser, Editors. 1995, MIT Press: Cambridge, Mass. p. 249-270. [10] Schultz, W., Dayan, P. & Montague, P.R. (1997) A neural substrate of prediction and reward. Science 275(5306):1593-1599. [11] Doya, K. (2000) Complementary roles of basal ganglia and cerebellum in learning and motor control. Curr Opin Neurobiol 10(6):732-739. [12] Berns, G.S., et al. (2001) Predictability modulates human brain response to reward. J Neurosci 21(8):2793-2798. [13] O'Doherty, J.P., et al. (2003) Temporal difference models and reward-related learning in the human brain. Neuron 38(2):329-337. [14] Koepp, M.J., et al. (1998) Evidence for striatal dopamine release during a video game. Nature 393(6682):266-268. [15] Rogers, R.D., et al. (1999) Choosing between small, likely rewards and large, unlikely rewards activates inferior and orbital prefrontal cortex. J Neurosci 19(20):9029-9038. [16] Elliott, R., Friston, K.J. & Dolan, R.J. (2000) Dissociable neural responses in human reward systems. J Neurosci 20(16):6159-6165. [17] Breiter, H.C., et al. (2001) Functional imaging of neural responses to expectancy and experience of monetary gains and losses. Neuron 30(2):619-639. [18] Knutson, B., et al. (2001) Anticipation of increasing monetary reward selectively recruits nucleus accumbens. J Neurosci 21(16):RC159. [19] O'Doherty, J.P., et al. (2002) Neural responses during anticipation of a primary taste reward. Neuron 33(5):815-826. [20] Pagnoni, G., et al. (2002) Activity in human ventral striatum locked to errors of reward prediction. Nat Neurosci 5(2):97-98. [21] Haber, S.N., et al. (1995) The orbital and medial prefrontal circuit through the primate basal ganglia. J Neurosci 15(7 Pt 1):4851-4867. [22] Celada, P., et al. (2001) Control of dorsal raphe serotonergic neurons by the medial prefrontal cortex: Involvement of serotonin-1A, GABA(A), and glutamate receptors. J Neurosci 21(24):9917-9929. [23] Martin-Ruiz, R., et al. (2001) Control of serotonergic function in medial prefrontal cortex by serotonin-2A receptors through a glutamate-dependent mechanism. J Neurosci 21(24):9856-9866.
6 0.0490564 190 nips-2003-Unsupervised Color Decomposition Of Histologically Stained Tissue Samples
7 0.048691586 64 nips-2003-Estimating Internal Variables and Paramters of a Learning Agent by a Particle Filter
8 0.044732723 160 nips-2003-Prediction on Spike Data Using Kernel Algorithms
9 0.04453475 42 nips-2003-Bounded Finite State Controllers
10 0.044202831 51 nips-2003-Design of Experiments via Information Theory
11 0.04257302 33 nips-2003-Approximate Planning in POMDPs with Macro-Actions
12 0.042529736 55 nips-2003-Distributed Optimization in Adaptive Networks
13 0.040561795 167 nips-2003-Robustness in Markov Decision Problems with Uncertain Transition Matrices
14 0.036594771 177 nips-2003-Simplicial Mixtures of Markov Chains: Distributed Modelling of Dynamic User Profiles
15 0.035910368 65 nips-2003-Extending Q-Learning to General Adaptive Multi-Agent Systems
16 0.033407498 181 nips-2003-Statistical Debugging of Sampled Programs
17 0.033395465 62 nips-2003-Envelope-based Planning in Relational MDPs
18 0.032583665 78 nips-2003-Gaussian Processes in Reinforcement Learning
19 0.032326695 170 nips-2003-Self-calibrating Probability Forecasting
20 0.032108098 186 nips-2003-Towards Social Robots: Automatic Evaluation of Human-Robot Interaction by Facial Expression Classification
topicId topicWeight
[(0, -0.123), (1, 0.049), (2, -0.008), (3, -0.013), (4, -0.031), (5, -0.002), (6, 0.024), (7, 0.031), (8, -0.015), (9, 0.028), (10, 0.006), (11, -0.003), (12, -0.008), (13, 0.033), (14, 0.058), (15, 0.012), (16, -0.012), (17, -0.017), (18, 0.003), (19, -0.05), (20, -0.045), (21, 0.033), (22, -0.09), (23, -0.05), (24, -0.115), (25, -0.074), (26, -0.056), (27, -0.034), (28, -0.026), (29, 0.049), (30, -0.014), (31, 0.032), (32, 0.009), (33, -0.004), (34, -0.061), (35, -0.093), (36, 0.015), (37, -0.04), (38, 0.037), (39, -0.117), (40, 0.043), (41, 0.061), (42, 0.047), (43, 0.02), (44, -0.021), (45, 0.006), (46, 0.018), (47, 0.069), (48, -0.089), (49, -0.068)]
simIndex simValue paperId paperTitle
same-paper 1 0.94816661 75 nips-2003-From Algorithmic to Subjective Randomness
Author: Thomas L. Griffiths, Joshua B. Tenenbaum
Abstract: We explore the phenomena of subjective randomness as a case study in understanding how people discover structure embedded in noise. We present a rational account of randomness perception based on the statistical problem of model selection: given a stimulus, inferring whether the process that generated it was random or regular. Inspired by the mathematical definition of randomness given by Kolmogorov complexity, we characterize regularity in terms of a hierarchy of automata that augment a finite controller with different forms of memory. We find that the regularities detected in binary sequences depend upon presentation format, and that the kinds of automata that can identify these regularities are informative about the cognitive processes engaged by different formats. 1
2 0.63209379 191 nips-2003-Unsupervised Context Sensitive Language Acquisition from a Large Corpus
Author: Zach Solan, David Horn, Eytan Ruppin, Shimon Edelman
Abstract: We describe a pattern acquisition algorithm that learns, in an unsupervised fashion, a streamlined representation of linguistic structures from a plain natural-language corpus. This paper addresses the issues of learning structured knowledge from a large-scale natural language data set, and of generalization to unseen text. The implemented algorithm represents sentences as paths on a graph whose vertices are words (or parts of words). Significant patterns, determined by recursive context-sensitive statistical inference, form new vertices. Linguistic constructions are represented by trees composed of significant patterns and their associated equivalence classes. An input module allows the algorithm to be subjected to a standard test of English as a Second Language (ESL) proficiency. The results are encouraging: the model attains a level of performance considered to be “intermediate” for 9th-grade students, despite having been trained on a corpus (CHILDES) containing transcribed speech of parents directed to small children. 1
3 0.62556553 123 nips-2003-Markov Models for Automated ECG Interval Analysis
Author: Nicholas P. Hughes, Lionel Tarassenko, Stephen J. Roberts
Abstract: We examine the use of hidden Markov and hidden semi-Markov models for automatically segmenting an electrocardiogram waveform into its constituent waveform features. An undecimated wavelet transform is used to generate an overcomplete representation of the signal that is more appropriate for subsequent modelling. We show that the state durations implicit in a standard hidden Markov model are ill-suited to those of real ECG features, and we investigate the use of hidden semi-Markov models for improved state duration modelling. 1
4 0.59512669 70 nips-2003-Fast Algorithms for Large-State-Space HMMs with Applications to Web Usage Analysis
Author: Pedro F. Felzenszwalb, Daniel P. Huttenlocher, Jon M. Kleinberg
Abstract: In applying Hidden Markov Models to the analysis of massive data streams, it is often necessary to use an artificially reduced set of states; this is due in large part to the fact that the basic HMM estimation algorithms have a quadratic dependence on the size of the state set. We present algorithms that reduce this computational bottleneck to linear or near-linear time, when the states can be embedded in an underlying grid of parameters. This type of state representation arises in many domains; in particular, we show an application to traffic analysis at a high-volume Web site. 1
5 0.53881931 91 nips-2003-Inferring State Sequences for Non-linear Systems with Embedded Hidden Markov Models
Author: Radford M. Neal, Matthew J. Beal, Sam T. Roweis
Abstract: We describe a Markov chain method for sampling from the distribution of the hidden state sequence in a non-linear dynamical system, given a sequence of observations. This method updates all states in the sequence simultaneously using an embedded Hidden Markov Model (HMM). An update begins with the creation of “pools” of candidate states at each time. We then define an embedded HMM whose states are indexes within these pools. Using a forward-backward dynamic programming algorithm, we can efficiently choose a state sequence with the appropriate probabilities from the exponentially large number of state sequences that pass through states in these pools. We illustrate the method in a simple one-dimensional example, and in an example showing how an embedded HMM can be used to in effect discretize the state space without any discretization error. We also compare the embedded HMM to a particle smoother on a more substantial problem of inferring human motion from 2D traces of markers. 1
6 0.52491885 25 nips-2003-An MCMC-Based Method of Comparing Connectionist Models in Cognitive Science
7 0.49402726 116 nips-2003-Linear Program Approximations for Factored Continuous-State Markov Decision Processes
8 0.4352535 167 nips-2003-Robustness in Markov Decision Problems with Uncertain Transition Matrices
9 0.41305086 8 nips-2003-A Holistic Approach to Compositional Semantics: a connectionist model and robot experiments
10 0.40464723 190 nips-2003-Unsupervised Color Decomposition Of Histologically Stained Tissue Samples
11 0.40108597 64 nips-2003-Estimating Internal Variables and Paramters of a Learning Agent by a Particle Filter
12 0.38222897 147 nips-2003-Online Learning via Global Feedback for Phrase Recognition
13 0.37167317 177 nips-2003-Simplicial Mixtures of Markov Chains: Distributed Modelling of Dynamic User Profiles
14 0.36422127 168 nips-2003-Salient Boundary Detection using Ratio Contour
15 0.36293986 30 nips-2003-Approximability of Probability Distributions
16 0.35788727 38 nips-2003-Autonomous Helicopter Flight via Reinforcement Learning
17 0.3489078 85 nips-2003-Human and Ideal Observers for Detecting Image Curves
18 0.33773574 51 nips-2003-Design of Experiments via Information Theory
19 0.32678685 185 nips-2003-The Doubly Balanced Network of Spiking Neurons: A Memory Model with High Capacity
20 0.32071343 22 nips-2003-An Improved Scheme for Detection and Labelling in Johansson Displays
topicId topicWeight
[(0, 0.036), (11, 0.032), (29, 0.024), (30, 0.02), (35, 0.079), (49, 0.014), (53, 0.065), (71, 0.045), (76, 0.052), (81, 0.378), (85, 0.049), (91, 0.096), (99, 0.026)]
simIndex simValue paperId paperTitle
same-paper 1 0.82893687 75 nips-2003-From Algorithmic to Subjective Randomness
Author: Thomas L. Griffiths, Joshua B. Tenenbaum
Abstract: We explore the phenomena of subjective randomness as a case study in understanding how people discover structure embedded in noise. We present a rational account of randomness perception based on the statistical problem of model selection: given a stimulus, inferring whether the process that generated it was random or regular. Inspired by the mathematical definition of randomness given by Kolmogorov complexity, we characterize regularity in terms of a hierarchy of automata that augment a finite controller with different forms of memory. We find that the regularities detected in binary sequences depend upon presentation format, and that the kinds of automata that can identify these regularities are informative about the cognitive processes engaged by different formats. 1
2 0.80934817 67 nips-2003-Eye Micro-movements Improve Stimulus Detection Beyond the Nyquist Limit in the Peripheral Retina
Author: Matthias H. Hennig, Florentin Wörgötter
Abstract: Even under perfect fixation the human eye is under steady motion (tremor, microsaccades, slow drift). The “dynamic” theory of vision [1, 2] states that eye-movements can improve hyperacuity. According to this theory, eye movements are thought to create variable spatial excitation patterns on the photoreceptor grid, which will allow for better spatiotemporal summation at later stages. We reexamine this theory using a realistic model of the vertebrate retina by comparing responses of a resting and a moving eye. The performance of simulated ganglion cells in a hyperacuity task is evaluated by ideal observer analysis. We find that in the central retina eye-micromovements have no effect on the performance. Here optical blurring limits vernier acuity. In the retinal periphery however, eye-micromovements clearly improve performance. Based on ROC analysis, our predictions are quantitatively testable in electrophysiological and psychophysical experiments. 1
3 0.39340237 78 nips-2003-Gaussian Processes in Reinforcement Learning
Author: Malte Kuss, Carl E. Rasmussen
Abstract: We exploit some useful properties of Gaussian process (GP) regression models for reinforcement learning in continuous state spaces and discrete time. We demonstrate how the GP model allows evaluation of the value function in closed form. The resulting policy iteration algorithm is demonstrated on a simple problem with a two dimensional state space. Further, we speculate that the intrinsic ability of GP models to characterise distributions of functions would allow the method to capture entire distributions over future values instead of merely their expectation, which has traditionally been the focus of much of reinforcement learning.
4 0.39307642 91 nips-2003-Inferring State Sequences for Non-linear Systems with Embedded Hidden Markov Models
Author: Radford M. Neal, Matthew J. Beal, Sam T. Roweis
Abstract: We describe a Markov chain method for sampling from the distribution of the hidden state sequence in a non-linear dynamical system, given a sequence of observations. This method updates all states in the sequence simultaneously using an embedded Hidden Markov Model (HMM). An update begins with the creation of “pools” of candidate states at each time. We then define an embedded HMM whose states are indexes within these pools. Using a forward-backward dynamic programming algorithm, we can efficiently choose a state sequence with the appropriate probabilities from the exponentially large number of state sequences that pass through states in these pools. We illustrate the method in a simple one-dimensional example, and in an example showing how an embedded HMM can be used to in effect discretize the state space without any discretization error. We also compare the embedded HMM to a particle smoother on a more substantial problem of inferring human motion from 2D traces of markers. 1
5 0.39295992 158 nips-2003-Policy Search by Dynamic Programming
Author: J. A. Bagnell, Sham M. Kakade, Jeff G. Schneider, Andrew Y. Ng
Abstract: We consider the policy search approach to reinforcement learning. We show that if a “baseline distribution” is given (indicating roughly how often we expect a good policy to visit each state), then we can derive a policy search algorithm that terminates in a finite number of steps, and for which we can provide non-trivial performance guarantees. We also demonstrate this algorithm on several grid-world POMDPs, a planar biped walking robot, and a double-pole balancing problem. 1
6 0.39052895 20 nips-2003-All learning is Local: Multi-agent Learning in Global Reward Games
7 0.38991624 116 nips-2003-Linear Program Approximations for Factored Continuous-State Markov Decision Processes
8 0.38923022 146 nips-2003-Online Learning of Non-stationary Sequences
9 0.38867733 189 nips-2003-Tree-structured Approximations by Expectation Propagation
10 0.38757524 101 nips-2003-Large Margin Classifiers: Convex Loss, Low Noise, and Convergence Rates
11 0.38696337 72 nips-2003-Fast Feature Selection from Microarray Expression Data via Multiplicative Large Margin Algorithms
12 0.38558474 107 nips-2003-Learning Spectral Clustering
13 0.38442278 34 nips-2003-Approximate Policy Iteration with a Policy Language Bias
14 0.3844206 73 nips-2003-Feature Selection in Clustering Problems
15 0.3836025 79 nips-2003-Gene Expression Clustering with Functional Mixture Models
16 0.38320228 125 nips-2003-Maximum Likelihood Estimation of a Stochastic Integrate-and-Fire Neural Model
17 0.38274473 65 nips-2003-Extending Q-Learning to General Adaptive Multi-Agent Systems
18 0.3825258 54 nips-2003-Discriminative Fields for Modeling Spatial Dependencies in Natural Images
19 0.38170218 33 nips-2003-Approximate Planning in POMDPs with Macro-Actions
20 0.38159084 50 nips-2003-Denoising and Untangling Graphs Using Degree Priors