nips nips2001 nips2001-18 knowledge-graph by maker-knowledge-mining

18 nips-2001-A Rational Analysis of Cognitive Control in a Speeded Discrimination Task

Source: pdf

Author: Michael C. Mozer, Michael D. Colagrosso, David E. Huber

Abstract: We are interested in the mechanisms by which individuals monitor and adjust their performance of simple cognitive tasks. We model a speeded discrimination task in which individuals are asked to classify a sequence of stimuli (Jones & Braver, 2001). Response conﬂict arises when one stimulus class is infrequent relative to another, resulting in more errors and slower reaction times for the infrequent class. How do control processes modulate behavior based on the relative class frequencies? We explain performance from a rational perspective that casts the goal of individuals as minimizing a cost that depends both on error rate and reaction time. With two additional assumptions of rationality—that class prior probabilities are accurately estimated and that inference is optimal subject to limitations on rate of information transmission—we obtain a good ﬁt to overall RT and error data, as well as trial-by-trial variations in performance. Consider the following scenario: While driving, you approach an intersection at which the trafﬁc light has already turned yellow, signaling that it is about to turn red. You also notice that a car is approaching you rapidly from behind, with no indication of slowing. Should you stop or speed through the intersection? The decision is difﬁcult due to the presence of two conﬂicting signals. Such response conﬂict can be produced in a psychological laboratory as well. For example, Stroop (1935) asked individuals to name the color of ink on which a word is printed. When the words are color names incongruous with the ink color— e.g., “blue” printed in red—reaction times are slower and error rates are higher. We are interested in the control mechanisms underlying performance of high-conﬂict tasks. Conﬂict requires individuals to monitor and adjust their behavior, possibly responding more slowly if errors are too frequent. In this paper, we model a speeded discrimination paradigm in which individuals are asked to classify a sequence of stimuli (Jones & Braver, 2001). The stimuli are letters of the alphabet, A–Z, presented in rapid succession. In a choice task, individuals are asked to press one response key if the letter is an X or another response key for any letter other than X (as a shorthand, we will refer to non-X stimuli as Y). In a go/no-go task, individuals are asked to press a response key when X is presented and to make no response otherwise. We address both tasks because they elicit slightly different decision-making behavior. In both tasks, Jones and Braver (2001) manipulated the relative frequency of the X and Y stimuli; the ratio of presentation frequency was either 17:83, 50:50, or 83:17. Response conﬂict arises when the two stimulus classes are unbalanced in frequency, resulting in more errors and slower reaction times. For example, when X’s are frequent but Y is presented, individuals are predisposed toward producing the X response, and this predisposition must be overcome by the perceptual evidence from the Y. Jones and Braver (2001) also performed an fMRI study of this task and found that anterior cingulate cortex (ACC) becomes activated in situations involving response conﬂict. Specifically, when one stimulus occurs infrequently relative to the other, event-related fMRI response in the ACC is greater for the low frequency stimulus. Jones and Braver also extended a neural network model of Botvinick, Braver, Barch, Carter, and Cohen (2001) to account for human performance in the two discrimination tasks. The heart of the model is a mechanism that monitors conﬂict—the posited role of the ACC—and adjusts response biases accordingly. In this paper, we develop a parsimonious alternative account of the role of the ACC and of how control processes modulate behavior when response conﬂict arises. 1 A RATIONAL ANALYSIS Our account is based on a rational analysis of human cognition, which views cognitive processes as being optimized with respect to certain task-related goals, and being adaptive to the structure of the environment (Anderson, 1990). We make three assumptions of rationality: (1) perceptual inference is optimal but is subject to rate limitations on information transmission, (2) response class prior probabilities are accurately estimated, and (3) the goal of individuals is to minimize a cost that depends both on error rate and reaction time. The heart of our account is an existing probabilistic model that explains a variety of facilitation effects that arise from long-term repetition priming (Colagrosso, in preparation; Mozer, Colagrosso, & Huber, 2000), and more broadly, that addresses changes in the nature of information transmission in neocortex due to experience. We give a brief overview of this model; the details are not essential for the present work. The model posits that neocortex can be characterized by a collection of informationprocessing pathways, and any act of cognition involves coordination among pathways. To model a simple discrimination task, we might suppose a perceptual pathway to map the visual input to a semantic representation, and a response pathway to map the semantic representation to a response. The choice and go/no-go tasks described earlier share a perceptual pathway, but require different response pathways. The model is framed in terms of probability theory: pathway inputs and outputs are random variables and microinference in a pathway is carried out by Bayesian belief revision. To elaborate, consider a pathway whose input at time is a discrete random variable, denoted , which can assume values corresponding to alternative input states. Similarly, the output of the pathway at time is a discrete random variable, denoted , which can assume values . For example, the input to the perceptual pathway in the discrimination task is one of visual patterns corresponding to the letters of the alphabet, and the output is one of letter identities. (This model is highly abstract: the visual patterns are enumerated, but the actual pixel patterns are not explicitly represented in the model. Nonetheless, the similarity structure among inputs can be captured, but we skip a discussion of this issue because it is irrelevant for the current work.) To present a particular input alternative, , to the model for time steps, we clamp for . The model computes a probability distribution over given , i.e., P . ¡ # 4 0 ©2' & 0 ' ! 1)(

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Mozer ¡ ¤ Abstract We are interested in the mechanisms by which individuals monitor and adjust their performance of simple cognitive tasks. [sent-5, score-0.278]

2 We model a speeded discrimination task in which individuals are asked to classify a sequence of stimuli (Jones & Braver, 2001). [sent-6, score-0.563]

3 Response conﬂict arises when one stimulus class is infrequent relative to another, resulting in more errors and slower reaction times for the infrequent class. [sent-7, score-0.701]

4 How do control processes modulate behavior based on the relative class frequencies? [sent-8, score-0.108]

5 We explain performance from a rational perspective that casts the goal of individuals as minimizing a cost that depends both on error rate and reaction time. [sent-9, score-0.953]

6 With two additional assumptions of rationality—that class prior probabilities are accurately estimated and that inference is optimal subject to limitations on rate of information transmission—we obtain a good ﬁt to overall RT and error data, as well as trial-by-trial variations in performance. [sent-10, score-0.144]

7 Such response conﬂict can be produced in a psychological laboratory as well. [sent-15, score-0.236]

8 For example, Stroop (1935) asked individuals to name the color of ink on which a word is printed. [sent-16, score-0.294]

9 Conﬂict requires individuals to monitor and adjust their behavior, possibly responding more slowly if errors are too frequent. [sent-21, score-0.323]

10 In this paper, we model a speeded discrimination paradigm in which individuals are asked to classify a sequence of stimuli (Jones & Braver, 2001). [sent-22, score-0.512]

11 In a choice task, individuals are asked to press one response key if the letter is an X or another response key for any letter other than X (as a shorthand, we will refer to non-X stimuli as Y). [sent-24, score-0.878]

12 In a go/no-go task, individuals are asked to press a response key when X is presented and to make no response otherwise. [sent-25, score-0.695]

13 In both tasks, Jones and Braver (2001) manipulated the relative frequency of the X and Y stimuli; the ratio of presentation frequency was either 17:83, 50:50, or 83:17. [sent-27, score-0.255]

14 Response conﬂict arises when the two stimulus classes are unbalanced in frequency, resulting in more errors and slower reaction times. [sent-28, score-0.605]

15 For example, when X’s are frequent but Y is presented, individuals are predisposed toward producing the X response, and this predisposition must be overcome by the perceptual evidence from the Y. [sent-29, score-0.282]

16 Jones and Braver (2001) also performed an fMRI study of this task and found that anterior cingulate cortex (ACC) becomes activated in situations involving response conﬂict. [sent-30, score-0.389]

17 Specifically, when one stimulus occurs infrequently relative to the other, event-related fMRI response in the ACC is greater for the low frequency stimulus. [sent-31, score-0.417]

18 Jones and Braver also extended a neural network model of Botvinick, Braver, Barch, Carter, and Cohen (2001) to account for human performance in the two discrimination tasks. [sent-32, score-0.25]

19 The heart of the model is a mechanism that monitors conﬂict—the posited role of the ACC—and adjusts response biases accordingly. [sent-33, score-0.412]

20 In this paper, we develop a parsimonious alternative account of the role of the ACC and of how control processes modulate behavior when response conﬂict arises. [sent-34, score-0.388]

21 1 A RATIONAL ANALYSIS Our account is based on a rational analysis of human cognition, which views cognitive processes as being optimized with respect to certain task-related goals, and being adaptive to the structure of the environment (Anderson, 1990). [sent-35, score-0.286]

22 The model posits that neocortex can be characterized by a collection of informationprocessing pathways, and any act of cognition involves coordination among pathways. [sent-39, score-0.072]

23 To model a simple discrimination task, we might suppose a perceptual pathway to map the visual input to a semantic representation, and a response pathway to map the semantic representation to a response. [sent-40, score-1.055]

24 The choice and go/no-go tasks described earlier share a perceptual pathway, but require different response pathways. [sent-41, score-0.412]

25 The model is framed in terms of probability theory: pathway inputs and outputs are random variables and microinference in a pathway is carried out by Bayesian belief revision. [sent-42, score-0.659]

26 To elaborate, consider a pathway whose input at time is a discrete random variable, denoted , which can assume values corresponding to alternative input states. [sent-43, score-0.39]

27 Similarly, the output of the pathway at time is a discrete random variable, denoted , which can assume values . [sent-44, score-0.353]

28 For example, the input to the perceptual pathway in the discrimination task is one of visual patterns corresponding to the letters of the alphabet, and the output is one of letter identities. [sent-45, score-0.586]

29 ) To present a particular input alternative, , to the model for time steps, we clamp for . [sent-48, score-0.089]

30 X(3) Y(T) X(T) Probability of responding Y(0) 1 0. [sent-57, score-0.091]

31 Thus, our model captures the time course of information processing for a single event. [sent-64, score-0.089]

32 (2000) model operates, the right panel of Figure 1 depicts the time course of inference in a single pathway which has 26 input and output alternatives, with one-to-one associations. [sent-68, score-0.421]

33 Due to limited association strengths, perceptual evidence must accumulate over many iterations in order for the target to be produced with high probability. [sent-72, score-0.192]

34 The densely dashed line shows the same target probability when the target prior is increased, and the sparsely dashed line shows the target probability when the association strength to the target is increased. [sent-73, score-0.242]

35 Increasing either the prior or the association strength causes the speed-accuracy curve to shift to the left. [sent-74, score-0.114]

36 In our previous work, we proposed a mechanism by which priors and association strengths are altered through experience. [sent-75, score-0.265]

37 A perceptual pathway maps visual patterns (26 alternatives) to a letter-identity representation (26 alternatives), and a response pathway maps the letter identity to a response. [sent-78, score-0.99]

38 For the choice task, the response pathway has two outputs, corresponding to the two response keys; for the go/no-go task, the response pathway also has two outputs, which are interpreted as “go” and “no go. [sent-79, score-1.332]

39 ” The interconnection between the pathways is achieved by copying the output of the perceptual pathway, , to the input of the response pathway, , at each time. [sent-80, score-0.396]

40 The free parameters of the model are mostly task and experience related. [sent-81, score-0.084]

41 (2000), with one exception: Because the speeded perceptual discrimination task studied here is quite unlike ¦¥$ 1¦¡ ¢¥ ¦¥¤ ¢ ¤# £ the tasks studied by Mozer et al. [sent-83, score-0.392]

42 , we allowed ourselves to vary the association-strength parameter in the response pathway. [sent-84, score-0.236]

43 In our simulations, we also use the priming mechanism proposed by Mozer et al. [sent-86, score-0.321]

44 The priors for a pathway are internally represented in a nonnormalized form: the nonnormalized prior for alternative is , and the normalized prior is . [sent-88, score-0.566]

45 On each trial, the priming mechanism increases the nonnorP malized prior of alternative in proportion to its asymptotic activity at ﬁnal time , and and all priors undergo exponential decay: P , where is the strength of priming, and is the decay rate. [sent-89, score-0.556]

46 model also performs priming in the association strengths by a similar rule, which is included in the present simulation although it has a negligible effect on the results here. [sent-91, score-0.4]

47 ) 3 ¡ ¢ 4 ¡ ¡ ¥ A¡ 3 ' ¥ $4¢ # ¢ 3 ' ¡ ' @¥3 ' ¥ ¢ ¢ # ¢ ¨ ¦ ¤ ¨ ©§¥¡ £ This priming mechanism yields priors on average that match the presentation probabilities in the task, e. [sent-92, score-0.44]

48 Consequently, when we report results for overall error rate and reaction time in a condition, we make the assumption of rationality that the model’s priors correspond to the true priors of the environment. [sent-97, score-0.875]

49 Although the model yields the same result when the priming mechanism is used on a trial-by-trial basis to adjust the priors, the explicit assumption of rationality avoids any confusion about the factors responsible for the model’s performance. [sent-98, score-0.472]

50 We use the priming mechanism on a trial-by-trial basis to account for performance conditional on recent trial history, as explained later. [sent-99, score-0.451]

51 2 Control Processes and the Speed-Accuracy Trade Off The response pathway of the model produces a speed-accuracy performance function much like that in Figure 1b. [sent-101, score-0.566]

52 This function characterizes the operation of the pathway, but it does not address the control issue of when in time to initiate a response. [sent-102, score-0.168]

53 A control mechanism might simply choose a threshold in accuracy or in reaction time, but we hypothesize a more general, rational approach in which a response cost is computed, and control mechanisms initiate a response at the point in time when a minimum in cost is attained. [sent-103, score-1.622]

54 When stimulus S is presented and the correct response is R, we posit a cost of responding at time following stimulus onset: ¥ A R S $ # ! [sent-104, score-0.636]

55 ©" P ' ¥¦¤¢ ¥ ¢ ' ¥ ¨ 7¤¢ # A S R (1) This cost involves two terms—the error rate and the reaction time—which are summed, with a weighting factor, , that determines the relative importance of the two terms. [sent-105, score-0.712]

56 We assume that is dependent on task instructions: if individuals are told to make no errors, should be small to emphasize the error rate; if individuals are told to respond quickly and not concern themselves with occasional errors, should be large to emphasize the reaction time. [sent-106, score-0.984]

57 $ # # # # ¥ ¨ 9¤ ¢ A The cost S R cannot be computed without knowing the correct response R. [sent-107, score-0.334]

58 We index by the response R because it is not sensible to assign a time cost to a “no go” response, where no response is produced. [sent-109, score-0.626]

59 Consequently, ; for the “go” response and for the two responses in the choice task, we searched for the parameter that best ﬁt the data, yielding . [sent-110, score-0.324]

60 4 83:17 Probability of responding 50:50 Probability of responding Probability of responding 17:83 1 1. [sent-134, score-0.273]

61 4 10 20 30 40 Reaction time 50 Figure 2: (upper row) Output of response pathway when stimulus S, associated with response R, is presented, and relative frequency of R and the alternative response, R, is 17:83, 50:50, and 83. [sent-138, score-1.043]

62 2 RESULTS Figure 2 illustrates the model’s performance on the choice task when presented with a stimulus, S, associated with a response, R, and the relative frequency of R and the alternative response, R, is 17:83, 50:50, or 83:17 (left, center, and right columns, respectively). [sent-142, score-0.237]

63 In the early part of the cost function, error rate dominates the cost, and in the late part, reaction time dominates. [sent-146, score-0.74]

64 In fact, at long times, the error rate is essentially 0, and the cost grows linearly with reaction time. [sent-147, score-0.684]

65 Our rational analysis suggests that a response should be initiated at the global minimum—indicated by asterisks in the ﬁgure—implying that both the reaction time and error rate will decrease as the response prior is increased. [sent-148, score-1.255]

66 ¦¥¤ ¢ Figure 3 presents human and simulation data for the choice task. [sent-149, score-0.197]

67 The data consist of mean reaction time and accuracy for the two target responses, and , for the three condi: presentation ratios. [sent-150, score-0.701]

68 Figure 4 presents human and tions corresponding to different simulation data for the go/no-go task. [sent-151, score-0.167]

69 Note that reaction times are shown only for the “go” trials, because no response is required for the “no go” trials. [sent-152, score-0.747]

70 For both tasks, the model provides an extremely good ﬁt to the human data. [sent-153, score-0.137]

71 The qualities of the model giving rise to the ﬁt can be inferred by inspection of Figure 2—namely, accuracy is higher and reaction times are faster when a response is expected. [sent-154, score-0.837]

72 ¢ £¡ 5 ¡ ¢ ¡ 5 ¡ Figure 5 reveals how the recent history of experimental trials inﬂuences reaction time and error rate in the choice task. [sent-155, score-0.75]

73 The trial context along the x-axis is coded as , where speciﬁes that trial required the same (“S”) or different (“D”) response as trial . [sent-156, score-0.521]

74 For example, if the current trial required response X, and the four trials leading up to the current trial were—in forward temporal order—Y, Y, Y, and X, the current trial’s context would be coded as “SSDS. [sent-157, score-0.466]

75 ” The correlation coefﬁcient between human and simulation data is . [sent-158, score-0.167]

76 @ 5 ¤ ¢ ©§¦¤ ¤¨¤¥ 3¡ ¡ § ¤ The model ﬁts the human data extremely well. [sent-162, score-0.137]

77 The simple priming mechanism proposed previously by Mozer et al. [sent-163, score-0.321]

78 (2000), which aims to adapt the model’s priors rapidly to the statistics of the environment, is responsible: On a coarse time scale, the mechanism produces priors in the model that match priors in the environment. [sent-164, score-0.415]

79 8 17:83 50:50 83:17 R1:R2 frequency 17:83 50:50 83:17 R1:R2 frequency Human Data Simulation 30 go 380 360 340 320 Reaction time Reaction time 420 400 go 28 26 24 300 17:83 50:50 83:17 go:no-go frequency 17:83 50:50 83:17 go:no-go frequency 1 go no-go 0. [sent-169, score-0.812]

80 8 17:83 50:50 83:17 go:no-go frequency Figure 3: Human data (left column) and simulation results (right column) for the choice task. [sent-173, score-0.184]

81 The upper and lower rows show mean reaction time and accuracy, respectively, for the two responses ( and ) in the three conditions corresponding to different : frequencies. [sent-175, score-0.625]

82 ¢ ¡ 5 ¡ R1 R2 ¡ 380 5 400 ¢ ¡ Human Data 420 17:83 50:50 83:17 go:no-go frequency Figure 4: Human data (left column) and simulation results (right column) for the go/nogo task. [sent-176, score-0.154]

83 The upper and lower rows show mean reaction time and accuracy, respectively, for the two responses (go and no-go) in the three conditions corresponding to different go:no-go presentation frequencies. [sent-178, score-0.67]

84 are clearly those in which the previous two trials required the same response as the current trial (the leftmost four contexts in each graph). [sent-179, score-0.371]

85 priming mechanism was used to model perceptual priming, and here the same mechanism is used to model response priming. [sent-181, score-0.841]

86 The model provides a parsimonious account of the detailed pattern of human data from two speeded discrimination tasks. [sent-183, score-0.367]

87 The heart of the model was proposed previously by Mozer, Colagrosso, and Huber (2000), and in the present work we ﬁt experimental data with only two free parameters, one relating to the rate of information ﬂow, and the other specifying the relative cost of speed and errors. [sent-184, score-0.245]

88 The simplicity and elegance of the model arises from having adopted the rational perspective, which imposes strong constraints on the model and removes arbitrary choices and degrees of freedom that are often present in psychological models. [sent-185, score-0.167]

89 Jones and Braver (2001) proposed a neural network model to address response conﬂict in a speeded discrimination task. [sent-186, score-0.464]

90 6 Sequence of stimuli Sequence of stimuli Figure 5: Reaction time (left curve) and accuracy (right curve) data for humans (solid line) and model (dashed line), contingent on the recent history of experimental trials. [sent-199, score-0.306]

91 In brief, their model is an associative net mapping activity from stimulus units to response units. [sent-201, score-0.331]

92 When response units and both receive signiﬁcant activation, noise in the system can push the inappropriate response unit over threshold. [sent-202, score-0.472]

93 When this conﬂict situation is detected, a control mechanism acts to lower the baseline activity of response units, requiring them to build up more evidence before responding and thereby reducing the likelihood of noise determining the response. [sent-203, score-0.48]

94 Their model includes a priming mechanism to facilitate repetition of responses, much as we have in our model. [sent-204, score-0.354]

95 However, their model also includes a secondary priming mechanism to facilitate alternation of responses, which our model does not require. [sent-205, score-0.387]

96 Our account makes an alternative proposal—that ACC activity reﬂects the expected cost of decision making. [sent-208, score-0.17]

97 Both hypotheses are consistent with the fMRI data indicating that the ACC produces a greater response for a low frequency stimulus. [sent-209, score-0.327]

98 Evaluating the demand for control: anterior cingulate cortex and conﬂict monitoring. [sent-224, score-0.102]

99 A Bayesian cognitive architecture for analyzing information transmission in neocortex. [sent-228, score-0.101]

100 Sequential modulations in control: Conﬂict monitoring and the anterior cingulate cortex. [sent-237, score-0.131]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('reaction', 0.511), ('pathway', 0.297), ('ict', 0.274), ('response', 0.236), ('braver', 0.217), ('priming', 0.217), ('individuals', 0.168), ('acc', 0.137), ('jones', 0.128), ('speeded', 0.117), ('perceptual', 0.114), ('go', 0.112), ('mechanism', 0.104), ('human', 0.104), ('con', 0.102), ('mozer', 0.102), ('colagrosso', 0.102), ('rational', 0.101), ('cost', 0.098), ('trial', 0.095), ('responding', 0.091), ('frequency', 0.091), ('rationality', 0.085), ('botvinick', 0.078), ('discrimination', 0.078), ('priors', 0.074), ('simulation', 0.063), ('stimulus', 0.062), ('stimuli', 0.061), ('cingulate', 0.059), ('responses', 0.058), ('huber', 0.058), ('accuracy', 0.057), ('time', 0.056), ('asked', 0.055), ('transmission', 0.055), ('cohen', 0.054), ('task', 0.051), ('control', 0.049), ('rate', 0.047), ('pathways', 0.046), ('fmri', 0.046), ('cognitive', 0.046), ('association', 0.046), ('letter', 0.046), ('presentation', 0.045), ('anterior', 0.043), ('strengths', 0.041), ('prior', 0.04), ('trials', 0.04), ('carter', 0.039), ('dddd', 0.039), ('ddds', 0.039), ('ddsd', 0.039), ('ddss', 0.039), ('dsdd', 0.039), ('dsds', 0.039), ('dssd', 0.039), ('dsss', 0.039), ('ink', 0.039), ('neocortex', 0.039), ('nonnormalized', 0.039), ('sddd', 0.039), ('sdds', 0.039), ('sdsd', 0.039), ('sdss', 0.039), ('ssdd', 0.039), ('ssds', 0.039), ('sssd', 0.039), ('ssss', 0.039), ('yeung', 0.039), ('heart', 0.039), ('history', 0.038), ('alternative', 0.037), ('characterizes', 0.036), ('nonetheless', 0.036), ('account', 0.035), ('panel', 0.035), ('barch', 0.034), ('infrequent', 0.034), ('adjust', 0.033), ('model', 0.033), ('target', 0.032), ('color', 0.032), ('slower', 0.032), ('outputs', 0.032), ('tasks', 0.032), ('alternatives', 0.031), ('monitor', 0.031), ('modulate', 0.031), ('posit', 0.031), ('choice', 0.03), ('limitations', 0.029), ('monitoring', 0.029), ('told', 0.029), ('error', 0.028), ('strength', 0.028), ('relative', 0.028), ('alphabet', 0.027), ('initiate', 0.027)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999952 18 nips-2001-A Rational Analysis of Cognitive Control in a Speeded Discrimination Task

Author: Michael C. Mozer, Michael D. Colagrosso, David E. Huber

2 0.094712064 160 nips-2001-Reinforcement Learning and Time Perception -- a Model of Animal Experiments

Author: Jonathan L. Shapiro, J. Wearden

Abstract: Animal data on delayed-reward conditioning experiments shows a striking property - the data for different time intervals collapses into a single curve when the data is scaled by the time interval. This is called the scalar property of interval timing. Here a simple model of a neural clock is presented and shown to give rise to the scalar property. The model is an accumulator consisting of noisy, linear spiking neurons. It is analytically tractable and contains only three parameters. When coupled with reinforcement learning it simulates peak procedure experiments, producing both the scalar property and the pattern of single trial covariances. 1

3 0.08524248 123 nips-2001-Modeling Temporal Structure in Classical Conditioning

Author: Aaron C. Courville, David S. Touretzky

Abstract: The Temporal Coding Hypothesis of Miller and colleagues [7] suggests that animals integrate related temporal patterns of stimuli into single memory representations. We formalize this concept using quasi-Bayes estimation to update the parameters of a constrained hidden Markov model. This approach allows us to account for some surprising temporal effects in the second order conditioning experiments of Miller et al. [1 , 2, 3], which other models are unable to explain. 1

4 0.081734635 78 nips-2001-Fragment Completion in Humans and Machines

Author: David Jacobs, Bas Rokers, Archisman Rudra, Zili Liu

Abstract: Partial information can trigger a complete memory. At the same time, human memory is not perfect. A cue can contain enough information to specify an item in memory, but fail to trigger that item. In the context of word memory, we present experiments that demonstrate some basic patterns in human memory errors. We use cues that consist of word fragments. We show that short and long cues are completed more accurately than medium length ones and study some of the factors that lead to this behavior. We then present a novel computational model that shows some of the ﬂexibility and patterns of errors that occur in human memory. This model iterates between bottom-up and top-down computations. These are tied together using a Markov model of words that allows memory to be accessed with a simple feature set, and enables a bottom-up process to compute a probability distribution of possible completions of word fragments, in a manner similar to models of visual perceptual completion.

5 0.077866256 11 nips-2001-A Maximum-Likelihood Approach to Modeling Multisensory Enhancement

Author: H. Colonius, A. Diederich

Abstract: Multisensory response enhancement (MRE) is the augmentation of the response of a neuron to sensory input of one modality by simultaneous input from another modality. The maximum likelihood (ML) model presented here modifies the Bayesian model for MRE (Anastasio et al.) by incorporating a decision strategy to maximize the number of correct decisions. Thus the ML model can also deal with the important tasks of stimulus discrimination and identification in the presence of incongruent visual and auditory cues. It accounts for the inverse effectiveness observed in neurophysiological recording data, and it predicts a functional relation between uni- and bimodal levels of discriminability that is testable both in neurophysiological and behavioral experiments. 1

6 0.073970184 87 nips-2001-Group Redundancy Measures Reveal Redundancy Reduction in the Auditory Pathway

7 0.067014568 12 nips-2001-A Model of the Phonological Loop: Generalization and Binding

8 0.064939149 174 nips-2001-Spike timing and the coding of naturalistic sounds in a central auditory area of songbirds

9 0.064558074 151 nips-2001-Probabilistic principles in unsupervised learning of visual structure: human data and a model

10 0.060607459 50 nips-2001-Classifying Single Trial EEG: Towards Brain Computer Interfacing

11 0.060424451 124 nips-2001-Modeling the Modulatory Effect of Attention on Human Spatial Vision

12 0.052396014 57 nips-2001-Correlation Codes in Neuronal Populations

13 0.052368622 54 nips-2001-Contextual Modulation of Target Saliency

14 0.050484847 41 nips-2001-Bayesian Predictive Profiles With Applications to Retail Transaction Data

15 0.048939563 48 nips-2001-Characterizing Neural Gain Control using Spike-triggered Covariance

16 0.048914626 36 nips-2001-Approximate Dynamic Programming via Linear Programming

17 0.046187118 82 nips-2001-Generating velocity tuning by asymmetric recurrent connections

18 0.044946671 141 nips-2001-Orientation-Selective aVLSI Spiking Neurons

19 0.044747494 145 nips-2001-Perceptual Metamers in Stereoscopic Vision

20 0.043529637 43 nips-2001-Bayesian time series classification

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.147), (1, -0.117), (2, -0.046), (3, 0.006), (4, -0.058), (5, -0.019), (6, -0.043), (7, 0.015), (8, -0.023), (9, -0.002), (10, -0.01), (11, 0.088), (12, -0.136), (13, 0.02), (14, -0.026), (15, -0.007), (16, 0.065), (17, 0.048), (18, 0.08), (19, 0.092), (20, -0.114), (21, -0.107), (22, -0.058), (23, 0.021), (24, 0.007), (25, 0.054), (26, -0.015), (27, 0.102), (28, 0.022), (29, -0.009), (30, 0.033), (31, -0.015), (32, 0.0), (33, 0.062), (34, 0.036), (35, 0.016), (36, 0.06), (37, -0.002), (38, -0.101), (39, -0.011), (40, -0.047), (41, -0.014), (42, -0.05), (43, -0.036), (44, -0.056), (45, -0.095), (46, 0.023), (47, -0.118), (48, -0.077), (49, -0.083)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95371085 18 nips-2001-A Rational Analysis of Cognitive Control in a Speeded Discrimination Task

Author: Michael C. Mozer, Michael D. Colagrosso, David E. Huber

2 0.72916728 11 nips-2001-A Maximum-Likelihood Approach to Modeling Multisensory Enhancement

Author: H. Colonius, A. Diederich

3 0.62232357 123 nips-2001-Modeling Temporal Structure in Classical Conditioning

Author: Aaron C. Courville, David S. Touretzky

4 0.60550547 151 nips-2001-Probabilistic principles in unsupervised learning of visual structure: human data and a model

Author: Shimon Edelman, Benjamin P. Hiles, Hwajin Yang, Nathan Intrator

Abstract: To ﬁnd out how the representations of structured visual objects depend on the co-occurrence statistics of their constituents, we exposed subjects to a set of composite images with tight control exerted over (1) the conditional probabilities of the constituent fragments, and (2) the value of Barlow’s criterion of “suspicious coincidence” (the ratio of joint probability to the product of marginals). We then compared the part veriﬁcation response times for various probe/target combinations before and after the exposure. For composite probes, the speedup was much larger for targets that contained pairs of fragments perfectly predictive of each other, compared to those that did not. This effect was modulated by the signiﬁcance of their co-occurrence as estimated by Barlow’s criterion. For lone-fragment probes, the speedup in all conditions was generally lower than for composites. These results shed light on the brain’s strategies for unsupervised acquisition of structural information in vision. 1 Motivation How does the human visual system decide for which objects it should maintain distinct and persistent internal representations of the kind typically postulated by theories of object recognition? Consider, for example, the image shown in Figure 1, left. This image can be represented as a monolithic hieroglyph, a pair of Chinese characters (which we shall refer to as and ), a set of strokes, or, trivially, as a collection of pixels. Note that the second option is only available to a system previously exposed to various combinations of Chinese characters. Indeed, a principled decision whether to represent this image as , or otherwise can only be made on the basis of prior exposure to related images. £ ¡ £¦ ¡ £ ¥¨§¢ ¥¤¢ ¢ According to Barlow’s [1] insight, one useful principle is tallying suspicious coincidences: two candidate fragments and should be combined into a composite object if the probability of their joint appearance is much higher than , which is the probability expected in the case of their statistical independence. This criterion may be compared to the Minimum Description Length (MDL) principle, which has been previously discussed in the context of object representation [2, 3]. In a simpliﬁed form [4], MDL calls for representing explicitly as a whole if , just as the principle of suspicious coincidences does. £ ©¢ £ ¢ ¥¤¥ £¦ ¢ ¥ £ ¢ £¦ ¢ ¥¤¥! ¨§¥ £ ¢ £ ©¢ £¦ £ ¨§¢¥ ¡ ¢ While the Barlow/MDL criterion certainly indicates a suspicious coincidence, there are additional probabilistic considerations that may be used and . One example is the possiin setting the degree of association between ble perfect predictability of from and vice versa, as measured by . If , then and are perfectly predictive of each other and should really be coded by a single symbol, whereas the MDL criterion may suggest merely that some association between the representation of and that of be established. In comparison, if and are not perfectly predictive of each other ( ), there is a case to be made in favor of coding them separately to allow for a maximally expressive representation, whereas MDL may actually suggest a high degree of association ). In this study we investigated whether the human (if visual system uses a criterion based on alongside MDL while learning (in an unsupervised manner) to represent composite objects. £ £ £ ¢ ¥ ¥ © §¥ ¡ ¢ ¨¦¤

5 0.56350446 160 nips-2001-Reinforcement Learning and Time Perception -- a Model of Animal Experiments

Author: Jonathan L. Shapiro, J. Wearden

6 0.53636384 78 nips-2001-Fragment Completion in Humans and Machines

7 0.49561319 3 nips-2001-ACh, Uncertainty, and Cortical Inference

8 0.47897816 174 nips-2001-Spike timing and the coding of naturalistic sounds in a central auditory area of songbirds

9 0.46519095 12 nips-2001-A Model of the Phonological Loop: Generalization and Binding

10 0.44841945 87 nips-2001-Group Redundancy Measures Reveal Redundancy Reduction in the Auditory Pathway

11 0.42223379 14 nips-2001-A Neural Oscillator Model of Auditory Selective Attention

12 0.42011476 57 nips-2001-Correlation Codes in Neuronal Populations

13 0.40751415 126 nips-2001-Motivated Reinforcement Learning

14 0.39936173 124 nips-2001-Modeling the Modulatory Effect of Attention on Human Spatial Vision

15 0.37812689 131 nips-2001-Neural Implementation of Bayesian Inference in Population Codes

16 0.37609932 108 nips-2001-Learning Body Pose via Specialized Maps

17 0.37478238 48 nips-2001-Characterizing Neural Gain Control using Spike-triggered Covariance

18 0.36820778 145 nips-2001-Perceptual Metamers in Stereoscopic Vision

19 0.36014944 19 nips-2001-A Rotation and Translation Invariant Discrete Saliency Network

20 0.35507113 41 nips-2001-Bayesian Predictive Profiles With Applications to Retail Transaction Data

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(14, 0.019), (19, 0.019), (27, 0.056), (30, 0.061), (38, 0.02), (59, 0.02), (72, 0.052), (79, 0.032), (91, 0.633)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99190831 18 nips-2001-A Rational Analysis of Cognitive Control in a Speeded Discrimination Task

Author: Michael C. Mozer, Michael D. Colagrosso, David E. Huber

2 0.99107713 189 nips-2001-The g Factor: Relating Distributions on Features to Distributions on Images

Author: James M. Coughlan, Alan L. Yuille

Abstract: We describe the g-factor, which relates probability distributions on image features to distributions on the images themselves. The g-factor depends only on our choice of features and lattice quantization and is independent of the training image data. We illustrate the importance of the g-factor by analyzing how the parameters of Markov Random Field (i.e. Gibbs or log-linear) probability models of images are learned from data by maximum likelihood estimation. In particular, we study homogeneous MRF models which learn image distributions in terms of clique potentials corresponding to feature histogram statistics (d. Minimax Entropy Learning (MEL) by Zhu, Wu and Mumford 1997 [11]) . We first use our analysis of the g-factor to determine when the clique potentials decouple for different features . Second, we show that clique potentials can be computed analytically by approximating the g-factor. Third, we demonstrate a connection between this approximation and the Generalized Iterative Scaling algorithm (GIS), due to Darroch and Ratcliff 1972 [2], for calculating potentials. This connection enables us to use GIS to improve our multinomial approximation, using Bethe-Kikuchi[8] approximations to simplify the GIS procedure. We support our analysis by computer simulations. 1

3 0.9904151 87 nips-2001-Group Redundancy Measures Reveal Redundancy Reduction in the Auditory Pathway

Author: Gal Chechik, Amir Globerson, M. J. Anderson, E. D. Young, Israel Nelken, Naftali Tishby

Abstract: The way groups of auditory neurons interact to code acoustic information is investigated using an information theoretic approach. We develop measures of redundancy among groups of neurons, and apply them to the study of collaborative coding efficiency in two processing stations in the auditory pathway: the inferior colliculus (IC) and the primary auditory cortex (AI). Under two schemes for the coding of the acoustic content, acoustic segments coding and stimulus identity coding, we show differences both in information content and group redundancies between IC and AI neurons. These results provide for the first time a direct evidence for redundancy reduction along the ascending auditory pathway, as has been hypothesized for theoretical considerations [Barlow 1959,2001]. The redundancy effects under the single-spikes coding scheme are significant only for groups larger than ten cells, and cannot be revealed with the redundancy measures that use only pairs of cells. The results suggest that the auditory system transforms low level representations that contain redundancies due to the statistical structure of natural stimuli, into a representation in which cortical neurons extract rare and independent component of complex acoustic signals, that are useful for auditory scene analysis. 1

4 0.98554993 148 nips-2001-Predictive Representations of State

Author: Michael L. Littman, Richard S. Sutton

Abstract: We show that states of a dynamical system can be usefully represented by multi-step, action-conditional predictions of future observations. State representations that are grounded in data in this way may be easier to learn, generalize better, and be less dependent on accurate prior models than, for example, POMDP state representations. Building on prior work by Jaeger and by Rivest and Schapire, in this paper we compare and contrast a linear specialization of the predictive approach with the state representations used in POMDPs and in k-order Markov models. Ours is the first specific formulation of the predictive idea that includes both stochasticity and actions (controls). We show that any system has a linear predictive state representation with number of predictions no greater than the number of states in its minimal POMDP model. In predicting or controlling a sequence of observations, the concepts of state and state estimation inevitably arise. There have been two dominant approaches. The generative-model approach, typified by research on partially observable Markov decision processes (POMDPs), hypothesizes a structure for generating observations and estimates its state and state dynamics. The history-based approach, typified by k-order Markov methods, uses simple functions of past observations as state, that is, as the immediate basis for prediction and control. (The data flow in these two approaches are diagrammed in Figure 1.) Of the two, the generative-model approach is more general. The model's internal state gives it temporally unlimited memorythe ability to remember an event that happened arbitrarily long ago--whereas a history-based approach can only remember as far back as its history extends. The bane of generative-model approaches is that they are often strongly dependent on a good model of the system's dynamics. Most uses of POMDPs, for example, assume a perfect dynamics model and attempt only to estimate state. There are algorithms for simultaneously estimating state and dynamics (e.g., Chrisman, 1992), analogous to the Baum-Welch algorithm for the uncontrolled case (Baum et al., 1970), but these are only effective at tuning parameters that are already approximately correct (e.g., Shatkay & Kaelbling, 1997). observations (and actions) 1-----1-----1..- (a) state rep'n observations (and actions) ¢E / t/' --+ 1-step delays . state rep'n (b) Figure 1: Data flow in a) POMDP and other recursive updating of state representation, and b) history-based state representation. In practice, history-based approaches are often much more effective. Here, the state representation is a relatively simple record of the stream of past actions and observations. It might record the occurrence of a specific subsequence or that one event has occurred more recently than another. Such representations are far more closely linked to the data than are POMDP representations. One way of saying this is that POMDP learning algorithms encounter many local minima and saddle points because all their states are equipotential. History-based systems immediately break symmetry, and their direct learning procedure makes them comparably simple. McCallum (1995) has shown in a number of examples that sophisticated history-based methods can be effective in large problems, and are often more practical than POMDP methods even in small ones. The predictive state representation (PSR) approach, which we develop in this paper, is like the generative-model approach in that it updates the state representation recursively, as in Figure l(a), rather than directly computing it from data. We show that this enables it to attain generality and compactness at least equal to that of the generative-model approach. However, the PSR approach is also like the history-based approach in that its representations are grounded in data. Whereas a history-based representation looks to the past and records what did happen, a PSR looks to the future and represents what will happen. In particular, a PSR is a vector of predictions for a specially selected set of action-observation sequences, called tests (after Rivest & Schapire, 1994). For example, consider the test U101U202, where U1 and U2 are specific actions and 01 and 02 are specific observations. The correct prediction for this test given the data stream up to time k is the probability of its observations occurring (in order) given that its actions are taken (in order) (i.e., Pr {Ok = 01, Ok+1 = 02 I A k = u1,A k + 1 = U2}). Each test is a kind of experiment that could be performed to tell us something about the system. If we knew the outcome of all possible tests, then we would know everything there is to know about the system. A PSR is a set of tests that is sufficient information to determine the prediction for all possible tests (a sufficient statistic). As an example of these points, consider the float/reset problem (Figure 2) consisting of a linear string of 5 states with a distinguished reset state on the far right. One action, f (float), causes the system to move uniformly at random to the right or left by one state, bounded at the two ends. The other action, r (reset), causes a jump to the reset state irrespective of the current state. The observation is always o unless the r action is taken when the system is already in the reset state, in which case the observation is 1. Thus, on an f action, the correct prediction is always 0, whereas on an r action, the correct prediction depends on how many fs there have been since the last r: for zero fS, it is 1; for one or two fS, it is 0.5; for three or four fS, it is 0.375; for five or six fs, it is 0.3125, and so on decreasing after every second f, asymptotically bottoming out at 0.2. No k-order Markov method can model this system exactly, because no limited-. .5 .5 a) float action 1,0=1 b) reset action Figure 2: Underlying dynamics of the float/reset problem for a) the float action and b) the reset action. The numbers on the arcs indicate transition probabilities. The observation is always 0 except on the reset action from the rightmost state, which produces an observation of 1. length history is a sufficient statistic. A POMDP approach can model it exactly by maintaining a belief-state representation over five or so states. A PSR, on the other hand, can exactly model the float/reset system using just two tests: rl and fOrI. Starting from the rightmost state, the correct predictions for these two tests are always two successive probabilities in the sequence given above (1, 0.5, 0.5, 0.375,...), which is always a sufficient statistic to predict the next pair in the sequence. Although this informational analysis indicates a solution is possible in principle, it would require a nonlinear updating process for the PSR. In this paper we restrict consideration to a linear special case of PSRs, for which we can guarantee that the number of tests needed does not exceed the number of states in the minimal POMDP representation (although we have not ruled out the possibility it can be considerably smaller). Of greater ultimate interest are the prospects for learning PSRs and their update functions, about which we can only speculate at this time. The difficulty of learning POMDP structures without good prior models are well known. To the extent that this difficulty is due to the indirect link between the POMDP states and the data, predictive representations may be able to do better. Jaeger (2000) introduced the idea of predictive representations as an alternative to belief states in hidden Markov models and provided a learning procedure for these models. We build on his work by treating the control case (with actions), which he did not significantly analyze. We have also been strongly influenced by the work of Rivest and Schapire (1994), who did consider tests including actions, but treated only the deterministic case, which is significantly different. They also explored construction and learning algorithms for discovering system structure. 1 Predictive State Representations We consider dynamical systems that accept actions from a discrete set A and generate observations from a discrete set O. We consider only predicting the system, not controlling it, so we do not designate an explicit reward observation. We refer to such a system as an environment. We use the term history to denote a test forming an initial stream of experience and characterize an environment by a probability distribution over all possible histories, P : {OIA}* H- [0,1], where P(Ol··· Otl a1··· at) is the probability of observations 01, ... , O£ being generated, in that order, given that actions aI, ... ,at are taken, in that order. The probability of a test t conditional on a history h is defined as P(tlh) = P(ht)/P(h). Given a set of q tests Q = {til, we define their (1 x q) prediction vector, p(h) = [P(t1Ih),P(t2Ih), ... ,P(tqlh)], as a predictive state representation (PSR) if and only if it forms a sufficient statistic for the environment, Le., if and only if P(tlh) = ft(P(h)), (1) for any test t and history h, and for some projection junction ft : [0, l]q ~ [0,1]. In this paper we focus on linear PSRs, for which the projection functions are linear, that is, for which there exist a (1 x q) projection vector mt, for every test t, such that (2) P(tlh) == ft(P(h)) =7 p(h)mf, for all histories h. Let Pi(h) denote the ith component of the prediction vector for some PSR. This can be updated recursively, given a new action-observation pair a,o, by .(h ) == P(t.lh ) == P(otil ha ) == faati(P(h)) == p(h)m'{;ati P2 ao 2 ao P(olha) faa (P(h)) p(h)mro ' (3) where the last step is specific to linear PSRs. We can now state our main result: Theorem 1 For any environment that can be represented by a finite POMDP model, there exists a linear PSR with number of tests no larger than the number of states in the minimal POMDP model. 2 Proof of Theorem 1: Constructing a PSR from a POMDP We prove Theorem 1 by showing that for any POMDP model of the environment, we can construct in polynomial time a linear PSR for that POMDP of lesser or equal complexity that produces the same probability distribution over histories as the POMDP model. We proceed in three steps. First, we review POMDP models and how they assign probabilities to tests. Next, we define an algorithm that takes an n-state POMDP model and produces a set of n or fewer tests, each of length less than or equal to n. Finally, we show that the set of tests constitute a PSR for the POMDP, that is, that there are projection vectors that, together with the tests' predictions, produce the same probability distribution over histories as the POMDP. A POMDP (Lovejoy, 1991; Kaelbling et al., 1998) is defined by a sextuple (8, A, 0, bo, T, 0). Here, 8 is a set of n underlying (hidden) states, A is a discrete set of actions, and 0 is a discrete set of observations. The (1 x n) vector bo is an initial state distribution. The set T consists of (n x n) transition matrices Ta, one for each action a, where Tlj is the probability of a transition from state i to j when action a is chosen. The set 0 consists of diagonal (n x n) observation matrices oa,o, one for each pair of observation 0 and action a, where o~'o is the probability of observation 0 when action a is selected and state i is reached. l The state representation in a POMDP (Figure l(a)) is the belief state-the (1 x n) vector of the state-occupation probabilities given the history h. It can be computed recursively given a new action a and observation 0 by b(h)Taoa,o b(hao) = b(h)Taoa,oe;' where en is the (1 x n)-vector of all Is. Finally, a POMDP defines a probability distribution over tests (and thus histories) by P(Ol ... otlhal ... at) == b(h)Ta1oal,Ol ... Taloa£,Ole~. (4) IThere are many equivalent formulations and the conversion procedure described here can be easily modified to accommodate other POMDP definitions. We now present our algorithm for constructing a PSR for a given POMDP. It uses a function u mapping tests to (1 x n) vectors defined recursively by u(c) == en and u(aot) == (Taoa,ou(t)T)T, where c represents the null test. Conceptually, the components of u(t) are the probabilities of the test t when applied from each underlying state of the POMDP; we call u(t) the outcome vector for test t. We say a test t is linearly independent of a set of tests S if its outcome vector is linearly independent of the set of outcome vectors of the tests in S. Our algorithm search is used and defined as Q -<- search(c, {}) search(t, S): for each a E A, 0 E 0 if aot is linearly independent of S then S -<- search(aot, S U {aot}) return S The algorithm maintains a set of tests and searches for new tests that are linearly independent of those already found. It is a form of depth-first search. The algorithm halts when it checks all the one-step extensions of its tests and finds none that are linearly independent. Because the set of tests Q returned by search have linearly independent outcome vectors, the cardinality of Q is bounded by n, ensuring that the algorithm halts after a polynomial number of iterations. Because each test in Q is formed by a one-step extension to some other test in Q, no test is longer than n action-observation pairs. The check for linear independence can be performed in many ways, including Gaussian elimination, implying that search terminates in polynomial time. By construction, all one-step extensions to the set of tests Q returned by search are linearly dependent on those in Q. We now show that this is true for any test. Lemma 1 The outcome vectors of the tests in Q can be linearly combined to produce the outcome vector for any test. Proof: Let U be the (n x q) matrix formed by concatenating the outcome vectors for all tests in Q. Since, for all combinations of a and 0, the columns of Taoa,ou are linearly dependent on the columns of U, we can write Taoa,ou == UW T for some q x q matrix of weights W. If t is a test that is linearly dependent on Q, then anyone-step extension of t, aot, is linearly dependent on Q. This is because we can write the outcome vector for t as u(t) == (UwT)T for some (1 x q) weight vector w and the outcome vector for aot as u(aot) == (Taoa,ou(t)T)T == (Taoa,oUwT)T == (UWTwT)T. Thus, aot is linearly dependent on Q. Now, note that all one-step tests are linearly dependent on Q by the structure of the search algorithm. Using the previous paragraph as an inductive argument, this implies that all tests are linearly dependent on Q. 0 Returning to the float/reset example POMDP, search begins with by enumerating the 4 extensions to the null test (fO, fl, rO, and rl). Of these, only fa and rO are are linearly independent. Of the extensions of these, fOrO is the only one that is linearly independent of the other two. The remaining two tests added to Q by search are fOfOrO and fOfOfOrO. No extensions of the 5 tests in Q are linearly independent of the 5 tests in Q, so the procedure halts. We now show that the set of tests Q constitute a PSR for the POMDP by constructing projection vectors that, together with the tests' predictions, produce the same probability distribution over histories as the POMDP. For each combination of a and 0, define a q x q matrix Mao == (U+Taoa,ou)T and a 1 x q vector mao == (U+Taoa,oe;;J T , where U is the matrix of outcome vectors defined in the previous section and U+ is its pseudoinverse2 • The ith row of Mao is maoti. The probability distribution on histories implied by these projection vectors is p(h )m~101 alOl p(h)M~ol M~_10l_1 m~Ol b(h)UU+r a1 oa 1,01 U ... U+T al-10 al-1,Ol-1 UU+Taloal,ol b(h)T a1 0 a1,01 ... ral-l0al-t,ol-lTaloal,Ole~, Le., it is the same as that of the POMDP, as in Equation 4. Here, the last step uses the fact that UU+v T == v T for v T linearly dependent on the columns of U. This holds by construction of U in the previous section. This completes the proof of Theorem 1. Completing the float/reset example, consider the Mf,o matrix found by the process defined in this section. It derives predictions for each test in Q after taking action f. Most of these are quite simple because the tests are so similar: the new prediction for rO is exactly the old prediction for fOrO, for example. The only non trivial test is fOfOfOrO. Its outcome can be computed from 0.250 p(rOlh) - 0.0625 p(fOrOlh) + 0.750 p(fOfOrOlh). This example illustrates that the projection vectors need not contain only positive entries. 3 Conclusion We have introduced a predictive state representation for dynamical systems that is grounded in actions and observations and shown that, even in its linear form, it is at least as general and compact as POMDPs. In essence, we have established PSRs as a non-inferior alternative to POMDPs, and suggested that they might have important advantages, while leaving demonstration of those advantages to future work. We conclude by summarizing the potential advantages (to be explored in future work): Learnability. The k-order Markov model is similar to PSRs in that it is entirely based on actions and observations. Such models can be learned trivially from data by counting-it is an open question whether something similar can be done with a PSR. Jaeger (2000) showed how to learn such a model in the uncontrolled setting, but the situation is more complex in the multiple action case since outcomes are conditioned on behavior, violating some required independence assumptions. Compactness. We have shown that there exist linear PSRs no more complex that the minimal POMDP for an environment, but in some cases the minimal linear PSR seems to be much smaller. For example, a POMDP extension of factored MDPs explored by Singh and Cohn (1998) would be cross-products of separate POMDPs and have linear PSRs that increase linearly with the number and size of the component POMDPs, whereas their minimal POMDP representation would grow as the size 2If U = A~BT is the singular value decomposition of U, then B:E+ AT is the pseudoinverse. The pseudoinverse of the diagonal matrix }J replaces each non-zero element with its reciprocal. e; of the state space, Le., exponential in the number of component POMDPs. This (apparent) advantage stems from the PSR's combinatorial or factored structure. As a vector of state variables, capable of taking on diverse values, a PSR may be inherently more powerful than the distribution over discrete states (the belief state) of a POMDP. We have already seen that general PSRs can be more compact than POMDPs; they are also capable of efficiently capturing environments in the diversity representation used by Rivest and Schapire (1994), which is known to provide an extremely compact representation for some environments. Generalization. There are reasons to think that state variables that are themselves predictions may be particularly useful in learning to make other predictions. With so many things to predict, we have in effect a set or sequence of learning problems, all due to the same environment. In many such cases the solutions to earlier problems have been shown to provide features that generalize particularly well to subsequent problems (e.g., Baxter, 2000; Thrun & Pratt, 1998). Powerful, extensible representations. PSRs that predict tests could be generalized to predict the outcomes of multi-step options (e.g., Sutton et al., 1999). In this case, particularly, they would constitute a powerful language for representing the state of complex environments. AcknowledgIllents: We thank Peter Dayan, Lawrence Saul, Fernando Pereira and Rob Schapire for many helpful discussions of these and related ideas. References Baum, L. E., Petrie, T., Soules, G., & Weiss, N. (1970). A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Annals of Mathematical Statistics, 41, 164-171. Baxter, J. (2000). A model of inductive bias learning. Journal of Artificial Intelligence Research, 12, 149-198. Chrisman, L. (1992). Reinforcement learning with perceptual aliasing: The perceptual distinctions approach. Proceedings of the Tenth National Conference on Artificial Intelligence (pp. 183-188). San Jose, California: AAAI Press. Jaeger, H. (2000). Observable operator models for discrete stochastic time series. Neural Computation, 12, 1371-1398. Kaelbling, L. P., Littman, M. L., & Cassandra, A. R. (1998). Planning and acting in ' partially observable stochastic domains. Artificial Intelligence, 101, 99-134. Lovejoy, W. S. (1991). A survey of algorithmic methods for partially observable Markov decision processes. Annals of Operations Research, 28, 47-65. McCallum, A. K. (1995). Reinforcement learning with selective perception and hidden state. Doctoral diss.ertation, Department of Computer Science, University of Rochester. Rivest, R. L., & Schapire, R. E. (1994). Diversity-based inference of finite automata. Journal of the ACM, 41, 555-589. Shatkay, H., & Kaelbling, L. P. (1997). Learning topological maps with weak local odometric information~ Proceedings of Fifteenth International Joint Conference on Artificial Intelligence (IJCAI-91) (pp. 920-929). Singh, S., & Cohn, D. (1998). How to dynamically merge Markov decision processes. Advances in Neural and Information Processing Systems 10 (pp. 1057-1063). Sutton, R. S., Precup, D., & Singh, S. (1999). Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 181-211. Thrun, S., & Pratt, L. (Eds.). (1998). Learning to learn. Kluwer Academic Publishers.

5 0.97662109 107 nips-2001-Latent Dirichlet Allocation

Author: David M. Blei, Andrew Y. Ng, Michael I. Jordan

Abstract: We propose a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams [6], and Hofmann's aspect model , also known as probabilistic latent semantic indexing (pLSI) [3]. In the context of text modeling, our model posits that each document is generated as a mixture of topics, where the continuous-valued mixture proportions are distributed as a latent Dirichlet random variable. Inference and learning are carried out efficiently via variational algorithms. We present empirical results on applications of this model to problems in text modeling, collaborative filtering, and text classification. 1

6 0.96850747 111 nips-2001-Learning Lateral Interactions for Feature Binding and Sensory Segmentation

7 0.94752747 144 nips-2001-Partially labeled classification with Markov random walks

8 0.92244291 66 nips-2001-Efficiency versus Convergence of Boolean Kernels for On-Line Learning Algorithms

9 0.89620656 174 nips-2001-Spike timing and the coding of naturalistic sounds in a central auditory area of songbirds

10 0.87145376 30 nips-2001-Agglomerative Multivariate Information Bottleneck

11 0.85964996 123 nips-2001-Modeling Temporal Structure in Classical Conditioning

12 0.83397543 24 nips-2001-Active Information Retrieval

13 0.82447225 183 nips-2001-The Infinite Hidden Markov Model

14 0.81989151 11 nips-2001-A Maximum-Likelihood Approach to Modeling Multisensory Enhancement

15 0.81889349 100 nips-2001-Iterative Double Clustering for Unsupervised and Semi-Supervised Learning

16 0.8183248 96 nips-2001-Information-Geometric Decomposition in Spike Analysis

17 0.81661344 68 nips-2001-Entropy and Inference, Revisited

18 0.80596852 7 nips-2001-A Dynamic HMM for On-line Segmentation of Sequential Data

19 0.79653138 51 nips-2001-Cobot: A Social Reinforcement Learning Agent

20 0.79635137 55 nips-2001-Convergence of Optimistic and Incremental Q-Learning