nips nips2001 nips2001-3 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Peter Dayan, Angela J. Yu
Abstract: Acetylcholine (ACh) has been implicated in a wide variety of tasks involving attentional processes and plasticity. Following extensive animal studies, it has previously been suggested that ACh reports on uncertainty and controls hippocampal, cortical and cortico-amygdalar plasticity. We extend this view and consider its effects on cortical representational inference, arguing that ACh controls the balance between bottom-up inference, influenced by input stimuli, and top-down inference, influenced by contextual information. We illustrate our proposal using a hierarchical hidden Markov model.
Reference: text
sentIndex sentText sentNum sentScore
1 Following extensive animal studies, it has previously been suggested that ACh reports on uncertainty and controls hippocampal, cortical and cortico-amygdalar plasticity. [sent-9, score-0.366]
2 We extend this view and consider its effects on cortical representational inference, arguing that ACh controls the balance between bottom-up inference, influenced by input stimuli, and top-down inference, influenced by contextual information. [sent-10, score-0.527]
3 We illustrate our proposal using a hierarchical hidden Markov model. [sent-11, score-0.07]
4 1 Introduction The individual and joint computational roles of neuromodulators such as dopamine, serotonin, norepinephrine and acetylcholine are currently the focus of intensive study. [sent-12, score-0.233]
5 5, 7, 9–11, 16, 27 A rich understanding of the effects of neuromodulators on the dynamics of networks has come about through work in invertebrate systems. [sent-13, score-0.102]
6 ACh was one of the first neuromodulators to be attributed a specific role. [sent-16, score-0.074]
7 Hasselmo and colleagues,10, 11 in their seminal work, proposed that cholinergic (and, in their later work, also GABAergic12 ) modulation controls read-in to and readout from recurrent, attractor-like memories, such as area CA3 of the hippocampus. [sent-17, score-0.281]
8 Such memories fail in a characteristic manner if the recurrent connections are operational during storage, thus forcing new input patterns to be mapped to existing memories. [sent-18, score-0.159]
9 During read-in, high levels of ACh would suppress the recurrent synapses, but make them readily plastic, so that new memories would be stored without being pattern-completed. [sent-21, score-0.192]
10 Then, during read-out, low levels of ACh would boost the impact of the recurrent weights (and reduce their plasticity), allowing auto-association to occur. [sent-22, score-0.085]
11 The ACh signal to the hippocampus can be characterized as reporting the unfamiliarity of the input with which its release is associated. [sent-23, score-0.157]
12 This is analogous to its characterization as reporting the uncertainty associated with predictions in theories of attentional influences over learning in classical conditioning. [sent-24, score-0.2]
13 20 We have4 interpreted this in the statistical terms of a Kalman filter, arguing that the ACh signal reported this uncertainty, thus changing plasticity appropriately. [sent-26, score-0.177]
14 In this paper, we take the idea that ACh reports on uncertainty one step farther. [sent-28, score-0.155]
15 There is a wealth of analysis-by-synthesis unsupervised learning models of cortical processing. [sent-29, score-0.106]
16 1, 3, 8, 13, 17, 19, 23 In these, top-down connections instantiate a generative model of sensory input; and bottom-up connections instantiate a recognition model, which is the statistical inverse of the generative model, and maps inputs into categories established in the generative model. [sent-30, score-0.494]
17 These models, at least in principle, permit stimuli to be processed according both to bottom-up input and top-down expectations, the latter being formed based on temporal context or information from other modalities. [sent-31, score-0.101]
18 However, in the face of contextual uncertainty, top-down information is useless. [sent-33, score-0.125]
19 2 Note that this interpretation is broadly consistent with existent electrophysiology data, and documented effects on stimulus processing of drugs that either enhance (eg cholinesterase inhibitors) or suppress (eg scopolamine) the action of ACh. [sent-35, score-0.116]
20 In exact bottom-up, top-down, inference using a generative model, top-down contextual uncertainty does not play a simple role. [sent-37, score-0.637]
21 Rather, all possible contexts are treated simultaneously according to the individual posterior probabilities that they currently pertain. [sent-38, score-0.084]
22 Given the neurobiologically likely scenario in which one set of units has to be used to represent all possible contexts, this exact inferential solution is not possible. [sent-39, score-0.214]
23 Rather, we propose that a single context is represented in the activities of high level (presumably pre-frontal) cortical units, and uncertainty associated with this context is represented by ACh. [sent-40, score-0.382]
24 This cholinergic signal then controls the balance between bottom-up and top-down influences over inference. [sent-41, score-0.377]
25 In the next section, we describe the simple hierarchical generative model that we use to illustrate our proposal. [sent-42, score-0.161]
26 The ACh-based recognition model is introduced in section 3 and discussed in section 4. [sent-43, score-0.076]
27 2 Generative and Recognition Models Figure 1A shows a very simple case of a hierarchical generative model. [sent-44, score-0.159]
28 The generative model is a form of hidden Markov model (HMM), with a discrete hidden state , which will capture the idea of a persistent temporal context, and a twodimensional, real-valued, output . [sent-45, score-0.288]
29 The state is stochastically determined from , and controls which of a set of 2d Gaussians (centered at the corners of the unit square) is used to generate . [sent-47, score-0.158]
30 In this austere case, is the model’s representation of , and the key inference ¤ ¡ ¡ £ ¡ ¢ ¡ £ ¡ ¤ ¡ ¤ £ ¡ £ 4 3 2 1 4 3 2 1 0 200 400 $ " ( &$ " %)%'%#! [sent-48, score-0.177]
31 A) Three-layer model with dynamics ( ) in the layer ( ), a probabilistic mapping ( ) from ( ), and a Gaussian model with means at the corners of the unit square and standard deviation in each direction. [sent-50, score-0.176]
32 H x yY4 w X W 5 U G E C A 9 7 5 H E C A 97 5 YDVITRSBRQPIG FDB@864 s t q b RH U v t s d qp hc 4 d c b @u FfFrigfe%4 Va U t s d qc c 4 d c b 1v F %4 g@H Va ` H generated ¡ £ 4 ¡ ¤ 4 problem will be to determine the distribution over which and itself. [sent-53, score-0.161]
33 The state in the layer stays the same for an average of about timesteps; and then switches to one of the other states, chosen equally at random. [sent-55, score-0.139]
34 The state in the layer is more weakly determined by the state in the layer, with a probability of only that . [sent-57, score-0.207]
35 However, and this is why it is a paradigmatic case for our proposal, contextual information, in this case past experience, can help to determine . [sent-63, score-0.125]
36 We show how the putative effect of ACh in controlling the balance between bottom-up and top-down inference in this model can be used to build a good approximate inference model. [sent-64, score-0.56]
37 ¡ ¤ { z x r¡ ¤ g¡ £ yw ¤ £ ¡ ¤ In order to evaluate our approximate model, we need to understand optimal inference in this case. [sent-65, score-0.536]
38 Figure 2A shows the standard HMM inference model, which calculates the exact posteriors and . [sent-66, score-0.341]
39 Figures 3A;D;E show various aspects of exact inference for a particular run. [sent-69, score-0.34]
40 The histograms in figure 3A show that captures quite well the actual states that generated the data. [sent-70, score-0.128]
41 The upper plot shows the posterior probabilities of the actual states in the sequence – these should be, and are, usually high; the lower histogram the posterior probability of the other possible states; these should be, and are, usually low. [sent-71, score-0.299]
42 Figure 3D shows the actual state sequence ; figure 3E shows the states that are individually most likely at each time step (note that this is not the maximum likelihood state sequence, as found by the Viterbi algorithm, for instance). [sent-72, score-0.34]
43 { z¡ x r¡ g¢ yw x { ¡ z ¡ ¤ yw { z x r¡ g¡ ¤ yw ¤ |¡ 3¤ |¡ B A @ ¨ 7 ¨ ¤ ( 8 ©¡ 6 ' ©¡ ( ¦ ¡ ( ¦ £ ¨¡ ¦¤ © ¡ §£ ¨ 9¡ ( ¦ ¨¡ ¨ ¦¤ © ©¡ §¥£ ¡ ! [sent-73, score-0.861]
44 This is combined with the likelihood term from the data to give the true . [sent-79, score-0.08]
45 B) Bottom-recognition model uses only a generic prior over (which conveys no information), and so the likelihood term dominates. [sent-80, score-0.075]
46 A single estimated is used, in conjunction with its certainty , reported by cholinergic activity, to state produce an approximate prior over (which is a mixture of a delta function and a uniform), and thus an approximate prior over . [sent-82, score-0.404]
47 This is combined with the likelihood to , and a new cholinergic signal is calculated. [sent-83, score-0.303]
48 8 1 w xy X U S QI YW VTRPH 0 0 p i%D 4 hc q hc c b D p igG4D ngF4D Va c g4 1 c ¥U 0. [sent-85, score-0.224]
49 A) Histograms of the exact posterior distriover the actual state (upper) and the other possible states (lower, bution written ). [sent-87, score-0.385]
50 B;C) Comparison (B) and the ACh-based approximation (C) with of the purely bottom up the true across all values of . [sent-89, score-0.108]
51 E) Highest probability state from the exact posterior distribution. [sent-92, score-0.285]
52 c d H uH c H q C n@H VD a c b qc c b e U fH Ra H 4 q 9Hd Va b q b C 3H Va c4 q c b C @H Va D4 Figure 2B shows a purely bottom up model that only uses the likelihood terms to infer the distribution over . [sent-94, score-0.21]
53 Figure 3B shows the representational performance of this against the exact posterior . [sent-96, score-0.357]
54 If model, through a scatter-plot of bottom-up inference was correct, then all the points would lie on the line of equality – the bow-shape shows that purely bottom-up inference is relatively poor. [sent-97, score-0.455]
55 Figure 4C shows this in a different way, indicating the difference between the average summed log probabilities of the actual states under the bottom up model and those under the true posterior. [sent-98, score-0.233]
56 The larger and more negative the difference, the worse the approximate inference. [sent-99, score-0.072]
57 Averaging over runs, the difference is log units (compared with a total log likelihood under the exact model of ). [sent-100, score-0.302]
58 i { z jr r¡ ¤ g¡ £ x h g i x f { ¡ £ z ¡ ¤ iuw x { ¡ z ¡ ¤ yw h n Rm hl o 1qpm ¡ ¤ x f { ¡ £ z ¡ ¤ kw h h h i1l 3 ACh Inference Model Figure 2C shows the ACh-based approximate inference model. [sent-101, score-0.567]
59 The information about the context comes in the form of two quantities: , the approximated contextual state having seen , and , which is the measure of uncertainty in that contextual state. [sent-102, score-0.488]
60 The idea is that is reported by ACh, and is used to control (indicated by the filled-in ellipse) the extent to which top-down information based is used to influence inference about . [sent-103, score-0.177]
61 As expected, ACh is generally high at times when the true state is changing, and decreases during the periods that is constant. [sent-105, score-0.106]
62 This is just the putative inferential effect of ACh. [sent-107, score-0.085]
63 However, the ACh signal of figure 4A was calculated assuming knowledge of the true posterior, which is unreasonable. [sent-108, score-0.107]
64 The model of figure 2C includes the key approximation that the only other information from about the state of is in . [sent-109, score-0.101]
65 The last two lines show the information that is propagated to the next time step; equation 6 shows the representational answer from the model, the distribution over given . [sent-112, score-0.172]
66 If high, then the input stimulus controlled, likelihood term dominates in the conditioning process (equation 5); if is low, then temporal context ( ) and the likelihood terms balance. [sent-115, score-0.207]
67 One potentially dangerous aspect of this inference procedure is that it might get unreasonably committed to a single state because it does not represent explicitly the probability accorded to the other possible values of given . [sent-116, score-0.245]
68 A natural way to avoid this is to bound the ACh level from below by a constant, , making approximate inference slightly more stimulus-bound than exact inference. [sent-117, score-0.43]
69 In practice, rather than use equation 9, we use ¡ ¡ ¤ ©¡ ¡ ¡ ¥¡ ¡ ¡ ¤ ¥¡ ¥¡ ddd %% ¡ ¥¡ A{ ¡ F¡ 1 yw l z x ¡ 9 (10) ¦¤ ¢ m l 9 m l §¥£0 £#0@9 6¡ ¡ ¥¡ 1 B @ 8 64 2 ¤ ¦ ) ( & $ £ ! [sent-119, score-0.326]
70 A) ACh level from the exact posterior for one run. [sent-134, score-0.265]
71 B) ACh level in the approximate model in the same run. [sent-135, score-0.153]
72 C) Solid: the mean extra representational cost for the true state over that in the exact posterior using the ACh model as a function of the minimum allowed ACh level . [sent-137, score-0.513]
73 Dashed: the same quantity for the pure bottom-up model (which is equivalent to the approximate ). [sent-138, score-0.105]
74 y c c E H 9 y d ss f1s 9 Figure 4B shows the approximate ACh level for the same case as figure 4A, using . [sent-140, score-0.151]
75 Although the detailed value of this signal is clearly different from that arising from an exact knowledge of the posterior probabilities (in figure 4A), the gross movements are quite similar. [sent-141, score-0.286]
76 Figure 3C shows that the ACh-based approximate posterior values are much closer to the true values than for the purely bottom-up model, particularly for values of near and , where most data lie. [sent-143, score-0.295]
77 Figure 3F shows that inference about is noisy, but the pattern of true values is certainly visible. [sent-144, score-0.246]
78 Figure 4C shows the effect of changing on the quality of inference about the true states . [sent-145, score-0.304]
79 This shows differences between approximate and exact log probabilities of the true states , averaged over cases. [sent-146, score-0.363]
80 If , then inference is completely stimulus-bound, just like the purely bottom-up model; values of less than appear to do well for this and other settings of the parameters of the problem. [sent-147, score-0.247]
81 An upper bound on the performance of approximate inference can be calculated in three steps by: i) using the exact posterior to work out and , as in equation 2, and iii) using this ii) using these values to approximate approximate distribution in equation 4 and the remaining equations. [sent-148, score-0.61]
82 The average resulting cost (ie the average resulting difference from the log probability under exact inference) is log units. [sent-149, score-0.195]
83 We used the example of a hierarchical HMM in which representational inference for a middle layer should correctly reflect such a balance, and showed that a simple model of the drive and effects of ACh leads to competent inference. [sent-152, score-0.456]
84 In particular, it uses a localist representation for the state , and so exact inference would be feasible. [sent-154, score-0.378]
85 In a more realistic case, distributed representations would be used at multiple levels in the hierarchy, and so only one single context could be entertained at once. [sent-155, score-0.09]
86 Then, it would also not be possible to represent the degree of uncertainty using the level of activities of the units representing the context, at least given a population-coded representation. [sent-156, score-0.192]
87 It would also be necessary to modify the steps in equations 4 and 5, since it would be hard to represent the joint uncertainty over representations at multiple levels in the hierarchy. [sent-157, score-0.144]
88 Nevertheless, our model shows the feasibility of using an ACh signal in helping propagate and use approximate information over time. [sent-158, score-0.205]
89 Since it is straightforward to administer cholinergic agonists and antagonists, there are many ways to test aspects of this proposal. [sent-159, score-0.222]
90 Preliminary simulation studies indicate that a hidden Markov model under controllable cholinergic modulation can capture several aspects of existent data on animal signal detection tasks. [sent-161, score-0.525]
91 [6] Everitt, BJ & Robbins, TW (1997) Central cholinergic systems and cognition. [sent-177, score-0.192]
92 [9] Hasselmo, ME (1995) Neuromodulation and cortical function: Modeling the physiological basis of behavior. [sent-183, score-0.106]
93 [10] Hasselmo, M (1999) Neuromodulation: acetylcholine and memory consolidation. [sent-185, score-0.128]
94 [12] Hasselmo, ME, Wyble, BP & Wallenstein, GV (1996) Encoding and retrieval of episodic memories: Role of cholinergic and GABAergic modulation in the hippocampus. [sent-189, score-0.23]
95 [15] Holland, PC & Gallagher, M (1999) Amygdala circuitry in attentional and representational processes. [sent-196, score-0.158]
96 Behavioral vigilance following infusions of 192 IgG=saporin into the basal forebrain: selectivity of the behavioral impairment and relation to cortical AChE-positive fiber density. [sent-206, score-0.148]
97 [23] Rao, RPN & Ballard, DH (1997) Dynamic model of visual recognition predicts neural response properties in the visual cortex. [sent-217, score-0.142]
98 [24] Ress, D, Backus, BT & Heeger, DJ (2000) Activity in primary visual cortex predicts performance in a visual detection task. [sent-219, score-0.094]
99 [25] Sarter, M, Bruno, JP (1997) Cognitive functions of cortical acetylcholine: Toward a unifying hypothesis. [sent-221, score-0.106]
100 [26] Schultz, W (1998) Predictive reward signal of dopamine neurons. [sent-223, score-0.123]
wordName wordTfidf (topN-words)
[('ach', 0.634), ('yw', 0.287), ('cholinergic', 0.192), ('inference', 0.177), ('va', 0.143), ('exact', 0.133), ('acetylcholine', 0.128), ('contextual', 0.125), ('uncertainty', 0.112), ('dayan', 0.111), ('representational', 0.109), ('cortical', 0.106), ('hasselmo', 0.104), ('hc', 0.097), ('generative', 0.09), ('neuromodulation', 0.085), ('posterior', 0.084), ('neuromodulators', 0.074), ('approximate', 0.072), ('layer', 0.071), ('purely', 0.07), ('signal', 0.069), ('state', 0.068), ('memories', 0.068), ('balance', 0.065), ('conditioning', 0.065), ('plasticity', 0.065), ('holland', 0.064), ('states', 0.058), ('kakade', 0.058), ('uw', 0.058), ('context', 0.058), ('gure', 0.056), ('dopamine', 0.054), ('animal', 0.054), ('recurrent', 0.053), ('controls', 0.051), ('existent', 0.049), ('inferential', 0.049), ('sarter', 0.049), ('hippocampus', 0.049), ('attentional', 0.049), ('qi', 0.048), ('level', 0.048), ('stimuli', 0.043), ('recognition', 0.043), ('reports', 0.043), ('trends', 0.043), ('arguing', 0.043), ('ber', 0.043), ('helmholtz', 0.043), ('montague', 0.043), ('ress', 0.043), ('actual', 0.042), ('likelihood', 0.042), ('behavioral', 0.042), ('uences', 0.042), ('corners', 0.039), ('ddd', 0.039), ('reporting', 0.039), ('schultz', 0.039), ('suppress', 0.039), ('connections', 0.038), ('true', 0.038), ('modulation', 0.038), ('hierarchical', 0.038), ('neuroscience', 0.036), ('heeger', 0.036), ('instantiate', 0.036), ('jm', 0.036), ('marginalization', 0.036), ('pavlovian', 0.036), ('putative', 0.036), ('bj', 0.034), ('crucially', 0.034), ('qc', 0.034), ('overlap', 0.033), ('visual', 0.033), ('model', 0.033), ('fh', 0.032), ('propagated', 0.032), ('yu', 0.032), ('hidden', 0.032), ('units', 0.032), ('hmm', 0.032), ('levels', 0.032), ('shows', 0.031), ('roles', 0.031), ('ic', 0.031), ('pc', 0.031), ('uenced', 0.031), ('log', 0.031), ('aspects', 0.03), ('qp', 0.03), ('xy', 0.03), ('gatsby', 0.029), ('effects', 0.028), ('detection', 0.028), ('eg', 0.028), ('histograms', 0.028)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999863 3 nips-2001-ACh, Uncertainty, and Cortical Inference
Author: Peter Dayan, Angela J. Yu
Abstract: Acetylcholine (ACh) has been implicated in a wide variety of tasks involving attentional processes and plasticity. Following extensive animal studies, it has previously been suggested that ACh reports on uncertainty and controls hippocampal, cortical and cortico-amygdalar plasticity. We extend this view and consider its effects on cortical representational inference, arguing that ACh controls the balance between bottom-up inference, influenced by input stimuli, and top-down inference, influenced by contextual information. We illustrate our proposal using a hierarchical hidden Markov model.
2 0.10749594 123 nips-2001-Modeling Temporal Structure in Classical Conditioning
Author: Aaron C. Courville, David S. Touretzky
Abstract: The Temporal Coding Hypothesis of Miller and colleagues [7] suggests that animals integrate related temporal patterns of stimuli into single memory representations. We formalize this concept using quasi-Bayes estimation to update the parameters of a constrained hidden Markov model. This approach allows us to account for some surprising temporal effects in the second order conditioning experiments of Miller et al. [1 , 2, 3], which other models are unable to explain. 1
3 0.10432909 54 nips-2001-Contextual Modulation of Target Saliency
Author: Antonio Torralba
Abstract: The most popular algorithms for object detection require the use of exhaustive spatial and scale search procedures. In such approaches, an object is defined by means of local features. fu this paper we show that including contextual information in object detection procedures provides an efficient way of cutting down the need for exhaustive search. We present results with real images showing that the proposed scheme is able to accurately predict likely object classes, locations and sizes. 1
4 0.094692074 115 nips-2001-Linear-time inference in Hierarchical HMMs
Author: Kevin P. Murphy, Mark A. Paskin
Abstract: The hierarchical hidden Markov model (HHMM) is a generalization of the hidden Markov model (HMM) that models sequences with structure at many length/time scales [FST98]. Unfortunately, the original infertime, where is ence algorithm is rather complicated, and takes the length of the sequence, making it impractical for many domains. In this paper, we show how HHMMs are a special kind of dynamic Bayesian network (DBN), and thereby derive a much simpler inference algorithm, which only takes time. Furthermore, by drawing the connection between HHMMs and DBNs, we enable the application of many standard approximation techniques to further speed up inference. ¥ ©§ £ ¨¦¥¤¢ © £ ¦¥¤¢
5 0.093911193 126 nips-2001-Motivated Reinforcement Learning
Author: Peter Dayan
Abstract: The standard reinforcement learning view of the involvement of neuromodulatory systems in instrumental conditioning includes a rather straightforward conception of motivation as prediction of sum future reward. Competition between actions is based on the motivating characteristics of their consequent states in this sense. Substantial, careful, experiments reviewed in Dickinson & Balleine, 12,13 into the neurobiology and psychology of motivation shows that this view is incomplete. In many cases, animals are faced with the choice not between many different actions at a given state, but rather whether a single response is worth executing at all. Evidence suggests that the motivational process underlying this choice has different psychological and neural properties from that underlying action choice. We describe and model these motivational systems, and consider the way they interact.
6 0.084706292 183 nips-2001-The Infinite Hidden Markov Model
7 0.081362538 43 nips-2001-Bayesian time series classification
8 0.071235433 172 nips-2001-Speech Recognition using SVMs
9 0.070475824 68 nips-2001-Entropy and Inference, Revisited
10 0.069599375 162 nips-2001-Relative Density Nets: A New Way to Combine Backpropagation with HMM's
11 0.065336905 150 nips-2001-Probabilistic Inference of Hand Motion from Neural Activity in Motor Cortex
12 0.063467406 52 nips-2001-Computing Time Lower Bounds for Recurrent Sigmoidal Neural Networks
13 0.062141478 39 nips-2001-Audio-Visual Sound Separation Via Hidden Markov Models
14 0.059840485 37 nips-2001-Associative memory in realistic neuronal networks
15 0.059206557 188 nips-2001-The Unified Propagation and Scaling Algorithm
16 0.058552708 95 nips-2001-Infinite Mixtures of Gaussian Process Experts
17 0.057744693 2 nips-2001-3 state neurons for contextual processing
18 0.056955241 21 nips-2001-A Variational Approach to Learning Curves
19 0.056046981 12 nips-2001-A Model of the Phonological Loop: Generalization and Binding
20 0.054294929 82 nips-2001-Generating velocity tuning by asymmetric recurrent connections
topicId topicWeight
[(0, -0.188), (1, -0.13), (2, -0.004), (3, -0.038), (4, -0.136), (5, -0.043), (6, 0.022), (7, -0.01), (8, -0.021), (9, 0.011), (10, -0.029), (11, 0.111), (12, 0.013), (13, 0.001), (14, -0.043), (15, 0.007), (16, 0.042), (17, 0.013), (18, 0.104), (19, -0.038), (20, 0.046), (21, -0.009), (22, -0.069), (23, -0.008), (24, -0.023), (25, 0.04), (26, -0.063), (27, 0.077), (28, -0.013), (29, -0.017), (30, -0.028), (31, 0.038), (32, 0.017), (33, -0.056), (34, 0.043), (35, -0.159), (36, -0.106), (37, -0.033), (38, -0.012), (39, -0.009), (40, -0.015), (41, -0.012), (42, 0.007), (43, 0.025), (44, -0.01), (45, 0.061), (46, -0.049), (47, -0.178), (48, -0.08), (49, -0.161)]
simIndex simValue paperId paperTitle
same-paper 1 0.94663334 3 nips-2001-ACh, Uncertainty, and Cortical Inference
Author: Peter Dayan, Angela J. Yu
Abstract: Acetylcholine (ACh) has been implicated in a wide variety of tasks involving attentional processes and plasticity. Following extensive animal studies, it has previously been suggested that ACh reports on uncertainty and controls hippocampal, cortical and cortico-amygdalar plasticity. We extend this view and consider its effects on cortical representational inference, arguing that ACh controls the balance between bottom-up inference, influenced by input stimuli, and top-down inference, influenced by contextual information. We illustrate our proposal using a hierarchical hidden Markov model.
2 0.63015676 123 nips-2001-Modeling Temporal Structure in Classical Conditioning
Author: Aaron C. Courville, David S. Touretzky
Abstract: The Temporal Coding Hypothesis of Miller and colleagues [7] suggests that animals integrate related temporal patterns of stimuli into single memory representations. We formalize this concept using quasi-Bayes estimation to update the parameters of a constrained hidden Markov model. This approach allows us to account for some surprising temporal effects in the second order conditioning experiments of Miller et al. [1 , 2, 3], which other models are unable to explain. 1
3 0.61471057 115 nips-2001-Linear-time inference in Hierarchical HMMs
Author: Kevin P. Murphy, Mark A. Paskin
Abstract: The hierarchical hidden Markov model (HHMM) is a generalization of the hidden Markov model (HMM) that models sequences with structure at many length/time scales [FST98]. Unfortunately, the original infertime, where is ence algorithm is rather complicated, and takes the length of the sequence, making it impractical for many domains. In this paper, we show how HHMMs are a special kind of dynamic Bayesian network (DBN), and thereby derive a much simpler inference algorithm, which only takes time. Furthermore, by drawing the connection between HHMMs and DBNs, we enable the application of many standard approximation techniques to further speed up inference. ¥ ©§ £ ¨¦¥¤¢ © £ ¦¥¤¢
4 0.56017661 183 nips-2001-The Infinite Hidden Markov Model
Author: Matthew J. Beal, Zoubin Ghahramani, Carl E. Rasmussen
Abstract: We show that it is possible to extend hidden Markov models to have a countably infinite number of hidden states. By using the theory of Dirichlet processes we can implicitly integrate out the infinitely many transition parameters, leaving only three hyperparameters which can be learned from data. These three hyperparameters define a hierarchical Dirichlet process capable of capturing a rich set of transition dynamics. The three hyperparameters control the time scale of the dynamics, the sparsity of the underlying state-transition matrix, and the expected number of distinct hidden states in a finite sequence. In this framework it is also natural to allow the alphabet of emitted symbols to be infinite— consider, for example, symbols being possible words appearing in English text.
5 0.54944932 126 nips-2001-Motivated Reinforcement Learning
Author: Peter Dayan
Abstract: The standard reinforcement learning view of the involvement of neuromodulatory systems in instrumental conditioning includes a rather straightforward conception of motivation as prediction of sum future reward. Competition between actions is based on the motivating characteristics of their consequent states in this sense. Substantial, careful, experiments reviewed in Dickinson & Balleine, 12,13 into the neurobiology and psychology of motivation shows that this view is incomplete. In many cases, animals are faced with the choice not between many different actions at a given state, but rather whether a single response is worth executing at all. Evidence suggests that the motivational process underlying this choice has different psychological and neural properties from that underlying action choice. We describe and model these motivational systems, and consider the way they interact.
6 0.54625523 18 nips-2001-A Rational Analysis of Cognitive Control in a Speeded Discrimination Task
7 0.48531878 160 nips-2001-Reinforcement Learning and Time Perception -- a Model of Animal Experiments
8 0.46482566 162 nips-2001-Relative Density Nets: A New Way to Combine Backpropagation with HMM's
9 0.45671135 68 nips-2001-Entropy and Inference, Revisited
10 0.43806109 11 nips-2001-A Maximum-Likelihood Approach to Modeling Multisensory Enhancement
11 0.42591491 43 nips-2001-Bayesian time series classification
12 0.41367006 124 nips-2001-Modeling the Modulatory Effect of Attention on Human Spatial Vision
13 0.40617251 172 nips-2001-Speech Recognition using SVMs
14 0.40588999 54 nips-2001-Contextual Modulation of Target Saliency
15 0.40173039 17 nips-2001-A Quantitative Model of Counterfactual Reasoning
16 0.39952716 161 nips-2001-Reinforcement Learning with Long Short-Term Memory
17 0.384278 2 nips-2001-3 state neurons for contextual processing
18 0.37751222 148 nips-2001-Predictive Representations of State
19 0.3771567 153 nips-2001-Product Analysis: Learning to Model Observations as Products of Hidden Variables
20 0.35943869 133 nips-2001-On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes
topicId topicWeight
[(5, 0.271), (14, 0.033), (17, 0.022), (19, 0.044), (27, 0.105), (30, 0.079), (38, 0.026), (59, 0.031), (63, 0.019), (72, 0.059), (79, 0.075), (83, 0.031), (91, 0.128)]
simIndex simValue paperId paperTitle
same-paper 1 0.8159771 3 nips-2001-ACh, Uncertainty, and Cortical Inference
Author: Peter Dayan, Angela J. Yu
Abstract: Acetylcholine (ACh) has been implicated in a wide variety of tasks involving attentional processes and plasticity. Following extensive animal studies, it has previously been suggested that ACh reports on uncertainty and controls hippocampal, cortical and cortico-amygdalar plasticity. We extend this view and consider its effects on cortical representational inference, arguing that ACh controls the balance between bottom-up inference, influenced by input stimuli, and top-down inference, influenced by contextual information. We illustrate our proposal using a hierarchical hidden Markov model.
2 0.78162491 74 nips-2001-Face Recognition Using Kernel Methods
Author: Ming-Hsuan Yang
Abstract: Principal Component Analysis and Fisher Linear Discriminant methods have demonstrated their success in face detection, recognition, and tracking. The representation in these subspace methods is based on second order statistics of the image set, and does not address higher order statistical dependencies such as the relationships among three or more pixels. Recently Higher Order Statistics and Independent Component Analysis (ICA) have been used as informative low dimensional representations for visual recognition. In this paper, we investigate the use of Kernel Principal Component Analysis and Kernel Fisher Linear Discriminant for learning low dimensional representations for face recognition, which we call Kernel Eigenface and Kernel Fisherface methods. While Eigenface and Fisherface methods aim to find projection directions based on the second order correlation of samples, Kernel Eigenface and Kernel Fisherface methods provide generalizations which take higher order correlations into account. We compare the performance of kernel methods with Eigenface, Fisherface and ICA-based methods for face recognition with variation in pose, scale, lighting and expression. Experimental results show that kernel methods provide better representations and achieve lower error rates for face recognition. 1 Motivation and Approach Subspace methods have been applied successfully in numerous visual recognition tasks such as face localization, face recognition, 3D object recognition, and tracking. In particular, Principal Component Analysis (PCA) [20] [13] ,and Fisher Linear Discriminant (FLD) methods [6] have been applied to face recognition with impressive results. While PCA aims to extract a subspace in which the variance is maximized (or the reconstruction error is minimized), some unwanted variations (due to lighting, facial expressions, viewing points, etc.) may be retained (See [8] for examples). It has been observed that in face recognition the variations between the images of the same face due to illumination and viewing direction are almost always larger than image variations due to the changes in face identity [1]. Therefore, while the PCA projections are optimal in a correlation sense (or for reconstruction
3 0.60352284 13 nips-2001-A Natural Policy Gradient
Author: Sham M. Kakade
Abstract: We provide a natural gradient method that represents the steepest descent direction based on the underlying structure of the parameter space. Although gradient methods cannot make large changes in the values of the parameters, we show that the natural gradient is moving toward choosing a greedy optimal action rather than just a better action. These greedy optimal actions are those that would be chosen under one improvement step of policy iteration with approximate, compatible value functions, as defined by Sutton et al. [9]. We then show drastic performance improvements in simple MDPs and in the more challenging MDP of Tetris. 1
4 0.6003406 162 nips-2001-Relative Density Nets: A New Way to Combine Backpropagation with HMM's
Author: Andrew D. Brown, Geoffrey E. Hinton
Abstract: Logistic units in the first hidden layer of a feedforward neural network compute the relative probability of a data point under two Gaussians. This leads us to consider substituting other density models. We present an architecture for performing discriminative learning of Hidden Markov Models using a network of many small HMM's. Experiments on speech data show it to be superior to the standard method of discriminatively training HMM's. 1
5 0.59516323 161 nips-2001-Reinforcement Learning with Long Short-Term Memory
Author: Bram Bakker
Abstract: This paper presents reinforcement learning with a Long ShortTerm Memory recurrent neural network: RL-LSTM. Model-free RL-LSTM using Advantage(,x) learning and directed exploration can solve non-Markovian tasks with long-term dependencies between relevant events. This is demonstrated in a T-maze task, as well as in a difficult variation of the pole balancing task. 1
6 0.59445882 56 nips-2001-Convolution Kernels for Natural Language
7 0.59362781 183 nips-2001-The Infinite Hidden Markov Model
8 0.59296751 95 nips-2001-Infinite Mixtures of Gaussian Process Experts
9 0.59278947 131 nips-2001-Neural Implementation of Bayesian Inference in Population Codes
11 0.59164518 169 nips-2001-Small-World Phenomena and the Dynamics of Information
12 0.59122622 132 nips-2001-Novel iteration schemes for the Cluster Variation Method
13 0.59075701 89 nips-2001-Grouping with Bias
14 0.59062713 52 nips-2001-Computing Time Lower Bounds for Recurrent Sigmoidal Neural Networks
15 0.59058195 27 nips-2001-Activity Driven Adaptive Stochastic Resonance
16 0.59025353 7 nips-2001-A Dynamic HMM for On-line Segmentation of Sequential Data
17 0.58936381 192 nips-2001-Tree-based reparameterization for approximate inference on loopy graphs
18 0.589252 160 nips-2001-Reinforcement Learning and Time Perception -- a Model of Animal Experiments
19 0.58605146 29 nips-2001-Adaptive Sparseness Using Jeffreys Prior
20 0.58554029 46 nips-2001-Categorization by Learning and Combining Object Parts