nips nips2001 nips2001-87 knowledge-graph by maker-knowledge-mining

87 nips-2001-Group Redundancy Measures Reveal Redundancy Reduction in the Auditory Pathway


Source: pdf

Author: Gal Chechik, Amir Globerson, M. J. Anderson, E. D. Young, Israel Nelken, Naftali Tishby

Abstract: The way groups of auditory neurons interact to code acoustic information is investigated using an information theoretic approach. We develop measures of redundancy among groups of neurons, and apply them to the study of collaborative coding efficiency in two processing stations in the auditory pathway: the inferior colliculus (IC) and the primary auditory cortex (AI). Under two schemes for the coding of the acoustic content, acoustic segments coding and stimulus identity coding, we show differences both in information content and group redundancies between IC and AI neurons. These results provide for the first time a direct evidence for redundancy reduction along the ascending auditory pathway, as has been hypothesized for theoretical considerations [Barlow 1959,2001]. The redundancy effects under the single-spikes coding scheme are significant only for groups larger than ten cells, and cannot be revealed with the redundancy measures that use only pairs of cells. The results suggest that the auditory system transforms low level representations that contain redundancies due to the statistical structure of natural stimuli, into a representation in which cortical neurons extract rare and independent component of complex acoustic signals, that are useful for auditory scene analysis. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 We develop measures of redundancy among groups of neurons, and apply them to the study of collaborative coding efficiency in two processing stations in the auditory pathway: the inferior colliculus (IC) and the primary auditory cortex (AI). [sent-7, score-2.142]

2 Under two schemes for the coding of the acoustic content, acoustic segments coding and stimulus identity coding, we show differences both in information content and group redundancies between IC and AI neurons. [sent-8, score-1.018]

3 These results provide for the first time a direct evidence for redundancy reduction along the ascending auditory pathway, as has been hypothesized for theoretical considerations [Barlow 1959,2001]. [sent-9, score-1.106]

4 The redundancy effects under the single-spikes coding scheme are significant only for groups larger than ten cells, and cannot be revealed with the redundancy measures that use only pairs of cells. [sent-10, score-1.796]

5 1 Introduction How do groups of sensory neurons interact to code information and how do these interactions change along the ascending sensory pathways? [sent-12, score-1.052]

6 According to the a common view, sensory systems are composed of a series of processing stations, representing more and more complex aspects of sensory inputs. [sent-13, score-0.311]

7 The changes in representations of stimuli along the sensory pathway reflect the information processing performed by the system. [sent-14, score-0.469]

8 Several computational principles that govern these changes were suggested, such as information maximization and redundancy reduction [2, 3, 11]. [sent-15, score-0.646]

9 In order to investigate such changes in practice, it is necessary to develop methods to quantify information content and redundancies among groups of neurons, and trace these measures along the sensory pathway. [sent-16, score-0.926]

10 Interactions and high order correlations between neurons were mostly investigated within single brain areas on the level of pairs of cells (but also for larger groups of cells [9]) showing both synergistic and redundant interactions [8, 10, 21, 6, 7, 13]. [sent-17, score-1.216]

11 The current study develops information theoretic redundancy measures for larger groups of neurons , focusing on the case of stimulus-conditioned independence. [sent-18, score-1.313]

12 We then compare these measures in electro-physiological recordings from two auditory stations: the auditory mid-brain and the primary auditory cortex. [sent-19, score-1.12]

13 2 Redundancy measures for groups of neurons To investigate high order correlations and interactions within groups of neurons we start by defining information measures for groups of cells and then develop information redundancy measures for such groups. [sent-20, score-2.635]

14 The properties of these measures are then further discussed for the specific case of stimulus-conditioned independence. [sent-21, score-0.151]

15 Formally, the level of independence of two variables X and Y is commonly quantified by their mutual information (MI) [17,5]. [sent-22, score-0.134]

16 For larger groups of cells, an important generalized measure quantifies the information that several variables provide about each other. [sent-24, score-0.378]

17 This multi information measure [18] is defined by (2) Similar to the mutual information case, the multi information measures how close the joint distribution is to the factorization by the marginals. [sent-25, score-0.453]

18 It thus vanishes when variables are independent and is otherwise positive. [sent-26, score-0.072]

19 We now turn to develop measures for group redundancies. [sent-27, score-0.232]

20 Consider first the simple case of a pair of neurons (Xl, X 2 ) conveying information about the stimulus S. [sent-28, score-0.445]

21 In this case, the redundancy-synergy index ([4, 7]) is defined by (3) Intuitively, RSpairs measures the amount of information on the stimulus S gained by observing the joint distribution of both Xl and X 2 , as compared with observing the two cells independently. [sent-29, score-0.548]

22 In the extreme case where Xl = X 2 , the two cells are completely redundant and provide the same information about the stimulus, yielding RSpairs = I(Xl' X 2 ; S) - I(X l ; S) - I(X2 ; S) = -I(Xl; S), which is always non-positive. [sent-30, score-0.277]

23 On the other hand, positive RSpairs values testify for synergistic interaction between Xl and X 2 ([8, 7, 4]). [sent-31, score-0.057]

24 For larger groups of neurons, several different measures of redundancy-synergy may be considered, that encompass different levels of interactions. [sent-32, score-0.398]

25 For example, one can quantify the residual information obtained from a group of N neurons compared to all its N - 1 subgroups. [sent-33, score-0.353]

26 As with inclusion-exclusion calculations this measure takes the form of a telescopic sum: RSNIN-l = I(XN; S) - L{X N -l} I(X N -\ S) + . [sent-34, score-0.066]

27 Unfortunately, this measure involves 2N information terms, making its calculation infeasible even for moderate N values 1. [sent-38, score-0.064]

28 A different RS measure quantifies the information embodied in the joint distribution of N neurons compared to that provided by N single independent neurons, and is defined by N RSN ll = I(Xl ' . [sent-39, score-0.489]

29 : I(Xi ; S) (4) i=l Interestingly, this synergy-redundancy measure may be rewritten as the difference between two multi-information terms N I(Xl ' . [sent-44, score-0.032]

30 We conclude that the index RSN ll can be separated into two terms: one that is always non-negative, and measures the coding synergy, and the second which is always non-positive and quantifies the redundancy. [sent-64, score-0.404]

31 These two terms correspond to two types of interactions between neurons: The first type are within-stimulus correlations (sometimes termed noise correlations) that emerge from functional connections between neurons and contribute to synergy. [sent-65, score-0.417]

32 The second type are between stimulus correlations (or across stimulus correlations) that reflect the fact that the cells have similar responses per stimulus, and contribute to redundancy. [sent-66, score-0.593]

33 Being interested in the latter type of correlations, we limit the discussion to the redundancy term -I(Xl; . [sent-67, score-0.552]

34 ; XN)' Formulating RSN ll as in equation 5 proves highly useful when neural activities are independent given the stimulus P(XIS) = II~l P(XiIS). [sent-70, score-0.256]

35 In this case, the first (synergy) term vanishes , thus limiting neural interactions to the redundant lOur results below suggest that some redundancy effects become significant only for groups larger than 10-15 cells. [sent-71, score-1.144]

36 2When comparing redundancy in different processing stations, one must consider the effects of the baseline information conveyed by single neurons. [sent-72, score-0.67]

37 We thus use the normalized redundancy (compare with [1 5] p. [sent-73, score-0.552]

38 More importantly, under the independence assumption we only have to estimate the marginal distributions P(XiIS = s) for each stimulus s instead of the full distribution P(XIS = s). [sent-79, score-0.196]

39 It thus allows to estimate an exponentially smaller number of parameters, which in our case of small sample sizes, provides more accurate information estimates. [sent-80, score-0.032]

40 This approximation makes it possible to investigate redundancy among considerably larger groups of neurons than the 2-3 neuron groups considered previously in the literature. [sent-81, score-1.348]

41 It is a good approximation whenever neuronal activity is mostly determined by the presented stimulus and to a lesser extent by interactions with nearby neurons. [sent-83, score-0.287]

42 A possible example is the high input regime of cortical neurons receiving thousands of inputs, where a single input has only a limited influence on the activity of the target cell. [sent-84, score-0.409]

43 One should note however, that stimulus-conditioned independence is implicitly assumed in analysis of non-simultaneously recorded data. [sent-88, score-0.077]

44 To summarize, the stimulus-conditioned independence assumption limits interactions to the redundant regime, but allows to compare the extent of redundancy among large groups of cells in different brain areas. [sent-89, score-1.222]

45 Neural activity was recorded non-simultaneously from a total of 6 different animals responding to a set of complex natural and modified stimuli. [sent-91, score-0.117]

46 Because cortical auditory neurons respond differently to simple and complex stimuli [12 , 1], we refrain from using artificial over-simplified acoustic stimuli but instead use a set of stimuli based on bird vocalizations which contains complex 'real-life' acoustic features. [sent-92, score-1.336]

47 e "1i E '" 20 40 60 80 time (milliseconds) 100 20 40 60 80 100 time (milliseconds) Figure 1: A representative stimulus containing a short bird vocalization recorded in a natural environment. [sent-96, score-0.292]

48 The set of stimuli consisted of similar natural and modified recordings. [sent-97, score-0.1]

49 4 Experimental Results In practice, in order to estimate the information conveyed by neural activity from limited data, one must assume a decoding procedure, such as focusing on a simple statistic of the spike trains that encompasses some of its informative properties. [sent-101, score-0.272]

50 In this paper we consider two extreme cases: coding short acoustic segments with single spikes and coding the stimulus identity with spike counts in a long window. [sent-102, score-0.686]

51 In addition, we estimated information and redundancy obtained with two other statistics. [sent-103, score-0.584]

52 First, the latency of the first spike after stimulus onset, and secondly, a statistic which generalizes the counts statistics for a general renewal process [19]. [sent-104, score-0.28]

53 These calculations yielded higher information content on average, but similar redundancies as presented below. [sent-105, score-0.272]

54 7 '-------=---~--~---~--~ 5 10 number of cells 15 20 Discussion We have developed information theoretic measures of redundancy among groups of neurons and applied them to investigate the collaborative coding efficiency in the auditory modality. [sent-120, score-2.023]

55 Under two different coding paradigms, we show differences in both information content and group redundancies between Ie and cortical auditory neurons. [sent-121, score-0.813]

56 Single Ie neurons carry more information about the presented stimulus, but are also more redundant. [sent-122, score-0.287]

57 On the other hand, auditory cortical neurons carry less information but are more independent, thus allowing information to be summed almost linearly when considering groups of few tens of neurons. [sent-123, score-0.914]

58 The results provide for the first time direct evidence for redundancy reduction along the ascending auditory pathway, as has been hypothesized by Barlow [2, 3]. [sent-124, score-1.106]

59 The redundancy effects under the single-spikes coding paradigm are significant only for groups larger than ten cells, and cannot be revealed with the standard redundancy measures that use only pairs of cells. [sent-125, score-1.796]

60 Our results suggest that transformations leading to redundancy reduction are not limited to low level sensory processing (aimed to reduce redundancy in input statistics) but are applied even at cortical sensory stations. [sent-126, score-1.589]

61 We suggest that an essential experimental prerequisite to reveal these effects is the use of complex acoustic stimuli whose processing occurs at high level processing stations. [sent-127, score-0.415]

62 The above findings are in agreement with the view that along the ascending sensory pathways, the number of neurons increase, their firing rates decrease , and neurons become tuned to more complex and independent features. [sent-128, score-0.882]

63 Together, these suggest that the neural representation is mapped into a representation with higher effective dimensionality. [sent-129, score-0.065]

64 Interestingly, recent advances in kernel-methods learning [20] have shown that nonlinear mapping into higher dimension and over-complete representations may be useful for learning of complex classifications. [sent-130, score-0.041]

65 Responses of neurons in cat primary auditory cortex to bird chirps: Effects of temporal and spectral context. [sent-136, score-0.695]

66 Coding of visual information by precisely correlated spikes in the lateral geniculate nucleus. [sent-174, score-0.06]

67 Synergy and redundancy among brain cells of behaving monkeys. [sent-178, score-0.809]

68 How independent are the messages carried by adjacent inferior temporal cortical neurons? [sent-192, score-0.186]

69 Nonlinear feedforward networks with stochastic outputs: infomax implies redundancy reduction. [sent-224, score-0.552]

70 Specialization of the auditory system for the analysis of natural sounds. [sent-230, score-0.298]

71 Unbiased measures of transmitted information and channel capacity from multivariate neuronal data. [sent-254, score-0.183]

72 Independent neurons representing a fintie set of stimuli: dependence of the mutual information on the number of units sampled. [sent-266, score-0.32]

73 Decoding visual information from a population of retinal ganglion cells. [sent-299, score-0.128]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('redundancy', 0.552), ('auditory', 0.298), ('neurons', 0.255), ('groups', 0.211), ('cells', 0.173), ('stimulus', 0.158), ('coding', 0.153), ('measures', 0.151), ('redundancies', 0.142), ('sensory', 0.135), ('pathway', 0.131), ('acoustic', 0.124), ('ascending', 0.123), ('xl', 0.119), ('stations', 0.113), ('stimuli', 0.1), ('interactions', 0.092), ('cortical', 0.086), ('rsn', 0.085), ('rspairs', 0.085), ('xiis', 0.085), ('synergy', 0.079), ('colliculus', 0.074), ('redundant', 0.072), ('correlations', 0.07), ('bird', 0.067), ('quantifies', 0.067), ('content', 0.064), ('inferior', 0.064), ('reduction', 0.062), ('gawne', 0.057), ('synergistic', 0.057), ('xis', 0.057), ('xnis', 0.057), ('ic', 0.054), ('retinal', 0.054), ('milliseconds', 0.049), ('plenum', 0.049), ('renewal', 0.049), ('effects', 0.049), ('israel', 0.046), ('brain', 0.046), ('pathways', 0.045), ('dkl', 0.045), ('investigate', 0.045), ('xn', 0.044), ('develop', 0.043), ('theoretic', 0.043), ('ganglion', 0.042), ('interdisciplinary', 0.042), ('barlow', 0.042), ('complex', 0.041), ('primary', 0.041), ('spike', 0.04), ('hebrew', 0.039), ('jerusalem', 0.039), ('collaborative', 0.039), ('recorded', 0.039), ('among', 0.038), ('group', 0.038), ('independence', 0.038), ('multi', 0.037), ('conveyed', 0.037), ('along', 0.037), ('activity', 0.037), ('suggest', 0.036), ('larger', 0.036), ('vanishes', 0.036), ('independent', 0.036), ('ie', 0.035), ('recordings', 0.034), ('reflect', 0.034), ('hypothesized', 0.034), ('calculations', 0.034), ('reveal', 0.034), ('cortex', 0.034), ('defined', 0.034), ('ll', 0.033), ('focusing', 0.033), ('efficiency', 0.033), ('revealed', 0.033), ('factorization', 0.033), ('statistic', 0.033), ('xi', 0.033), ('mutual', 0.033), ('information', 0.032), ('interact', 0.032), ('measure', 0.032), ('neuroscience', 0.031), ('significant', 0.031), ('regime', 0.031), ('decoding', 0.031), ('level', 0.031), ('segments', 0.03), ('neural', 0.029), ('interestingly', 0.028), ('spikes', 0.028), ('quantify', 0.028), ('representative', 0.028), ('ten', 0.028)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000005 87 nips-2001-Group Redundancy Measures Reveal Redundancy Reduction in the Auditory Pathway

Author: Gal Chechik, Amir Globerson, M. J. Anderson, E. D. Young, Israel Nelken, Naftali Tishby

Abstract: The way groups of auditory neurons interact to code acoustic information is investigated using an information theoretic approach. We develop measures of redundancy among groups of neurons, and apply them to the study of collaborative coding efficiency in two processing stations in the auditory pathway: the inferior colliculus (IC) and the primary auditory cortex (AI). Under two schemes for the coding of the acoustic content, acoustic segments coding and stimulus identity coding, we show differences both in information content and group redundancies between IC and AI neurons. These results provide for the first time a direct evidence for redundancy reduction along the ascending auditory pathway, as has been hypothesized for theoretical considerations [Barlow 1959,2001]. The redundancy effects under the single-spikes coding scheme are significant only for groups larger than ten cells, and cannot be revealed with the redundancy measures that use only pairs of cells. The results suggest that the auditory system transforms low level representations that contain redundancies due to the statistical structure of natural stimuli, into a representation in which cortical neurons extract rare and independent component of complex acoustic signals, that are useful for auditory scene analysis. 1

2 0.24905477 174 nips-2001-Spike timing and the coding of naturalistic sounds in a central auditory area of songbirds

Author: B. D. Wright, Kamal Sen, William Bialek, A. J. Doupe

Abstract: In nature, animals encounter high dimensional sensory stimuli that have complex statistical and dynamical structure. Attempts to study the neural coding of these natural signals face challenges both in the selection of the signal ensemble and in the analysis of the resulting neural responses. For zebra finches, naturalistic stimuli can be defined as sounds that they encounter in a colony of conspecific birds. We assembled an ensemble of these sounds by recording groups of 10-40 zebra finches, and then analyzed the response of single neurons in the songbird central auditory area (field L) to continuous playback of long segments from this ensemble. Following methods developed in the fly visual system, we measured the information that spike trains provide about the acoustic stimulus without any assumptions about which features of the stimulus are relevant. Preliminary results indicate that large amounts of information are carried by spike timing, with roughly half of the information accessible only at time resolutions better than 10 ms; additional information is still being revealed as time resolution is improved to 2 ms. Information can be decomposed into that carried by the locking of individual spikes to the stimulus (or modulations of spike rate) vs. that carried by timing in spike patterns. Initial results show that in field L, temporal patterns give at least % extra information. Thus, single central auditory neurons can provide an informative representation of naturalistic sounds, in which spike timing may play a significant role.   

3 0.15643939 141 nips-2001-Orientation-Selective aVLSI Spiking Neurons

Author: Shih-Chii Liu, Jörg Kramer, Giacomo Indiveri, Tobi Delbrück, Rodney J. Douglas

Abstract: We describe a programmable multi-chip VLSI neuronal system that can be used for exploring spike-based information processing models. The system consists of a silicon retina, a PIC microcontroller, and a transceiver chip whose integrate-and-fire neurons are connected in a soft winner-take-all architecture. The circuit on this multi-neuron chip approximates a cortical microcircuit. The neurons can be configured for different computational properties by the virtual connections of a selected set of pixels on the silicon retina. The virtual wiring between the different chips is effected by an event-driven communication protocol that uses asynchronous digital pulses, similar to spikes in a neuronal system. We used the multi-chip spike-based system to synthesize orientation-tuned neurons using both a feedforward model and a feedback model. The performance of our analog hardware spiking model matched the experimental observations and digital simulations of continuous-valued neurons. The multi-chip VLSI system has advantages over computer neuronal models in that it is real-time, and the computational time does not scale with the size of the neuronal network.

4 0.1547246 37 nips-2001-Associative memory in realistic neuronal networks

Author: Peter E. Latham

Abstract: Almost two decades ago , Hopfield [1] showed that networks of highly reduced model neurons can exhibit multiple attracting fixed points, thus providing a substrate for associative memory. It is still not clear, however, whether realistic neuronal networks can support multiple attractors. The main difficulty is that neuronal networks in vivo exhibit a stable background state at low firing rate, typically a few Hz. Embedding attractor is easy; doing so without destabilizing the background is not. Previous work [2, 3] focused on the sparse coding limit, in which a vanishingly small number of neurons are involved in any memory. Here we investigate the case in which the number of neurons involved in a memory scales with the number of neurons in the network. In contrast to the sparse coding limit, we find that multiple attractors can co-exist robustly with a stable background state. Mean field theory is used to understand how the behavior of the network scales with its parameters, and simulations with analog neurons are presented. One of the most important features of the nervous system is its ability to perform associative memory. It is generally believed that associative memory is implemented using attractor networks - experimental studies point in that direction [4- 7], and there are virtually no competing theoretical models. Perhaps surprisingly, however, it is still an open theoretical question whether attractors can exist in realistic neuronal networks. The

5 0.12878208 78 nips-2001-Fragment Completion in Humans and Machines

Author: David Jacobs, Bas Rokers, Archisman Rudra, Zili Liu

Abstract: Partial information can trigger a complete memory. At the same time, human memory is not perfect. A cue can contain enough information to specify an item in memory, but fail to trigger that item. In the context of word memory, we present experiments that demonstrate some basic patterns in human memory errors. We use cues that consist of word fragments. We show that short and long cues are completed more accurately than medium length ones and study some of the factors that lead to this behavior. We then present a novel computational model that shows some of the flexibility and patterns of errors that occur in human memory. This model iterates between bottom-up and top-down computations. These are tied together using a Markov model of words that allows memory to be accessed with a simple feature set, and enables a bottom-up process to compute a probability distribution of possible completions of word fragments, in a manner similar to models of visual perceptual completion.

6 0.12462659 72 nips-2001-Exact differential equation population dynamics for integrate-and-fire neurons

7 0.12344386 11 nips-2001-A Maximum-Likelihood Approach to Modeling Multisensory Enhancement

8 0.1109118 123 nips-2001-Modeling Temporal Structure in Classical Conditioning

9 0.10954561 82 nips-2001-Generating velocity tuning by asymmetric recurrent connections

10 0.10766589 131 nips-2001-Neural Implementation of Bayesian Inference in Population Codes

11 0.10621262 57 nips-2001-Correlation Codes in Neuronal Populations

12 0.10568658 73 nips-2001-Eye movements and the maturation of cortical orientation selectivity

13 0.10328522 65 nips-2001-Effective Size of Receptive Fields of Inferior Temporal Visual Cortex Neurons in Natural Scenes

14 0.10176852 14 nips-2001-A Neural Oscillator Model of Auditory Selective Attention

15 0.093409576 96 nips-2001-Information-Geometric Decomposition in Spike Analysis

16 0.092438661 48 nips-2001-Characterizing Neural Gain Control using Spike-triggered Covariance

17 0.091342166 150 nips-2001-Probabilistic Inference of Hand Motion from Neural Activity in Motor Cortex

18 0.081561476 111 nips-2001-Learning Lateral Interactions for Feature Binding and Sensory Segmentation

19 0.077970944 2 nips-2001-3 state neurons for contextual processing

20 0.075145461 124 nips-2001-Modeling the Modulatory Effect of Attention on Human Spatial Vision


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.168), (1, -0.291), (2, -0.167), (3, -0.007), (4, 0.08), (5, 0.022), (6, 0.016), (7, -0.016), (8, -0.006), (9, 0.022), (10, -0.005), (11, 0.156), (12, -0.156), (13, 0.038), (14, 0.07), (15, -0.143), (16, 0.063), (17, 0.091), (18, -0.068), (19, 0.017), (20, -0.153), (21, -0.058), (22, 0.003), (23, -0.096), (24, -0.022), (25, 0.048), (26, -0.009), (27, 0.094), (28, 0.025), (29, 0.03), (30, -0.038), (31, -0.038), (32, -0.048), (33, 0.122), (34, -0.01), (35, 0.178), (36, 0.012), (37, -0.068), (38, -0.058), (39, 0.04), (40, 0.003), (41, 0.061), (42, 0.092), (43, -0.081), (44, -0.036), (45, 0.07), (46, -0.034), (47, 0.074), (48, 0.113), (49, 0.009)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97955883 87 nips-2001-Group Redundancy Measures Reveal Redundancy Reduction in the Auditory Pathway

Author: Gal Chechik, Amir Globerson, M. J. Anderson, E. D. Young, Israel Nelken, Naftali Tishby

Abstract: The way groups of auditory neurons interact to code acoustic information is investigated using an information theoretic approach. We develop measures of redundancy among groups of neurons, and apply them to the study of collaborative coding efficiency in two processing stations in the auditory pathway: the inferior colliculus (IC) and the primary auditory cortex (AI). Under two schemes for the coding of the acoustic content, acoustic segments coding and stimulus identity coding, we show differences both in information content and group redundancies between IC and AI neurons. These results provide for the first time a direct evidence for redundancy reduction along the ascending auditory pathway, as has been hypothesized for theoretical considerations [Barlow 1959,2001]. The redundancy effects under the single-spikes coding scheme are significant only for groups larger than ten cells, and cannot be revealed with the redundancy measures that use only pairs of cells. The results suggest that the auditory system transforms low level representations that contain redundancies due to the statistical structure of natural stimuli, into a representation in which cortical neurons extract rare and independent component of complex acoustic signals, that are useful for auditory scene analysis. 1

2 0.77252948 174 nips-2001-Spike timing and the coding of naturalistic sounds in a central auditory area of songbirds

Author: B. D. Wright, Kamal Sen, William Bialek, A. J. Doupe

Abstract: In nature, animals encounter high dimensional sensory stimuli that have complex statistical and dynamical structure. Attempts to study the neural coding of these natural signals face challenges both in the selection of the signal ensemble and in the analysis of the resulting neural responses. For zebra finches, naturalistic stimuli can be defined as sounds that they encounter in a colony of conspecific birds. We assembled an ensemble of these sounds by recording groups of 10-40 zebra finches, and then analyzed the response of single neurons in the songbird central auditory area (field L) to continuous playback of long segments from this ensemble. Following methods developed in the fly visual system, we measured the information that spike trains provide about the acoustic stimulus without any assumptions about which features of the stimulus are relevant. Preliminary results indicate that large amounts of information are carried by spike timing, with roughly half of the information accessible only at time resolutions better than 10 ms; additional information is still being revealed as time resolution is improved to 2 ms. Information can be decomposed into that carried by the locking of individual spikes to the stimulus (or modulations of spike rate) vs. that carried by timing in spike patterns. Initial results show that in field L, temporal patterns give at least % extra information. Thus, single central auditory neurons can provide an informative representation of naturalistic sounds, in which spike timing may play a significant role.   

3 0.64294219 11 nips-2001-A Maximum-Likelihood Approach to Modeling Multisensory Enhancement

Author: H. Colonius, A. Diederich

Abstract: Multisensory response enhancement (MRE) is the augmentation of the response of a neuron to sensory input of one modality by simultaneous input from another modality. The maximum likelihood (ML) model presented here modifies the Bayesian model for MRE (Anastasio et al.) by incorporating a decision strategy to maximize the number of correct decisions. Thus the ML model can also deal with the important tasks of stimulus discrimination and identification in the presence of incongruent visual and auditory cues. It accounts for the inverse effectiveness observed in neurophysiological recording data, and it predicts a functional relation between uni- and bimodal levels of discriminability that is testable both in neurophysiological and behavioral experiments. 1

4 0.59428084 14 nips-2001-A Neural Oscillator Model of Auditory Selective Attention

Author: Stuart N. Wrigley, Guy J. Brown

Abstract: A model of auditory grouping is described in which auditory attention plays a key role. The model is based upon an oscillatory correlation framework, in which neural oscillators representing a single perceptual stream are synchronised, and are desynchronised from oscillators representing other streams. The model suggests a mechanism by which attention can be directed to the high or low tones in a repeating sequence of tones with alternating frequencies. In addition, it simulates the perceptual segregation of a mistuned harmonic from a complex tone. 1

5 0.57238984 141 nips-2001-Orientation-Selective aVLSI Spiking Neurons

Author: Shih-Chii Liu, Jörg Kramer, Giacomo Indiveri, Tobi Delbrück, Rodney J. Douglas

Abstract: We describe a programmable multi-chip VLSI neuronal system that can be used for exploring spike-based information processing models. The system consists of a silicon retina, a PIC microcontroller, and a transceiver chip whose integrate-and-fire neurons are connected in a soft winner-take-all architecture. The circuit on this multi-neuron chip approximates a cortical microcircuit. The neurons can be configured for different computational properties by the virtual connections of a selected set of pixels on the silicon retina. The virtual wiring between the different chips is effected by an event-driven communication protocol that uses asynchronous digital pulses, similar to spikes in a neuronal system. We used the multi-chip spike-based system to synthesize orientation-tuned neurons using both a feedforward model and a feedback model. The performance of our analog hardware spiking model matched the experimental observations and digital simulations of continuous-valued neurons. The multi-chip VLSI system has advantages over computer neuronal models in that it is real-time, and the computational time does not scale with the size of the neuronal network.

6 0.56558394 57 nips-2001-Correlation Codes in Neuronal Populations

7 0.56179613 124 nips-2001-Modeling the Modulatory Effect of Attention on Human Spatial Vision

8 0.51730412 48 nips-2001-Characterizing Neural Gain Control using Spike-triggered Covariance

9 0.47171724 37 nips-2001-Associative memory in realistic neuronal networks

10 0.44473699 18 nips-2001-A Rational Analysis of Cognitive Control in a Speeded Discrimination Task

11 0.43929484 131 nips-2001-Neural Implementation of Bayesian Inference in Population Codes

12 0.43663865 151 nips-2001-Probabilistic principles in unsupervised learning of visual structure: human data and a model

13 0.42518532 78 nips-2001-Fragment Completion in Humans and Machines

14 0.42492405 160 nips-2001-Reinforcement Learning and Time Perception -- a Model of Animal Experiments

15 0.41742137 72 nips-2001-Exact differential equation population dynamics for integrate-and-fire neurons

16 0.38056907 82 nips-2001-Generating velocity tuning by asymmetric recurrent connections

17 0.36691707 123 nips-2001-Modeling Temporal Structure in Classical Conditioning

18 0.36280921 73 nips-2001-Eye movements and the maturation of cortical orientation selectivity

19 0.34328881 96 nips-2001-Information-Geometric Decomposition in Spike Analysis

20 0.31486902 30 nips-2001-Agglomerative Multivariate Information Bottleneck


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(14, 0.016), (17, 0.026), (19, 0.016), (27, 0.07), (30, 0.051), (38, 0.031), (72, 0.016), (79, 0.033), (91, 0.637)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99323905 87 nips-2001-Group Redundancy Measures Reveal Redundancy Reduction in the Auditory Pathway

Author: Gal Chechik, Amir Globerson, M. J. Anderson, E. D. Young, Israel Nelken, Naftali Tishby

Abstract: The way groups of auditory neurons interact to code acoustic information is investigated using an information theoretic approach. We develop measures of redundancy among groups of neurons, and apply them to the study of collaborative coding efficiency in two processing stations in the auditory pathway: the inferior colliculus (IC) and the primary auditory cortex (AI). Under two schemes for the coding of the acoustic content, acoustic segments coding and stimulus identity coding, we show differences both in information content and group redundancies between IC and AI neurons. These results provide for the first time a direct evidence for redundancy reduction along the ascending auditory pathway, as has been hypothesized for theoretical considerations [Barlow 1959,2001]. The redundancy effects under the single-spikes coding scheme are significant only for groups larger than ten cells, and cannot be revealed with the redundancy measures that use only pairs of cells. The results suggest that the auditory system transforms low level representations that contain redundancies due to the statistical structure of natural stimuli, into a representation in which cortical neurons extract rare and independent component of complex acoustic signals, that are useful for auditory scene analysis. 1

2 0.99099362 189 nips-2001-The g Factor: Relating Distributions on Features to Distributions on Images

Author: James M. Coughlan, Alan L. Yuille

Abstract: We describe the g-factor, which relates probability distributions on image features to distributions on the images themselves. The g-factor depends only on our choice of features and lattice quantization and is independent of the training image data. We illustrate the importance of the g-factor by analyzing how the parameters of Markov Random Field (i.e. Gibbs or log-linear) probability models of images are learned from data by maximum likelihood estimation. In particular, we study homogeneous MRF models which learn image distributions in terms of clique potentials corresponding to feature histogram statistics (d. Minimax Entropy Learning (MEL) by Zhu, Wu and Mumford 1997 [11]) . We first use our analysis of the g-factor to determine when the clique potentials decouple for different features . Second, we show that clique potentials can be computed analytically by approximating the g-factor. Third, we demonstrate a connection between this approximation and the Generalized Iterative Scaling algorithm (GIS), due to Darroch and Ratcliff 1972 [2], for calculating potentials. This connection enables us to use GIS to improve our multinomial approximation, using Bethe-Kikuchi[8] approximations to simplify the GIS procedure. We support our analysis by computer simulations. 1

3 0.98621619 18 nips-2001-A Rational Analysis of Cognitive Control in a Speeded Discrimination Task

Author: Michael C. Mozer, Michael D. Colagrosso, David E. Huber

Abstract: We are interested in the mechanisms by which individuals monitor and adjust their performance of simple cognitive tasks. We model a speeded discrimination task in which individuals are asked to classify a sequence of stimuli (Jones & Braver, 2001). Response conflict arises when one stimulus class is infrequent relative to another, resulting in more errors and slower reaction times for the infrequent class. How do control processes modulate behavior based on the relative class frequencies? We explain performance from a rational perspective that casts the goal of individuals as minimizing a cost that depends both on error rate and reaction time. With two additional assumptions of rationality—that class prior probabilities are accurately estimated and that inference is optimal subject to limitations on rate of information transmission—we obtain a good fit to overall RT and error data, as well as trial-by-trial variations in performance. Consider the following scenario: While driving, you approach an intersection at which the traffic light has already turned yellow, signaling that it is about to turn red. You also notice that a car is approaching you rapidly from behind, with no indication of slowing. Should you stop or speed through the intersection? The decision is difficult due to the presence of two conflicting signals. Such response conflict can be produced in a psychological laboratory as well. For example, Stroop (1935) asked individuals to name the color of ink on which a word is printed. When the words are color names incongruous with the ink color— e.g., “blue” printed in red—reaction times are slower and error rates are higher. We are interested in the control mechanisms underlying performance of high-conflict tasks. Conflict requires individuals to monitor and adjust their behavior, possibly responding more slowly if errors are too frequent. In this paper, we model a speeded discrimination paradigm in which individuals are asked to classify a sequence of stimuli (Jones & Braver, 2001). The stimuli are letters of the alphabet, A–Z, presented in rapid succession. In a choice task, individuals are asked to press one response key if the letter is an X or another response key for any letter other than X (as a shorthand, we will refer to non-X stimuli as Y). In a go/no-go task, individuals are asked to press a response key when X is presented and to make no response otherwise. We address both tasks because they elicit slightly different decision-making behavior. In both tasks, Jones and Braver (2001) manipulated the relative frequency of the X and Y stimuli; the ratio of presentation frequency was either 17:83, 50:50, or 83:17. Response conflict arises when the two stimulus classes are unbalanced in frequency, resulting in more errors and slower reaction times. For example, when X’s are frequent but Y is presented, individuals are predisposed toward producing the X response, and this predisposition must be overcome by the perceptual evidence from the Y. Jones and Braver (2001) also performed an fMRI study of this task and found that anterior cingulate cortex (ACC) becomes activated in situations involving response conflict. Specifically, when one stimulus occurs infrequently relative to the other, event-related fMRI response in the ACC is greater for the low frequency stimulus. Jones and Braver also extended a neural network model of Botvinick, Braver, Barch, Carter, and Cohen (2001) to account for human performance in the two discrimination tasks. The heart of the model is a mechanism that monitors conflict—the posited role of the ACC—and adjusts response biases accordingly. In this paper, we develop a parsimonious alternative account of the role of the ACC and of how control processes modulate behavior when response conflict arises. 1 A RATIONAL ANALYSIS Our account is based on a rational analysis of human cognition, which views cognitive processes as being optimized with respect to certain task-related goals, and being adaptive to the structure of the environment (Anderson, 1990). We make three assumptions of rationality: (1) perceptual inference is optimal but is subject to rate limitations on information transmission, (2) response class prior probabilities are accurately estimated, and (3) the goal of individuals is to minimize a cost that depends both on error rate and reaction time. The heart of our account is an existing probabilistic model that explains a variety of facilitation effects that arise from long-term repetition priming (Colagrosso, in preparation; Mozer, Colagrosso, & Huber, 2000), and more broadly, that addresses changes in the nature of information transmission in neocortex due to experience. We give a brief overview of this model; the details are not essential for the present work. The model posits that neocortex can be characterized by a collection of informationprocessing pathways, and any act of cognition involves coordination among pathways. To model a simple discrimination task, we might suppose a perceptual pathway to map the visual input to a semantic representation, and a response pathway to map the semantic representation to a response. The choice and go/no-go tasks described earlier share a perceptual pathway, but require different response pathways. The model is framed in terms of probability theory: pathway inputs and outputs are random variables and microinference in a pathway is carried out by Bayesian belief revision.   To elaborate, consider a pathway whose input at time is a discrete random variable, denoted , which can assume values corresponding to alternative input states. Similarly, the output of the pathway at time is a discrete random variable, denoted , which can assume values . For example, the input to the perceptual pathway in the discrimination task is one of visual patterns corresponding to the letters of the alphabet, and the output is one of letter identities. (This model is highly abstract: the visual patterns are enumerated, but the actual pixel patterns are not explicitly represented in the model. Nonetheless, the similarity structure among inputs can be captured, but we skip a discussion of this issue because it is irrelevant for the current work.) To present a particular input alternative, , to the model for time steps, we clamp for . The model computes a probability distribution over given , i.e., P . ¡ # 4 0 ©2' &  0 ' ! 1)(

4 0.98523855 148 nips-2001-Predictive Representations of State

Author: Michael L. Littman, Richard S. Sutton

Abstract: We show that states of a dynamical system can be usefully represented by multi-step, action-conditional predictions of future observations. State representations that are grounded in data in this way may be easier to learn, generalize better, and be less dependent on accurate prior models than, for example, POMDP state representations. Building on prior work by Jaeger and by Rivest and Schapire, in this paper we compare and contrast a linear specialization of the predictive approach with the state representations used in POMDPs and in k-order Markov models. Ours is the first specific formulation of the predictive idea that includes both stochasticity and actions (controls). We show that any system has a linear predictive state representation with number of predictions no greater than the number of states in its minimal POMDP model. In predicting or controlling a sequence of observations, the concepts of state and state estimation inevitably arise. There have been two dominant approaches. The generative-model approach, typified by research on partially observable Markov decision processes (POMDPs), hypothesizes a structure for generating observations and estimates its state and state dynamics. The history-based approach, typified by k-order Markov methods, uses simple functions of past observations as state, that is, as the immediate basis for prediction and control. (The data flow in these two approaches are diagrammed in Figure 1.) Of the two, the generative-model approach is more general. The model's internal state gives it temporally unlimited memorythe ability to remember an event that happened arbitrarily long ago--whereas a history-based approach can only remember as far back as its history extends. The bane of generative-model approaches is that they are often strongly dependent on a good model of the system's dynamics. Most uses of POMDPs, for example, assume a perfect dynamics model and attempt only to estimate state. There are algorithms for simultaneously estimating state and dynamics (e.g., Chrisman, 1992), analogous to the Baum-Welch algorithm for the uncontrolled case (Baum et al., 1970), but these are only effective at tuning parameters that are already approximately correct (e.g., Shatkay & Kaelbling, 1997). observations (and actions) 1-----1-----1..- (a) state rep'n observations (and actions) ¢E / t/' --+ 1-step delays . state rep'n (b) Figure 1: Data flow in a) POMDP and other recursive updating of state representation, and b) history-based state representation. In practice, history-based approaches are often much more effective. Here, the state representation is a relatively simple record of the stream of past actions and observations. It might record the occurrence of a specific subsequence or that one event has occurred more recently than another. Such representations are far more closely linked to the data than are POMDP representations. One way of saying this is that POMDP learning algorithms encounter many local minima and saddle points because all their states are equipotential. History-based systems immediately break symmetry, and their direct learning procedure makes them comparably simple. McCallum (1995) has shown in a number of examples that sophisticated history-based methods can be effective in large problems, and are often more practical than POMDP methods even in small ones. The predictive state representation (PSR) approach, which we develop in this paper, is like the generative-model approach in that it updates the state representation recursively, as in Figure l(a), rather than directly computing it from data. We show that this enables it to attain generality and compactness at least equal to that of the generative-model approach. However, the PSR approach is also like the history-based approach in that its representations are grounded in data. Whereas a history-based representation looks to the past and records what did happen, a PSR looks to the future and represents what will happen. In particular, a PSR is a vector of predictions for a specially selected set of action-observation sequences, called tests (after Rivest & Schapire, 1994). For example, consider the test U101U202, where U1 and U2 are specific actions and 01 and 02 are specific observations. The correct prediction for this test given the data stream up to time k is the probability of its observations occurring (in order) given that its actions are taken (in order) (i.e., Pr {Ok = 01, Ok+1 = 02 I A k = u1,A k + 1 = U2}). Each test is a kind of experiment that could be performed to tell us something about the system. If we knew the outcome of all possible tests, then we would know everything there is to know about the system. A PSR is a set of tests that is sufficient information to determine the prediction for all possible tests (a sufficient statistic). As an example of these points, consider the float/reset problem (Figure 2) consisting of a linear string of 5 states with a distinguished reset state on the far right. One action, f (float), causes the system to move uniformly at random to the right or left by one state, bounded at the two ends. The other action, r (reset), causes a jump to the reset state irrespective of the current state. The observation is always o unless the r action is taken when the system is already in the reset state, in which case the observation is 1. Thus, on an f action, the correct prediction is always 0, whereas on an r action, the correct prediction depends on how many fs there have been since the last r: for zero fS, it is 1; for one or two fS, it is 0.5; for three or four fS, it is 0.375; for five or six fs, it is 0.3125, and so on decreasing after every second f, asymptotically bottoming out at 0.2. No k-order Markov method can model this system exactly, because no limited-. .5 .5 a) float action 1,0=1 b) reset action Figure 2: Underlying dynamics of the float/reset problem for a) the float action and b) the reset action. The numbers on the arcs indicate transition probabilities. The observation is always 0 except on the reset action from the rightmost state, which produces an observation of 1. length history is a sufficient statistic. A POMDP approach can model it exactly by maintaining a belief-state representation over five or so states. A PSR, on the other hand, can exactly model the float/reset system using just two tests: rl and fOrI. Starting from the rightmost state, the correct predictions for these two tests are always two successive probabilities in the sequence given above (1, 0.5, 0.5, 0.375,...), which is always a sufficient statistic to predict the next pair in the sequence. Although this informational analysis indicates a solution is possible in principle, it would require a nonlinear updating process for the PSR. In this paper we restrict consideration to a linear special case of PSRs, for which we can guarantee that the number of tests needed does not exceed the number of states in the minimal POMDP representation (although we have not ruled out the possibility it can be considerably smaller). Of greater ultimate interest are the prospects for learning PSRs and their update functions, about which we can only speculate at this time. The difficulty of learning POMDP structures without good prior models are well known. To the extent that this difficulty is due to the indirect link between the POMDP states and the data, predictive representations may be able to do better. Jaeger (2000) introduced the idea of predictive representations as an alternative to belief states in hidden Markov models and provided a learning procedure for these models. We build on his work by treating the control case (with actions), which he did not significantly analyze. We have also been strongly influenced by the work of Rivest and Schapire (1994), who did consider tests including actions, but treated only the deterministic case, which is significantly different. They also explored construction and learning algorithms for discovering system structure. 1 Predictive State Representations We consider dynamical systems that accept actions from a discrete set A and generate observations from a discrete set O. We consider only predicting the system, not controlling it, so we do not designate an explicit reward observation. We refer to such a system as an environment. We use the term history to denote a test forming an initial stream of experience and characterize an environment by a probability distribution over all possible histories, P : {OIA}* H- [0,1], where P(Ol··· Otl a1··· at) is the probability of observations 01, ... , O£ being generated, in that order, given that actions aI, ... ,at are taken, in that order. The probability of a test t conditional on a history h is defined as P(tlh) = P(ht)/P(h). Given a set of q tests Q = {til, we define their (1 x q) prediction vector, p(h) = [P(t1Ih),P(t2Ih), ... ,P(tqlh)], as a predictive state representation (PSR) if and only if it forms a sufficient statistic for the environment, Le., if and only if P(tlh) = ft(P(h)), (1) for any test t and history h, and for some projection junction ft : [0, l]q ~ [0,1]. In this paper we focus on linear PSRs, for which the projection functions are linear, that is, for which there exist a (1 x q) projection vector mt, for every test t, such that (2) P(tlh) == ft(P(h)) =7 p(h)mf, for all histories h. Let Pi(h) denote the ith component of the prediction vector for some PSR. This can be updated recursively, given a new action-observation pair a,o, by .(h ) == P(t.lh ) == P(otil ha ) == faati(P(h)) == p(h)m'{;ati P2 ao 2 ao P(olha) faa (P(h)) p(h)mro ' (3) where the last step is specific to linear PSRs. We can now state our main result: Theorem 1 For any environment that can be represented by a finite POMDP model, there exists a linear PSR with number of tests no larger than the number of states in the minimal POMDP model. 2 Proof of Theorem 1: Constructing a PSR from a POMDP We prove Theorem 1 by showing that for any POMDP model of the environment, we can construct in polynomial time a linear PSR for that POMDP of lesser or equal complexity that produces the same probability distribution over histories as the POMDP model. We proceed in three steps. First, we review POMDP models and how they assign probabilities to tests. Next, we define an algorithm that takes an n-state POMDP model and produces a set of n or fewer tests, each of length less than or equal to n. Finally, we show that the set of tests constitute a PSR for the POMDP, that is, that there are projection vectors that, together with the tests' predictions, produce the same probability distribution over histories as the POMDP. A POMDP (Lovejoy, 1991; Kaelbling et al., 1998) is defined by a sextuple (8, A, 0, bo, T, 0). Here, 8 is a set of n underlying (hidden) states, A is a discrete set of actions, and 0 is a discrete set of observations. The (1 x n) vector bo is an initial state distribution. The set T consists of (n x n) transition matrices Ta, one for each action a, where Tlj is the probability of a transition from state i to j when action a is chosen. The set 0 consists of diagonal (n x n) observation matrices oa,o, one for each pair of observation 0 and action a, where o~'o is the probability of observation 0 when action a is selected and state i is reached. l The state representation in a POMDP (Figure l(a)) is the belief state-the (1 x n) vector of the state-occupation probabilities given the history h. It can be computed recursively given a new action a and observation 0 by b(h)Taoa,o b(hao) = b(h)Taoa,oe;' where en is the (1 x n)-vector of all Is. Finally, a POMDP defines a probability distribution over tests (and thus histories) by P(Ol ... otlhal ... at) == b(h)Ta1oal,Ol ... Taloa£,Ole~. (4) IThere are many equivalent formulations and the conversion procedure described here can be easily modified to accommodate other POMDP definitions. We now present our algorithm for constructing a PSR for a given POMDP. It uses a function u mapping tests to (1 x n) vectors defined recursively by u(c) == en and u(aot) == (Taoa,ou(t)T)T, where c represents the null test. Conceptually, the components of u(t) are the probabilities of the test t when applied from each underlying state of the POMDP; we call u(t) the outcome vector for test t. We say a test t is linearly independent of a set of tests S if its outcome vector is linearly independent of the set of outcome vectors of the tests in S. Our algorithm search is used and defined as Q -<- search(c, {}) search(t, S): for each a E A, 0 E 0 if aot is linearly independent of S then S -<- search(aot, S U {aot}) return S The algorithm maintains a set of tests and searches for new tests that are linearly independent of those already found. It is a form of depth-first search. The algorithm halts when it checks all the one-step extensions of its tests and finds none that are linearly independent. Because the set of tests Q returned by search have linearly independent outcome vectors, the cardinality of Q is bounded by n, ensuring that the algorithm halts after a polynomial number of iterations. Because each test in Q is formed by a one-step extension to some other test in Q, no test is longer than n action-observation pairs. The check for linear independence can be performed in many ways, including Gaussian elimination, implying that search terminates in polynomial time. By construction, all one-step extensions to the set of tests Q returned by search are linearly dependent on those in Q. We now show that this is true for any test. Lemma 1 The outcome vectors of the tests in Q can be linearly combined to produce the outcome vector for any test. Proof: Let U be the (n x q) matrix formed by concatenating the outcome vectors for all tests in Q. Since, for all combinations of a and 0, the columns of Taoa,ou are linearly dependent on the columns of U, we can write Taoa,ou == UW T for some q x q matrix of weights W. If t is a test that is linearly dependent on Q, then anyone-step extension of t, aot, is linearly dependent on Q. This is because we can write the outcome vector for t as u(t) == (UwT)T for some (1 x q) weight vector w and the outcome vector for aot as u(aot) == (Taoa,ou(t)T)T == (Taoa,oUwT)T == (UWTwT)T. Thus, aot is linearly dependent on Q. Now, note that all one-step tests are linearly dependent on Q by the structure of the search algorithm. Using the previous paragraph as an inductive argument, this implies that all tests are linearly dependent on Q. 0 Returning to the float/reset example POMDP, search begins with by enumerating the 4 extensions to the null test (fO, fl, rO, and rl). Of these, only fa and rO are are linearly independent. Of the extensions of these, fOrO is the only one that is linearly independent of the other two. The remaining two tests added to Q by search are fOfOrO and fOfOfOrO. No extensions of the 5 tests in Q are linearly independent of the 5 tests in Q, so the procedure halts. We now show that the set of tests Q constitute a PSR for the POMDP by constructing projection vectors that, together with the tests' predictions, produce the same probability distribution over histories as the POMDP. For each combination of a and 0, define a q x q matrix Mao == (U+Taoa,ou)T and a 1 x q vector mao == (U+Taoa,oe;;J T , where U is the matrix of outcome vectors defined in the previous section and U+ is its pseudoinverse2 • The ith row of Mao is maoti. The probability distribution on histories implied by these projection vectors is p(h )m~101 alOl p(h)M~ol M~_10l_1 m~Ol b(h)UU+r a1 oa 1,01 U ... U+T al-10 al-1,Ol-1 UU+Taloal,ol b(h)T a1 0 a1,01 ... ral-l0al-t,ol-lTaloal,Ole~, Le., it is the same as that of the POMDP, as in Equation 4. Here, the last step uses the fact that UU+v T == v T for v T linearly dependent on the columns of U. This holds by construction of U in the previous section. This completes the proof of Theorem 1. Completing the float/reset example, consider the Mf,o matrix found by the process defined in this section. It derives predictions for each test in Q after taking action f. Most of these are quite simple because the tests are so similar: the new prediction for rO is exactly the old prediction for fOrO, for example. The only non trivial test is fOfOfOrO. Its outcome can be computed from 0.250 p(rOlh) - 0.0625 p(fOrOlh) + 0.750 p(fOfOrOlh). This example illustrates that the projection vectors need not contain only positive entries. 3 Conclusion We have introduced a predictive state representation for dynamical systems that is grounded in actions and observations and shown that, even in its linear form, it is at least as general and compact as POMDPs. In essence, we have established PSRs as a non-inferior alternative to POMDPs, and suggested that they might have important advantages, while leaving demonstration of those advantages to future work. We conclude by summarizing the potential advantages (to be explored in future work): Learnability. The k-order Markov model is similar to PSRs in that it is entirely based on actions and observations. Such models can be learned trivially from data by counting-it is an open question whether something similar can be done with a PSR. Jaeger (2000) showed how to learn such a model in the uncontrolled setting, but the situation is more complex in the multiple action case since outcomes are conditioned on behavior, violating some required independence assumptions. Compactness. We have shown that there exist linear PSRs no more complex that the minimal POMDP for an environment, but in some cases the minimal linear PSR seems to be much smaller. For example, a POMDP extension of factored MDPs explored by Singh and Cohn (1998) would be cross-products of separate POMDPs and have linear PSRs that increase linearly with the number and size of the component POMDPs, whereas their minimal POMDP representation would grow as the size 2If U = A~BT is the singular value decomposition of U, then B:E+ AT is the pseudoinverse. The pseudoinverse of the diagonal matrix }J replaces each non-zero element with its reciprocal. e; of the state space, Le., exponential in the number of component POMDPs. This (apparent) advantage stems from the PSR's combinatorial or factored structure. As a vector of state variables, capable of taking on diverse values, a PSR may be inherently more powerful than the distribution over discrete states (the belief state) of a POMDP. We have already seen that general PSRs can be more compact than POMDPs; they are also capable of efficiently capturing environments in the diversity representation used by Rivest and Schapire (1994), which is known to provide an extremely compact representation for some environments. Generalization. There are reasons to think that state variables that are themselves predictions may be particularly useful in learning to make other predictions. With so many things to predict, we have in effect a set or sequence of learning problems, all due to the same environment. In many such cases the solutions to earlier problems have been shown to provide features that generalize particularly well to subsequent problems (e.g., Baxter, 2000; Thrun & Pratt, 1998). Powerful, extensible representations. PSRs that predict tests could be generalized to predict the outcomes of multi-step options (e.g., Sutton et al., 1999). In this case, particularly, they would constitute a powerful language for representing the state of complex environments. AcknowledgIllents: We thank Peter Dayan, Lawrence Saul, Fernando Pereira and Rob Schapire for many helpful discussions of these and related ideas. References Baum, L. E., Petrie, T., Soules, G., & Weiss, N. (1970). A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Annals of Mathematical Statistics, 41, 164-171. Baxter, J. (2000). A model of inductive bias learning. Journal of Artificial Intelligence Research, 12, 149-198. Chrisman, L. (1992). Reinforcement learning with perceptual aliasing: The perceptual distinctions approach. Proceedings of the Tenth National Conference on Artificial Intelligence (pp. 183-188). San Jose, California: AAAI Press. Jaeger, H. (2000). Observable operator models for discrete stochastic time series. Neural Computation, 12, 1371-1398. Kaelbling, L. P., Littman, M. L., & Cassandra, A. R. (1998). Planning and acting in ' partially observable stochastic domains. Artificial Intelligence, 101, 99-134. Lovejoy, W. S. (1991). A survey of algorithmic methods for partially observable Markov decision processes. Annals of Operations Research, 28, 47-65. McCallum, A. K. (1995). Reinforcement learning with selective perception and hidden state. Doctoral diss.ertation, Department of Computer Science, University of Rochester. Rivest, R. L., & Schapire, R. E. (1994). Diversity-based inference of finite automata. Journal of the ACM, 41, 555-589. Shatkay, H., & Kaelbling, L. P. (1997). Learning topological maps with weak local odometric information~ Proceedings of Fifteenth International Joint Conference on Artificial Intelligence (IJCAI-91) (pp. 920-929). Singh, S., & Cohn, D. (1998). How to dynamically merge Markov decision processes. Advances in Neural and Information Processing Systems 10 (pp. 1057-1063). Sutton, R. S., Precup, D., & Singh, S. (1999). Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 181-211. Thrun, S., & Pratt, L. (Eds.). (1998). Learning to learn. Kluwer Academic Publishers.

5 0.97619808 107 nips-2001-Latent Dirichlet Allocation

Author: David M. Blei, Andrew Y. Ng, Michael I. Jordan

Abstract: We propose a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams [6], and Hofmann's aspect model , also known as probabilistic latent semantic indexing (pLSI) [3]. In the context of text modeling, our model posits that each document is generated as a mixture of topics, where the continuous-valued mixture proportions are distributed as a latent Dirichlet random variable. Inference and learning are carried out efficiently via variational algorithms. We present empirical results on applications of this model to problems in text modeling, collaborative filtering, and text classification. 1

6 0.96571386 111 nips-2001-Learning Lateral Interactions for Feature Binding and Sensory Segmentation

7 0.94307673 144 nips-2001-Partially labeled classification with Markov random walks

8 0.92038524 66 nips-2001-Efficiency versus Convergence of Boolean Kernels for On-Line Learning Algorithms

9 0.8909592 174 nips-2001-Spike timing and the coding of naturalistic sounds in a central auditory area of songbirds

10 0.85719472 30 nips-2001-Agglomerative Multivariate Information Bottleneck

11 0.85469359 123 nips-2001-Modeling Temporal Structure in Classical Conditioning

12 0.83166873 24 nips-2001-Active Information Retrieval

13 0.81629813 96 nips-2001-Information-Geometric Decomposition in Spike Analysis

14 0.81488252 183 nips-2001-The Infinite Hidden Markov Model

15 0.81123352 68 nips-2001-Entropy and Inference, Revisited

16 0.81086278 11 nips-2001-A Maximum-Likelihood Approach to Modeling Multisensory Enhancement

17 0.80481285 100 nips-2001-Iterative Double Clustering for Unsupervised and Semi-Supervised Learning

18 0.80073899 7 nips-2001-A Dynamic HMM for On-line Segmentation of Sequential Data

19 0.7994647 55 nips-2001-Convergence of Optimistic and Incremental Q-Learning

20 0.79278952 51 nips-2001-Cobot: A Social Reinforcement Learning Agent