nips nips2009 nips2009-183 knowledge-graph by maker-knowledge-mining

183 nips-2009-Optimal context separation of spiking haptic signals by second-order somatosensory neurons


Source: pdf

Author: Romain Brasselet, Roland Johansson, Angelo Arleo

Abstract: We study an encoding/decoding mechanism accounting for the relative spike timing of the signals propagating from peripheral nerve fibers to second-order somatosensory neurons in the cuneate nucleus (CN). The CN is modeled as a population of spiking neurons receiving as inputs the spatiotemporal responses of real mechanoreceptors obtained via microneurography recordings in humans. The efficiency of the haptic discrimination process is quantified by a novel definition of entropy that takes into full account the metrical properties of the spike train space. This measure proves to be a suitable decoding scheme for generalizing the classical Shannon entropy to spike-based neural codes. It permits an assessment of neurotransmission in the presence of a large output space (i.e. hundreds of spike trains) with 1 ms temporal precision. It is shown that the CN population code performs a complete discrimination of 81 distinct stimuli already within 35 ms of the first afferent spike, whereas a partial discrimination (80% of the maximum information transmission) is possible as rapidly as 15 ms. This study suggests that the CN may not constitute a mere synaptic relay along the somatosensory pathway but, rather, it may convey optimal contextual accounts (in terms of fast and reliable information transfer) of peripheral tactile inputs to downstream structures of the central nervous system. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Optimal context separation of spiking haptic signals by second-order somatosensory neurons Romain Brasselet CNRS - UPMC Univ Paris 6, UMR 7102 F 75005, Paris, France romain. [sent-1, score-0.536]

2 fr Abstract We study an encoding/decoding mechanism accounting for the relative spike timing of the signals propagating from peripheral nerve fibers to second-order somatosensory neurons in the cuneate nucleus (CN). [sent-10, score-1.135]

3 The CN is modeled as a population of spiking neurons receiving as inputs the spatiotemporal responses of real mechanoreceptors obtained via microneurography recordings in humans. [sent-11, score-0.752]

4 The efficiency of the haptic discrimination process is quantified by a novel definition of entropy that takes into full account the metrical properties of the spike train space. [sent-12, score-0.914]

5 hundreds of spike trains) with 1 ms temporal precision. [sent-16, score-0.552]

6 It is shown that the CN population code performs a complete discrimination of 81 distinct stimuli already within 35 ms of the first afferent spike, whereas a partial discrimination (80% of the maximum information transmission) is possible as rapidly as 15 ms. [sent-17, score-1.098]

7 This study suggests that the CN may not constitute a mere synaptic relay along the somatosensory pathway but, rather, it may convey optimal contextual accounts (in terms of fast and reliable information transfer) of peripheral tactile inputs to downstream structures of the central nervous system. [sent-18, score-0.71]

8 1 Introduction During haptic exploration tasks, forces are applied to the skin of the hand, and in particular to the fingertips, which constitute the most sensitive parts of the hand and are prominently involved in object manipulation/recognition tasks. [sent-19, score-0.191]

9 Due to the visco-elastic properties of the skin, forces applied to the fingertips generate complex non-linear deformation dynamics, which makes it difficult to predict how these forces can be transduced into percepts by the somatosensory system. [sent-20, score-0.223]

10 They send direct projections to the spinal cord and to the cuneate nucleus (CN), which constitutes an important synaptic relay of the ascending somatosensory pathway. [sent-22, score-0.626]

11 The CN projects to several areas of the central nervous system (CNS), including the cerebellum and the thalamic ventrolateral posterior nucleus, which in turn projects to the primary somatosensory cortex. [sent-23, score-0.207]

12 Even under the most favorable conditions, discrimination based on firing rates takes on average 15 to 20 ms longer than discrimination based on first spike latency [9, 10]. [sent-27, score-1.183]

13 Estimates of how early the sequence in which afferents are recruited conveys information needed for the discrimination of contact parameters indicate that, among mechanoreceptors, the FA-I population provides the fastest reliable discrimination of both surface curvature and force direction. [sent-28, score-0.981]

14 Reliable discrimination can take place after as few as some five FA-I afferents are recruited, which can occur a few milliseconds after the first impulse in the population response [10]. [sent-29, score-0.596]

15 In particular, the high information content in the timing of the first spikes in ensembles of central neurons has been emphasized in several sensory modalities, including the auditory [3, 16], visual [4, 6], and somatosensory [17] systems. [sent-31, score-0.498]

16 If relative spike timing is fundamental for rapid encoding and transfer of tactile events in manipulation, then how do neurons read out information carried by a temporal code? [sent-32, score-0.829]

17 Various decoding schemes have been proposed to discriminate between different spatiotemporal sequences of incoming spike patterns [8, 13, 1, 7]. [sent-33, score-0.384]

18 Here, we investigate an encoding/decoding mechanism accounting for the relative spike timing of signals propagating from primary tactile afferents to 2nd order neurons in the CN (Fig. [sent-34, score-0.972]

19 The population coding properties of a model CN network are studied by employing as peripheral signals the responses of real mechanoreceptors obtained via microneurography recordings in humans. [sent-36, score-0.716]

20 We focus on the first spike of each mechanoreceptor, according to the hypothesis that the variability in the first-spike latency domain with respect to stimulus feature (e. [sent-37, score-0.446]

21 Thus, each tactile stimulus consists of a single volley of spikes (black and gray waves in Fig. [sent-40, score-0.454]

22 1) forming a spatiotemporal response pattern defined by the first-spike latencies across the afferent population (Fig. [sent-41, score-0.257]

23 1 Methods Human microneurography data In order to investigate fast encoding/decoding mechanisms of haptic signals, we concentrate on the responses of FA-I mechanoreceptors only [9]. [sent-44, score-0.504]

24 In total, we consider the responses of 42 FA-I mechanoreceptors to 81 distinct stimuli. [sent-46, score-0.26]

25 The propagation velocity distribution across the set of primary projections onto 2nd order CN neurons is considered by fitting experimental observations [11, 21] (see Fig. [sent-47, score-0.18]

26 2 Cuneate nucleus model and synaptic plasticity rule Single unit discharges at the CN level are modeled according to the spike-response model (SRM) [5] (see Supporting Material Sec. [sent-52, score-0.363]

27 2A shows a sample firing pattern that illustrates the spike timing reliability property [14] of the model CN neuron. [sent-58, score-0.404]

28 In order to test the hypothesis of a purely feed-forward information transfer at the CN level, no collateral projections between CN neurons are considered in the current version of the model. [sent-66, score-0.174]

29 This learning rule optimizes the information transmission property of a single SRM neuron, accounts for coincidence detection across multiple afferents and provides a biologically-plausible principle that generalizes the Bienenstock-Cooper-Munro (BCM) rule [2] for spiking neurons. [sent-69, score-0.398]

30 In order to focus on the first spike latencies of the mechanoreceptor signals, we adapt the learning rule developed by Toyoizumi et al. [sent-70, score-0.446]

31 3 Metrical information transfer measure An information-theoretical approach is employed to assess the efficiency of the haptic discrimination process. [sent-78, score-0.458]

32 Yet, none of these techniques allows the information transmission to be assessed by taking into full account the metrics of the spike response space. [sent-82, score-0.393]

33 Furthermore, a decoding scheme accounting for precise temporal discrimination while maintaining the combinatorial properties of the output space within suitable boundaries – even in the presence of hundreds of CN spike trains – is needed. [sent-83, score-0.717]

34 Responses are shown as a raster plot of spike times during 25 trials (center), and as the corresponding PSTH (top). [sent-85, score-0.292]

35 The optimal discrimination condition is met after about 110 ms, when the distribution of intra- and inter-stimulus distances (right plot) stop overlapping. [sent-88, score-0.454]

36 The following definition of entropy is taken: 1 log |R| r ∈R < r|r > |R| (1) where R is the set of responses elicited by all the stimuli, |R| is the cardinal of R, and < r|r > is a similarity measure between any two responses r and r . [sent-94, score-0.4]

37 The similarity measure < r|r > depends on Victor-Purpura (VP) spike train metrics [23] (see below). [sent-95, score-0.292]

38 The conditional entropy is then taken as: H ∗ (R|S) = p(s)H ∗ (R|s) = − s∈S p(s) s∈S r∈Rs 1 log |Rs | r ∈Rs < r|r > |Rs | (2) where Rs is the set of responses elicited by the stimulus s. [sent-104, score-0.398]

39 Finally, the metrical information measure is given by: I ∗ (R; S) = H ∗ (R) − H ∗ (R|S) (3) The similarity measure < r|r > is defined as a function of the VP distance DV P (r, r ) between two population responses r and r . [sent-105, score-0.391]

40 The distance DV P (r, r ) depends on the VP cost parameter CV P [23], which determines the time scale of the analysis by regulating the influence of spike timing vs. [sent-106, score-0.464]

41 spike count when calculating the distance between r and r . [sent-107, score-0.352]

42 In the case of spike train neurotransmission, the relationship between intra- and inter-stimulus distance distributions tends to evolve over time, as the input spike wave across multiple afferents flows in. [sent-117, score-0.788]

43 The critical parameter Dcritic can then be taken as the distance at which the maximum intra-stimulus distance becomes smaller than the minimum inter-stimulus distance (dashed line in Fig. [sent-121, score-0.225]

44 the time at which the two distributions stop overlapping) indicates when the perfect discrimination condition is reached (i. [sent-125, score-0.423]

45 To summarize, perfect discrimination calls upon the following rule: • if all intra-stimulus distances are smaller than the critical distance Dcritic , then all the responses elicited by any stimulus are considered identical. [sent-128, score-0.884]

46 • if all inter-stimulus distances are greater than Dcritic , then two responses elicited by two different stimuli are always discriminated. [sent-130, score-0.342]

47 ∗ We define the optimum VP cost CV P as the one that leads to earliest perfect discrimination (in the example of Fig. [sent-133, score-0.39]

48 1 Results Decoding of spiking haptic signals upstream from the cuneate nucleus First, we validate the information theoretical analysis described above to decode a limited set of microneurography data upstream from the CN network [18]. [sent-137, score-0.772]

49 Each of the 5 stimuli is presented 100 times, and the VP distances DV P are computed across the population of 42 mechanoreceptor afferents. [sent-139, score-0.357]

50 3A shows that the critical distance Dcritic = 8 can be set 72 ms after the stimulus onset. [sent-141, score-0.45]

51 3B, that ensures that the perfect discrimination condition is met within 30 ms of the first mechanoreceptor discharge. [sent-143, score-0.802]

52 3C displays two samples of distance matrices indicating how the input spike waves across the 42 mechanoreceptor afferents are clustered by the decoding system over time. [sent-145, score-0.701]

53 Before the occurrence of the perfect discrimination condition (left matrix) different stimuli can have relatively small distances (e. [sent-146, score-0.565]

54 After 72 ms (right matrix), all the initially overlapping contexts become pulled apart, which removes all interferences across inputs and leads to a 100% accuracy in the discrimination process. [sent-149, score-0.573]

55 2 Optimal haptic context separation downstream from the cuneate nucleus Second, the entire set of microneurography recordings (81 stimuli) is employed to analyze the information transmission properties of a network of 50 CN neurons in the presence of synaptic plasticity 5 A B 2. [sent-151, score-1.007]

56 The perfect discrimination condition is met 72 ms after the stimulus onset and 30 ms after the arrival of the first spike. [sent-157, score-1.116]

57 To compute I ∗ (R; S), the VP distances DV P (r, r ) between any two CN population responses r, r are considered. [sent-165, score-0.294]

58 Again, the distance Dcritic is used to identify the perfect discrimination condition, and the VP cost parame∗ ter CV P = 0. [sent-166, score-0.45]

59 4A shows that the CN population achieves optimal context separation within 35 ms of the arrival of the first afferent spikes. [sent-169, score-0.493]

60 4A, corresponds to the situation in which a readout system downstream from the CN would need a complete separation of haptic percepts (e. [sent-171, score-0.242]

61 to the extent of very rapid, though less precise, reactions) can further speed up the discrimination process. [sent-176, score-0.31]

62 4B indicates that setting Dcritic to a suboptimal value would lead to a partial discrimination condition in which 80% of the maximum I ∗ (R; S) (with non-zero H ∗ (R|S)) can be achieved within 15 ms of the arrival of the first pre-synaptic spike. [sent-178, score-0.622]

63 4C-D illustrate the distributions of intra- and inter-stimulus distances 100 ms after stimulus onset before and after learning. [sent-180, score-0.453]

64 The average uncertainty on the timing of a single spike can be expressed by ∆t = max DV P / CV P n. [sent-186, score-0.404]

65 This shows that the plasticity rule helped to reduce the jitter on CN spikes, thus reducing the metrical conditional entropy compared to the pre-learning condition. [sent-190, score-0.296]

66 After learning, the synaptic 6 A B 7 7 I* = 100% H*(R|S) I*(R;S) 6 5 Information (bits) Information (bits) 6 4 3 2 H*(R|S) 5 I* = 80% 4 3 2 1 1 H* = 0 0 40 50 first input spike time x 10 0 0 80 90 0 40 100 time (ms) ~35 ms 4 D Count 4 70 2. [sent-193, score-0.614]

67 2 50 first input spike time x 10 5 DVP 100 0 60 70 DVP 250 90 100 time (ms) 6 0 0 80 ~15 ms E Count C 60 H* > 0 log10 (# synapses) I*(R;S) 0 1 synaptic weight Figure 4: Information I ∗ (R; S) and conditional entropy H ∗ (R|S) over time. [sent-194, score-0.698]

68 The 81 tactile stimuli are presented 100 times each. [sent-196, score-0.277]

69 (A) Optimal discrimination is reached 35 ms after the first afferent spike. [sent-197, score-0.618]

70 (B) If the perfect discrimination constraint is relaxed by reducing the critical distance, then the system can perform partial discrimination –i. [sent-198, score-0.745]

71 e 80% of maximum I ∗ (R; S) and non-zero H ∗ (R|S)– already within 15 ms of the first spike time. [sent-199, score-0.523]

72 (C-D) Distributions of intra- and inter-stimulus distances (computed 100 ms after stimulus onset) before and after training, respectively. [sent-200, score-0.42]

73 In this example, a network of 10000 cuneate neurons has been trained. [sent-202, score-0.273]

74 3 How does the size of the cuneate nucleus network influence discrimination? [sent-212, score-0.3]

75 both very rapid and reliable) discrimination of the 81 microneurography spike trains. [sent-216, score-0.771]

76 5, the perfect discrimination condition cannot be met with a population of less than 50 CN neurons. [sent-218, score-0.562]

77 This result corroborates the hypothesis that a spatiotemporal population code is a necessary condition for performing effective context separation of complex spiking signals [3, 6]. [sent-219, score-0.34]

78 By increasing the number of neurons, the discrimination becomes faster and saturates at 72 ms (which corresponds to the time at which the first spike from the slowest volley of pulses arrives at the CN). [sent-220, score-0.865]

79 It is also shown that the number of spikes emitted on average by CN cells under the optimal discrimination condition decreases from 2. [sent-221, score-0.402]

80 3 with the size of the CN population, supporting the idea that one spike per neuron is enough to convey a significant amount of information. [sent-223, score-0.418]

81 3 72 70 50 200 500 1000 2000 CN population size Figure 5: Time necessary to perfectly discriminate the entire set of 81 stimuli as a function of the size of the CN population. [sent-226, score-0.17]

82 The numbers of spikes emitted on average by each CN neuron when optimal discrimination occurs are also indicated in the diagram. [sent-228, score-0.43]

83 4 Discussion This study focuses on how a population of 2nd order somatosensory neurons in the cuneate nucleus (CN) can encode incoming spike trains –obtained via microneurography recordings in humans– by separating them in an abstract metrical space. [sent-229, score-1.262]

84 The main contribution is the prediction concerning a significant role of the CN in conveying optimal contextual accounts of peripheral tactile inputs to downstream structures of the CNS. [sent-230, score-0.346]

85 It is shown that an encoding/decoding mechanism based on relative spike timing can account for rapid and reliable transmission of tactile information at the level of the CN. [sent-231, score-0.75]

86 In addition, it is emphasized that the variability of the CN conditioned responses to tactile stimuli constitutes a fundamental measure when examining neurotransmission at this stage of the ascending somatosensory pathway. [sent-232, score-0.631]

87 More generally, the number of responses elicited by a stimulus is a critical issue when information has to be transferred through multiple synaptic relays. [sent-233, score-0.45]

88 If a single stimulus can possibly elicit millions of different responses on a neural layer, how can this plethora of data be effectively decoded by downstream networks? [sent-234, score-0.294]

89 Thus, neural information processing requires encoding mechanisms capable of producing as few responses as possible to a given stimulus while keeping these responses different between stimuli. [sent-235, score-0.346]

90 A corollary contribution of this work consists in putting forth a novel definition of entropy, H ∗ (R), to assess neurotransmission in the presence of large spike train spaces and with high temporal precision. [sent-236, score-0.424]

91 Rather, the evaluation of the clustering process is embedded in the entropy measure and, when the condition of optimal discrimination is reached, the existence of well-defined clusters is ensured. [sent-243, score-0.427]

92 Rapid neural coding in the retina with relative spike latencies. [sent-281, score-0.341]

93 Pattern recognition computation using action potential timing for stimulus representation. [sent-290, score-0.226]

94 First spikes in ensembles of human tactile afferents code complex spatial fingertip events. [sent-296, score-0.453]

95 Coding and use of tactile signals from the fingertips in object manipulation tasks. [sent-303, score-0.273]

96 Cortical and subcortical contributions to activity-dependent plasticity in primate somatosensory cortex. [sent-313, score-0.199]

97 Encoding stimulus information by spike numbers and mean response time in primary auditory cortex. [sent-346, score-0.516]

98 The role of spike timing in the coding of stimulus location in rat somatosensory cortex. [sent-356, score-0.708]

99 Information about complex fingertip parameters in individual human tactile afferent neurons. [sent-364, score-0.287]

100 Generalized Bienenstock-CooperMunro rule for spiking neurons that maximizes information transmission. [sent-377, score-0.224]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('cn', 0.393), ('discrimination', 0.31), ('spike', 0.292), ('ms', 0.231), ('dcritic', 0.224), ('tactile', 0.21), ('cuneate', 0.16), ('vp', 0.152), ('afferents', 0.144), ('mechanoreceptors', 0.144), ('somatosensory', 0.141), ('nucleus', 0.14), ('microneurography', 0.128), ('responses', 0.116), ('haptic', 0.116), ('stimulus', 0.114), ('neurons', 0.113), ('timing', 0.112), ('mechanoreceptor', 0.112), ('metrical', 0.112), ('population', 0.103), ('neurosci', 0.097), ('cv', 0.094), ('dv', 0.093), ('synaptic', 0.091), ('elicited', 0.084), ('entropy', 0.084), ('ngertip', 0.08), ('perfect', 0.08), ('afferent', 0.077), ('distances', 0.075), ('peripheral', 0.072), ('spiking', 0.069), ('stimuli', 0.067), ('downstream', 0.064), ('johansson', 0.064), ('neurotransmission', 0.064), ('srm', 0.064), ('signals', 0.063), ('transmission', 0.062), ('neuron', 0.061), ('distance', 0.06), ('spikes', 0.059), ('plasticity', 0.058), ('shannon', 0.055), ('decoding', 0.054), ('coding', 0.049), ('rs', 0.048), ('force', 0.048), ('arrival', 0.048), ('ngertips', 0.048), ('skin', 0.048), ('univ', 0.048), ('upstream', 0.048), ('critical', 0.045), ('rule', 0.042), ('conduction', 0.042), ('nerve', 0.042), ('rapid', 0.041), ('recordings', 0.041), ('ensembles', 0.04), ('latency', 0.04), ('response', 0.039), ('forth', 0.039), ('coincidence', 0.039), ('stdp', 0.039), ('waves', 0.039), ('primary', 0.038), ('spatiotemporal', 0.038), ('met', 0.036), ('critic', 0.036), ('pathway', 0.036), ('paris', 0.035), ('separation', 0.034), ('supporting', 0.034), ('reliable', 0.033), ('condition', 0.033), ('onset', 0.033), ('ascending', 0.033), ('auditory', 0.033), ('contact', 0.033), ('trains', 0.032), ('transfer', 0.032), ('discharges', 0.032), ('dvp', 0.032), ('interferences', 0.032), ('psth', 0.032), ('relay', 0.032), ('ulnar', 0.032), ('umea', 0.032), ('upmc', 0.032), ('volley', 0.032), ('convey', 0.031), ('temporal', 0.029), ('synapses', 0.029), ('projections', 0.029), ('neurobiol', 0.028), ('cerebellum', 0.028), ('percepts', 0.028), ('forces', 0.027)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000008 183 nips-2009-Optimal context separation of spiking haptic signals by second-order somatosensory neurons

Author: Romain Brasselet, Roland Johansson, Angelo Arleo

Abstract: We study an encoding/decoding mechanism accounting for the relative spike timing of the signals propagating from peripheral nerve fibers to second-order somatosensory neurons in the cuneate nucleus (CN). The CN is modeled as a population of spiking neurons receiving as inputs the spatiotemporal responses of real mechanoreceptors obtained via microneurography recordings in humans. The efficiency of the haptic discrimination process is quantified by a novel definition of entropy that takes into full account the metrical properties of the spike train space. This measure proves to be a suitable decoding scheme for generalizing the classical Shannon entropy to spike-based neural codes. It permits an assessment of neurotransmission in the presence of a large output space (i.e. hundreds of spike trains) with 1 ms temporal precision. It is shown that the CN population code performs a complete discrimination of 81 distinct stimuli already within 35 ms of the first afferent spike, whereas a partial discrimination (80% of the maximum information transmission) is possible as rapidly as 15 ms. This study suggests that the CN may not constitute a mere synaptic relay along the somatosensory pathway but, rather, it may convey optimal contextual accounts (in terms of fast and reliable information transfer) of peripheral tactile inputs to downstream structures of the central nervous system. 1

2 0.28653556 52 nips-2009-Code-specific policy gradient rules for spiking neurons

Author: Henning Sprekeler, Guillaume Hennequin, Wulfram Gerstner

Abstract: Although it is widely believed that reinforcement learning is a suitable tool for describing behavioral learning, the mechanisms by which it can be implemented in networks of spiking neurons are not fully understood. Here, we show that different learning rules emerge from a policy gradient approach depending on which features of the spike trains are assumed to influence the reward signals, i.e., depending on which neural code is in effect. We use the framework of Williams (1992) to derive learning rules for arbitrary neural codes. For illustration, we present policy-gradient rules for three different example codes - a spike count code, a spike timing code and the most general “full spike train” code - and test them on simple model problems. In addition to classical synaptic learning, we derive learning rules for intrinsic parameters that control the excitability of the neuron. The spike count learning rule has structural similarities with established Bienenstock-Cooper-Munro rules. If the distribution of the relevant spike train features belongs to the natural exponential family, the learning rules have a characteristic shape that raises interesting prediction problems. 1

3 0.20928992 19 nips-2009-A joint maximum-entropy model for binary neural population patterns and continuous signals

Author: Sebastian Gerwinn, Philipp Berens, Matthias Bethge

Abstract: Second-order maximum-entropy models have recently gained much interest for describing the statistics of binary spike trains. Here, we extend this approach to take continuous stimuli into account as well. By constraining the joint secondorder statistics, we obtain a joint Gaussian-Boltzmann distribution of continuous stimuli and binary neural firing patterns, for which we also compute marginal and conditional distributions. This model has the same computational complexity as pure binary models and fitting it to data is a convex problem. We show that the model can be seen as an extension to the classical spike-triggered average/covariance analysis and can be used as a non-linear method for extracting features which a neural population is sensitive to. Further, by calculating the posterior distribution of stimuli given an observed neural response, the model can be used to decode stimuli and yields a natural spike-train metric. Therefore, extending the framework of maximum-entropy models to continuous variables allows us to gain novel insights into the relationship between the firing patterns of neural ensembles and the stimuli they are processing. 1

4 0.19194488 163 nips-2009-Neurometric function analysis of population codes

Author: Philipp Berens, Sebastian Gerwinn, Alexander Ecker, Matthias Bethge

Abstract: The relative merits of different population coding schemes have mostly been analyzed in the framework of stimulus reconstruction using Fisher Information. Here, we consider the case of stimulus discrimination in a two alternative forced choice paradigm and compute neurometric functions in terms of the minimal discrimination error and the Jensen-Shannon information to study neural population codes. We first explore the relationship between minimum discrimination error, JensenShannon Information and Fisher Information and show that the discrimination framework is more informative about the coding accuracy than Fisher Information as it defines an error for any pair of possible stimuli. In particular, it includes Fisher Information as a special case. Second, we use the framework to study population codes of angular variables. Specifically, we assess the impact of different noise correlations structures on coding accuracy in long versus short decoding time windows. That is, for long time window we use the common Gaussian noise approximation. To address the case of short time windows we analyze the Ising model with identical noise correlation structure. In this way, we provide a new rigorous framework for assessing the functional consequences of noise correlation structures for the representational accuracy of neural population codes that is in particular applicable to short-time population coding. 1

5 0.17290959 162 nips-2009-Neural Implementation of Hierarchical Bayesian Inference by Importance Sampling

Author: Lei Shi, Thomas L. Griffiths

Abstract: The goal of perception is to infer the hidden states in the hierarchical process by which sensory data are generated. Human behavior is consistent with the optimal statistical solution to this problem in many tasks, including cue combination and orientation detection. Understanding the neural mechanisms underlying this behavior is of particular importance, since probabilistic computations are notoriously challenging. Here we propose a simple mechanism for Bayesian inference which involves averaging over a few feature detection neurons which fire at a rate determined by their similarity to a sensory stimulus. This mechanism is based on a Monte Carlo method known as importance sampling, commonly used in computer science and statistics. Moreover, a simple extension to recursive importance sampling can be used to perform hierarchical Bayesian inference. We identify a scheme for implementing importance sampling with spiking neurons, and show that this scheme can account for human behavior in cue combination and the oblique effect. 1

6 0.15258661 200 nips-2009-Reconstruction of Sparse Circuits Using Multi-neuronal Excitation (RESCUME)

7 0.13995744 255 nips-2009-Variational Inference for the Nested Chinese Restaurant Process

8 0.1355672 62 nips-2009-Correlation Coefficients are Insufficient for Analyzing Spike Count Dependencies

9 0.13010165 121 nips-2009-Know Thy Neighbour: A Normative Theory of Synaptic Depression

10 0.12199868 247 nips-2009-Time-rescaling methods for the estimation and assessment of non-Poisson neural encoding models

11 0.12011826 165 nips-2009-Noise Characterization, Modeling, and Reduction for In Vivo Neural Recording

12 0.11605497 210 nips-2009-STDP enables spiking neurons to detect hidden causes of their inputs

13 0.10522022 99 nips-2009-Functional network reorganization in motor cortex can be explained by reward-modulated Hebbian learning

14 0.094958022 164 nips-2009-No evidence for active sparsification in the visual cortex

15 0.083100289 203 nips-2009-Replacing supervised classification learning by Slow Feature Analysis in spiking neural networks

16 0.08214698 257 nips-2009-White Functionals for Anomaly Detection in Dynamical Systems

17 0.066572368 109 nips-2009-Hierarchical Learning of Dimensional Biases in Human Categorization

18 0.058362987 88 nips-2009-Extending Phase Mechanism to Differential Motion Opponency for Motion Pop-out

19 0.056617133 6 nips-2009-A Biologically Plausible Model for Rapid Natural Scene Identification

20 0.056557965 166 nips-2009-Noisy Generalized Binary Search


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.148), (1, -0.217), (2, 0.306), (3, 0.178), (4, 0.076), (5, -0.088), (6, -0.146), (7, 0.048), (8, 0.042), (9, 0.049), (10, -0.023), (11, -0.025), (12, 0.102), (13, -0.057), (14, -0.009), (15, -0.042), (16, 0.034), (17, -0.019), (18, 0.073), (19, 0.006), (20, 0.026), (21, 0.002), (22, -0.03), (23, 0.037), (24, -0.061), (25, -0.08), (26, 0.059), (27, 0.01), (28, 0.059), (29, 0.079), (30, 0.099), (31, 0.095), (32, -0.036), (33, 0.063), (34, 0.028), (35, 0.071), (36, 0.004), (37, 0.011), (38, -0.041), (39, 0.022), (40, 0.002), (41, -0.029), (42, 0.018), (43, -0.039), (44, 0.043), (45, -0.039), (46, -0.131), (47, -0.088), (48, 0.037), (49, 0.071)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97594386 183 nips-2009-Optimal context separation of spiking haptic signals by second-order somatosensory neurons

Author: Romain Brasselet, Roland Johansson, Angelo Arleo

Abstract: We study an encoding/decoding mechanism accounting for the relative spike timing of the signals propagating from peripheral nerve fibers to second-order somatosensory neurons in the cuneate nucleus (CN). The CN is modeled as a population of spiking neurons receiving as inputs the spatiotemporal responses of real mechanoreceptors obtained via microneurography recordings in humans. The efficiency of the haptic discrimination process is quantified by a novel definition of entropy that takes into full account the metrical properties of the spike train space. This measure proves to be a suitable decoding scheme for generalizing the classical Shannon entropy to spike-based neural codes. It permits an assessment of neurotransmission in the presence of a large output space (i.e. hundreds of spike trains) with 1 ms temporal precision. It is shown that the CN population code performs a complete discrimination of 81 distinct stimuli already within 35 ms of the first afferent spike, whereas a partial discrimination (80% of the maximum information transmission) is possible as rapidly as 15 ms. This study suggests that the CN may not constitute a mere synaptic relay along the somatosensory pathway but, rather, it may convey optimal contextual accounts (in terms of fast and reliable information transfer) of peripheral tactile inputs to downstream structures of the central nervous system. 1

2 0.78361589 19 nips-2009-A joint maximum-entropy model for binary neural population patterns and continuous signals

Author: Sebastian Gerwinn, Philipp Berens, Matthias Bethge

Abstract: Second-order maximum-entropy models have recently gained much interest for describing the statistics of binary spike trains. Here, we extend this approach to take continuous stimuli into account as well. By constraining the joint secondorder statistics, we obtain a joint Gaussian-Boltzmann distribution of continuous stimuli and binary neural firing patterns, for which we also compute marginal and conditional distributions. This model has the same computational complexity as pure binary models and fitting it to data is a convex problem. We show that the model can be seen as an extension to the classical spike-triggered average/covariance analysis and can be used as a non-linear method for extracting features which a neural population is sensitive to. Further, by calculating the posterior distribution of stimuli given an observed neural response, the model can be used to decode stimuli and yields a natural spike-train metric. Therefore, extending the framework of maximum-entropy models to continuous variables allows us to gain novel insights into the relationship between the firing patterns of neural ensembles and the stimuli they are processing. 1

3 0.78025526 52 nips-2009-Code-specific policy gradient rules for spiking neurons

Author: Henning Sprekeler, Guillaume Hennequin, Wulfram Gerstner

Abstract: Although it is widely believed that reinforcement learning is a suitable tool for describing behavioral learning, the mechanisms by which it can be implemented in networks of spiking neurons are not fully understood. Here, we show that different learning rules emerge from a policy gradient approach depending on which features of the spike trains are assumed to influence the reward signals, i.e., depending on which neural code is in effect. We use the framework of Williams (1992) to derive learning rules for arbitrary neural codes. For illustration, we present policy-gradient rules for three different example codes - a spike count code, a spike timing code and the most general “full spike train” code - and test them on simple model problems. In addition to classical synaptic learning, we derive learning rules for intrinsic parameters that control the excitability of the neuron. The spike count learning rule has structural similarities with established Bienenstock-Cooper-Munro rules. If the distribution of the relevant spike train features belongs to the natural exponential family, the learning rules have a characteristic shape that raises interesting prediction problems. 1

4 0.71018159 62 nips-2009-Correlation Coefficients are Insufficient for Analyzing Spike Count Dependencies

Author: Arno Onken, Steffen Grünewälder, Klaus Obermayer

Abstract: The linear correlation coefficient is typically used to characterize and analyze dependencies of neural spike counts. Here, we show that the correlation coefficient is in general insufficient to characterize these dependencies. We construct two neuron spike count models with Poisson-like marginals and vary their dependence structure using copulas. To this end, we construct a copula that allows to keep the spike counts uncorrelated while varying their dependence strength. Moreover, we employ a network of leaky integrate-and-fire neurons to investigate whether weakly correlated spike counts with strong dependencies are likely to occur in real networks. We find that the entropy of uncorrelated but dependent spike count distributions can deviate from the corresponding distribution with independent components by more than 25 % and that weakly correlated but strongly dependent spike counts are very likely to occur in biological networks. Finally, we introduce a test for deciding whether the dependence structure of distributions with Poissonlike marginals is well characterized by the linear correlation coefficient and verify it for different copula-based models. 1

5 0.70208853 163 nips-2009-Neurometric function analysis of population codes

Author: Philipp Berens, Sebastian Gerwinn, Alexander Ecker, Matthias Bethge

Abstract: The relative merits of different population coding schemes have mostly been analyzed in the framework of stimulus reconstruction using Fisher Information. Here, we consider the case of stimulus discrimination in a two alternative forced choice paradigm and compute neurometric functions in terms of the minimal discrimination error and the Jensen-Shannon information to study neural population codes. We first explore the relationship between minimum discrimination error, JensenShannon Information and Fisher Information and show that the discrimination framework is more informative about the coding accuracy than Fisher Information as it defines an error for any pair of possible stimuli. In particular, it includes Fisher Information as a special case. Second, we use the framework to study population codes of angular variables. Specifically, we assess the impact of different noise correlations structures on coding accuracy in long versus short decoding time windows. That is, for long time window we use the common Gaussian noise approximation. To address the case of short time windows we analyze the Ising model with identical noise correlation structure. In this way, we provide a new rigorous framework for assessing the functional consequences of noise correlation structures for the representational accuracy of neural population codes that is in particular applicable to short-time population coding. 1

6 0.63888341 247 nips-2009-Time-rescaling methods for the estimation and assessment of non-Poisson neural encoding models

7 0.55728066 121 nips-2009-Know Thy Neighbour: A Normative Theory of Synaptic Depression

8 0.53839296 200 nips-2009-Reconstruction of Sparse Circuits Using Multi-neuronal Excitation (RESCUME)

9 0.53211117 162 nips-2009-Neural Implementation of Hierarchical Bayesian Inference by Importance Sampling

10 0.52967566 165 nips-2009-Noise Characterization, Modeling, and Reduction for In Vivo Neural Recording

11 0.49017903 99 nips-2009-Functional network reorganization in motor cortex can be explained by reward-modulated Hebbian learning

12 0.48612437 164 nips-2009-No evidence for active sparsification in the visual cortex

13 0.44308284 210 nips-2009-STDP enables spiking neurons to detect hidden causes of their inputs

14 0.43614087 203 nips-2009-Replacing supervised classification learning by Slow Feature Analysis in spiking neural networks

15 0.36087847 216 nips-2009-Sequential effects reflect parallel learning of multiple environmental regularities

16 0.29086649 257 nips-2009-White Functionals for Anomaly Detection in Dynamical Systems

17 0.28499505 13 nips-2009-A Neural Implementation of the Kalman Filter

18 0.28299236 43 nips-2009-Bayesian estimation of orientation preference maps

19 0.28262013 255 nips-2009-Variational Inference for the Nested Chinese Restaurant Process

20 0.28118533 231 nips-2009-Statistical Models of Linear and Nonlinear Contextual Interactions in Early Visual Processing


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(7, 0.018), (24, 0.039), (25, 0.054), (31, 0.394), (35, 0.033), (36, 0.042), (39, 0.047), (58, 0.058), (61, 0.011), (68, 0.011), (71, 0.038), (81, 0.05), (86, 0.065), (91, 0.059)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.82239413 183 nips-2009-Optimal context separation of spiking haptic signals by second-order somatosensory neurons

Author: Romain Brasselet, Roland Johansson, Angelo Arleo

Abstract: We study an encoding/decoding mechanism accounting for the relative spike timing of the signals propagating from peripheral nerve fibers to second-order somatosensory neurons in the cuneate nucleus (CN). The CN is modeled as a population of spiking neurons receiving as inputs the spatiotemporal responses of real mechanoreceptors obtained via microneurography recordings in humans. The efficiency of the haptic discrimination process is quantified by a novel definition of entropy that takes into full account the metrical properties of the spike train space. This measure proves to be a suitable decoding scheme for generalizing the classical Shannon entropy to spike-based neural codes. It permits an assessment of neurotransmission in the presence of a large output space (i.e. hundreds of spike trains) with 1 ms temporal precision. It is shown that the CN population code performs a complete discrimination of 81 distinct stimuli already within 35 ms of the first afferent spike, whereas a partial discrimination (80% of the maximum information transmission) is possible as rapidly as 15 ms. This study suggests that the CN may not constitute a mere synaptic relay along the somatosensory pathway but, rather, it may convey optimal contextual accounts (in terms of fast and reliable information transfer) of peripheral tactile inputs to downstream structures of the central nervous system. 1

2 0.7414571 261 nips-2009-fMRI-Based Inter-Subject Cortical Alignment Using Functional Connectivity

Author: Bryan Conroy, Ben Singer, James Haxby, Peter J. Ramadge

Abstract: The inter-subject alignment of functional MRI (fMRI) data is important for improving the statistical power of fMRI group analyses. In contrast to existing anatomically-based methods, we propose a novel multi-subject algorithm that derives a functional correspondence by aligning spatial patterns of functional connectivity across a set of subjects. We test our method on fMRI data collected during a movie viewing experiment. By cross-validating the results of our algorithm, we show that the correspondence successfully generalizes to a secondary movie dataset not used to derive the alignment. 1

3 0.61626935 256 nips-2009-Which graphical models are difficult to learn?

Author: Andrea Montanari, Jose A. Pereira

Abstract: We consider the problem of learning the structure of Ising models (pairwise binary Markov random fields) from i.i.d. samples. While several methods have been proposed to accomplish this task, their relative merits and limitations remain somewhat obscure. By analyzing a number of concrete examples, we show that low-complexity algorithms systematically fail when the Markov random field develops long-range correlations. More precisely, this phenomenon appears to be related to the Ising model phase transition (although it does not coincide with it). 1 Introduction and main results Given a graph G = (V = [p], E), and a positive parameter θ > 0 the ferromagnetic Ising model on G is the pairwise Markov random field µG,θ (x) = 1 ZG,θ eθxi xj (1) (i,j)∈E over binary variables x = (x1 , x2 , . . . , xp ). Apart from being one of the most studied models in statistical mechanics, the Ising model is a prototypical undirected graphical model, with applications in computer vision, clustering and spatial statistics. Its obvious generalization to edge-dependent parameters θij , (i, j) ∈ E is of interest as well, and will be introduced in Section 1.2.2. (Let us stress that we follow the statistical mechanics convention of calling (1) an Ising model for any graph G.) In this paper we study the following structural learning problem: Given n i.i.d. samples x(1) , x(2) ,. . . , x(n) with distribution µG,θ ( · ), reconstruct the graph G. For the sake of simplicity, we assume that the parameter θ is known, and that G has no double edges (it is a ‘simple’ graph). The graph learning problem is solvable with unbounded sample complexity, and computational resources [1]. The question we address is: for which classes of graphs and values of the parameter θ is the problem solvable under appropriate complexity constraints? More precisely, given an algorithm Alg, a graph G, a value θ of the model parameter, and a small δ > 0, the sample complexity is defined as nAlg (G, θ) ≡ inf n ∈ N : Pn,G,θ {Alg(x(1) , . . . , x(n) ) = G} ≥ 1 − δ , (2) where Pn,G,θ denotes probability with respect to n i.i.d. samples with distribution µG,θ . Further, we let χAlg (G, θ) denote the number of operations of the algorithm Alg, when run on nAlg (G, θ) samples.1 1 For the algorithms analyzed in this paper, the behavior of nAlg and χAlg does not change significantly if we require only ‘approximate’ reconstruction (e.g. in graph distance). 1 The general problem is therefore to characterize the functions nAlg (G, θ) and χAlg (G, θ), in particular for an optimal choice of the algorithm. General bounds on nAlg (G, θ) have been given in [2, 3], under the assumption of unbounded computational resources. A general charactrization of how well low complexity algorithms can perform is therefore lacking. Although we cannot prove such a general characterization, in this paper we estimate nAlg and χAlg for a number of graph models, as a function of θ, and unveil a fascinating universal pattern: when the model (1) develops long range correlations, low-complexity algorithms fail. Under the Ising model, the variables {xi }i∈V become strongly correlated for θ large. For a large class of graphs with degree bounded by ∆, this phenomenon corresponds to a phase transition beyond some critical value of θ uniformly bounded in p, with typically θcrit ≤ const./∆. In the examples discussed below, the failure of low-complexity algorithms appears to be related to this phase transition (although it does not coincide with it). 1.1 A toy example: the thresholding algorithm In order to illustrate the interplay between graph structure, sample complexity and interaction strength θ, it is instructive to consider a warmup example. The thresholding algorithm reconstructs G by thresholding the empirical correlations Cij ≡ 1 n n (ℓ) (ℓ) xi xj for i, j ∈ V . ℓ=1 (3) T HRESHOLDING( samples {x(ℓ) }, threshold τ ) 1: Compute the empirical correlations {Cij }(i,j)∈V ×V ; 2: For each (i, j) ∈ V × V 3: If Cij ≥ τ , set (i, j) ∈ E; We will denote this algorithm by Thr(τ ). Notice that its complexity is dominated by the computation of the empirical correlations, i.e. χThr(τ ) = O(p2 n). The sample complexity nThr(τ ) can be bounded for specific classes of graphs as follows (the proofs are straightforward and omitted from this paper). Theorem 1.1. If G has maximum degree ∆ > 1 and if θ < atanh(1/(2∆)) then there exists τ = τ (θ) such that 2p 8 log nThr(τ ) (G, θ) ≤ . (4) 1 δ (tanh θ − 2∆ )2 Further, the choice τ (θ) = (tanh θ + (1/2∆))/2 achieves this bound. Theorem 1.2. There exists a numerical constant K such that the following is true. If ∆ > 3 and θ > K/∆, there are graphs of bounded degree ∆ such that for any τ , nThr(τ ) = ∞, i.e. the thresholding algorithm always fails with high probability. These results confirm the idea that the failure of low-complexity algorithms is related to long-range correlations in the underlying graphical model. If the graph G is a tree, then correlations between far apart variables xi , xj decay exponentially with the distance between vertices i, j. The same happens on bounded-degree graphs if θ ≤ const./∆. However, for θ > const./∆, there exists families of bounded degree graphs with long-range correlations. 1.2 More sophisticated algorithms In this section we characterize χAlg (G, θ) and nAlg (G, θ) for more advanced algorithms. We again obtain very distinct behaviors of these algorithms depending on long range correlations. Due to space limitations, we focus on two type of algorithms and only outline the proof of our most challenging result, namely Theorem 1.6. In the following we denote by ∂i the neighborhood of a node i ∈ G (i ∈ ∂i), and assume the degree / to be bounded: |∂i| ≤ ∆. 1.2.1 Local Independence Test A recurring approach to structural learning consists in exploiting the conditional independence structure encoded by the graph [1, 4, 5, 6]. 2 Let us consider, to be definite, the approach of [4], specializing it to the model (1). Fix a vertex r, whose neighborhood we want to reconstruct, and consider the conditional distribution of xr given its neighbors2 : µG,θ (xr |x∂r ). Any change of xi , i ∈ ∂r, produces a change in this distribution which is bounded away from 0. Let U be a candidate neighborhood, and assume U ⊆ ∂r. Then changing the value of xj , j ∈ U will produce a noticeable change in the marginal of Xr , even if we condition on the remaining values in U and in any W , |W | ≤ ∆. On the other hand, if U ∂r, then it is possible to find W (with |W | ≤ ∆) and a node i ∈ U such that, changing its value after fixing all other values in U ∪ W will produce no noticeable change in the conditional marginal. (Just choose i ∈ U \∂r and W = ∂r\U ). This procedure allows us to distinguish subsets of ∂r from other sets of vertices, thus motivating the following algorithm. L OCAL I NDEPENDENCE T EST( samples {x(ℓ) }, thresholds (ǫ, γ) ) 1: Select a node r ∈ V ; 2: Set as its neighborhood the largest candidate neighbor U of size at most ∆ for which the score function S CORE(U ) > ǫ/2; 3: Repeat for all nodes r ∈ V ; The score function S CORE( · ) depends on ({x(ℓ) }, ∆, γ) and is defined as follows, min max W,j xi ,xW ,xU ,xj |Pn,G,θ {Xi = xi |X W = xW , X U = xU }− Pn,G,θ {Xi = xi |X W = xW , X U \j = xU \j , Xj = xj }| . (5) In the minimum, |W | ≤ ∆ and j ∈ U . In the maximum, the values must be such that Pn,G,θ {X W = xW , X U = xU } > γ/2, Pn,G,θ {X W = xW , X U \j = xU \j , Xj = xj } > γ/2 Pn,G,θ is the empirical distribution calculated from the samples {x(ℓ) }. We denote this algorithm by Ind(ǫ, γ). The search over candidate neighbors U , the search for minima and maxima in the computation of the S CORE(U ) and the computation of Pn,G,θ all contribute for χInd (G, θ). Both theorems that follow are consequences of the analysis of [4]. Theorem 1.3. Let G be a graph of bounded degree ∆ ≥ 1. For every θ there exists (ǫ, γ), and a numerical constant K, such that 2p 100∆ , χInd(ǫ,γ) (G, θ) ≤ K (2p)2∆+1 log p . nInd(ǫ,γ) (G, θ) ≤ 2 4 log ǫ γ δ More specifically, one can take ǫ = 1 4 sinh(2θ), γ = e−4∆θ 2−2∆ . This first result implies in particular that G can be reconstructed with polynomial complexity for any bounded ∆. However, the degree of such polynomial is pretty high and non-uniform in ∆. This makes the above approach impractical. A way out was proposed in [4]. The idea is to identify a set of ‘potential neighbors’ of vertex r via thresholding: B(r) = {i ∈ V : Cri > κ/2} , (6) For each node r ∈ V , we evaluate S CORE(U ) by restricting the minimum in Eq. (5) over W ⊆ B(r), and search only over U ⊆ B(r). We call this algorithm IndD(ǫ, γ, κ). The basic intuition here is that Cri decreases rapidly with the graph distance between vertices r and i. As mentioned above, this is true at small θ. Theorem 1.4. Let G be a graph of bounded degree ∆ ≥ 1. Assume that θ < K/∆ for some small enough constant K. Then there exists ǫ, γ, κ such that nIndD(ǫ,γ,κ) (G, θ) ≤ 8(κ2 + 8∆ ) log 4p , δ χIndD(ǫ,γ,κ) (G, θ) ≤ K ′ p∆∆ More specifically, we can take κ = tanh θ, ǫ = 1 4 log(4/κ) α + K ′ ∆p2 log p . sinh(2θ) and γ = e−4∆θ 2−2∆ . 2 If a is a vector and R is a set of indices then we denote by aR the vector formed by the components of a with index in R. 3 1.2.2 Regularized Pseudo-Likelihoods A different approach to the learning problem consists in maximizing an appropriate empirical likelihood function [7, 8, 9, 10, 13]. To control the fluctuations caused by the limited number of samples, and select sparse graphs a regularization term is often added [7, 8, 9, 10, 11, 12, 13]. As a specific low complexity implementation of this idea, we consider the ℓ1 -regularized pseudolikelihood method of [7]. For each node r, the following likelihood function is considered L(θ; {x(ℓ) }) = − 1 n n (ℓ) ℓ=1 log Pn,G,θ (x(ℓ) |x\r ) r (7) where x\r = xV \r = {xi : i ∈ V \ r} is the vector of all variables except xr and Pn,G,θ is defined from the following extension of (1), µG,θ (x) = 1 ZG,θ eθij xi xj (8) i,j∈V / where θ = {θij }i,j∈V is a vector of real parameters. Model (1) corresponds to θij = 0, ∀(i, j) ∈ E and θij = θ, ∀(i, j) ∈ E. The function L(θ; {x(ℓ) }) depends only on θr,· = {θrj , j ∈ ∂r} and is used to estimate the neighborhood of each node by the following algorithm, Rlr(λ), R EGULARIZED L OGISTIC R EGRESSION( samples {x(ℓ) }, regularization (λ)) 1: Select a node r ∈ V ; 2: Calculate ˆr,· = arg min {L(θr,· ; {x(ℓ) }) + λ||θr,· ||1 }; θ θ r,· ∈Rp−1 3: ˆ If θrj > 0, set (r, j) ∈ E; Our first result shows that Rlr(λ) indeed reconstructs G if θ is sufficiently small. Theorem 1.5. There exists numerical constants K1 , K2 , K3 , such that the following is true. Let G be a graph with degree bounded by ∆ ≥ 3. If θ ≤ K1 /∆, then there exist λ such that nRlr(λ) (G, θ) ≤ K2 θ−2 ∆ log 8p2 . δ (9) Further, the above holds with λ = K3 θ ∆−1/2 . This theorem is proved by noting that for θ ≤ K1 /∆ correlations decay exponentially, which makes all conditions in Theorem 1 of [7] (denoted there by A1 and A2) hold, and then computing the probability of success as a function of n, while strenghtening the error bounds of [7]. In order to prove a converse to the above result, we need to make some assumptions on λ. Given θ > 0, we say that λ is ‘reasonable for that value of θ if the following conditions old: (i) Rlr(λ) is successful with probability larger than 1/2 on any star graph (a graph composed by a vertex r connected to ∆ neighbors, plus isolated vertices); (ii) λ ≤ δ(n) for some sequence δ(n) ↓ 0. Theorem 1.6. There exists a numerical constant K such that the following happens. If ∆ > 3, θ > K/∆, then there exists graphs G of degree bounded by ∆ such that for all reasonable λ, nRlr(λ) (G) = ∞, i.e. regularized logistic regression fails with high probability. The graphs for which regularized logistic regression fails are not contrived examples. Indeed we will prove that the claim in the last theorem holds with high probability when G is a uniformly random graph of regular degree ∆. The proof Theorem 1.6 is based on showing that an appropriate incoherence condition is necessary for Rlr to successfully reconstruct G. The analogous result was proven in [14] for model selection using the Lasso. In this paper we show that such a condition is also necessary when the underlying model is an Ising model. Notice that, given the graph G, checking the incoherence condition is NP-hard for general (non-ferromagnetic) Ising model, and requires significant computational effort 4 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 20 15 λ0 10 5 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.2 1 0.8 0.6 Psucc 0.4 0.2 1 θ 0 0 0.2 0.4 0.6 θ θcrit 0.8 1 Figure 1: Learning random subgraphs of a 7 × 7 (p = 49) two-dimensional grid from n = 4500 Ising models samples, using regularized logistic regression. Left: success probability as a function of the model parameter θ and of the regularization parameter λ0 (darker corresponds to highest probability). Right: the same data plotted for several choices of λ versus θ. The vertical line corresponds to the model critical temperature. The thick line is an envelope of the curves obtained for different λ, and should correspond to optimal regularization. even in the ferromagnetic case. Hence the incoherence condition does not provide, by itself, a clear picture of which graph structure are difficult to learn. We will instead show how to evaluate it on specific graph families. Under the restriction λ → 0 the solutions given by Rlr converge to θ∗ with n [7]. Thus, for large n we can expand L around θ∗ to second order in (θ − θ∗ ). When we add the regularization term to L we obtain a quadratic model analogous the Lasso plus the error term due to the quadratic approximation. It is thus not surprising that, when λ → 0 the incoherence condition introduced for the Lasso in [14] is also relevant for the Ising model. 2 Numerical experiments In order to explore the practical relevance of the above results, we carried out extensive numerical simulations using the regularized logistic regression algorithm Rlr(λ). Among other learning algorithms, Rlr(λ) strikes a good balance of complexity and performance. Samples from the Ising model (1) where generated using Gibbs sampling (a.k.a. Glauber dynamics). Mixing time can be very large for θ ≥ θcrit , and was estimated using the time required for the overall bias to change sign (this is a quite conservative estimate at low temperature). Generating the samples {x(ℓ) } was indeed the bulk of our computational effort and took about 50 days CPU time on Pentium Dual Core processors (we show here only part of these data). Notice that Rlr(λ) had been tested in [7] only on tree graphs G, or in the weakly coupled regime θ < θcrit . In these cases sampling from the Ising model is easy, but structural learning is also intrinsically easier. Figure reports the success probability of Rlr(λ) when applied to random subgraphs of a 7 × 7 two-dimensional grid. Each such graphs was obtained by removing each edge independently with probability ρ = 0.3. Success probability was estimated by applying Rlr(λ) to each vertex of 8 graphs (thus averaging over 392 runs of Rlr(λ)), using n = 4500 samples. We scaled the regularization parameter as λ = 2λ0 θ(log p/n)1/2 (this choice is motivated by the algorithm analysis and is empirically the most satisfactory), and searched over λ0 . The data clearly illustrate the phenomenon discussed. Despite the large number of samples n ≫ log p, when θ crosses a threshold, the algorithm starts performing poorly irrespective of λ. Intriguingly, this threshold is not far from the critical point of the Ising model on a randomly diluted grid θcrit (ρ = 0.3) ≈ 0.7 [15, 16]. 5 1.2 1.2 θ = 0.35, 0.40 1 1 θ = 0.25 θ = 0.20 0.8 0.8 θ = 0.45 θ = 0.10 0.6 0.6 Psucc Psucc 0.4 0.4 θ = 0.50 0.2 0.2 θthr θ = 0.65, 0.60, 0.55 0 0 0 2000 4000 6000 8000 10000 0 0.1 0.2 n 0.3 0.4 0.5 0.6 0.7 0.8 θ Figure 2: Learning uniformly random graphs of degree 4 from Ising models samples, using Rlr. Left: success probability as a function of the number of samples n for several values of θ. Right: the same data plotted for several choices of λ versus θ as in Fig. 1, right panel. Figure 2 presents similar data when G is a uniformly random graph of degree ∆ = 4, over p = 50 vertices. The evolution of the success probability with n clearly shows a dichotomy. When θ is below a threshold, a small number of samples is sufficient to reconstruct G with high probability. Above the threshold even n = 104 samples are to few. In this case we can predict the threshold analytically, cf. Lemma 3.3 below, and get θthr (∆ = 4) ≈ 0.4203, which compares favorably with the data. 3 Proofs In order to prove Theorem 1.6, we need a few auxiliary results. It is convenient to introduce some notations. If M is a matrix and R, P are index sets then MR P denotes the submatrix with row indices in R and column indices in P . As above, we let r be the vertex whose neighborhood we are trying to reconstruct and define S = ∂r, S c = V \ ∂r ∪ r. Since the cost function L(θ; {x(ℓ) }) + λ||θ||1 only depend on θ through its components θr,· = {θrj }, we will hereafter neglect all the other parameters and write θ as a shorthand of θr,· . Let z ∗ be a subgradient of ||θ||1 evaluated at the true parameters values, θ∗ = {θrj : θij = 0, ∀j ∈ ˆ / ˆn be the parameter estimate returned by Rlr(λ) when the number ∂r, θrj = θ, ∀j ∈ ∂r}. Let θ of samples is n. Note that, since we assumed θ∗ ≥ 0, zS = ½. Define Qn (θ, ; {x(ℓ) }) to be the ˆ∗ (ℓ) (ℓ) n Hessian of L(θ; {x }) and Q(θ) = limn→∞ Q (θ, ; {x }). By the law of large numbers Q(θ) is the Hessian of EG,θ log PG,θ (Xr |X\r ) where EG,θ is the expectation with respect to (8) and X is a random variable distributed according to (8). We will denote the maximum and minimum eigenvalue of a symmetric matrix M by σmax (M ) and σmin (M ) respectively. We will omit arguments whenever clear from the context. Any quantity evaluated at the true parameter values will be represented with a ∗ , e.g. Q∗ = Q(θ∗ ). Quantities under a ∧ depend on n. Throughout this section G is a graph of maximum degree ∆. 3.1 Proof of Theorem 1.6 Our first auxiliary results establishes that, if λ is small, then ||Q∗ c S Q∗ −1 zS ||∞ > 1 is a sufficient ˆ∗ S SS condition for the failure of Rlr(λ). Lemma 3.1. Assume [Q∗ c S Q∗ −1 zS ]i ≥ 1 + ǫ for some ǫ > 0 and some row i ∈ V , σmin (Q∗ ) ≥ ˆ∗ S SS SS 3 ǫ/29 ∆4 . Then the success probability of Rlr(λ) is upper bounded as Cmin > 0, and λ < Cmin 2 2 2 δB Psucc ≤ 4∆2 e−nδA + 2∆ e−nλ 2 where δA = (Cmin /100∆2 )ǫ and δB = (Cmin /8∆)ǫ. 6 (10) The next Lemma implies that, for λ to be ‘reasonable’ (in the sense introduced in Section 1.2.2), nλ2 must be unbounded. Lemma 3.2. There exist M = M (K, θ) > 0 for θ > 0 such that the following is true: If G is the graph with only one edge between nodes r and i and nλ2 ≤ K, then Psucc ≤ e−M (K,θ)p + e−n(1−tanh θ) 2 /32 . (11) Finally, our key result shows that the condition ||Q∗ c S Q∗ −1 zS ||∞ ≤ 1 is violated with high ˆ∗ S SS probability for large random graphs. The proof of this result relies on a local weak convergence result for ferromagnetic Ising models on random graphs proved in [17]. Lemma 3.3. Let G be a uniformly random regular graph of degree ∆ > 3, and ǫ > 0 be sufficiently small. Then, there exists θthr (∆, ǫ) such that, for θ > θthr (∆, ǫ), ||Q∗ c S Q∗ −1 zS ||∞ ≥ 1 + ǫ with ˆ∗ S SS probability converging to 1 as p → ∞. ˜ ˜ ˜ Furthermore, for large ∆, θthr (∆, 0+) = θ ∆−1 (1 + o(1)). The constant θ is given by θ = ¯ ¯ ¯ ¯ ¯ ¯ tanh h)/h and h is the unique positive solution of h tanh h = (1 − tanh2 h)2 . Finally, there exist Cmin > 0 dependent only on ∆ and θ such that σmin (Q∗ ) ≥ Cmin with probability converging to SS 1 as p → ∞. The proofs of Lemmas 3.1 and 3.3 are sketched in the next subsection. Lemma 3.2 is more straightforward and we omit its proof for space reasons. Proof. (Theorem 1.6) Fix ∆ > 3, θ > K/∆ (where K is a large enough constant independent of ∆), and ǫ, Cmin > 0 and both small enough. By Lemma 3.3, for any p large enough we can choose a ∆-regular graph Gp = (V = [p], Ep ) and a vertex r ∈ V such that |Q∗ c S Q∗ −1 ½S |i > 1 + ǫ for S SS some i ∈ V \ r. By Theorem 1 in [4] we can assume, without loss of generality n > K ′ ∆ log p for some small constant K ′ . Further by Lemma 3.2, nλ2 ≥ F (p) for some F (p) ↑ ∞ as p → ∞ and the condition of Lemma 3.1 on λ is satisfied since by the ”reasonable” assumption λ → 0 with n. Using these results in Eq. (10) of Lemma 3.1 we get the following upper bound on the success probability 2 Psucc (Gp ) ≤ 4∆2 p−δA K In particular Psucc (Gp ) → 0 as p → ∞. 3.2 ′ ∆ 2 + 2∆ e−nF (p)δB . (12) Proofs of auxiliary lemmas θ θ Proof. (Lemma 3.1) We will show that under the assumptions of the lemma and if ˆ = (ˆS , ˆS C ) = θ (ˆS , 0) then the probability that the i component of any subgradient of L(θ; {x(ℓ) })+λ||θ||1 vanishes θ for any ˆS > 0 (component wise) is upper bounded as in Eq. (10). To simplify notation we will omit θ {x(ℓ) } in all the expression derived from L. θ θ) z Let z be a subgradient of ||θ|| at ˆ and assume ∇L(ˆ + λˆ = 0. An application of the mean value ˆ theorem yields ∇2 L(θ∗ )[ˆ − θ∗ ] = W n − λˆ + Rn , θ z (13) ∗ n n 2 ¯(j) ) − ∇2 L(θ∗ )]T (ˆ − θ∗ ) with ¯(j) a point in the line where W = −∇L(θ ) and [R ]j = [∇ L(θ θ j θ ˆ to θ∗ . Notice that by definition ∇2 L(θ∗ ) = Qn ∗ = Qn (θ∗ ). To simplify notation we will from θ omit the ∗ in all Qn ∗ . All Qn in this proof are thus evaluated at θ∗ . Breaking this expression into its S and S c components and since ˆS C = θ∗ C = 0 we can eliminate θ S ˆ − θ∗ from the two expressions obtained and write θS S n n n n ˆ z [WS C − RS C ] − Qn C S (Qn )−1 [WS − RS ] + λQn C S (Qn )−1 zS = λˆS C . SS SS S S Now notice that Qn C S (Qn )−1 = T1 + T2 + T3 + T4 where SS S T1 = Q∗ C S [(Qn )−1 − (Q∗ )−1 ] , SS SS S T3 = [Qn C S − Q∗ C S ][(Qn )−1 − (Q∗ )−1 ] , SS SS S S 7 T2 = [Qn C S − Q∗ C S ]Q∗ −1 , SS S S T4 = Q∗ C S Q∗ −1 . SS S (14) We will assume that the samples {x(ℓ) } are such that the following event holds n E ≡ {||Qn − Q∗ ||∞ < ξA , ||Qn C S − Q∗ C S ||∞ < ξB , ||WS /λ||∞ < ξC } , (15) SS SS S S √ 2 n where ξA ≡ Cmin ǫ/(16∆), ξB ≡ Cmin ǫ/(8 ∆) and ξC ≡ Cmin ǫ/(8∆). Since EG,θ (Q ) = Q∗ and EG,θ (W n ) = 0 and noticing that both Qn and W n are sums of bounded i.i.d. random variables, a simple application of Azuma-Hoeffding inequality upper bounds the probability of E as in (10). From E it follows that σmin (Qn ) > σmin (Q∗ ) − Cmin /2 > Cmin /2. We can therefore lower SS SS bound the absolute value of the ith component of zS C by ˆ n n ∆ Rn WS RS Wn + |[Q∗ C S Q∗ −1 ½S ]i |−||T1,i ||∞ −||T2,i ||∞ −||T3,i ||∞ − i − i − SS S λ λ Cmin λ ∞ λ ∞ where the subscript i denotes the i-th row of a matrix. The proof is completed by showing that the event E and the assumptions of the theorem imply that n each of last 7 terms in this expression is smaller than ǫ/8. Since |[Q∗ C S Q∗ −1 ]T zS | ≥ 1 + ǫ by i ˆ SS S assumption, this implies |ˆi | ≥ 1 + ǫ/8 > 1 which cannot be since any subgradient of the 1-norm z has components of magnitude at most 1. The last condition on E immediately bounds all terms involving W by ǫ/8. Some straightforward manipulations imply (See Lemma 7 from [7]) √ ∆ ∆ n ∗ ||T2,i ||∞ ≤ ||[Qn C S − Q∗ C S ]i ||∞ , ||T1,i ||∞ ≤ 2 ||QSS − QSS ||∞ , S S Cmin Cmin 2∆ ||T3,i ||∞ ≤ 2 ||Qn − Q∗ ||∞ ||[Qn C S − Q∗ C S ]i ||∞ , SS SS S S Cmin and thus all will be bounded by ǫ/8 when E holds. The upper bound of Rn follows along similar lines via an mean value theorem, and is deferred to a longer version of this paper. Proof. (Lemma 3.3.) Let us state explicitly the local weak convergence result mentioned in Sec. 3.1. For t ∈ N, let T(t) = (VT , ET ) be the regular rooted tree of t generations and define the associated Ising measure as ∗ 1 eθxi xj (16) eh xi . µ+ (x) = T,θ ZT,θ (i,j)∈ET i∈∂T(t) Here ∂T(t) is the set of leaves of T(t) and h∗ is the unique positive solution of h = (∆ − 1) atanh {tanh θ tanh h}. It can be proved using [17] and uniform continuity with respect to the ‘external field’ that non-trivial local expectations with respect to µG,θ (x) converge to local expectations with respect to µ+ (x), as p → ∞. T,θ More precisely, let Br (t) denote a ball of radius t around node r ∈ G (the node whose neighborhood we are trying to reconstruct). For any fixed t, the probability that Br (t) is not isomorphic to T(t) goes to 0 as p → ∞. Let g(xBr (t) ) be any function of the variables in Br (t) such that g(xBr (t) ) = g(−xBr (t) ). Then almost surely over graph sequences Gp of uniformly random regular graphs with p nodes (expectations here are taken with respect to the measures (1) and (16)) lim EG,θ {g(X Br (t) )} = ET(t),θ,+ {g(X T(t) )} . (17) p→∞ The proof consists in considering [Q∗ c S Q∗ −1 zS ]i for t = dist(r, i) finite. We then write ˆ∗ S SS (Q∗ )lk = E{gl,k (X Br (t) )} and (Q∗ c S )il = E{gi,l (X Br (t) )} for some functions g·,· (X Br (t) ) and S SS apply the weak convergence result (17) to these expectations. We thus reduced the calculation of [Q∗ c S Q∗ −1 zS ]i to the calculation of expectations with respect to the tree measure (16). The latter ˆ∗ S SS can be implemented explicitly through a recursive procedure, with simplifications arising thanks to the tree symmetry and by taking t ≫ 1. The actual calculations consist in a (very) long exercise in calculus and we omit them from this outline. The lower bound on σmin (Q∗ ) is proved by a similar calculation. SS Acknowledgments This work was partially supported by a Terman fellowship, the NSF CAREER award CCF-0743978 and the NSF grant DMS-0806211 and by a Portuguese Doctoral FCT fellowship. 8 , References [1] P. Abbeel, D. Koller and A. Ng, “Learning factor graphs in polynomial time and sample complexity”. Journal of Machine Learning Research., 2006, Vol. 7, 1743–1788. [2] M. Wainwright, “Information-theoretic limits on sparsity recovery in the high-dimensional and noisy setting”, arXiv:math/0702301v2 [math.ST], 2007. [3] N. Santhanam, M. Wainwright, “Information-theoretic limits of selecting binary graphical models in high dimensions”, arXiv:0905.2639v1 [cs.IT], 2009. [4] G. Bresler, E. Mossel and A. Sly, “Reconstruction of Markov Random Fields from Samples: Some Observations and Algorithms”,Proceedings of the 11th international workshop, APPROX 2008, and 12th international workshop RANDOM 2008, 2008 ,343–356. [5] Csiszar and Z. Talata, “Consistent estimation of the basic neighborhood structure of Markov random fields”, The Annals of Statistics, 2006, 34, Vol. 1, 123-145. [6] N. Friedman, I. Nachman, and D. Peer, “Learning Bayesian network structure from massive datasets: The sparse candidate algorithm”. In UAI, 1999. [7] P. Ravikumar, M. Wainwright and J. Lafferty, “High-Dimensional Ising Model Selection Using l1-Regularized Logistic Regression”, arXiv:0804.4202v1 [math.ST], 2008. [8] M.Wainwright, P. Ravikumar, and J. Lafferty, “Inferring graphical model structure using l1regularized pseudolikelihood“, In NIPS, 2006. [9] H. H¨ fling and R. Tibshirani, “Estimation of Sparse Binary Pairwise Markov Networks using o Pseudo-likelihoods” , Journal of Machine Learning Research, 2009, Vol. 10, 883–906. [10] O.Banerjee, L. El Ghaoui and A. d’Aspremont, “Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data”, Journal of Machine Learning Research, March 2008, Vol. 9, 485–516. [11] M. Yuan and Y. Lin, “Model Selection and Estimation in Regression with Grouped Variables”, J. Royal. Statist. Soc B, 2006, 68, Vol. 19,49–67. [12] N. Meinshausen and P. B¨ uhlmann, “High dimensional graphs and variable selection with the u lasso”, Annals of Statistics, 2006, 34, Vol. 3. [13] R. Tibshirani, “Regression shrinkage and selection via the lasso”, Journal of the Royal Statistical Society, Series B, 1994, Vol. 58, 267–288. [14] P. Zhao, B. Yu, “On model selection consistency of Lasso”, Journal of Machine. Learning Research 7, 25412563, 2006. [15] D. Zobin, ”Critical behavior of the bond-dilute two-dimensional Ising model“, Phys. Rev., 1978 ,5, Vol. 18, 2387 – 2390. [16] M. Fisher, ”Critical Temperatures of Anisotropic Ising Lattices. II. General Upper Bounds”, Phys. Rev. 162 ,Oct. 1967, Vol. 2, 480–485. [17] A. Dembo and A. Montanari, “Ising Models on Locally Tree Like Graphs”, Ann. Appl. Prob. (2008), to appear, arXiv:0804.4726v2 [math.PR] 9

4 0.59149206 255 nips-2009-Variational Inference for the Nested Chinese Restaurant Process

Author: Chong Wang, David M. Blei

Abstract: The nested Chinese restaurant process (nCRP) is a powerful nonparametric Bayesian model for learning tree-based hierarchies from data. Since its posterior distribution is intractable, current inference methods have all relied on MCMC sampling. In this paper, we develop an alternative inference technique based on variational methods. To employ variational methods, we derive a tree-based stick-breaking construction of the nCRP mixture model, and a novel variational algorithm that efficiently explores a posterior over a large set of combinatorial structures. We demonstrate the use of this approach for text and hand written digits modeling, where we show we can adapt the nCRP to continuous data as well. 1

5 0.49843317 212 nips-2009-Semi-Supervised Learning in Gigantic Image Collections

Author: Rob Fergus, Yair Weiss, Antonio Torralba

Abstract: With the advent of the Internet it is now possible to collect hundreds of millions of images. These images come with varying degrees of label information. “Clean labels” can be manually obtained on a small fraction, “noisy labels” may be extracted automatically from surrounding text, while for most images there are no labels at all. Semi-supervised learning is a principled framework for combining these different label sources. However, it scales polynomially with the number of images, making it impractical for use on gigantic collections with hundreds of millions of images and thousands of classes. In this paper we show how to utilize recent results in machine learning to obtain highly efficient approximations for semi-supervised learning that are linear in the number of images. Specifically, we use the convergence of the eigenvectors of the normalized graph Laplacian to eigenfunctions of weighted Laplace-Beltrami operators. Our algorithm enables us to apply semi-supervised learning to a database of 80 million images gathered from the Internet. 1

6 0.36413625 162 nips-2009-Neural Implementation of Hierarchical Bayesian Inference by Importance Sampling

7 0.34197694 99 nips-2009-Functional network reorganization in motor cortex can be explained by reward-modulated Hebbian learning

8 0.34141165 210 nips-2009-STDP enables spiking neurons to detect hidden causes of their inputs

9 0.33844233 52 nips-2009-Code-specific policy gradient rules for spiking neurons

10 0.33522883 19 nips-2009-A joint maximum-entropy model for binary neural population patterns and continuous signals

11 0.33487543 213 nips-2009-Semi-supervised Learning using Sparse Eigenfunction Bases

12 0.3287209 141 nips-2009-Local Rules for Global MAP: When Do They Work ?

13 0.32809114 155 nips-2009-Modelling Relational Data using Bayesian Clustered Tensor Factorization

14 0.3260476 62 nips-2009-Correlation Coefficients are Insufficient for Analyzing Spike Count Dependencies

15 0.32406017 169 nips-2009-Nonlinear Learning using Local Coordinate Coding

16 0.32263076 65 nips-2009-Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process

17 0.32147211 142 nips-2009-Locality-sensitive binary codes from shift-invariant kernels

18 0.32097548 112 nips-2009-Human Rademacher Complexity

19 0.32006377 17 nips-2009-A Sparse Non-Parametric Approach for Single Channel Separation of Known Sounds

20 0.31980491 247 nips-2009-Time-rescaling methods for the estimation and assessment of non-Poisson neural encoding models