nips nips2009 nips2009-247 knowledge-graph by maker-knowledge-mining

247 nips-2009-Time-rescaling methods for the estimation and assessment of non-Poisson neural encoding models

Source: pdf

Author: Jonathan W. Pillow

Abstract: Recent work on the statistical modeling of neural responses has focused on modulated renewal processes in which the spike rate is a function of the stimulus and recent spiking history. Typically, these models incorporate spike-history dependencies via either: (A) a conditionally-Poisson process with rate dependent on a linear projection of the spike train history (e.g., generalized linear model); or (B) a modulated non-Poisson renewal process (e.g., inhomogeneous gamma process). Here we show that the two approaches can be combined, resulting in a conditional renewal (CR) model for neural spike trains. This model captures both real-time and rescaled-time history effects, and can be ﬁt by maximum likelihood using a simple application of the time-rescaling theorem [1]. We show that for any modulated renewal process model, the log-likelihood is concave in the linear ﬁlter parameters only under certain restrictive conditions on the renewal density (ruling out many popular choices, e.g. gamma with shape κ = 1), suggesting that real-time history effects are easier to estimate than non-Poisson renewal properties. Moreover, we show that goodness-of-ﬁt tests based on the time-rescaling theorem [1] quantify relative-time effects, but do not reliably assess accuracy in spike prediction or stimulus-response modeling. We illustrate the CR model with applications to both real and simulated neural data. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract Recent work on the statistical modeling of neural responses has focused on modulated renewal processes in which the spike rate is a function of the stimulus and recent spiking history. [sent-4, score-1.348]

2 Typically, these models incorporate spike-history dependencies via either: (A) a conditionally-Poisson process with rate dependent on a linear projection of the spike train history (e. [sent-5, score-0.435]

3 , generalized linear model); or (B) a modulated non-Poisson renewal process (e. [sent-7, score-0.858]

4 Here we show that the two approaches can be combined, resulting in a conditional renewal (CR) model for neural spike trains. [sent-10, score-1.124]

5 We show that for any modulated renewal process model, the log-likelihood is concave in the linear ﬁlter parameters only under certain restrictive conditions on the renewal density (ruling out many popular choices, e. [sent-12, score-1.781]

6 gamma with shape κ = 1), suggesting that real-time history effects are easier to estimate than non-Poisson renewal properties. [sent-14, score-0.894]

7 Moreover, we show that goodness-of-ﬁt tests based on the time-rescaling theorem [1] quantify relative-time effects, but do not reliably assess accuracy in spike prediction or stimulus-response modeling. [sent-15, score-0.295]

8 1 Introduction A central problem in computational neuroscience is to develop functional models that can accurately describe the relationship between external variables and neural spike trains. [sent-17, score-0.342]

9 All attempts to measure information transmission in the nervous system are fundamentally attempts to quantify this relationship, which can be expressed by the conditional probability P ({ti }|X), where {ti } is a set of spike times generated in response to an external stimulus X. [sent-18, score-0.461]

10 Recent work on the neural coding problem has focused on extensions of the Linear-NonlinearPoisson (LNP) “cascade” encoding model, which describes the neural encoding process using a linear receptive ﬁeld, a point nonlinearity, and an inhomogeneous Poisson spiking process [2, 3]. [sent-19, score-0.444]

11 Such dependencies, moreover, have been shown to be essential for extracting complete stimulus information from spike trains in a variety of brain areas [4, 5, 6, 7, 8, 9, 10, 11]. [sent-24, score-0.381]

12 One approach is to model spiking as a non-Poisson inhomogeneous renewal process (e. [sent-26, score-1.048]

13 Under this approach, spike 1 A 50 B 0 rate (Hz) 100 nonlinearity 6 5 renewal density 3 2 p(ISI) + 4 . [sent-29, score-1.19]

14 stimulus filter 1 post-spike filter 0 1 2 rescaled time rescaled time (unitless) 7 rescaled renewal spiking 0 real time (s) Figure 1: The conditional renewal (CR) model and time-rescaling transform. [sent-32, score-2.378]

15 (A) Stimuli are convolved with a ﬁlter k then passed through a nonlinearity f , whose output is the rate λ(t) for an inhomogeneous spiking process with renewal density q. [sent-33, score-1.181]

16 The post-spike ﬁlter h provides recurrent additive input to f for every spike emitted. [sent-34, score-0.265]

17 Top: the intensity λ(t) (here independent of spike history) in response to a one-second stimulus. [sent-36, score-0.43]

18 Bottom left: interspike intervals (left, intervals between red dots) are drawn i. [sent-37, score-0.36]

19 in rescaled time from renewal density q, here set to gamma with shape κ = 20. [sent-40, score-1.104]

20 Alternatively, Λ(t) maps the true spike times (bottom) to samples from a homogeneous renewal process in rescaled time (left edge). [sent-42, score-1.304]

21 times are Markovian, depending on the most recent spike time via a (non-exponential) renewal density, which may be rescaled in proportion to the instantaneous spike rate. [sent-43, score-1.495]

22 A second approach is to use a conditionally Poisson process in which the intensity (or spike rate) is a function of the recent spiking history [4, 16, 17, 18, 19, 20]. [sent-44, score-0.673]

23 The output of such a model is a conditionally Poisson process, but not Poisson, since the spike rate itself depends on the spike history. [sent-45, score-0.631]

24 We begin by reviewing inhomogeneous renewal models and generalized linear model point process models for neural spike trains. [sent-47, score-1.235]

25 1 Point process neural encoding models Deﬁnitions and Terminology Let {ti } be a sequence of spike times on the interval (0, T ], with 0 < t0 < t1 < . [sent-49, score-0.402]

26 This function rescales the original spike times into spikes from a (homogeneous) renewal process, that is, a process in which the intervals are i. [sent-57, score-1.275]

27 Let {ui } denote the inter-spike intervals (ISIs) of the rescaled process, which are given by the integral of the intensity between successive spikes, i. [sent-61, score-0.516]

28 (2) ti−1 Intuitively, this transformation stretches time in proportion to the spike rate λ(t) , so that when the rate λ(t) is high, ISIs are lengthened and when λ(t) is low, ISIs are compressed. [sent-64, score-0.322]

29 2 Let q(u) denote the renewal density, the probability density function from which the rescaled-time intervals {ui } are drawn. [sent-67, score-1.0]

30 A Poisson process arises if q is exponential, q(u) = e−u ; for any other density, the probability of spiking depends on the most recent spike time. [sent-68, score-0.408]

31 1B), we can draw independent intervals ui from renewal density q(u), then apply the inverse time-rescaling transform to obtain ISIs in real time: (ti − ti−1 ) = Λ−1 (ui ), ti−1 (3) where Λ−1 (t) is the inverse of time-rescaling transform (eq 2). [sent-71, score-1.079]

32 The intensity in this case can be written: λ(t) = f (xt · k + yt · h), (4) where xt is a vector representing the stimulus at time t, k is a stimulus ﬁlter, yt is a vector representing the spike history at t, and h is a spike-history ﬁlter. [sent-73, score-0.692]

33 2 The conditional renewal model We refer to the most general version of this model, in which λ(t) is allowed to depend on both the stimulus and spike train history, and q(u) is an arbitrary (ﬁnite-mean) density on R + , as a conditional renewal (CR) model (see ﬁg. [sent-76, score-2.126]

34 The output of this model forms an inhomogeneous renewal process conditioned on the process history. [sent-78, score-0.989]

35 Speciﬁc (restricted) cases of the CR model include the generalized linear model (GLM) [17], and the modulated renewal model with λ = f (x · k) and q a right-skewed, non-exponential renewal density [13, 15]. [sent-80, score-1.761]

36 The conditional probability distribution over spike times {ti } given the external variables X can be derived using the time-rescaling transformation. [sent-82, score-0.354]

37 3) provides the conditional probability over spike ti−1 times: n P ({ti }|X) = λ(ti )q(Λti−1 (ti )). [sent-85, score-0.309]

38 3 The log-likelihood function can be approximated in discrete time, with bin-size dt taken small enough to ensure ≤ 1 spike per bin:   n log P ({ti }|X) = ti n log λ(ti ) + i=1 log q  i=1 λ(j)dt , (7) j=ti−1 +1 where ti indicates the bin for the ith spike. [sent-87, score-0.725]

39 1 Note that Λt∗ (t) is invertible for all spike times ti , since necessarily ti ∈ {t; λ(t) > 0}. [sent-89, score-0.699]

40 A note on terminology: we follow [13] in deﬁning λ(t) to be the instantaneous rate for an inhomogeneous renewal process, which is not identical to the hazard function H(t) = P (ti ∈ [t, t + ∆]|ti > ti−1 )/∆, also known as the conditional intensity [1]. [sent-90, score-1.107]

41 2 3 stimulus ﬁlter rasters renewal density KS plot 1 2 (a) CDF gamma 0. [sent-93, score-1.011]

42 : Left: Stimulus ﬁlter and renewal density for three point process models (all with nonlinearity f (x) = ex and history-independent intensity). [sent-96, score-0.946]

43 “True” spikes were generated from (a), a conditional renewal model with a gamma renewal density (κ = 10). [sent-97, score-1.804]

44 These responses were ﬁt by: (b), a Poisson model with the correct stimulus ﬁlter; and (c), a modulated renewal process with incorrect stimulus ﬁlter (set to the negative of the correct ﬁlter), and renewal density estimated nonparametrically from the transformed intervals (eq. [sent-98, score-2.121]

45 Middle: Repeated responses from all three models to a novel 1-s stimulus, showing that spike rate is well predicted by (b) but not by (c). [sent-100, score-0.316]

46 Likelihood-based cross-validation tests (below) show that (b) preserves roughly 1/3 as much information about spike times as (a), while (c) carries slightly less information than a homogeneous Poisson process with the correct spike rate. [sent-103, score-0.688]

47 3 Convexity condition for inhomogeneous renewal models We now turn to the tractability of estimating the CR model parameters from data. [sent-104, score-0.905]

48 By extension, we can ask whether the estimation problem remains convex when we relax the Poisson assumption and allow for a non-exponential renewal density q. [sent-112, score-0.905]

49 The CR model log-likelihood L{D,q} (θ) is concave in the ﬁlter parameters θ, for any observed data D, if: (1) the nonlinearity f is convex and log-concave; and (2) the renewal density q is log-concave and non-increasing on (0, ∞]. [sent-115, score-1.023]

50 Maximum likelihood ﬁlter estimation under the CR model is therefore a convex problem so long as the renewal density q is both log-concave and non-increasing. [sent-125, score-0.954]

51 This restriction rules out a variety of renewal densities that are commonly employed to model neural data [13, 14, 15]. [sent-126, score-0.83]

52 , using the GLM framework) than via a non-Poisson renewal density. [sent-132, score-0.764]

53 An important corollary of this convexity result is that the decoding problem of estimating stimuli {xt } from a set of observed spike times {ti } using the maximum of the posterior (i. [sent-133, score-0.313]

54 4 Nonparametric Estimation of the CR model In practice, we may wish to optimize both the ﬁlter parameters governing the base intensity λ(t) and the renewal density q, which is not in general a convex problem. [sent-136, score-1.103]

55 Here we formulate a slightly different interval-rescaling function that allows us to nonparametrically estimate renewal properties using a density on the unit interval. [sent-138, score-0.897]

56 Let us deﬁne the mapping vi = 1 − exp(−Λti−1 (ti )), (9) which is the cumulative density function (cdf) for the intervals from a conditionally Poisson process with cumulative intensity Λ(t). [sent-139, score-0.592]

57 Any discrepancy between the distribution of {vi } and the uniform distribution represents failures of a Poisson model to correctly describe the renewal statistics. [sent-144, score-0.807]

58 We propose to estimate a density φ(v) for the rescaled intervals {vi } using cubic splines (piecewise 3rd-order polynomials with continuous 2nd derivatives), with evenly spaced knots on the interval [0, 1]. [sent-146, score-0.474]

59 0 d2 g(f (x)) dx2 6 5 1 0 0 1 Figure 3: Left: pairwise dependencies between successive rescaled ISIs from model (“a”, see ﬁg. [sent-150, score-0.299]

60 Center: ﬁtted model of the conditional distribution over rescaled ISIs given the previous ISI, discretized into 7 intervals for the previous ISI. [sent-152, score-0.396]

61 Right: rescaling the intervals using the cdf , obtained from the conditional (zi+1 zi ), produces successive ISIs which are much more independent. [sent-153, score-0.359]

62 Poisson spiking, from the (rescaled-time) contributions of a non-Poisson renewal density. [sent-155, score-0.764]

63 2), and to real neural data using alternating coordinate ascent of the ﬁlter parameters and the renewal density parameters (ﬁg. [sent-158, score-0.884]

64 2, we plot the renewal distribution q(u) (red trace), which can be obtained from the estimated (v) via the transformation q(u) = (1 e u )e u . [sent-161, score-0.779]

65 1 Incorporating dependencies between intervals v The cdf deﬁned by the CR model, (v) = 0 (s)ds, maps the transformed ISIs vi so that the marginal distribution over zi = (vi ) is uniform on [0 1]. [sent-163, score-0.383]

66 Speciﬁcally, after remapping a set of observed spike times according to the (model-deﬁned) cumulative intensity, one can perform a distributional test (e. [sent-172, score-0.333]

67 , Kolmogorov-Smirnov, or KS test) to assess whether the rescaled intervals have the expected distribution7 . [sent-174, score-0.324]

68 For example, for a conditionally Poisson model, the KS test can be applied to the rescaled intervals vi (eq. [sent-175, score-0.417]

69 7 Although we have deﬁned the time-rescaling transform using the base intensity instead of the conditional intensity as in [1], the resulting tests are equivalent provided the K-S test is applied using the appropriate distribution. [sent-177, score-0.445]

70 For this example, spikes were genereated from a “true” model (denoted “a”), a CR model with a biphasic stimulus ﬁlter and a gamma renewal density (κ = 10). [sent-188, score-1.116]

71 We cross-validated these models by computing the log-likelihood of novel data, which provides a measure of predictive information about novel spike trains in units of bits/s [24, 18]. [sent-192, score-0.289]

72 Using this measure, the “true” model (a) provides approximately 24 bits/s about the spike response to a novel stimulus. [sent-193, score-0.293]

73 The Poisson model (b) captures only 8 bits/s, but is still much more accurate than the mis-speciﬁed renewal model (c), for which the information is slightly negative (indicating that performance is slightly worse than that of a homogeneous Poisson process with the correct rate). [sent-194, score-0.924]

74 3 shows that model (c) can be improved by modeling the dependencies between successive rescaled interspike intervals. [sent-196, score-0.381]

75 Rescaling these intervals using the cdf of the augmented model yields intervals that are both uniform on [0, 1] and approximately independent (ﬁg. [sent-199, score-0.396]

76 The augmented model raises the cross-validation score of model (c) to 1 bit/s, meaning that by incorporating dependencies between intervals, the model carries slightly more predictive information than a homogeneous Poisson model, despite the mis-speciﬁed stimulus ﬁlter. [sent-201, score-0.341]

77 However, this model—despite passing time-rescaling tests of both marginal distribution and independence—still carries less information about spike times than the inhomogeneous Poisson model (b). [sent-202, score-0.475]

78 We ﬁt parameters for the CR model with and without spike-history ﬁlters, and with and without a non-Poisson renewal density (estimated non-parametrically as described above). [sent-206, score-0.889]

79 As expected, a non-parametric renewal density allows for remapping of ISIs to the correct (uniform) marginal distribution in rescaled time (ﬁg. [sent-207, score-1.07]

80 Even when incorporating spike-history ﬁlters, the model with conditionally Poisson spiking (red) fails the time-rescaling test at the 95% level, though not so badly as the the inhomogeneous Poisson model (blue). [sent-209, score-0.405]

81 However, the conditional Poisson model with spike-history ﬁlter (red) outperforms the non-parametric renewal model without spike-history ﬁlter (dark gray) on likelihood-based cross-validation, carrying 14% more predictive information. [sent-210, score-0.864]

82 For this neuron, incorporating non-Poisson renewal properties into a model with spike history dependent intensity (light gray) provides only a modest (<1%) increase in cross-validation performance. [sent-211, score-1.306]

83 Thus, in addition to being more tractable for estimation, it appears that the generalized linear modeling framework captures spike-train dependencies more accurately than a non-Poisson renewal process (at least for this neuron). [sent-212, score-0.865]

84 Left: marginal distribution over the interspike intervals {zi }, rescaled according to their cdf deﬁned under four different models: (a) Inhomogeneous Poisson (i. [sent-215, score-0.481]

85 (b) Conditional renewal model without spike-history ﬁlter, with non-parametrically estimated renewal density φ. [sent-218, score-1.653]

86 (d) Conditional renewal model with spikehistory ﬁlter and non-parametrically estimated renewal density. [sent-220, score-1.556]

87 Middle: The difference between the empirical cdf of the rescaled intervals (under all four models) and their quantiles. [sent-222, score-0.399]

88 Adding a non-parametric renewal density adds 4% to the Poisson model performance, but <1% to the GLM performance. [sent-225, score-0.889]

89 Overall, a spike-history ﬁlter improves cross-validation performance more than the use of non-Poisson renewal process. [sent-226, score-0.764]

90 7 Discussion We have connected two basic approaches for incorporating spike-history effects into neural encoding models: (1) non-Poisson renewal processes; and (2) conditionally Poisson processes with an intensity that depends on spike train history. [sent-227, score-1.371]

91 We have shown that both kinds of effects can be regarded as special cases of a conditional renewal (CR) process model, and have formulated the model likelihood in a manner that separates the contributions from these two kinds of mechanisms. [sent-228, score-0.923]

92 Additionally, we have derived a condition on the CR model renewal density under which the likelihood function over ﬁlter parameters is log-concave, guaranteeing that ML estimation of ﬁlters (and MAP stimulus decoding) is a convex optimization problem. [sent-229, score-1.046]

93 We have shown that incorporating a non-parametric estimate of the CR model renewal density ensures near-perfect performance on the time-rescaling goodness-of-ﬁt test, even when the model itself has little predictive accuracy (e. [sent-230, score-0.953]

94 Thus, we would argue that K-S tests based on the time-rescaled interspike intervals should not be used in isolation, but rather in conjunction with other tools for model comparison (e. [sent-233, score-0.279]

95 Failure under the time-rescaling test indicates that model performance may be improved by incorporating a non-Poisson renewal density, which as we have shown, may be estimated directly from rescaled intervals. [sent-236, score-1.013]

96 Finally, we have applied the CR model to neural data, and shown that it can capture spike-history dependencies in both real and rescaled time. [sent-237, score-0.295]

97 In future work, we will examine larger datasets and explore whether rescaled-time or real-time models provide more accurate descriptions of the dependencies in spike trains from a wider variety of neural datasets. [sent-238, score-0.371]

98 The time-rescaling theorem and its application to neural spike train data analysis. [sent-254, score-0.288]

99 Role of precise spike timing in coding of dynamic vibrissa stimuli in somatosensory thalamus. [sent-324, score-0.298]

100 A point process framework for relating neural spiking activity to spiking history, neural ensemble and extrinsic covariate effects. [sent-380, score-0.29]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('renewal', 0.764), ('spike', 0.265), ('ti', 0.209), ('rescaled', 0.185), ('intensity', 0.165), ('cr', 0.161), ('poisson', 0.153), ('intervals', 0.139), ('inhomogeneous', 0.113), ('isis', 0.113), ('lter', 0.102), ('spiking', 0.101), ('density', 0.097), ('stimulus', 0.092), ('interspike', 0.082), ('cdf', 0.075), ('concave', 0.062), ('dependencies', 0.059), ('gamma', 0.058), ('zi', 0.054), ('conditionally', 0.052), ('modulated', 0.052), ('spikes', 0.049), ('history', 0.048), ('paninski', 0.047), ('pillow', 0.044), ('conditional', 0.044), ('nonlinearity', 0.043), ('encoding', 0.042), ('process', 0.042), ('glm', 0.041), ('vi', 0.041), ('ui', 0.037), ('incorporating', 0.036), ('ks', 0.036), ('lnp', 0.035), ('homogeneous', 0.032), ('ruling', 0.031), ('tests', 0.03), ('neuron', 0.03), ('responses', 0.03), ('convex', 0.029), ('external', 0.029), ('badly', 0.028), ('isi', 0.028), ('cumulative', 0.028), ('model', 0.028), ('successive', 0.027), ('shlens', 0.026), ('retinal', 0.026), ('neuroscience', 0.025), ('trains', 0.024), ('effects', 0.024), ('knots', 0.024), ('refractory', 0.024), ('remapping', 0.024), ('timerescaling', 0.024), ('neural', 0.023), ('dt', 0.023), ('carries', 0.023), ('cascade', 0.022), ('rate', 0.021), ('transform', 0.021), ('chichilnisky', 0.021), ('fellows', 0.021), ('barbieri', 0.021), ('nonparametrically', 0.021), ('likelihood', 0.021), ('base', 0.02), ('rescaling', 0.02), ('lters', 0.019), ('bin', 0.019), ('ganglion', 0.019), ('refractoriness', 0.019), ('kass', 0.019), ('fails', 0.019), ('litke', 0.018), ('stimulated', 0.018), ('stimuli', 0.017), ('times', 0.016), ('coding', 0.016), ('middle', 0.015), ('transformation', 0.015), ('densities', 0.015), ('sher', 0.015), ('neurobiology', 0.015), ('jonathan', 0.015), ('transmission', 0.015), ('uniform', 0.015), ('decoding', 0.015), ('filter', 0.015), ('quantile', 0.015), ('inspection', 0.015), ('cubic', 0.015), ('estimation', 0.015), ('cell', 0.015), ('yt', 0.015), ('slightly', 0.015), ('interval', 0.014), ('simoncelli', 0.014)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999976 247 nips-2009-Time-rescaling methods for the estimation and assessment of non-Poisson neural encoding models

Author: Jonathan W. Pillow

2 0.20742975 52 nips-2009-Code-specific policy gradient rules for spiking neurons

Author: Henning Sprekeler, Guillaume Hennequin, Wulfram Gerstner

Abstract: Although it is widely believed that reinforcement learning is a suitable tool for describing behavioral learning, the mechanisms by which it can be implemented in networks of spiking neurons are not fully understood. Here, we show that different learning rules emerge from a policy gradient approach depending on which features of the spike trains are assumed to inﬂuence the reward signals, i.e., depending on which neural code is in effect. We use the framework of Williams (1992) to derive learning rules for arbitrary neural codes. For illustration, we present policy-gradient rules for three different example codes - a spike count code, a spike timing code and the most general “full spike train” code - and test them on simple model problems. In addition to classical synaptic learning, we derive learning rules for intrinsic parameters that control the excitability of the neuron. The spike count learning rule has structural similarities with established Bienenstock-Cooper-Munro rules. If the distribution of the relevant spike train features belongs to the natural exponential family, the learning rules have a characteristic shape that raises interesting prediction problems. 1

3 0.13869166 19 nips-2009-A joint maximum-entropy model for binary neural population patterns and continuous signals

Author: Sebastian Gerwinn, Philipp Berens, Matthias Bethge

Abstract: Second-order maximum-entropy models have recently gained much interest for describing the statistics of binary spike trains. Here, we extend this approach to take continuous stimuli into account as well. By constraining the joint secondorder statistics, we obtain a joint Gaussian-Boltzmann distribution of continuous stimuli and binary neural ﬁring patterns, for which we also compute marginal and conditional distributions. This model has the same computational complexity as pure binary models and ﬁtting it to data is a convex problem. We show that the model can be seen as an extension to the classical spike-triggered average/covariance analysis and can be used as a non-linear method for extracting features which a neural population is sensitive to. Further, by calculating the posterior distribution of stimuli given an observed neural response, the model can be used to decode stimuli and yields a natural spike-train metric. Therefore, extending the framework of maximum-entropy models to continuous variables allows us to gain novel insights into the relationship between the ﬁring patterns of neural ensembles and the stimuli they are processing. 1

4 0.12199868 183 nips-2009-Optimal context separation of spiking haptic signals by second-order somatosensory neurons

Author: Romain Brasselet, Roland Johansson, Angelo Arleo

Abstract: We study an encoding/decoding mechanism accounting for the relative spike timing of the signals propagating from peripheral nerve ﬁbers to second-order somatosensory neurons in the cuneate nucleus (CN). The CN is modeled as a population of spiking neurons receiving as inputs the spatiotemporal responses of real mechanoreceptors obtained via microneurography recordings in humans. The efﬁciency of the haptic discrimination process is quantiﬁed by a novel deﬁnition of entropy that takes into full account the metrical properties of the spike train space. This measure proves to be a suitable decoding scheme for generalizing the classical Shannon entropy to spike-based neural codes. It permits an assessment of neurotransmission in the presence of a large output space (i.e. hundreds of spike trains) with 1 ms temporal precision. It is shown that the CN population code performs a complete discrimination of 81 distinct stimuli already within 35 ms of the ﬁrst afferent spike, whereas a partial discrimination (80% of the maximum information transmission) is possible as rapidly as 15 ms. This study suggests that the CN may not constitute a mere synaptic relay along the somatosensory pathway but, rather, it may convey optimal contextual accounts (in terms of fast and reliable information transfer) of peripheral tactile inputs to downstream structures of the central nervous system. 1

5 0.11931596 62 nips-2009-Correlation Coefficients are Insufficient for Analyzing Spike Count Dependencies

Author: Arno Onken, Steffen Grünewälder, Klaus Obermayer

Abstract: The linear correlation coefﬁcient is typically used to characterize and analyze dependencies of neural spike counts. Here, we show that the correlation coefﬁcient is in general insufﬁcient to characterize these dependencies. We construct two neuron spike count models with Poisson-like marginals and vary their dependence structure using copulas. To this end, we construct a copula that allows to keep the spike counts uncorrelated while varying their dependence strength. Moreover, we employ a network of leaky integrate-and-ﬁre neurons to investigate whether weakly correlated spike counts with strong dependencies are likely to occur in real networks. We ﬁnd that the entropy of uncorrelated but dependent spike count distributions can deviate from the corresponding distribution with independent components by more than 25 % and that weakly correlated but strongly dependent spike counts are very likely to occur in biological networks. Finally, we introduce a test for deciding whether the dependence structure of distributions with Poissonlike marginals is well characterized by the linear correlation coefﬁcient and verify it for different copula-based models. 1

6 0.10125482 165 nips-2009-Noise Characterization, Modeling, and Reduction for In Vivo Neural Recording

7 0.085903108 225 nips-2009-Sparsistent Learning of Varying-coefficient Models with Structural Changes

8 0.082360595 41 nips-2009-Bayesian Source Localization with the Multivariate Laplace Prior

9 0.07643719 121 nips-2009-Know Thy Neighbour: A Normative Theory of Synaptic Depression

10 0.062035557 217 nips-2009-Sharing Features among Dynamical Systems with Beta Processes

11 0.05665718 210 nips-2009-STDP enables spiking neurons to detect hidden causes of their inputs

12 0.055864118 200 nips-2009-Reconstruction of Sparse Circuits Using Multi-neuronal Excitation (RESCUME)

13 0.055262238 162 nips-2009-Neural Implementation of Hierarchical Bayesian Inference by Importance Sampling

14 0.054686222 203 nips-2009-Replacing supervised classification learning by Slow Feature Analysis in spiking neural networks

15 0.048443694 241 nips-2009-The 'tree-dependent components' of natural scenes are edge filters

16 0.047262445 13 nips-2009-A Neural Implementation of the Kalman Filter

17 0.046513181 150 nips-2009-Maximum likelihood trajectories for continuous-time Markov chains

18 0.045842644 43 nips-2009-Bayesian estimation of orientation preference maps

19 0.040030744 163 nips-2009-Neurometric function analysis of population codes

20 0.039600153 61 nips-2009-Convex Relaxation of Mixture Regression with Efficient Algorithms

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.12), (1, -0.11), (2, 0.187), (3, 0.091), (4, 0.056), (5, -0.067), (6, -0.031), (7, 0.017), (8, 0.034), (9, -0.0), (10, -0.018), (11, 0.011), (12, 0.032), (13, -0.033), (14, -0.028), (15, -0.04), (16, 0.004), (17, -0.044), (18, 0.025), (19, 0.033), (20, 0.048), (21, 0.026), (22, -0.152), (23, 0.02), (24, -0.099), (25, 0.013), (26, 0.035), (27, 0.048), (28, 0.086), (29, -0.049), (30, -0.005), (31, 0.097), (32, -0.004), (33, -0.086), (34, 0.023), (35, 0.031), (36, 0.075), (37, 0.098), (38, -0.147), (39, 0.1), (40, -0.058), (41, -0.004), (42, 0.032), (43, 0.081), (44, 0.091), (45, 0.026), (46, -0.158), (47, 0.085), (48, 0.018), (49, -0.019)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94442362 247 nips-2009-Time-rescaling methods for the estimation and assessment of non-Poisson neural encoding models

Author: Jonathan W. Pillow

2 0.69829428 62 nips-2009-Correlation Coefficients are Insufficient for Analyzing Spike Count Dependencies

Author: Arno Onken, Steffen Grünewälder, Klaus Obermayer

3 0.65944171 52 nips-2009-Code-specific policy gradient rules for spiking neurons

Author: Henning Sprekeler, Guillaume Hennequin, Wulfram Gerstner

4 0.6316306 183 nips-2009-Optimal context separation of spiking haptic signals by second-order somatosensory neurons

Author: Romain Brasselet, Roland Johansson, Angelo Arleo

5 0.61107349 165 nips-2009-Noise Characterization, Modeling, and Reduction for In Vivo Neural Recording

Author: Zhi Yang, Qi Zhao, Edward Keefer, Wentai Liu

Abstract: Studying signal and noise properties of recorded neural data is critical in developing more efﬁcient algorithms to recover the encoded information. Important issues exist in this research including the variant spectrum spans of neural spikes that make it difﬁcult to choose a globally optimal bandpass ﬁlter. Also, multiple sources produce aggregated noise that deviates from the conventional white Gaussian noise. In this work, the spectrum variability of spikes is addressed, based on which the concept of adaptive bandpass ﬁlter that ﬁts the spectrum of individual spikes is proposed. Multiple noise sources have been studied through analytical models as well as empirical measurements. The dominant noise source is identiﬁed as neuron noise followed by interface noise of the electrode. This suggests that major efforts to reduce noise from electronics are not well spent. The measured noise from in vivo experiments shows a family of 1/f x spectrum that can be reduced using noise shaping techniques. In summary, the methods of adaptive bandpass ﬁltering and noise shaping together result in several dB signal-to-noise ratio (SNR) enhancement.

6 0.59756845 19 nips-2009-A joint maximum-entropy model for binary neural population patterns and continuous signals

7 0.58146709 203 nips-2009-Replacing supervised classification learning by Slow Feature Analysis in spiking neural networks

8 0.4745076 121 nips-2009-Know Thy Neighbour: A Normative Theory of Synaptic Depression

9 0.42504802 216 nips-2009-Sequential effects reflect parallel learning of multiple environmental regularities

10 0.34745687 150 nips-2009-Maximum likelihood trajectories for continuous-time Markov chains

11 0.33910978 163 nips-2009-Neurometric function analysis of population codes

12 0.33842137 41 nips-2009-Bayesian Source Localization with the Multivariate Laplace Prior

13 0.33176982 225 nips-2009-Sparsistent Learning of Varying-coefficient Models with Structural Changes

14 0.30047092 25 nips-2009-Adaptive Design Optimization in Experiments with People

15 0.29881418 140 nips-2009-Linearly constrained Bayesian matrix factorization for blind source separation

16 0.28284892 114 nips-2009-Indian Buffet Processes with Power-law Behavior

17 0.27822816 217 nips-2009-Sharing Features among Dynamical Systems with Beta Processes

18 0.25823179 17 nips-2009-A Sparse Non-Parametric Approach for Single Channel Separation of Known Sounds

19 0.25710493 59 nips-2009-Construction of Nonparametric Bayesian Models from Parametric Bayes Equations

20 0.24791704 200 nips-2009-Reconstruction of Sparse Circuits Using Multi-neuronal Excitation (RESCUME)

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(7, 0.021), (21, 0.024), (22, 0.239), (24, 0.043), (25, 0.046), (31, 0.012), (35, 0.043), (36, 0.09), (39, 0.04), (58, 0.093), (71, 0.053), (81, 0.036), (86, 0.05), (91, 0.077), (92, 0.02)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.81095344 227 nips-2009-Speaker Comparison with Inner Product Discriminant Functions

Author: Zahi Karam, Douglas Sturim, William M. Campbell

Abstract: Speaker comparison, the process of ﬁnding the speaker similarity between two speech signals, occupies a central role in a variety of applications—speaker veriﬁcation, clustering, and identiﬁcation. Speaker comparison can be placed in a geometric framework by casting the problem as a model comparison process. For a given speech signal, feature vectors are produced and used to adapt a Gaussian mixture model (GMM). Speaker comparison can then be viewed as the process of compensating and ﬁnding metrics on the space of adapted models. We propose a framework, inner product discriminant functions (IPDFs), which extends many common techniques for speaker comparison—support vector machines, joint factor analysis, and linear scoring. The framework uses inner products between the parameter vectors of GMM models motivated by several statistical methods. Compensation of nuisances is performed via linear transforms on GMM parameter vectors. Using the IPDF framework, we show that many current techniques are simple variations of each other. We demonstrate, on a 2006 NIST speaker recognition evaluation task, new scoring methods using IPDFs which produce excellent error rates and require signiﬁcantly less computation than current techniques.

same-paper 2 0.79248202 247 nips-2009-Time-rescaling methods for the estimation and assessment of non-Poisson neural encoding models

Author: Jonathan W. Pillow

3 0.75809437 21 nips-2009-Abstraction and Relational learning

Author: Charles Kemp, Alan Jern

Abstract: Most models of categorization learn categories deﬁned by characteristic features but some categories are described more naturally in terms of relations. We present a generative model that helps to explain how relational categories are learned and used. Our model learns abstract schemata that specify the relational similarities shared by instances of a category, and our emphasis on abstraction departs from previous theoretical proposals that focus instead on comparison of concrete instances. Our ﬁrst experiment suggests that abstraction can help to explain some of the ﬁndings that have previously been used to support comparison-based approaches. Our second experiment focuses on one-shot schema learning, a problem that raises challenges for comparison-based approaches but is handled naturally by our abstraction-based account. Categories such as family, sonnet, above, betray, and imitate differ in many respects but all of them depend critically on relational information. Members of a family are typically related by blood or marriage, and the lines that make up a sonnet must rhyme with each other according to a certain pattern. A pair of objects will demonstrate “aboveness” only if a certain spatial relationship is present, and an event will qualify as an instance of betrayal or imitation only if its participants relate to each other in certain ways. All of the cases just described are examples of relational categories. This paper develops a computational approach that helps to explain how simple relational categories are acquired. Our approach highlights the role of abstraction in relational learning. Given several instances of a relational category, it is often possible to infer an abstract representation that captures what the instances have in common. We refer to these abstract representations as schemata, although others may prefer to call them rules or theories. For example, a sonnet schema might specify the number of lines that a sonnet should include and the rhyming pattern that the lines should follow. Once a schema has been acquired it can support several kinds of inferences. A schema can be used to make predictions about hidden aspects of the examples already observed—if the ﬁnal word in a sonnet is illegible, the rhyming pattern can help to predict the identity of this word. A schema can be used to decide whether new examples (e.g. new poems) qualify as members of the category. Finally, a schema can be used to generate novel examples of a category (e.g. novel sonnets). Most researchers would agree that abstraction plays some role in relational learning, but Gentner [1] and other psychologists have emphasized the role of comparison instead [2, 3]. Given one example of a sonnet and the task of deciding whether a second poem is also a sonnet, a comparison-based approach might attempt to establish an alignment or mapping between the two. Approaches that rely on comparison or mapping are especially prominent in the literature on analogical reasoning [4, 5], and many of these approaches can be viewed as accounts of relational categorization [6]. For example, the problem of deciding whether two systems are analogous can be formalized as the problem of deciding whether these systems are instances of the same relational category. Despite some notable exceptions [6, 7], most accounts of analogy focus on comparison rather than abstraction, and suggest that “analogy passes from one instance of a generalization to another without pausing for explicit induction of the generalization” (p 95) [8]. 1 Schema s 0∀Q ∀x ∀y Q(x) < Q(y) ↔ D1 (x) < D1 (y) Group g Observation o Figure 1: A hierarchical generative model for learning and using relational categories. The schema s at the top level is a logical sentence that speciﬁes which groups are valid instances of the category. The group g at the second level is randomly sampled from the set of valid instances, and the observation o is a partially observed version of group g. Researchers that focus on comparison sometimes discuss abstraction, but typically suggest that abstractions emerge as a consequence of comparing two or more concrete instances of a category [3, 5, 9, 10]. This view, however, will not account for one-shot inferences, or inferences based on a single instance of a relational category. Consider a learner who is shown one instance of a sonnet then asked to create a second instance. Since only one instance is provided, it is hard to see how comparisons between instances could account for success on the task. A single instance, however, will sometimes provide enough information for a schema to be learned, and this schema should allow subsequent instances to be generated [11]. Here we develop a formal framework for exploring relational learning in general and one-shot schema learning in particular. Our framework relies on the hierarchical Bayesian approach, which provides a natural way to combine abstraction and probabilistic inference [12]. The hierarchical Bayesian approach supports representations at multiple levels of abstraction, and helps to explains how abstract representations (e.g. a sonnet schema) can be acquired given observations of concrete instances (e.g. individual sonnets). The schemata we consider are represented as sentences in a logical language, and our approach therefore builds on previous probabilistic methods for learning and using logical theories [13, 14]. Following previous authors, we propose that logical representations can help to capture the content of human knowledge, and that Bayesian inference helps to explain how these representations are acquired and how they support inductive inference. The following sections introduce our framework then evaluate it using two behavioral experiments. Our ﬁrst experiment uses a standard classiﬁcation task where participants are shown one example of a category then asked to decide which of two alternatives is more likely to belong to the same category. Tasks of this kind have previously been used to argue for the importance of comparison, but we suggest that these tasks can be handled by accounts that focus on abstraction. Our second experiment uses a less standard generation task [15, 16] where participants are shown a single example of a category then asked to generate additional examples. As predicted by our abstraction-based account, we ﬁnd that people are able to learn relational categories on the basis of a single example. 1 A generative approach to relational learning Our examples so far have used real-world relational categories such as family and sonnet but we now turn to a very simple domain where relational categorization can be studied. Each element in the domain is a group of components that vary along a number of dimensions—in Figure 1, the components are ﬁgures that vary along the dimensions of size, color, and circle position. The groups can be organized into categories—one such category includes groups where every component is black. Although our domain is rather basic it allows some simple relational regularities to be explored. We can consider categories, for example, where all components in a group must be the same along some dimension, and categories where all components must be different along some dimension. We can also consider categories deﬁned by relationships between dimensions—for example, the category that includes all groups where the size and color dimensions are correlated. Each category is associated with a schema, or an abstract representation that speciﬁes which groups are valid instances of the category. Here we consider schemata that correspond to rules formulated 2 1 2 3 4 5 6 7  ﬀ ˘ ¯ ∀x D (x) =, =, <, > vk ∃xﬀ  i  ﬀ ˘ ¯ ∀x ∀y x = y → D (x) =, =, <, > Di (y) ∃x ∃y x = y ∧ 8 i9 ˘ ¯ <∧= ˘ ¯ ∀x Di (x) =, = vk ∨ Dj (x) =, = vl : ; ↔ 8 9 0 1 <∧= ˘ ¯ ˘ ¯ ∀x∀y x = y → @Di (x) =, =, <, > Di (y) ∨ Dj (x) =, =, <, > Dj (y)A : ; ↔  ﬀ ﬀ ﬀ ˘ ¯ ∀Q ∀x ∀y x = y → Q(x) =, =, <, > Q(y) ∃Q ∃x ∃y x = y ∧ 8 9 0 1  ﬀ <∧= ˘ ¯ ˘ ¯ ∀Q Q = Di → ∀x∀y x = y → @Q(x) =, =, <, > Q(y) ∨ Di (x) =, =, <, > Di (y)A ∃Q Q = Di ∧ : ; ↔ 8 9 0 1  ﬀ ﬀ <∧= ˘ ¯ ˘ ¯ ∀Q ∀R Q = R → ∀x∀y x = y → @Q(x) =, =, <, > Q(y) ∨ R(x) =, =, <, > R(y)A ∃Q ∃R Q = R ∧ : ; ↔ Table 1: Templates used to construct a hypothesis space of logical schemata. An instance of a given template can be created by choosing an element from each set enclosed in braces (some sets are laid out horizontally to save space), replacing each occurrence of Di or Dj with a dimension (e.g. D1 ) and replacing each occurrence of vk or vl with a value (e.g. 1). in a logical language. The language includes three binary connectives—and (∧), or (∨), and if and only if (↔). Four binary relations (=, =, <, and >) are available for comparing values along dimensions. Universal quantiﬁcation (∀x) and existential quantiﬁcation (∃x) are both permitted, and the language includes quantiﬁcation over objects (∀x) and dimensions (∀Q). For example, the schema in Figure 1 states that all dimensions are aligned. More precisely, if D1 is the dimension of size, the schema states that for all dimensions Q, a component x is smaller than a component y along dimension Q if and only if x is smaller in size than y. It follows that all three dimensions must increase or decrease together. To explain how rules in this logical language are learned we work with the hierarchical generative model in Figure 1. The representation at the top level is a schema s, and we assume that one or more groups g are generated from a distribution P (g|s). Following a standard approach to category learning [17, 18], we assume that g is uniformly sampled from all groups consistent with s: p(g|s) ∝ 1 g is consistent with s 0 otherwise (1) For all applications in this paper, we assume that the number of components in a group is known and ﬁxed in advance. The bottom level of the hierarchy speciﬁes observations o that are generated from a distribution P (o|g). In most cases we assume that g can be directly observed, and that P (o|g) = 1 if o = g and 0 otherwise. We also consider the setting shown in Figure 1 where o is generated by concealing a component of g chosen uniformly at random. Note that the observation o in Figure 1 includes only four of the components in group g, and is roughly analogous to our earlier example of a sonnet with an illegible ﬁnal word. To convert Figure 1 into a fully-speciﬁed probabilistic model it remains to deﬁne a prior distribution P (s) over schemata. An appealing approach is to consider all of the inﬁnitely many sentences in the logical language already mentioned, and to deﬁne a prior favoring schemata which correspond to simple (i.e. short) sentences. We approximate this approach by considering a large but ﬁnite space of sentences that includes all instances of the templates in Table 1 and all conjunctions of these instances. When instantiating one of these templates, each occurrence of Di or Dj should be replaced by one of the dimensions in the domain. For example, the schema in Figure 1 is a simpliﬁed instance of template 6 where Di is replaced by D1 . Similarly, each instance of vk or vl should be replaced by a value along one of the dimensions. Our ﬁrst experiment considers a problem where there are are three dimensions and three possible values along each dimension (i.e. vk = 1, 2, or 3). As a result there are 1568 distinct instances of the templates in Table 1 and roughly one million 3 conjunctions of these instances. Our second experiment uses three dimensions with ﬁve values along each dimension, which leads to 2768 template instances and roughly three million conjunctions of these instances. The templates in Table 1 capture most of the simple regularities that can be formulated in our logical language. Template 1 generates all rules that include quantiﬁcation over a single object variable and no binary connectives. Template 3 is similar but includes a single binary connective. Templates 2 and 4 are similar to 1 and 3 respectively, but include two object variables (x and y) rather than one. Templates 5, 6 and 7 add quantiﬁcation over dimensions to Templates 2 and 4. Although the templates in Table 1 capture a large class of regularities, several kinds of templates are not included. Since we do not assume that the dimensions are commensurable, values along different dimensions cannot be directly compared (∃x D1 (x) = D2 (x) is not permitted. For the same reason, comparisons to a dimension value must involve a concrete dimension (∀x D1 (x) = 1 is permitted) rather than a dimension variable (∀Q ∀x Q(x) = 1 is not permitted). Finally, we exclude all schemata where quantiﬁcation over objects precedes quantiﬁcation over dimensions, and as a result there are some simple schemata that our implementation cannot learn (e.g. ∃x∀y∃Q Q(x) = Q(y)). The extension of each schema is a set of groups, and schemata with the same extension can be assigned to the same equivalence class. For example, ∀x D1 (x) = v1 (an instance of template 1) and ∀x D1 (x) = v1 ∧ D1 (x) = v1 (an instance of template 3) end up in the same equivalence class. Each equivalence class can be represented by the shortest sentence that it contains, and we deﬁne our prior P (s) over a set that includes a single representative for each equivalence class. The prior probability P (s) of each sentence is inversely proportional to its length: P (s) ∝ λ|s| , where |s| is the length of schema s and λ is a constant between 0 and 1. For all applications in this paper we set λ = 0.8. The generative model in Figure 1 can be used for several purposes, including schema learning (inferring a schema s given one or more instances generated from the schema), classiﬁcation (deciding whether group gnew belongs to a category given one or more instances of the category) and generation (generating a group gnew that belongs to the same category as one or more instances). Our ﬁrst experiment explores all three of these problems. 2 Experiment 1: Relational classiﬁcation Our ﬁrst experiment is organized around a triad task where participants are shown one example of a category then asked to decide which of two choice examples is more likely to belong to the category. Triad tasks are regularly used by studies of relational categorization, and have been used to argue for the importance of comparison [1]. A comparison-based approach to this task, for instance, might compare the example object to each of the choice objects in order to decide which is the better match. Our ﬁrst experiment is intended in part to explore whether a schema-learning approach can also account for inferences about triad tasks. Materials and Method. 18 adults participated for course credit and interacted with a custom-built computer interface. The stimuli were groups of ﬁgures that varied along three dimensions (color, size, and ball position, as in Figure 1). Each shape was displayed on a single card, and all groups in Experiment 1 included exactly three cards. The cards in Figure 1 show ﬁve different values along each dimension, but Experiment 1 used only three values along each dimension. The experiment included inferences about 10 triads. Participants were told that aliens from a certain planet “enjoy organizing cards into groups,” and that “any group of cards will probably be liked by some aliens and disliked by others.” The ten triad tasks were framed as questions about the preferences of 10 aliens. Participants were shown a group that Mr X likes (different names were used for the ten triads), then shown two choice groups and told that “Mr X likes one of these groups but not the other.” Participants were asked to select one of the choice groups, then asked to generate another 3-card group that Mr X would probably like. Cards could be added to the screen using an “Add Card” button, and there were three pairs of buttons that allowed each card to be increased or decreased along the three dimensions. Finally, participants were asked to explain in writing “what kind of groups Mr X likes.” The ten triads used are shown in Figure 2. Each group is represented as a 3 by 3 matrix where rows represent cards and columns show values along the three dimensions. Triad 1, for example, 4 (a) D1 value always 3 321 332 313 1 0.5 1 231 323 333 1 4 0.5 4 311 122 333 311 113 313 8 12 16 20 24 211 222 233 211 232 223 1 4 0.5 4 211 312 113 8 12 16 20 24 1 1 4 8 12 16 20 24 312 312 312 313 312 312 1 8 12 16 20 24 211 232 123 4 8 12 16 20 24 1 0.5 231 322 213 112 212 312 4 8 12 16 20 24 4 8 12 16 20 24 0.5 1 0.5 0.5 8 12 16 20 24 0.5 4 8 12 16 20 24 0.5 1 1 4 4 (j) Some dimension has no repeats 0.5 1 311 232 123 231 132 333 1 0.5 8 12 16 20 24 0.5 111 312 213 231 222 213 (i) All dimensions have no repeats 331 122 213 4 1 0.5 8 12 16 20 24 0.5 4 8 12 16 20 24 (h) Some dimension uniform 1 4 4 0.5 1 311 212 113 0.5 1 321 122 223 0.5 8 12 16 20 24 0.5 4 0.5 331 322 313 1 0.5 8 12 16 20 24 (f) Two dimensions anti-aligned (g) All dimensions uniform 133 133 133 4 0.5 1 321 222 123 0.5 1 8 12 16 20 24 1 0.5 8 12 16 20 24 1 0.5 111 212 313 331 212 133 1 (e) Two dimensions aligned 311 322 333 311 113 323 4 (d) D1 and D3 anti-aligned 0.5 1 0.5 1 1 0.5 1 0.5 8 12 16 20 24 (c) D2 and D3 aligned 1 132 332 233 1 0.5 331 323 333 (b) D2 uniform 1 311 321 331 8 12 16 20 24 311 331 331 4 8 12 16 20 24 4 8 12 16 20 24 0.5 Figure 2: Human responses and model predictions for the ten triads in Experiment 1. The plot at the left of each panel shows model predictions (white bars) and human preferences (black bars) for the two choice groups in each triad. The plots at the right of each panel summarize the groups created during the generation phase. The 23 elements along the x-axis correspond to the regularities listed in Table 2. 5 1 2 3 4 5 6 7 8 9 10 11 12 All dimensions aligned Two dimensions aligned D1 and D2 aligned D1 and D3 aligned D2 and D3 aligned All dimensions aligned or anti-aligned Two dimensions anti-aligned D1 and D2 anti-aligned D1 and D3 anti-aligned D2 and D3 anti-aligned All dimensions have no repeats Two dimensions have no repeats 13 14 15 16 17 18 19 20 21 22 23 One dimension has no repeats D1 has no repeats D2 has no repeats D3 has no repeats All dimensions uniform Two dimensions uniform One dimension uniform D1 uniform D2 uniform D3 uniform D1 value is always 3 Table 2: Regularities used to code responses to the generation tasks in Experiments 1 and 2 has an example group including three cards that each take value 3 along D1 . The ﬁrst choice group is consistent with this regularity but the second choice group is not. The cards in each group were arrayed vertically on screen, and were initially sorted as shown in Figure 2 (i.e. ﬁrst by D3 , then by D2 and then by D1 ). The cards could be dragged around on screen, and participants were invited to move them around in order to help them understand each group. The mapping between the three dimensions in each matrix and the three dimensions in the experiment (color, position, and size) was randomized across participants, and the order in which triads were presented was also randomized. Model predictions and results. Let ge be the example group presented in the triad task and g1 and g2 be the two choice groups. We use our model to compute the relative probability of two hypotheses: h1 which states that ge and g1 are generated from the same schema and that g2 is sampled randomly from all possible groups, and h2 which states that ge and g2 are generated from the same schema. We set P (h1 ) = P (h2 ) = 0.5, and compute posterior probabilities P (h1 |ge , g1 , g2 ) and P (h2 |ge , g1 , g2 ) by integrating over all schemata in the hypothesis space already described. Our model assumes that two groups are considered similar to the extent that they appear to have been generated by the same underlying schema, and is consistent with the generative approach to similarity described by Kemp et al. [19]. Model predictions for the ten triads are shown in Figure 2. In each case, the choice probabilities plotted (white bars) are the posterior probabilities of hypotheses h1 and h2 . In nine out of ten cases the best choice according to the model is the most common human response. Responses to triads 2c and 2d support the idea that people are sensitive to relationships between dimensions (i.e. alignment and anti-alignment). Triads 2e and 2f are similar to triads studied by Kotovsky and Gentner [1], and we replicate their ﬁnding that people are sensitive to relationships between dimensions even when the dimensions involved vary from group to group. The one case where human responses diverge from model predictions is shown in Figure 2h. Note that the schema for this triad involves existential quantiﬁcation over dimensions (some dimension is uniform), and according to our prior P (s) this kind of quantiﬁcation is no more complex than other kinds of quantiﬁcation. Future applications of our approach can explore the idea that existential quantiﬁcation over dimensions (∃Q) is psychologically more complex than universal quantiﬁcation over dimensions (∀Q) or existential quantiﬁcation over cards (∃x), and can consider logical languages that incorporate this inductive bias. To model the generation phase of the experiment we computed the posterior distribution P (gnew |ge , g1 , g2 ) = P (gnew |s)P (s|h, ge , g1 , g2 )P (h|ge , g1 , g2 ) s,h where P (h|ge , g1 , g2 ) is the distribution used to model selections in the triad task. Since the space of possible groups is large, we visualize this distribution using a proﬁle that shows the posterior probability assigned to groups consistent with the 23 regularities shown in Table 2. The white bar plots in Figure 2 show proﬁles predicted by the model, and the black plots immediately above show proﬁles computed over the groups generated by our 18 participants. In many of the 10 cases the model accurately predicts regularities in the groups generated by people. In case 2c, for example, the model correctly predicts that generated groups will tend to have no repeats along dimensions D2 and D3 (regularities 15 and 16) and that these two dimensions will be aligned (regularities 2 and 5). There are, however, some departures from the model’s predictions, and a notable example occurs in case 2d. Here the model detects the regularity that dimensions D1 and D3 are anti-aligned (regularity 9). Some groups generated by participants are consistent with 6 (a) All dimensions aligned 1 0.5 1 8 12 16 20 24 (c) D1 has no repeats, D2 and D3 uniform 1 8 12 16 20 24 0.5 1 8 12 16 20 24 354 312 1 8 12 16 20 24 4 8 12 16 20 24 4 8 12 16 20 24 0.5 423 414 214 315 0.5 314 0.5 0.5 4 8 12 16 20 24 1 251 532 314 145 0.5 4 8 12 16 20 24 (f) All dimensions have no repeats 1 1 335 8 12 16 20 24 (e) All dimensions uniform 1 4 0.5 432 514 324 224 424 0.5 314 314 314 314 8 12 16 20 24 4 1 0.5 4 4 0.5 314 0.5 4 8 12 16 20 24 1 431 433 135 335 0.5 1 4 (d) D2 uniform 1 433 1 322 8 12 16 20 24 0.5 0.5 344 333 223 555 222 4 1 1 0.5 0.5 124 224 324 524 311 322 333 354 324 1 0.5 4 311 322 333 355 134 121 232 443 555 443 1 111 333 444 555 (b) D2 and D3 aligned Figure 3: Human responses and model predictions for the six cases in Experiment 2. In (a) and (b), the 4 cards used for the completion and generation phases are shown on either side of the dashed line (completion cards on the left). In the remaining cases, the same 4 cards were used for both phases. The plots at the right of each panel show model predictions (white bars) and human responses (black bars) for the generation task. In each case, the 23 elements along each x-axis correspond to the regularities listed in Table 2. The remaining plots show responses to the completion task. There are 125 possible responses, and the four responses shown always include the top two human responses and the top two model predictions. this regularity, but people also regularly generate groups where two dimensions are aligned rather than anti-aligned (regularity 2). This result may indicate that some participants are sensitive to relationships between dimensions but do not consider the difference between a positive relationship (alignment) and an inverse relationship (anti-alignment) especially important. Kotovsky and Gentner [1] suggest that comparison can explain how people respond to triad tasks, although they do not provide a computational model that can be compared with our approach. It is less clear how comparison might account for our generation data, and our next experiment considers a one-shot generation task that raises even greater challenges for a comparison-based approach. 3 Experiment 2: One-shot schema learning As described already, comparison involves constructing mappings between pairs of category instances. In some settings, however, learners make conﬁdent inferences given a single instance of a category [15, 20], and it is difﬁcult to see how comparison could play a major role when only one instance is available. Models that rely on abstraction, however, can naturally account for one-shot relational learning, and we designed a second experiment to evaluate this aspect of our approach. 7 Several previous studies have explored one-shot relational learning. Holyoak and Thagard [21] developed a study of analogical reasoning using stories as stimuli and found little evidence of oneshot schema learning. Ahn et al. [11] demonstrated, however, that one-shot learning can be achieved with complex materials such as stories, and modeled this result using explanation-based learning. Here we use much simpler stimuli and explore a probabilistic approach to one-shot learning. Materials and Method. 18 adults participated for course credit. The same individuals completed Experiments 1 and 2, and Experiment 2 was always run before Experiment 1. The same computer interface was used in both experiments, and the only important difference was that the ﬁgures in Experiment 2 could now take ﬁve values along each dimension rather than three. The experiment included two phases. During the generation phase, participants saw a 4-card group that Mr X liked and were asked to generate two 5-card groups that Mr X would probably like. During the completion phase, participants were shown four members of a 5-card group and were asked to generate the missing card. The stimuli used in each phase are shown in Figure 3. In the ﬁrst two cases, slightly different stimuli were used in the generation and completion phases, and in all remaining cases the same set of four cards was used in both cases. All participants responded to the six generation questions before answering the six completion questions. Model predictions and results. The generation phase is modeled as in Experiment 1, but now the posterior distribution P (gnew |ge ) is computed after observing a single instance of a category. The human responses in Figure 3 (white bars) are consistent with the model in all cases, and conﬁrm that a single example can provide sufﬁcient evidence for learners to acquire a relational category. For example, the most common response in case 3a was the 5-card group shown in Figure 1—a group with all three dimensions aligned. To model the completion phase, let oe represent a partial observation of group ge . Our model infers which card is missing from ge by computing the posterior distribution P (ge |oe ) ∝ P (oe |ge ) s P (ge |s)P (s), where P (oe |ge ) captures the idea that oe is generated by randomly concealing one component of ge . The white bars in Figure 3 show model predictions, and in ﬁve out of six cases the best response according to the model is the same as the most common human response. In the remaining case (Figure 3d) the model generates a diffuse distribution over all cards with value 3 on dimension 2, and all human responses satisfy this regularity. 4 Conclusion We presented a generative model that helps to explain how relational categories are learned and used. Our approach captures relational regularities using a logical language, and helps to explain how schemata formulated in this language can be learned from observed data. Our approach differs in several respects from previous accounts of relational categorization [1, 5, 10, 22]. First, we focus on abstraction rather than comparison. Second, we consider tasks where participants must generate examples of categories [16] rather than simply classify existing examples. Finally, we provide a formal account that helps to explain how relational categories can be learned from a single instance. Our approach can be developed and extended in several ways. For simplicity, we implemented our model by working with a ﬁnite space of several million schemata, but future work can consider hypothesis spaces that assign non-zero probability to all regularities that can be formulated in the language we described. The speciﬁc logical language used here is only a starting point, and future work can aim to develop languages that provide a more faithful account of human inductive biases. Finally, we worked with a domain that provides one of the simplest ways to address core questions such as one-shot learning. Future applications of our general approach can consider domains that include more than three dimensions and a richer space of relational regularities. Relational learning and analogical reasoning are tightly linked, and hierarchical generative models provide a promising approach to both problems. We focused here on relational categorization, but future studies can explore whether probabilistic accounts of schema learning can help to explain the inductive inferences typically considered by studies of analogical reasoning. Although there are many models of analogical reasoning, there are few that pursue a principled probabilistic approach, and the hierarchical Bayesian approach may help to ﬁll this gap in the literature. Acknowledgments We thank Maureen Satyshur for running the experiments. This work was supported in part by NSF grant CDI-0835797. 8 References [1] L. Kotovsky and D. Gentner. Comparison and categorization in the development of relational similarity. Child Development, 67:2797–2822, 1996. [2] D. Gentner and A. B. Markman. Structure mapping in analogy and similarity. American Psychologist, 52:45–56, 1997. [3] D. Gentner and J. Medina. Similarity and the development of rules. Cognition, 65:263–297, 1998. [4] B. Falkenhainer, K. D. Forbus, and D. Gentner. The structure-mapping engine: Algorithm and examples. Artiﬁcial Intelligence, 41:1–63, 1989. [5] J. E. Hummel and K. J. Holyoak. A symbolic-connectionist theory of relational inference and generalization. Psychological Review, 110:220–264, 2003. [6] M. Mitchell. Analogy-making as perception: a computer model. MIT Press, Cambridge, MA, 1993. [7] D. R. Hofstadter and the Fluid Analogies Research Group. Fluid concepts and creative analogies: computer models of the fundamental mechanisms of thought. 1995. [8] W. V. O. Quine and J. Ullian. The Web of Belief. Random House, New York, 1978. [9] J. Skorstad, D. Gentner, and D. Medin. Abstraction processes during concept learning: a structural view. In Proceedings of the 10th Annual Conference of the Cognitive Science Society, pages 419–425. 2009. [10] D. Gentner and J. Loewenstein. Relational language and relational thought. In E. Amsel and J. P. Byrnes, editors, Language, literacy and cognitive development: the development and consequences of symbolic communication, pages 87–120. 2002. [11] W. Ahn, W. F. Brewer, and R. J. Mooney. Schema acquisition from a single example. Journal of Experimental Psychology: Learning, Memory and Cognition, 18(2):391–412, 1992. [12] A. Gelman, J. B. Carlin, H. S. Stern, and D. B. Rubin. Bayesian data analysis. Chapman & Hall, New York, 2nd edition, 2003. [13] C. Kemp, N. D. Goodman, and J. B. Tenenbaum. Learning and using relational theories. In J.C. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20, pages 753–760. MIT Press, Cambridge, MA, 2008. [14] S. Kok and P. Domingos. Learning the structure of Markov logic networks. In Proceedings of the 22nd International Conference on Machine Learning, 2005. [15] J. Feldman. The structure of perceptual categories. Journal of Mathematical Psychology, 41: 145–170, 1997. [16] A. Jern and C. Kemp. Category generation. In Proceedings of the 31st Annual Conference of the Cognitive Science Society, pages 130–135. Cognitive Science Society, Austin, TX, 2009. [17] D. Conklin and I. H. Witten. Complexity-based induction. Machine Learning, 16(3):203–225, 1994. [18] J. B. Tenenbaum and T. L. Grifﬁths. Generalization, similarity, and Bayesian inference. Behavioral and Brain Sciences, 24:629–641, 2001. [19] C. Kemp, A. Bernstein, and J. B. Tenenbaum. A generative theory of similarity. In B. G. Bara, L. Barsalou, and M. Bucciarelli, editors, Proceedings of the 27th Annual Conference of the Cognitive Science Society, pages 1132–1137. Lawrence Erlbaum Associates, 2005. [20] C. Kemp, N. D. Goodman, and J. B. Tenenbaum. Theory acquisition and the language of thought. In Proceedings of the 30th Annual Conference of the Cognitive Science Society, pages 1606–1611. Cognitive Science Society, Austin, TX, 2008. [21] K. J. Holyoak and P. Thagard. Analogical mapping by constraint satisfaction. Cognitive Science, 13(3):295–355, 1989. [22] L. A. A. Doumas, J. E. Hummel, and C. M. Sandhofer. A theory of the discovery and predication of relational concepts. Psychological Review, 115(1):1–43, 2008. [23] M. L. Gick and K. J. Holyoak. Schema induction and analogical transfer. Cognitive Psychology, 15:1–38, 1983. 9

4 0.67489052 26 nips-2009-Adaptive Regularization for Transductive Support Vector Machine

Author: Zenglin Xu, Rong Jin, Jianke Zhu, Irwin King, Michael Lyu, Zhirong Yang

Abstract: We discuss the framework of Transductive Support Vector Machine (TSVM) from the perspective of the regularization strength induced by the unlabeled data. In this framework, SVM and TSVM can be regarded as a learning machine without regularization and one with full regularization from the unlabeled data, respectively. Therefore, to supplement this framework of the regularization strength, it is necessary to introduce data-dependant partial regularization. To this end, we reformulate TSVM into a form with controllable regularization strength, which includes SVM and TSVM as special cases. Furthermore, we introduce a method of adaptive regularization that is data dependant and is based on the smoothness assumption. Experiments on a set of benchmark data sets indicate the promising results of the proposed work compared with state-of-the-art TSVM algorithms. 1

5 0.58817863 213 nips-2009-Semi-supervised Learning using Sparse Eigenfunction Bases

Author: Kaushik Sinha, Mikhail Belkin

Abstract: We present a new framework for semi-supervised learning with sparse eigenfunction bases of kernel matrices. It turns out that when the data has clustered, that is, when the high density regions are sufﬁciently separated by low density valleys, each high density area corresponds to a unique representative eigenvector. Linear combination of such eigenvectors (or, more precisely, of their Nystrom extensions) provide good candidates for good classiﬁcation functions when the cluster assumption holds. By ﬁrst choosing an appropriate basis of these eigenvectors from unlabeled data and then using labeled data with Lasso to select a classiﬁer in the span of these eigenvectors, we obtain a classiﬁer, which has a very sparse representation in this basis. Importantly, the sparsity corresponds naturally to the cluster assumption. Experimental results on a number of real-world data-sets show that our method is competitive with the state of the art semi-supervised learning algorithms and outperforms the natural base-line algorithm (Lasso in the Kernel PCA basis). 1

6 0.57952535 19 nips-2009-A joint maximum-entropy model for binary neural population patterns and continuous signals

7 0.57548326 62 nips-2009-Correlation Coefficients are Insufficient for Analyzing Spike Count Dependencies

8 0.57110769 162 nips-2009-Neural Implementation of Hierarchical Bayesian Inference by Importance Sampling

9 0.56754392 169 nips-2009-Nonlinear Learning using Local Coordinate Coding

10 0.56448269 52 nips-2009-Code-specific policy gradient rules for spiking neurons

11 0.55989814 166 nips-2009-Noisy Generalized Binary Search

12 0.55811685 38 nips-2009-Augmenting Feature-driven fMRI Analyses: Semi-supervised learning and resting state activity

13 0.55601799 254 nips-2009-Variational Gaussian-process factor analysis for modeling spatio-temporal data

14 0.55553156 174 nips-2009-Nonparametric Latent Feature Models for Link Prediction

15 0.55541593 210 nips-2009-STDP enables spiking neurons to detect hidden causes of their inputs

16 0.55511773 158 nips-2009-Multi-Label Prediction via Sparse Infinite CCA

17 0.55436879 224 nips-2009-Sparse and Locally Constant Gaussian Graphical Models

18 0.55152434 18 nips-2009-A Stochastic approximation method for inference in probabilistic graphical models

19 0.55112076 163 nips-2009-Neurometric function analysis of population codes

20 0.55031031 191 nips-2009-Positive Semidefinite Metric Learning with Boosting