nips nips2003 nips2003-160 knowledge-graph by maker-knowledge-mining

160 nips-2003-Prediction on Spike Data Using Kernel Algorithms

Source: pdf

Author: Jan Eichhorn, Andreas Tolias, Alexander Zien, Malte Kuss, Jason Weston, Nikos Logothetis, Bernhard Schölkopf, Carl E. Rasmussen

Abstract: We report and compare the performance of different learning algorithms based on data from cortical recordings. The task is to predict the orientation of visual stimuli from the activity of a population of simultaneously recorded neurons. We compare several ways of improving the coding of the input (i.e., the spike data) as well as of the output (i.e., the orientation), and report the results obtained using different kernel algorithms. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 The task is to predict the orientation of visual stimuli from the activity of a population of simultaneously recorded neurons. [sent-5, score-0.548]

2 We compare several ways of improving the coding of the input (i. [sent-6, score-0.121]

3 , the orientation), and report the results obtained using different kernel algorithms. [sent-10, score-0.259]

4 1 Introduction Recently, there has been a great deal of interest in using the activity from a population of neurons to predict or reconstruct the sensory input [1, 2], motor output [3, 4] or the trajectory of movement of an animal in space [5]. [sent-11, score-0.44]

5 This analysis is of importance since it may lead to a better understanding of the coding schemes utilised by networks of neurons in the brain. [sent-12, score-0.187]

6 The task is to reconstruct the angle of a visual stimulus, which can take eight discrete values, from the activity of simultaneously recorded neurons. [sent-19, score-0.315]

7 A clever encoding of the input data might reﬂect, for example, known invariances of the problem, or assumptions about the similarity structure of the data motivated by scientiﬁc insights. [sent-22, score-0.191]

8 An algorithmic approach which currently enjoys great popularity in the machine learning community, called kernel machines, makes these assumptions explicit by the choice of a kernel function. [sent-23, score-0.563]

9 The kernel can be thought of as a mathematical formalisation of a similarity measure that ideally captures much of this prior knowledge about the data domain. [sent-24, score-0.405]

10 Note that unlike many traditional machine learning methods, kernel machines can readily handle data that is not in the form of vectors of numbers, but also complex data types, such as strings, graphs, or spike trains. [sent-25, score-0.674]

11 Recently, a kernel for spike trains was proposed whose design is based on a number of biologically motivated assumptions about the structure of spike data [6]. [sent-26, score-1.194]

12 Just like the inputs, also the stimuli perceived or the actions carried out by an animal are in general not given to us in vectorial form. [sent-28, score-0.161]

13 Moreover, biologically meaningful similarity measures and loss functions may be very different from those used traditionally in pattern recognition. [sent-29, score-0.243]

14 In the problem at hand, the outputs are orientations of a stimulus and thus it would be desirable to use a method which takes their circular structure into account. [sent-31, score-0.325]

15 In this paper, we will utilise the recently proposed kernel dependency estimation technique [7] that can cope with general sets of outputs and and a large class of loss functions in a principled manner. [sent-32, score-0.513]

16 The dimensionality of the spike data can be very high, in particular if the data stem from multicellular recording and if the temporal resolution is high. [sent-35, score-0.44]

17 2 Learning algorithms, kernels and output coding In supervised machine learning, we basically attempt to discover dependencies between variables based on a ﬁnite set of observations (called the training set) {(xi , yi )|i = 1, . [sent-40, score-0.157]

18 The xi ∈ X are referred to as inputs and are taken from a domain X; likewise, the y ∈ Y are called outputs and the objective is to approximate the mapping X → Y between the domains from the samples. [sent-44, score-0.182]

19 The input points get mapped to a possibly highdimensional dot product space (called the feature space) using Φ, and in that space the learning problem is tackled using simple linear geometric methods (see [8] for details). [sent-49, score-0.252]

20 All geometric methods that are based on distances and angles can be performed in terms of the dot product. [sent-50, score-0.208]

21 The ”kernel trick” is to calculate the inner product of feature space mapped points using a kernel function k(xi , xj ) = Φ(xi ), Φ(xj ) . [sent-51, score-0.455]

22 In order for k to be interpretable as a dot product in some feature space it has to be a positive deﬁnite function. [sent-53, score-0.148]

23 1 Support Vector Classiﬁcation and Gaussian Process Regression A simple geometric classiﬁcation method which is based on dot products and which is the basis of support vector machines is linear classiﬁcation via separating hyperplanes. [sent-55, score-0.215]

24 One can show that the so-called optimal separating hyperplane (the one that leads to the largest margin of separation between the classes) can be written in feature space as w, Φ(x) +b = 0, where the hyperplane normal vector can be expanded in terms of the training points as m w = i=1 λi Φ(xi ). [sent-56, score-0.224]

25 The central idea of support vector machines is thus that we can perform linear classiﬁcation in a high-dimensional feature space using a kernel which can be seen as a (nonlinear) similarity measure for the input data. [sent-60, score-0.557]

26 A popular nonlinear kernel function is the Gaussian kernel k(xi , xj ) = exp(− xi − xj 2 /2σ 2 ). [sent-61, score-0.659]

27 This kernel has been successfully used to predict stimulus parameters using spikes from simultaneously recorded data [2]. [sent-62, score-0.569]

28 ) of the functions are given by the covariance function or covariance kernel; it controls how the outputs covary as a function of the inputs. [sent-66, score-0.193]

29 In the experiments below (assuming x ∈ RD ) we use a Gaussian kernel of the form D 1 2 Cov(yi , yj ) = k(xi , xj ) = v 2 exp − xd − xd 2 /wd (3) i j 2 d=1 with parameters v and w = (w1 , . [sent-67, score-0.386]

30 This covariance function expresses that outputs whose inputs are nearby have large covariance, and outputs that belong to inputs far apart have smaller covariance. [sent-71, score-0.307]

31 2 Similarity measures for spike data To take advantage of the strength of kernel machines in the analysis of cortical recordings we will explore the usefulness of different kernel functions. [sent-76, score-1.052]

32 We describe the spikernel introduced in [6] and present a novel use of alignment-type scores typically used in bioinformatics. [sent-77, score-0.223]

33 Although we are far from understanding the neuronal code, there exist some reasonable assumptions about the structure of spike data one has to take into account when comparing spike patterns and designing kernels. [sent-78, score-0.825]

34 • Most fundamental is the assumption that frequency and temporal coding play central roles. [sent-79, score-0.177]

35 Information related to a certain variable of the stimulus may be coded in highly speciﬁc temporal patterns contained in the spike trains of a cortical population. [sent-80, score-0.794]

36 To compare spike trains it might be necessary to realign them by introducing a certain time shift. [sent-82, score-0.508]

37 We want the similarity score to be the higher the smaller this time shift is. [sent-83, score-0.24]

38 proposed a kernel for spike trains that was designed with respect to the assumptions above and some extra assumptions related to the special task to be solved. [sent-86, score-0.857]

39 To understand their ideas it is most instructive to have a look at the feature map Φ rather than at the kernel itself. [sent-87, score-0.401]

40 The feature map maps this sequence into a high dimensional space where the coordinates u represent a possible spike train prototype of ﬁxed length n ≤ |s|. [sent-89, score-0.692]

41 The value of the feature map of s, Φu (s), represents the similarity of s to the prototype u. [sent-90, score-0.332]

42 The u component of the feature vector Φ(s) is deﬁned as: n µd(si ,u) λ|s|−i1 Φu (s) = C 2 (4) i∈In,|s| Here i is an index vector that indexes a length n ordered subsequence of s and the sum runs over all possible subsequences. [sent-91, score-0.163]

43 Following the authors we chose the distance measure d(si,k , uk ), determining how two ﬁring rate vectors are compared, to be the squared l2 norm: d(si,k , uk ) = si,k − uk 2 . [sent-94, score-0.194]

44 Note, that each entry sk of the sequence (-matrix) s is 2 meant to be a vector containing the ﬁring rates of all simultaneously recorded neurons in the same time interval (bin). [sent-95, score-0.223]

45 The kernel kn (s, t) induced by this feature map can be computed in time O(|s||t|n) using dynamic programming. [sent-96, score-0.401]

46 The kernel used in our experiments is a sum of kernels for different pattern lengths n weighted with another parameter p, i. [sent-97, score-0.295]

47 In addition to methods developed speciﬁcally for neural spike train data, we also train on pairwise similarities derived from global alignments. [sent-101, score-0.554]

48 Aligning sequences is a standard method in bioinformatics; there, the sequences usually describe DNA, RNA or protein molecules. [sent-102, score-0.114]

49 Here, the sequences are time-binned representations of the spike trains, as described above. [sent-103, score-0.404]

50 t|t| , each sequence may be elongated by inserting copies of a special symbol (the dash, “ ”) at any position, yielding two stuffed sequences s and t . [sent-110, score-0.205]

51 The ﬁrst requirement is that the stuffed sequences must have the same length. [sent-111, score-0.121]

52 This allows to write them on top of each other, so that each symbol of s is either mapped to a symbol of t (match/mismatch), or mapped to a dash (gap), and vice versa. [sent-112, score-0.316]

53 The second requirement for a valid alignment is that no dash is mapped to a dash, which restricts the length of any alignment to a maximum of |s| + |t|. [sent-113, score-0.601]

54 Once costs are assigned to the matches and gaps, the cost of an alignment is deﬁned as the sum of costs in the alignment. [sent-114, score-0.298]

55 The distance of s and t can now be deﬁned as the cost of an optimal global alignment of s and t, where optimal means minimising the cost. [sent-115, score-0.206]

56 We parameterise the costs with γ and µ as follows: c(a, b) = c(b, a) c(a, ) = c( , a) := |a − b| := γ|a − µ| The matrix of pairwise distances as deﬁned above will, in general, not be a proper kernel (i. [sent-118, score-0.34]

57 We use the alignment score to compute explicit feature vectors of the data points via an empirical kernel map [8, p. [sent-123, score-0.701]

58 Since our alignment score kalign (n, n ) applies to single spike trains only2 , we compute the empirical kernel map for each neuron separately and then concatenate these vectors. [sent-133, score-1.288]

59 m , n20 )}] Thus, each trial is represented by a vector of its alignment score with respect to all other trials where alignments are computed separately for all 20 neurons. [sent-152, score-0.383]

60 We can now train kernel machines using any standard kernel on top of this representation, but we already achieve very good performance using the simple linear kernel (see results section). [sent-153, score-0.929]

61 Although we give results obtained with this technique of constructing a feature map only for the alignment score, it can be easily applied with the spikernel and other kernels. [sent-154, score-0.571]

62 3 Coding structure in output space Our objective is to use various machine learning algorithms to predict the orientation of a stimulus used in the experiment described below. [sent-156, score-0.287]

63 Above, we explained how to do binary classiﬁcation using SVMs by estimating a normal vector w and offset b of a hyperplane w, Φ(x) + b = 0 in the feature space. [sent-159, score-0.13]

64 If we have M > 2 classes, we can train M classiﬁers, each one separating one speciﬁc class from the union of all other ones (hence the name “one-versus-rest”). [sent-161, score-0.132]

65 A more sophisticated and more expensive method is to train one classiﬁer for each possible combination of two classes and then use a voting scheme to classify a point. [sent-163, score-0.12]

66 In our situation, however, certain classes are “closer” to each other since the corresponding stimulus angles are closer than others. [sent-167, score-0.244]

67 To take this into account, we use the kernel dependency estimation (KDE) algorithm [7] with an output similarity measure corresponding to a loss function of the angles taking the form L(α, β) = cos(2α − 2β). [sent-168, score-0.61]

68 One feature space corresponds to the kernel used on the inputs (in our case, the spike trains), and the other one to a second kernel which encodes the similarity measure to be used on the outputs (the orientation of the lines). [sent-172, score-1.321]

69 When we use Gaussian processes to predict the stimulus angle α we consider the task as a regression problem on sin 2α and cos 2α separately. [sent-174, score-0.405]

70 2 It is straightforward to extend this idea to synchronous alignments of the whole population vector, but we achieved worse results. [sent-176, score-0.189]

71 do prediction we take the means of the predicted distributions of sin 2α and cos 2α as point estimates respectively, which are then projected onto the unit circle. [sent-180, score-0.126]

72 Finally we assign the averaged predicted angle to the nearest orientation which could have been shown. [sent-181, score-0.138]

73 The a spike data were recorded using tetrodes inserted in area V1 of a behaving macaque (Macaca Mulatta). [sent-186, score-0.469]

74 A single stimulus of ﬁxed orientation and contrast was presented for a period of 500 ms, i. [sent-190, score-0.271]

75 Spiking activity from neural recordings usually come as a time series of action potentials from one or more neurons recorded from the brain. [sent-194, score-0.283]

76 Therefore we can abstract the spike series as a series of zeros and ones. [sent-196, score-0.347]

77 We compute the ﬁring rates from the high resolution data for each neuron in 1, 5 or 10 bins of length 500, 100 or 50ms respectively, resulting in three different data representations for different temporal resolutions. [sent-198, score-0.41]

78 , 20) containing the bins of each neuron we obtain one data point x = [n1 n2 . [sent-202, score-0.277]

79 Below we validate our reasoning on input and output coding with several experiments. [sent-207, score-0.164]

80 We will compare the kernel algorithms KDE, SVM and Gaussian Processes (GP) and a simple k-nearest neighbour approach (k-NN) that we applied with different kernels and different data representations. [sent-208, score-0.295]

81 As reference values, we give the performance of a standard Bayesian reconstruction method (assuming independent neurons with Poisson characteristics), a Template Matching method and the standard Population Vector method as they are described e. [sent-209, score-0.116]

82 4 We use four out of the ﬁve folds of the data to choose the parameters of the kernel and the method. [sent-213, score-0.319]

83 Finally we train the best model on these four folds and compute an independent test error on the remaining fold. [sent-215, score-0.144]

84 After we knew its order of magnitude, we chose the σ-parameter of the Gaussian kernel from a linear grid (σ = 1, 2, . [sent-228, score-0.297]

85 The spikernel has four parameters: λ, µ, N and p. [sent-232, score-0.223]

86 The stimulus in our experiment was perceived over the whole period of recording. [sent-233, score-0.181]

87 Therefore we do not want any increasing weight of the similarity score towards the beginning or the end of the spikesequence and we ﬁx λ = 1. [sent-234, score-0.24]

88 Further we chose N = 10 to be the length of our sequence, and thereby consider patterns of all possible lengths. [sent-235, score-0.119]

89 7 Table 1 Mean test error and standard error on the low contrast dataset Gaussian Kernel KDE SVM (1-vs-rest) SVM (1-vs-1) k-NN GP 10 bins 1 bin 10 bins 1 bin 10 bins 1 bin 10 bins 1 bin 2 bins ‡ 1 bin Spikernel Alignment score 16. [sent-257, score-2.597]

90 0◦ Table 2 Mean test error and standard error on the high contrast dataset Gaussian Kernel KDE SVM (1-vs-rest) SVM (1-vs-1) k-NN GP 10 bins 1 bin 10 bins 1 bin 10 bins 1 bin 10 bins 1 bin 2 bins ‡ 1 bin Spikernel Alignment score 1. [sent-304, score-2.597]

91 7◦ † We report this number only for comparison, since the spikernel relies on temporal patterns and it makes no sense to use only one bin. [sent-351, score-0.32]

92 ‡ A 10 bin resolution would require to determine 200 parameters w d of the covariance function (3) from only 192 samples. [sent-352, score-0.342]

93 Using crossvalidation instead resembles very much Kernel Ridge Regression on sin 2α and cos 2α which is almost exactly what KDE is doing when applied with the loss function (5). [sent-355, score-0.188]

94 We can observe that standard techniques for decoding, namely Population vector, Template Matching and a particular Bayesian reconstruction method, can be outperformed by state-of-the-art kernel methods when applied with an appropriate kernel and suitable data representation. [sent-359, score-0.568]

95 We found that the accuracy of kernel methods can in most cases be improved by utilising task speciﬁc similarity measures for spike trains, such as the spikernel or the introduced alignment distances from bioinformatics. [sent-360, score-1.216]

96 Due to the (by machine learning standards) relatively small size of the analysed datasets, it is hard to draw conclusions regarding which of the applied kernel methods performs best. [sent-361, score-0.259]

97 Rather than focusing too much on the differences in performance, we want to emphasise the capability of kernel machines to assay different decoding hypotheses by choosing appropriate kernel functions. [sent-362, score-0.672]

98 Analysing their respective performance may provide insight about how spike trains carry information and thus about the nature of neural coding. [sent-363, score-0.508]

99 Interpreting neuronal population activity by reconstruction: uniﬁed framework with application to hippocampal place cells. [sent-412, score-0.221]

100 Nature and precision of temporal coding in visual cortex: a metric-space analysis. [sent-468, score-0.212]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('spike', 0.347), ('kernel', 0.259), ('bin', 0.252), ('bins', 0.241), ('spikernel', 0.223), ('alignment', 0.206), ('kde', 0.193), ('trains', 0.161), ('similarity', 0.146), ('stimulus', 0.134), ('kalign', 0.127), ('coding', 0.121), ('population', 0.106), ('orientation', 0.099), ('score', 0.094), ('outputs', 0.087), ('feature', 0.084), ('train', 0.084), ('alignments', 0.083), ('recorded', 0.083), ('ring', 0.082), ('dash', 0.08), ('angles', 0.074), ('template', 0.074), ('cos', 0.072), ('activity', 0.07), ('dependency', 0.069), ('mapped', 0.069), ('machines', 0.068), ('ridge', 0.066), ('neurons', 0.066), ('recordings', 0.064), ('dot', 0.064), ('shpigelman', 0.064), ('stuffed', 0.064), ('tolias', 0.064), ('fold', 0.063), ('loss', 0.062), ('stimuli', 0.062), ('svm', 0.06), ('orientations', 0.06), ('gp', 0.06), ('folds', 0.06), ('map', 0.058), ('sequences', 0.057), ('temporal', 0.056), ('xi', 0.055), ('cortical', 0.055), ('predict', 0.054), ('sin', 0.054), ('covariance', 0.053), ('regression', 0.052), ('animal', 0.052), ('uk', 0.052), ('assay', 0.05), ('victor', 0.05), ('reconstruction', 0.05), ('reconstruct', 0.049), ('symbol', 0.049), ('ve', 0.049), ('separating', 0.048), ('gaussian', 0.047), ('perceived', 0.047), ('classi', 0.047), ('hyperplane', 0.046), ('costs', 0.046), ('assumptions', 0.045), ('neuronal', 0.045), ('matching', 0.045), ('circular', 0.044), ('prototype', 0.044), ('monitor', 0.044), ('reasoning', 0.043), ('movement', 0.043), ('xj', 0.043), ('xd', 0.042), ('patterns', 0.041), ('length', 0.04), ('inputs', 0.04), ('simultaneously', 0.039), ('angle', 0.039), ('similarities', 0.039), ('macaque', 0.039), ('subsequence', 0.039), ('contrast', 0.038), ('chose', 0.038), ('resolution', 0.037), ('classes', 0.036), ('editors', 0.036), ('kernels', 0.036), ('cope', 0.036), ('decoding', 0.036), ('neuron', 0.036), ('visual', 0.035), ('sequence', 0.035), ('geometric', 0.035), ('biologically', 0.035), ('distances', 0.035), ('si', 0.035), ('weston', 0.034)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999917 160 nips-2003-Prediction on Spike Data Using Kernel Algorithms

Author: Jan Eichhorn, Andreas Tolias, Alexander Zien, Malte Kuss, Jason Weston, Nikos Logothetis, Bernhard Schölkopf, Carl E. Rasmussen

2 0.21910852 112 nips-2003-Learning to Find Pre-Images

Author: Jason Weston, Bernhard Schölkopf, Gökhan H. Bakir

Abstract: We consider the problem of reconstructing patterns from a feature map. Learning algorithms using kernels to operate in a reproducing kernel Hilbert space (RKHS) express their solutions in terms of input points mapped into the RKHS. We introduce a technique based on kernel principal component analysis and regression to reconstruct corresponding patterns in the input space (aka pre-images) and review its performance in several applications requiring the construction of pre-images. The introduced technique avoids difﬁcult and/or unstable numerical optimization, is easy to implement and, unlike previous methods, permits the computation of pre-images in discrete input spaces. 1

3 0.21255197 125 nips-2003-Maximum Likelihood Estimation of a Stochastic Integrate-and-Fire Neural Model

Author: Liam Paninski, Eero P. Simoncelli, Jonathan W. Pillow

Abstract: Recent work has examined the estimation of models of stimulus-driven neural activity in which some linear ﬁltering process is followed by a nonlinear, probabilistic spiking stage. We analyze the estimation of one such model for which this nonlinear step is implemented by a noisy, leaky, integrate-and-ﬁre mechanism with a spike-dependent aftercurrent. This model is a biophysically plausible alternative to models with Poisson (memory-less) spiking, and has been shown to effectively reproduce various spiking statistics of neurons in vivo. However, the problem of estimating the model from extracellular spike train data has not been examined in depth. We formulate the problem in terms of maximum likelihood estimation, and show that the computational problem of maximizing the likelihood is tractable. Our main contribution is an algorithm and a proof that this algorithm is guaranteed to ﬁnd the global optimum with reasonable speed. We demonstrate the effectiveness of our estimator with numerical simulations. A central issue in computational neuroscience is the characterization of the functional relationship between sensory stimuli and neural spike trains. A common model for this relationship consists of linear ﬁltering of the stimulus, followed by a nonlinear, probabilistic spike generation process. The linear ﬁlter is typically interpreted as the neuron’s “receptive ﬁeld,” while the spiking mechanism accounts for simple nonlinearities like rectiﬁcation and response saturation. Given a set of stimuli and (extracellularly) recorded spike times, the characterization problem consists of estimating both the linear ﬁlter and the parameters governing the spiking mechanism. One widely used model of this type is the Linear-Nonlinear-Poisson (LNP) cascade model, in which spikes are generated according to an inhomogeneous Poisson process, with rate determined by an instantaneous (“memoryless”) nonlinear function of the ﬁltered input. This model has a number of desirable features, including conceptual simplicity and computational tractability. Additionally, reverse correlation analysis provides a simple unbiased estimator for the linear ﬁlter [5], and the properties of estimators (for both the linear ﬁlter and static nonlinearity) have been thoroughly analyzed, even for the case of highly non-symmetric or “naturalistic” stimuli [12]. One important drawback of the LNP model, * JWP and LP contributed equally to this work. We thank E.J. Chichilnisky for helpful discussions. L−NLIF model LNP model )ekips(P Figure 1: Simulated responses of LNLIF and LNP models to 20 repetitions of a ﬁxed 100-ms stimulus segment of temporal white noise. Top: Raster of responses of L-NLIF model, where σnoise /σsignal = 0.5 and g gives a membrane time constant of 15 ms. The top row shows the ﬁxed (deterministic) response of the model with σnoise set to zero. Middle: Raster of responses of LNP model, with parameters ﬁt with standard methods from a long run of the L-NLIF model responses to nonrepeating stimuli. Bottom: (Black line) Post-stimulus time histogram (PSTH) of the simulated L-NLIF response. (Gray line) PSTH of the LNP model. Note that the LNP model fails to preserve the ﬁne temporal structure of the spike trains, relative to the L-NLIF model. 001 05 0 )sm( emit however, is that Poisson processes do not accurately capture the statistics of neural spike trains [2, 9, 16, 1]. In particular, the probability of observing a spike is not a functional of the stimulus only; it is also strongly affected by the recent history of spiking. The leaky integrate-and-ﬁre (LIF) model provides a biophysically more realistic spike mechanism with a simple form of spike-history dependence. This model is simple, wellunderstood, and has dynamics that are entirely linear except for a nonlinear “reset” of the membrane potential following a spike. Although this model’s overriding linearity is often emphasized (due to the approximately linear relationship between input current and ﬁring rate, and lack of active conductances), the nonlinear reset has signiﬁcant functional importance for the model’s response properties. In previous work, we have shown that standard reverse correlation analysis fails when applied to a neuron with deterministic (noise-free) LIF spike generation; we developed a new estimator for this model, and demonstrated that a change in leakiness of such a mechanism might underlie nonlinear effects of contrast adaptation in macaque retinal ganglion cells [15]. We and others have explored other “adaptive” properties of the LIF model [17, 13, 19]. In this paper, we consider a model consisting of a linear ﬁlter followed by noisy LIF spike generation with a spike-dependent after-current; this is essentially the standard LIF model driven by a noisy, ﬁltered version of the stimulus, with an additional current waveform injected following each spike. We will refer to this as the the “L-NLIF” model. The probabilistic nature of this model provides several important advantages over the deterministic version we have considered previously. First, an explicit noise model allows us to couch the problem in the terms of classical estimation theory. This, in turn, provides a natural “cost function” (likelihood) for model assessment and leads to more efﬁcient estimation of the model parameters. Second, noise allows us to explicitly model neural ﬁring statistics, and could provide a rigorous basis for a metric distance between spike trains, useful in other contexts [18]. Finally, noise inﬂuences the behavior of the model itself, giving rise to phenomena not observed in the purely deterministic model [11]. Our main contribution here is to show that the maximum likelihood estimator (MLE) for the L-NLIF model is computationally tractable. Speciﬁcally, we describe an algorithm for computing the likelihood function, and prove that this likelihood function contains no non-global maxima, implying that the MLE can be computed efﬁciently using standard ascent techniques. The desirable statistical properties of this estimator (e.g. consistency, efﬁciency) are all inherited “for free” from classical estimation theory. Thus, we have a compact and powerful model for the neural code, and a well-motivated, efﬁcient way to estimate the parameters of this model from extracellular data. The Model We consider a model for which the (dimensionless) subthreshold voltage variable V evolves according to i−1 dV = − gV (t) + k · x(t) + j=0 h(t − tj ) dt + σNt , (1) and resets to Vr whenever V = 1. Here, g denotes the leak conductance, k · x(t) the projection of the input signal x(t) onto the linear kernel k, h is an “afterpotential,” a current waveform of ﬁxed amplitude and shape whose value depends only on the time since the last spike ti−1 , and Nt is an unobserved (hidden) noise process with scale parameter σ. Without loss of generality, the “leak” and “threshold” potential are set at 0 and 1, respectively, so the cell spikes whenever V = 1, and V decays back to 0 with time constant 1/g in the absence of input. Note that the nonlinear behavior of the model is completely determined by only a few parameters, namely {g, σ, Vr }, and h (where the function h is allowed to take values in some low-dimensional vector space). The dynamical properties of this type of “spike response model” have been extensively studied [7]; for example, it is known that this class of models can effectively capture much of the behavior of apparently more biophysically realistic models (e.g. Hodgkin-Huxley). Figures 1 and 2 show several simple comparisons of the L-NLIF and LNP models. In 1, note the ﬁne structure of spike timing in the responses of the L-NLIF model, which is qualitatively similar to in vivo experimental observations [2, 16, 9]). The LNP model fails to capture this ﬁne temporal reproducibility. At the same time, the L-NLIF model is much more ﬂexible and representationally powerful, as demonstrated in Fig. 2: by varying V r or h, for example, we can match a wide variety of dynamical behaviors (e.g. adaptation, bursting, bistability) known to exist in biological neurons. The Estimation Problem Our problem now is to estimate the model parameters {k, σ, g, Vr , h} from a sufﬁciently rich, dynamic input sequence x(t) together with spike times {ti }. A natural choice is the maximum likelihood estimator (MLE), which is easily proven to be consistent and statistically efﬁcient here. To compute the MLE, we need to compute the likelihood and develop an algorithm for maximizing it. The tractability of the likelihood function for this model arises directly from the linearity of the subthreshold dynamics of voltage V (t) during an interspike interval. In the noiseless case [15], the voltage trace during an interspike interval t ∈ [ti−1 , ti ] is given by the solution to equation (1) with σ = 0:   V0 (t) = Vr e−gt + t ti−1 i−1 k · x(s) + j=0 h(s − tj ) e−g(t−s) ds, (2) A stimulus h current responses 0 0 0 1 )ces( t 0 2. 0 t stimulus x 0 B c responses c=1 h current 0 c=2 2. 0 c=5 1 )ces( t t 0 0 stimulus C 0 h current responses Figure 2: Illustration of diverse behaviors of L-NLIF model. A: Firing rate adaptation. A positive DC current (top) was injected into three model cells differing only in their h currents (shown on left: top, h = 0; middle, h depolarizing; bottom, h hyperpolarizing). Voltage traces of each cell’s response (right, with spikes superimposed) exhibit rate facilitation for depolarizing h (middle), and rate adaptation for hyperpolarizing h (bottom). B: Bursting. The response of a model cell with a biphasic h current (left) is shown as a function of the three different levels of DC current. For small current levels (top), the cell responds rhythmically. For larger currents (middle and bottom), the cell responds with regular bursts of spikes. C: Bistability. The stimulus (top) is a positive followed by a negative current pulse. Although a cell with no h current (middle) responds transiently to the positive pulse, a cell with biphasic h (bottom) exhibits a bistable response: the positive pulse puts it into a stable ﬁring regime which persists until the arrival of a negative pulse. 0 0 1 )ces( t 0 5 0. t 0 which is simply a linear convolution of the input current with a negative exponential. It is easy to see that adding Gaussian noise to the voltage during each time step induces a Gaussian density over V (t), since linear dynamics preserve Gaussianity [8]. This density is uniquely characterized by its ﬁrst two moments; the mean is given by (2), and its covariance T is σ 2 Eg Eg , where Eg is the convolution operator corresponding to e−gt . Note that this density is highly correlated for nearby points in time, since noise is integrated by the linear dynamics. Intuitively, smaller leak conductance g leads to stronger correlation in V (t) at nearby time points. We denote this Gaussian density G(xi , k, σ, g, Vr , h), where index i indicates the ith spike and the corresponding stimulus chunk xi (i.e. the stimuli that inﬂuence V (t) during the ith interspike interval). Now, on any interspike interval t ∈ [ti−1 , ti ], the only information we have is that V (t) is less than threshold for all times before ti , and exceeds threshold during the time bin containing ti . This translates to a set of linear constraints on V (t), expressed in terms of the set Ci = ti−1 ≤t < 1 ∩ V (ti ) ≥ 1 . Therefore, the likelihood that the neuron ﬁrst spikes at time ti , given a spike at time ti−1 , is the probability of the event V (t) ∈ Ci , which is given by Lxi ,ti (k, σ, g, Vr , h) = G(xi , k, σ, g, Vr , h), Ci the integral of the Gaussian density G(xi , k, σ, g, Vr , h) over the set Ci . sulumits Figure 3: Behavior of the L-NLIF model during a single interspike interval, for a single (repeated) input current (top). Top middle: Ten simulated voltage traces V (t), evaluated up to the ﬁrst threshold crossing, conditional on a spike at time zero (Vr = 0). Note the strong correlation between neighboring time points, and the sparsening of the plot as traces are eliminated by spiking. Bottom Middle: Time evolution of P (V ). Each column represents the conditional distribution of V at the corresponding time (i.e. for all traces that have not yet crossed threshold). Bottom: Probability density of the interspike interval (isi) corresponding to this particular input. Note that probability mass is concentrated at the points where input drives V0 (t) close to threshold. rhtV secart V 0 rhtV )V(P 0 )isi(P 002 001 )cesm( t 0 0 Spiking resets V to Vr , meaning that the noise contribution to V in different interspike intervals is independent. This “renewal” property, in turn, implies that the density over V (t) for an entire experiment factorizes into a product of conditionally independent terms, where each of these terms is one of the Gaussian integrals derived above for a single interspike interval. The likelihood for the entire spike train is therefore the product of these terms over all observed spikes. Putting all the pieces together, then, the full likelihood is L{xi ,ti } (k, σ, g, Vr , h) = G(xi , k, σ, g, Vr , h), i Ci where the product, again, is over all observed spike times {ti } and corresponding stimulus chunks {xi }. Now that we have an expression for the likelihood, we need to be able to maximize it. Our main result now states, basically, that we can use simple ascent algorithms to compute the MLE without getting stuck in local maxima. Theorem 1. The likelihood L{xi ,ti } (k, σ, g, Vr , h) has no non-global extrema in the parameters (k, σ, g, Vr , h), for any data {xi , ti }. The proof [14] is based on the log-concavity of L{xi ,ti } (k, σ, g, Vr , h) under a certain parametrization of (k, σ, g, Vr , h). The classical approach for establishing the nonexistence of non-global maxima of a given function uses concavity, which corresponds roughly to the function having everywhere non-positive second derivatives. However, the basic idea can be extended with the use of any invertible function: if f has no non-global extrema, neither will g(f ), for any strictly increasing real function g. The logarithm is a natural choice for g in any probabilistic context in which independence plays a role, since sums are easier to work with than products. Moreover, concavity of a function f is strictly stronger than logconcavity, so logconcavity can be a powerful tool even in situations for which concavity is useless (the Gaussian density is logconcave but not concave, for example). Our proof relies on a particular theorem [3] establishing the logconcavity of integrals of logconcave functions, and proceeds by making a correspondence between this type of integral and the integrals that appear in the deﬁnition of the L-NLIF likelihood above. We should also note that the proof extends without difﬁculty to some other noise processes which generate logconcave densities (where white noise has the standard Gaussian density); for example, the proof is nearly identical if Nt is allowed to be colored or nonGaussian noise, with possibly nonzero drift. Computational methods and numerical results Theorem 1 tells us that we can ascend the likelihood surface without fear of getting stuck in local maxima. Now how do we actually compute the likelihood? This is a nontrivial problem: we need to be able to quickly compute (or at least approximate, in a rational way) integrals of multivariate Gaussian densities G over simple but high-dimensional orthants Ci . We discuss two ways to compute these integrals; each has its own advantages. The ﬁrst technique can be termed “density evolution” [10, 13]. The method is based on the following well-known fact from the theory of stochastic differential equations [8]: given the data (xi , ti−1 ), the probability density of the voltage process V (t) up to the next spike ti satisﬁes the following partial differential (Fokker-Planck) equation: ∂P (V, t) σ2 ∂ 2 P ∂[(V − Veq (t))P ] = , +g 2 ∂t 2 ∂V ∂V under the boundary conditions (3) P (V, ti−1 ) = δ(V − Vr ), P (Vth , t) = 0; where Veq (t) is the instantaneous equilibrium potential:   i−1 1 Veq (t) = h(t − tj ) . k · x(t) + g j=0 Moreover, the conditional ﬁring rate f (t) satisﬁes t ti−1 f (s)ds = 1 − P (V, t)dV. Thus standard techniques for solving the drift-diffusion evolution equation (3) lead to a fast method for computing f (t) (as illustrated in Fig. 2). Finally, the likelihood Lxi ,ti (k, σ, g, Vr , h) is simply f (ti ). While elegant and efﬁcient, this density evolution technique turns out to be slightly more powerful than what we need for the MLE: recall that we do not need to compute the conditional rate function f at all times t, but rather just at the set of spike times {ti }, and thus we can turn to more specialized techniques for faster performance. We employ a rapid technique for computing the likelihood using an algorithm due to Genz [6], designed to compute exactly the kinds of multidimensional Gaussian probability integrals considered here. This algorithm works well when the orthants Ci are deﬁned by fewer than ≈ 10 linear constraints on V (t). The number of actual constraints on V (t) during an interspike interval (ti+1 − ti ) grows linearly in the length of the interval: thus, to use this algorithm in typical data situations, we adopt a strategy proposed in our work on the deterministic form of the model [15], in which we discard all but a small subset of the constraints. The key point is that, due to strong correlations in the noise and the fact that the constraints only ﬁgure signiﬁcantly when the V (t) is driven close to threshold, a small number of constraints often sufﬁce to approximate the true likelihood to a high degree of precision. h mitse h eurt K mitse ATS K eurt 0 0 06 )ekips retfa cesm( t 03 0 0 )ekips erofeb cesm( t 001- 002- Figure 4: Demonstration of the estimator’s performance on simulated data. Dashed lines show the true kernel k and aftercurrent h; k is a 12-sample function chosen to resemble the biphasic temporal impulse response of a macaque retinal ganglion cell, while h is function speciﬁed in a ﬁve-dimensional vector space, whose shape induces a slight degree of burstiness in the model’s spike responses. The L-NLIF model was stimulated with parameters g = 0.05 (corresponding to a membrane time constant of 20 time-samples), σ noise = 0.5, and Vr = 0. The stimulus was 30,000 time samples of white Gaussian noise with a standard deviation of 0.5. With only 600 spikes of output, the estimator is able to retrieve an estimate of k (gray curve) which closely matches the true kernel. Note that the spike-triggered average (black curve), which is an unbiased estimator for the kernel of an LNP neuron [5], differs signiﬁcantly from this true kernel (see also [15]). The accuracy of this approach improves with the number of constraints considered, but performance is fastest with fewer constraints. Therefore, because ascending the likelihood function requires evaluating the likelihood at many different points, we can make this ascent process much quicker by applying a version of the coarse-to-ﬁne idea. Let L k denote the approximation to the likelihood given by allowing only k constraints in the above algorithm. Then we know, by a proof identical to that of Theorem 1, that Lk has no local maxima; in addition, by the above logic, Lk → L as k grows. It takes little additional effort to prove that argmax Lk → argmax L; thus, we can efﬁciently ascend the true likelihood surface by ascending the “coarse” approximants Lk , then gradually “reﬁning” our approximation by letting k increase. An application of this algorithm to simulated data is shown in Fig. 4. Further applications to both simulated and real data will be presented elsewhere. Discussion We have shown here that the L-NLIF model, which couples a linear ﬁltering stage to a biophysically plausible and ﬂexible model of neuronal spiking, can be efﬁciently estimated from extracellular physiological data using maximum likelihood. Moreover, this model lends itself directly to analysis via tools from the modern theory of point processes. For example, once we have obtained our estimate of the parameters (k, σ, g, Vr , h), how do we verify that the resulting model provides an adequate description of the data? This important “model validation” question has been the focus of some recent elegant research, under the rubric of “time rescaling” techniques [4]. While we lack the room here to review these methods in detail, we can note that they depend essentially on knowledge of the conditional ﬁring rate function f (t). Recall that we showed how to efﬁciently compute this function in the last section and examined some of its qualitative properties in the L-NLIF context in Figs. 2 and 3. We are currently in the process of applying the model to physiological data recorded both in vivo and in vitro, in order to assess whether it accurately accounts for the stimulus preferences and spiking statistics of real neurons. One long-term goal of this research is to elucidate the different roles of stimulus-driven and stimulus-independent activity on the spiking patterns of both single cells and multineuronal ensembles. References [1] B. Aguera y Arcas and A. Fairhall. What causes a neuron to spike? 15:1789–1807, 2003. Neral Computation, [2] M. Berry and M. Meister. Refractoriness and neural precision. Journal of Neuroscience, 18:2200–2211, 1998. [3] V. Bogachev. Gaussian Measures. AMS, New York, 1998. [4] E. Brown, R. Barbieri, V. Ventura, R. Kass, and L. Frank. The time-rescaling theorem and its application to neural spike train data analysis. Neural Computation, 14:325–346, 2002. [5] E. Chichilnisky. A simple white noise analysis of neuronal light responses. Network: Computation in Neural Systems, 12:199–213, 2001. [6] A. Genz. Numerical computation of multivariate normal probabilities. Journal of Computational and Graphical Statistics, 1:141–149, 1992. [7] W. Gerstner and W. Kistler. Spiking Neuron Models: Single Neurons, Populations, Plasticity. Cambridge University Press, 2002. [8] S. Karlin and H. Taylor. A Second Course in Stochastic Processes. Academic Press, New York, 1981. [9] J. Keat, P. Reinagel, R. Reid, and M. Meister. Predicting every spike: a model for the responses of visual neurons. Neuron, 30:803–817, 2001. [10] B. Knight, A. Omurtag, and L. Sirovich. The approach of a neuron population ﬁring rate to a new equilibrium: an exact theoretical result. Neural Computation, 12:1045–1055, 2000. [11] J. Levin and J. Miller. Broadband neural encoding in the cricket cercal sensory system enhanced by stochastic resonance. Nature, 380:165–168, 1996. [12] L. Paninski. Convergence properties of some spike-triggered analysis techniques. Network: Computation in Neural Systems, 14:437–464, 2003. [13] L. Paninski, B. Lau, and A. Reyes. Noise-driven adaptation: in vitro and mathematical analysis. Neurocomputing, 52:877–883, 2003. [14] L. Paninski, J. Pillow, and E. Simoncelli. Maximum likelihood estimation of a stochastic integrate-and-ﬁre neural encoding model. submitted manuscript (cns.nyu.edu/∼liam), 2004. [15] J. Pillow and E. Simoncelli. Biases in white noise analysis due to non-poisson spike generation. Neurocomputing, 52:109–115, 2003. [16] D. Reich, J. Victor, and B. Knight. The power ratio and the interval map: Spiking models and extracellular recordings. The Journal of Neuroscience, 18:10090–10104, 1998. [17] M. Rudd and L. Brown. Noise adaptation in integrate-and-ﬁre neurons. Neural Computation, 9:1047–1069, 1997. [18] J. Victor. How the brain uses time to represent and process visual information. Brain Research, 886:33–46, 2000. [19] Y. Yu and T. Lee. Dynamical mechanisms underlying contrast gain control in sing le neurons. Physical Review E, 68:011901, 2003.

4 0.17046471 93 nips-2003-Information Dynamics and Emergent Computation in Recurrent Circuits of Spiking Neurons

Author: Thomas Natschläger, Wolfgang Maass

Abstract: We employ an efﬁcient method using Bayesian and linear classiﬁers for analyzing the dynamics of information in high-dimensional states of generic cortical microcircuit models. It is shown that such recurrent circuits of spiking neurons have an inherent capability to carry out rapid computations on complex spike patterns, merging information contained in the order of spike arrival with previously acquired context information. 1

5 0.16037409 49 nips-2003-Decoding V1 Neuronal Activity using Particle Filtering with Volterra Kernels

Author: Ryan C. Kelly, Tai Sing Lee

Abstract: Decoding is a strategy that allows us to assess the amount of information neurons can provide about certain aspects of the visual scene. In this study, we develop a method based on Bayesian sequential updating and the particle ﬁltering algorithm to decode the activity of V1 neurons in awake monkeys. A distinction in our method is the use of Volterra kernels to ﬁlter the particles, which live in a high dimensional space. This parametric Bayesian decoding scheme is compared to the optimal linear decoder and is shown to work consistently better than the linear optimal decoder. Interestingly, our results suggest that for decoding in real time, spike trains of as few as 10 independent but similar neurons would be sufﬁcient for decoding a critical scene variable in a particular class of visual stimuli. The reconstructed variable can predict the neural activity about as well as the actual signal with respect to the Volterra kernels. 1

6 0.1343915 183 nips-2003-Synchrony Detection by Analogue VLSI Neurons with Bimodal STDP Synapses

7 0.13228637 173 nips-2003-Semi-supervised Protein Classification Using Cluster Kernels

8 0.11758129 9 nips-2003-A Kullback-Leibler Divergence Based Kernel for SVM Classification in Multimedia Applications

9 0.11431948 141 nips-2003-Nonstationary Covariance Functions for Gaussian Process Regression

10 0.11278862 61 nips-2003-Entrainment of Silicon Central Pattern Generators for Legged Locomotory Control

11 0.1036968 16 nips-2003-A Recurrent Model of Orientation Maps with Simple and Complex Cells

12 0.10367723 176 nips-2003-Sequential Bayesian Kernel Regression

13 0.10256885 107 nips-2003-Learning Spectral Clustering

14 0.10127951 1 nips-2003-1-norm Support Vector Machines

15 0.10124476 43 nips-2003-Bounded Invariance and the Formation of Place Fields

16 0.098259553 113 nips-2003-Learning with Local and Global Consistency

17 0.096399337 127 nips-2003-Mechanism of Neural Interference by Transcranial Magnetic Stimulation: Network or Single Neuron?

18 0.095288947 122 nips-2003-Margin Maximizing Loss Functions

19 0.094551116 18 nips-2003-A Summating, Exponentially-Decaying CMOS Synapse for Spiking Neural Systems

20 0.087373182 150 nips-2003-Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.321), (1, -0.032), (2, 0.227), (3, -0.045), (4, 0.093), (5, 0.127), (6, -0.09), (7, -0.039), (8, 0.141), (9, 0.042), (10, 0.043), (11, 0.104), (12, -0.037), (13, -0.063), (14, 0.169), (15, 0.117), (16, 0.119), (17, 0.133), (18, -0.057), (19, -0.016), (20, -0.023), (21, 0.013), (22, -0.06), (23, -0.043), (24, 0.031), (25, -0.052), (26, -0.094), (27, 0.053), (28, -0.07), (29, -0.095), (30, -0.025), (31, -0.127), (32, 0.028), (33, -0.076), (34, -0.08), (35, -0.015), (36, 0.075), (37, 0.057), (38, 0.139), (39, -0.083), (40, -0.076), (41, 0.029), (42, -0.024), (43, -0.087), (44, -0.06), (45, 0.018), (46, 0.03), (47, 0.016), (48, 0.089), (49, 0.074)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95169067 160 nips-2003-Prediction on Spike Data Using Kernel Algorithms

Author: Jan Eichhorn, Andreas Tolias, Alexander Zien, Malte Kuss, Jason Weston, Nikos Logothetis, Bernhard Schölkopf, Carl E. Rasmussen

2 0.72072387 112 nips-2003-Learning to Find Pre-Images

Author: Jason Weston, Bernhard Schölkopf, Gökhan H. Bakir

3 0.71689433 125 nips-2003-Maximum Likelihood Estimation of a Stochastic Integrate-and-Fire Neural Model

Author: Liam Paninski, Eero P. Simoncelli, Jonathan W. Pillow

4 0.64815712 173 nips-2003-Semi-supervised Protein Classification Using Cluster Kernels

Author: Jason Weston, Dengyong Zhou, André Elisseeff, William S. Noble, Christina S. Leslie

Abstract: A key issue in supervised protein classiﬁcation is the representation of input sequences of amino acids. Recent work using string kernels for protein data has achieved state-of-the-art classiﬁcation performance. However, such representations are based only on labeled data — examples with known 3D structures, organized into structural classes — while in practice, unlabeled data is far more plentiful. In this work, we develop simple and scalable cluster kernel techniques for incorporating unlabeled data into the representation of protein sequences. We show that our methods greatly improve the classiﬁcation performance of string kernels and outperform standard approaches for using unlabeled data, such as adding close homologs of the positive examples to the training data. We achieve equal or superior performance to previously presented cluster kernel methods while achieving far greater computational efﬁciency. 1

5 0.56352758 49 nips-2003-Decoding V1 Neuronal Activity using Particle Filtering with Volterra Kernels

Author: Ryan C. Kelly, Tai Sing Lee

6 0.54455423 127 nips-2003-Mechanism of Neural Interference by Transcranial Magnetic Stimulation: Network or Single Neuron?

7 0.51267761 178 nips-2003-Sparse Greedy Minimax Probability Machine Classification

8 0.50214452 176 nips-2003-Sequential Bayesian Kernel Regression

9 0.48386511 93 nips-2003-Information Dynamics and Emergent Computation in Recurrent Circuits of Spiking Neurons

10 0.48153988 9 nips-2003-A Kullback-Leibler Divergence Based Kernel for SVM Classification in Multimedia Applications

11 0.47682601 57 nips-2003-Dynamical Modeling with Kernels for Nonlinear Time Series Prediction

12 0.44945669 140 nips-2003-Nonlinear Processing in LGN Neurons

13 0.44420159 16 nips-2003-A Recurrent Model of Orientation Maps with Simple and Complex Cells

14 0.44202471 141 nips-2003-Nonstationary Covariance Functions for Gaussian Process Regression

15 0.43623677 113 nips-2003-Learning with Local and Global Consistency

16 0.43555075 60 nips-2003-Eigenvoice Speaker Adaptation via Composite Kernel Principal Component Analysis

17 0.42283019 98 nips-2003-Kernel Dimensionality Reduction for Supervised Learning

18 0.41986364 183 nips-2003-Synchrony Detection by Analogue VLSI Neurons with Bimodal STDP Synapses

19 0.40434194 77 nips-2003-Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data

20 0.39711794 56 nips-2003-Dopamine Modulation in a Basal Ganglio-Cortical Network of Working Memory

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.027), (11, 0.027), (29, 0.011), (30, 0.013), (35, 0.049), (53, 0.104), (71, 0.042), (76, 0.479), (85, 0.079), (91, 0.08)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.96922153 155 nips-2003-Perspectives on Sparse Bayesian Learning

Author: Jason Palmer, Bhaskar D. Rao, David P. Wipf

Abstract: Recently, relevance vector machines (RVM) have been fashioned from a sparse Bayesian learning (SBL) framework to perform supervised learning using a weight prior that encourages sparsity of representation. The methodology incorporates an additional set of hyperparameters governing the prior, one for each weight, and then adopts a speciﬁc approximation to the full marginalization over all weights and hyperparameters. Despite its empirical success however, no rigorous motivation for this particular approximation is currently available. To address this issue, we demonstrate that SBL can be recast as the application of a rigorous variational approximation to the full model by expressing the prior in a dual form. This formulation obviates the necessity of assuming any hyperpriors and leads to natural, intuitive explanations of why sparsity is achieved in practice. 1

2 0.94864959 74 nips-2003-Finding the M Most Probable Configurations using Loopy Belief Propagation

Author: Chen Yanover, Yair Weiss

Abstract: Loopy belief propagation (BP) has been successfully used in a number of diﬃcult graphical models to ﬁnd the most probable conﬁguration of the hidden variables. In applications ranging from protein folding to image analysis one would like to ﬁnd not just the best conﬁguration but rather the top M . While this problem has been solved using the junction tree formalism, in many real world problems the clique size in the junction tree is prohibitively large. In this work we address the problem of ﬁnding the M best conﬁgurations when exact inference is impossible. We start by developing a new exact inference algorithm for calculating the best conﬁgurations that uses only max-marginals. For approximate inference, we replace the max-marginals with the beliefs calculated using max-product BP and generalized BP. We show empirically that the algorithm can accurately and rapidly approximate the M best conﬁgurations in graphs with hundreds of variables. 1

same-paper 3 0.9403969 160 nips-2003-Prediction on Spike Data Using Kernel Algorithms

Author: Jan Eichhorn, Andreas Tolias, Alexander Zien, Malte Kuss, Jason Weston, Nikos Logothetis, Bernhard Schölkopf, Carl E. Rasmussen

4 0.91262448 178 nips-2003-Sparse Greedy Minimax Probability Machine Classification

Author: Thomas R. Strohmann, Andrei Belitski, Gregory Z. Grudic, Dennis DeCoste

Abstract: The Minimax Probability Machine Classiﬁcation (MPMC) framework [Lanckriet et al., 2002] builds classiﬁers by minimizing the maximum probability of misclassiﬁcation, and gives direct estimates of the probabilistic accuracy bound Ω. The only assumptions that MPMC makes is that good estimates of means and covariance matrices of the classes exist. However, as with Support Vector Machines, MPMC is computationally expensive and requires extensive cross validation experiments to choose kernels and kernel parameters that give good performance. In this paper we address the computational cost of MPMC by proposing an algorithm that constructs nonlinear sparse MPMC (SMPMC) models by incrementally adding basis functions (i.e. kernels) one at a time – greedily selecting the next one that maximizes the accuracy bound Ω. SMPMC automatically chooses both kernel parameters and feature weights without using computationally expensive cross validation. Therefore the SMPMC algorithm simultaneously addresses the problem of kernel selection and feature selection (i.e. feature weighting), based solely on maximizing the accuracy bound Ω. Experimental results indicate that we can obtain reliable bounds Ω, as well as test set accuracies that are comparable to state of the art classiﬁcation algorithms.

5 0.63998765 112 nips-2003-Learning to Find Pre-Images

Author: Jason Weston, Bernhard Schölkopf, Gökhan H. Bakir

6 0.59817195 189 nips-2003-Tree-structured Approximations by Expectation Propagation

7 0.59695351 176 nips-2003-Sequential Bayesian Kernel Regression

8 0.591344 103 nips-2003-Learning Bounds for a Generalized Family of Bayesian Posterior Distributions

9 0.57093239 9 nips-2003-A Kullback-Leibler Divergence Based Kernel for SVM Classification in Multimedia Applications

10 0.56815881 49 nips-2003-Decoding V1 Neuronal Activity using Particle Filtering with Volterra Kernels

11 0.55632234 57 nips-2003-Dynamical Modeling with Kernels for Nonlinear Time Series Prediction

12 0.55532557 94 nips-2003-Information Maximization in Noisy Channels : A Variational Approach

13 0.55145139 54 nips-2003-Discriminative Fields for Modeling Spatial Dependencies in Natural Images

14 0.549128 173 nips-2003-Semi-supervised Protein Classification Using Cluster Kernels

15 0.54833263 141 nips-2003-Nonstationary Covariance Functions for Gaussian Process Regression

16 0.54498523 17 nips-2003-A Sampled Texture Prior for Image Super-Resolution

17 0.54221576 152 nips-2003-Pairwise Clustering and Graphical Models

18 0.54213268 107 nips-2003-Learning Spectral Clustering

19 0.5411759 122 nips-2003-Margin Maximizing Loss Functions

20 0.53937465 43 nips-2003-Bounded Invariance and the Formation of Place Fields