nips nips2011 nips2011-253 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Sebastian A. Kurtek, Anuj Srivastava, Wei Wu
Abstract: While signal estimation under random amplitudes, phase shifts, and additive noise is studied frequently, the problem of estimating a deterministic signal under random time-warpings has been relatively unexplored. We present a novel framework for estimating the unknown signal that utilizes the action of the warping group to form an equivalence relation between signals. First, we derive an estimator for the equivalence class of the unknown signal using the notion of Karcher mean on the quotient space of equivalence classes. This step requires the use of Fisher-Rao Riemannian metric and a square-root representation of signals to enable computations of distances and means under this metric. Then, we define a notion of the center of a class and show that the center of the estimated class is a consistent estimator of the underlying unknown signal. This estimation algorithm has many applications: (1) registration/alignment of functional data, (2) separation of phase/amplitude components of functional data, (3) joint demodulation and carrier estimation, and (4) sparse modeling of functional data. Here we demonstrate only (1) and (2): Given signals are temporally aligned using nonlinear warpings and, thus, separated into their phase and amplitude components. The proposed method for signal alignment is shown to have state of the art performance using Berkeley growth, handwritten signatures, and neuroscience spike train data. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract While signal estimation under random amplitudes, phase shifts, and additive noise is studied frequently, the problem of estimating a deterministic signal under random time-warpings has been relatively unexplored. [sent-3, score-0.379]
2 We present a novel framework for estimating the unknown signal that utilizes the action of the warping group to form an equivalence relation between signals. [sent-4, score-0.611]
3 First, we derive an estimator for the equivalence class of the unknown signal using the notion of Karcher mean on the quotient space of equivalence classes. [sent-5, score-0.516]
4 This step requires the use of Fisher-Rao Riemannian metric and a square-root representation of signals to enable computations of distances and means under this metric. [sent-6, score-0.161]
5 Then, we define a notion of the center of a class and show that the center of the estimated class is a consistent estimator of the underlying unknown signal. [sent-7, score-0.21]
6 This estimation algorithm has many applications: (1) registration/alignment of functional data, (2) separation of phase/amplitude components of functional data, (3) joint demodulation and carrier estimation, and (4) sparse modeling of functional data. [sent-8, score-0.355]
7 Here we demonstrate only (1) and (2): Given signals are temporally aligned using nonlinear warpings and, thus, separated into their phase and amplitude components. [sent-9, score-0.444]
8 The proposed method for signal alignment is shown to have state of the art performance using Berkeley growth, handwritten signatures, and neuroscience spike train data. [sent-10, score-0.315]
9 1 Introduction Consider the problem of estimating signal using noisy observation under the model: f (t) = cg(a t − φ) + e(t) , where the random quantities c ∈ R is the scale, a ∈ R is the rate, φ ∈ R is the phase shift, and e(t) ∈ R is the additive noise. [sent-11, score-0.225]
10 There has been an elaborate theory for estimation of the underlying signal g, given one or several observations of the function f . [sent-12, score-0.154]
11 For instance, the estimation of sinusoids or exponentials in additive Gaussian noise is a classical problem in signal and speech processing. [sent-16, score-0.2]
12 In this paper we consider a related but fundamentally different estimation problem where the observed functional data is modeled as: for t ∈ [0, 1], fi (t) = ci g(γi (t)) + ei , i = 1, 2, . [sent-17, score-0.583]
13 The fi s represent observations of an unknown, deterministic signal g under random warpings γi , scalings ci and vertical translations ei ∈ R. [sent-21, score-0.753]
14 ) This problem is interesting because in many situations, including speech, SONAR, RADAR, 1 phase components warping functions 1. [sent-24, score-0.593]
15 4 −3 −2 −1 0 1 2 3 Figure 1: Separation of phase and amplitude variability in function data. [sent-83, score-0.207]
16 NMR, fMRI, and MEG applications, the noise can actually affect the instantaneous phase of the signal, resulting in an observation that is a phase (or frequency) modulation of the original signal. [sent-84, score-0.255]
17 This problem is challenging because of the nonparametric, random nature of the warping functions γi s. [sent-85, score-0.494]
18 It turns out that without any further restrictions on γi s one can recover g only up to an arbitrary warping function. [sent-92, score-0.434]
19 This is easy to see since g ◦ γi = (g ◦ γ) ◦ (γ −1 ◦ γi ) for any warping function γ. [sent-93, score-0.434]
20 (As described later, the warping functions are restricted to be automorphisms of a domain and, hence, form a group. [sent-94, score-0.494]
21 ) Under an additional condition related to the mean of (inverses of) γi s, we can reach the exact signal g, as demonstrated in this paper. [sent-95, score-0.165]
22 In fact, this model describes several related, some even equivalent, problems but with distinct applications: Problem 1: Joint Phase Demodulation and Carrier Estimation: One can view this problem as that of phase (or frequency) demodulation but without the knowledge of the carrier signal g. [sent-96, score-0.342]
23 Thus, −1 it becomes a problem of joint estimation of the carrier signal (g) and phase demodulation (γi ) of signals that share the same carrier. [sent-97, score-0.45]
24 g is a sinusoid, then it is relatively easier to estimate the warping functions using dynamic time warping or other estimation theoretic methods [15, 13]. [sent-100, score-0.977]
25 One would like to separate the variability associated with the heights, called the amplitude variability, from the variability associated with the locations, termed the phase variability. [sent-106, score-0.257]
26 Extracting the amplitude variability implies temporally aligning the given functions using nonlinear time warping, with the result shown in the bottom right. [sent-110, score-0.248]
27 The corresponding set of warping functions, shown in the top right, represent the phase variability. [sent-111, score-0.533]
28 The phase component can also be illustrated by applying these warping functions to the same function, also shown in the top right. [sent-112, score-0.593]
29 The main reason for separating functional data into these components is to better preserve the structure of the observed data, since a separate modeling of amplitude and phase variability will be more natural, parsimonious and efficient. [sent-113, score-0.251]
30 In other words, 2 −1 what are the γi s such that, for any t0 , fi (γi (t0 )) correspond to each other. [sent-118, score-0.303]
31 Also, we illustrate the use of this solution in automated alignment of sets of given signals. [sent-126, score-0.154]
32 Two signals, are deemed equivalent if one can be time-warped into the other; since the warping functions form a group, the equivalence class is an orbit of the warping group. [sent-128, score-1.225]
33 This relation partitions the set of signals into equivalence classes, and the set of equivalence classes (orbits) forms a quotient space. [sent-129, score-0.348]
34 First, we estimate the equivalence class of g using the notion of Karcher mean on quotient space which, in turn, requires a distance on this quotient space. [sent-131, score-0.505]
35 The difficulty in using this metric directly is that it is not straightforward to compute geodesics (remember that geodesics lengths provide the desired distances). [sent-140, score-0.161]
36 However, a simple square-root transformation converts this metric into the standard L2 metric and the distance is obtainable as a simple L2 norm between the square-root forms of functions. [sent-141, score-0.243]
37 Second, given an estimate of the equivalence class of g, we define the notion of a center of an orbit and use that to derive an estimator for g. [sent-142, score-0.413]
38 ˙(t)| |f (2) This metric has many fundamental advantages, including the fact that it is the only Riemannian metric that is invariant to the domain warping [2]. [sent-162, score-0.588]
39 This result can be used to compute the distance dF R between any two functions by computing the L2 distance between the corresponding SRVFs, that is, dF R (f1 , f2 ) = q1 − q2 . [sent-167, score-0.188]
40 The next question is: What is the effect of warping on dF R ? [sent-168, score-0.434]
41 3 Elastic Distance on Quotient Space Our next step is to define an elastic distance between functions as follows. [sent-172, score-0.203]
42 The orbit of an SRVF q ∈ L2 is given by: [q] = closure{(q, γ)|γ ∈ Γ}. [sent-173, score-0.225]
43 Definition 2 For any two functions f1 , f2 ∈ F and the corresponding SRVFs, q1 , q2 ∈ L2 , we define the elastic distance d on the quotient space S to be: d([q1 ], [q2 ]) = inf γ∈Γ q1 − (q2 , γ) . [sent-178, score-0.348]
44 3 Signal Estimation Method Our estimation is based on the model fi = ci (g ◦ γi ) + ei , i = 1, · · · , n, where g, fi ∈ F , ci ∈ R+ , γi ∈ Γ and ei ∈ R. [sent-184, score-1.029]
45 Given {fi }, our goal is to identify warping functions {γi } so as to reconstruct g. [sent-185, score-0.524]
46 We will do so in three steps: 1) For a given collection of functions {fi }, and their SRVFs {qi }, we compute the mean of the corresponding orbits {[qi ]} in the quotient space S; we will call it [µ]n . [sent-186, score-0.394]
47 2) We compute an appropriate element of this mean orbit to define a template µn in L2 . [sent-187, score-0.356]
48 The optimal warping functions {γi } are estimated by align individual functions to match the template µn . [sent-188, score-0.65]
49 3) The estimated warping functions are then used to align {fi } and reconstruct the underlying signal g. [sent-189, score-0.682]
50 1 Pre-step: Karcher Mean of Points in Γ In this section we will define a Karcher mean of a set of warping functions {γi }, under the FisherRao metric, using the differential geometry of Γ. [sent-191, score-0.574]
51 Using Lemma 1, the Fisher-Rao distance between any two warping 1 ˙ ˙ functions is found to be dF R (γ1 , γ2 ) = cos−1 ( 0 γ1 (t) γ2 (t)dt). [sent-199, score-0.558]
52 Definition 3 For a given set of warping functions γ1 , γ2 , . [sent-201, score-0.494]
53 Definition 4 Define the Karcher mean [µ]n of the given SRVF orbits {[qi ]} in the space S as a local minimum of the sum of squares of elastic distances: n d([q], [qi ])2 . [sent-208, score-0.288]
54 [µ]n = argmin [q]∈S (3) i=1 We emphasize that the Karcher mean [µ]n is actually an orbit of functions, rather than a function. [sent-209, score-0.366]
55 For each qi find γi by solving: γi = argminγ∈Γ µ − (qi , γ) . [sent-214, score-0.291]
56 If the increment n i=1 qi − µ is small, then stop. [sent-219, score-0.291]
57 Else, update the mean using µ → n 1 qi and return to step 2. [sent-220, score-0.351]
58 In the kth iteration, let γi denote the (k) (k) n optimal domain warping from qi to µ(k) and let qi = (qi , γi ). [sent-224, score-1.016]
59 Then, i=1 d([µ(k) ], [qi ])2 = ˜ (k) 2 (k) 2 n n n (k) (k+1) − qi ˜ ≥ i=1 µ − qi ˜ ≥ i=1 d([µ(k+1) ], [qi ])2 . [sent-225, score-0.582]
60 3 Step 2: Center of an Orbit Here we find a particular element of this mean orbit so that it can be used as a template to align the given functions. [sent-228, score-0.388]
61 , qn and q, define an element q of [q] as the center ˜ of [q] with respect to the set {qi } if the warping functions {γi }, where γi = argminγ∈Γ q −(qi , γ) , ˜ have the Karcher mean γid . [sent-232, score-0.636]
62 Algorithm 2: Finding Center of an Orbit : WLOG, let q be any element of the orbit [q]. [sent-234, score-0.253]
63 For each qi find γi by solving: γi = argminγ∈Γ q − (qi , γ) . [sent-236, score-0.291]
64 That is, γi is a warping ¯ ¯ ˜ ∗ that aligns qi to q . [sent-243, score-0.725]
65 4 Steps 1-3: Complete Estimation Algorithm Consider the observation model fi = ci (g ◦ γi ) + ei , i = 1, . [sent-259, score-0.49]
66 , n, where g is an unknown signal, and ci ∈ R+ , γi ∈ Γ and ei ∈ R are random. [sent-262, score-0.187]
67 To make the system identifiable, we need some constraints on γi , ci , and ei . [sent-264, score-0.187]
68 Now we can utilize c e Algorithms 1 and 2 to present the full procedure for function alignment and signal estimation. [sent-266, score-0.259]
69 Compute the aligned SRVFs qi = (qi , γi ) and aligned functions fi = fi ◦ γi . [sent-279, score-1.121]
70 Return the warping functions {γ ∗ } and the estimated signal g = ( 1 ˆ i n i=1 Illustration. [sent-282, score-0.62]
71 We randomly generate −1 n = 50 warping functions {γi } such that {γi } are i. [sent-284, score-0.494]
72 Then we compute functions fi = ci (g ◦ γi ) + ei to form the functional data. [sent-290, score-0.594]
73 The ∗ ˜ Complete Estimation Algorithm results in the aligned functions {fi = fi ◦ γi } that are are shown in the third panel in Fig. [sent-293, score-0.475]
74 6 Based on Lemma 3 and Corollary 1, we have the following result on the Karcher mean in the quotient space S. [sent-307, score-0.205]
75 Theorem 1 For a function g, consider a sequence of functions fi (t) = ci g(γi (t)) + ei , where ci is a positive constant, ei is a constant, and γi is a time warping, i = 1, · · · , n. [sent-308, score-0.737]
76 Denote by qg n √ 1 and qi the SRVFs of g and fi , respectively, and let s = n i=1 ci . [sent-309, score-0.828]
77 ¯ ¯ i=1 Next, we present a simple fact about the Karcher mean (see Definition 3) of warping functions. [sent-315, score-0.494]
78 ¯ Theorem 1 ensures that [µ]n belongs to the orbit of [qg ] (up to a scale factor) but we are interested in estimating g itself, rather than its orbit. [sent-320, score-0.225]
79 We will show in two steps (Theorems 2 and 3) that finding the center of the orbit [µ]n leads to a consistent estimator for g. [sent-321, score-0.341]
80 If the population Karcher mean of {γi } is γid , then the center of the orbit [µ]n , denoted by µn , satisfies limn→∞ µn = E(¯)qg . [sent-324, score-0.367]
81 s This result shows that asymptotically one can recover the SRVF of the original signal by the Karcher mean of the SRVFs of the observed signals. [sent-325, score-0.181]
82 Next in Theorem 3, we will show that one can also ˜ reconstruct g using aligned functions {fi } generated by the Alignment Algorithm in Section 3. [sent-326, score-0.172]
83 ∗ ˜ Theorem 3 Under the same conditions as in Theorem 2, let γi = argminγ (qi , γ) − µn and fi = n n n 1 1 1 ∗ ˜ = E(¯)g + E(¯). [sent-327, score-0.303]
84 If we denote c = n i=1 ci and e = n i=1 ei , then limn→∞ n i=1 fi ¯ ¯ c e 5 Application to Signal Alignment In this section we will focus on function alignment and comparison of alignment performance with some previous methods on several datasets. [sent-329, score-0.798]
85 In this case, the given signals are viewed as {fi } in the previous set up and we estimate the center of the orbit and then use it for alignment of all signals. [sent-330, score-0.492]
86 Handwriting Signature Data: 20 handwritten signatures and the acceleration functions along the signature curves [8]. [sent-342, score-0.179]
87 The smoothed (using a Gaussian kernel) spike trains over 10 movement trials are used in this alignment analysis. [sent-348, score-0.19]
88 There are no standard criteria on evaluating function alignment in the current literature. [sent-349, score-0.172]
89 Here we use the following three criteria so that together they provide a comprehensive evaluation, where fi and ˜ fi , i = 1, . [sent-350, score-0.624]
90 Least Squares: ls = 1 N N i=1 ˜ (fi (t)− N 1 −1 (fi (t)− N 1 −1 ˜ fj (t))2 dt . [sent-355, score-0.171]
91 fj (t))2 dt j=i j=i ls measures the cross-sectional variance of the aligned functions, relative to original values. [sent-356, score-0.269]
92 The smaller the value of ls, the better the alignment is in general. [sent-357, score-0.154]
93 28) Figure 3: Empirical evaluation of four methods on 3 real datasets, with the alignment performance computed using three criteria (ls, pc, sls). [sent-429, score-0.172]
94 Sobolev Least Squares: sls = N i=1 N i=1 ˙ 1 ˜ (fi (t)− N (f˙i (t)− 1 N N j=1 N j=1 ˙ ˜ fj )2 dt f˙j )2 dt , This criterion measures the total cross-sectional variance of the derivatives of the aligned functions, relative to the original value. [sent-435, score-0.337]
95 From the results, we can see that the F-R method does uniformly well in functional alignment under all the evaluation metrics. [sent-440, score-0.198]
96 6 Summary In this paper we have described a parameter-free approach for reconstructing underlying signal using given functions with random warpings, scalings, and translations. [sent-448, score-0.165]
97 The basic idea is to use the FisherRao Riemannian metric and the resulting geodesic distance to define a proper distance, called elastic distance, between warping orbits of SRVF functions. [sent-449, score-0.803]
98 This distance is used to compute a Karcher mean of the orbits, and a template is selected from the mean orbit using an additional condition that the mean of the warping functions is identity. [sent-450, score-1.006]
99 By applying these warpings on the original functions, we provide a consistent estimator of the underlying signal. [sent-451, score-0.188]
100 We show the the proposed FisherRao method provides better alignment performance than the state-of-the-art methods in several real experimental data. [sent-453, score-0.154]
wordName wordTfidf (topN-words)
[('warping', 0.434), ('karcher', 0.386), ('fi', 0.303), ('qi', 0.291), ('srvfs', 0.256), ('orbit', 0.225), ('srvf', 0.183), ('alignment', 0.154), ('qg', 0.146), ('quotient', 0.145), ('orbits', 0.129), ('warpings', 0.11), ('signal', 0.105), ('ei', 0.099), ('phase', 0.099), ('df', 0.093), ('sls', 0.092), ('ci', 0.088), ('ls', 0.085), ('riemannian', 0.083), ('aligned', 0.082), ('argmin', 0.081), ('elastic', 0.079), ('id', 0.077), ('metric', 0.077), ('carrier', 0.074), ('equivalence', 0.072), ('registration', 0.064), ('demodulation', 0.064), ('distance', 0.064), ('dt', 0.061), ('functions', 0.06), ('mean', 0.06), ('signals', 0.059), ('growth', 0.059), ('amplitude', 0.058), ('fisherrao', 0.055), ('smr', 0.055), ('center', 0.054), ('variability', 0.05), ('estimation', 0.049), ('signature', 0.047), ('aligning', 0.044), ('functional', 0.044), ('estimator', 0.043), ('template', 0.043), ('geodesics', 0.042), ('pc', 0.037), ('kneip', 0.037), ('mbm', 0.037), ('spike', 0.036), ('separation', 0.036), ('cc', 0.033), ('diffeomorphisms', 0.032), ('align', 0.032), ('reconstruct', 0.03), ('panel', 0.03), ('element', 0.028), ('population', 0.028), ('male', 0.028), ('kass', 0.028), ('synchronization', 0.028), ('signatures', 0.028), ('cq', 0.026), ('tf', 0.026), ('limn', 0.026), ('absolutely', 0.025), ('wrt', 0.025), ('heights', 0.025), ('pace', 0.025), ('exponentials', 0.025), ('distances', 0.025), ('fj', 0.025), ('transformation', 0.025), ('std', 0.024), ('translations', 0.024), ('acceleration', 0.024), ('modulation', 0.024), ('scalings', 0.024), ('lemma', 0.023), ('berkeley', 0.022), ('circle', 0.021), ('lebesgue', 0.021), ('additive', 0.021), ('estimated', 0.021), ('squares', 0.02), ('proper', 0.02), ('velocity', 0.02), ('handwritten', 0.02), ('cos', 0.02), ('geometry', 0.02), ('temporally', 0.02), ('sphere', 0.019), ('notion', 0.019), ('consistent', 0.019), ('criteria', 0.018), ('instantaneous', 0.017), ('nition', 0.017), ('original', 0.016), ('nonlinear', 0.016)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000002 253 nips-2011-Signal Estimation Under Random Time-Warpings and Nonlinear Signal Alignment
Author: Sebastian A. Kurtek, Anuj Srivastava, Wei Wu
Abstract: While signal estimation under random amplitudes, phase shifts, and additive noise is studied frequently, the problem of estimating a deterministic signal under random time-warpings has been relatively unexplored. We present a novel framework for estimating the unknown signal that utilizes the action of the warping group to form an equivalence relation between signals. First, we derive an estimator for the equivalence class of the unknown signal using the notion of Karcher mean on the quotient space of equivalence classes. This step requires the use of Fisher-Rao Riemannian metric and a square-root representation of signals to enable computations of distances and means under this metric. Then, we define a notion of the center of a class and show that the center of the estimated class is a consistent estimator of the underlying unknown signal. This estimation algorithm has many applications: (1) registration/alignment of functional data, (2) separation of phase/amplitude components of functional data, (3) joint demodulation and carrier estimation, and (4) sparse modeling of functional data. Here we demonstrate only (1) and (2): Given signals are temporally aligned using nonlinear warpings and, thus, separated into their phase and amplitude components. The proposed method for signal alignment is shown to have state of the art performance using Berkeley growth, handwritten signatures, and neuroscience spike train data. 1
2 0.11156512 135 nips-2011-Information Rates and Optimal Decoding in Large Neural Populations
Author: Kamiar R. Rad, Liam Paninski
Abstract: Many fundamental questions in theoretical neuroscience involve optimal decoding and the computation of Shannon information rates in populations of spiking neurons. In this paper, we apply methods from the asymptotic theory of statistical inference to obtain a clearer analytical understanding of these quantities. We find that for large neural populations carrying a finite total amount of information, the full spiking population response is asymptotically as informative as a single observation from a Gaussian process whose mean and covariance can be characterized explicitly in terms of network and single neuron properties. The Gaussian form of this asymptotic sufficient statistic allows us in certain cases to perform optimal Bayesian decoding by simple linear transformations, and to obtain closed-form expressions of the Shannon information carried by the network. One technical advantage of the theory is that it may be applied easily even to non-Poisson point process network models; for example, we find that under some conditions, neural populations with strong history-dependent (non-Poisson) effects carry exactly the same information as do simpler equivalent populations of non-interacting Poisson neurons with matched firing rates. We argue that our findings help to clarify some results from the recent literature on neural decoding and neuroprosthetic design.
3 0.096511878 225 nips-2011-Probabilistic amplitude and frequency demodulation
Author: Richard Turner, Maneesh Sahani
Abstract: A number of recent scientific and engineering problems require signals to be decomposed into a product of a slowly varying positive envelope and a quickly varying carrier whose instantaneous frequency also varies slowly over time. Although signal processing provides algorithms for so-called amplitude- and frequencydemodulation (AFD), there are well known problems with all of the existing methods. Motivated by the fact that AFD is ill-posed, we approach the problem using probabilistic inference. The new approach, called probabilistic amplitude and frequency demodulation (PAFD), models instantaneous frequency using an auto-regressive generalization of the von Mises distribution, and the envelopes using Gaussian auto-regressive dynamics with a positivity constraint. A novel form of expectation propagation is used for inference. We demonstrate that although PAFD is computationally demanding, it outperforms previous approaches on synthetic and real signals in clean, noisy and missing data settings. 1
4 0.089402273 302 nips-2011-Variational Learning for Recurrent Spiking Networks
Author: Danilo J. Rezende, Daan Wierstra, Wulfram Gerstner
Abstract: We derive a plausible learning rule for feedforward, feedback and lateral connections in a recurrent network of spiking neurons. Operating in the context of a generative model for distributions of spike sequences, the learning mechanism is derived from variational inference principles. The synaptic plasticity rules found are interesting in that they are strongly reminiscent of experimental Spike Time Dependent Plasticity, and in that they differ for excitatory and inhibitory neurons. A simulation confirms the method’s applicability to learning both stationary and temporal spike patterns. 1
5 0.084457949 212 nips-2011-Periodic Finite State Controllers for Efficient POMDP and DEC-POMDP Planning
Author: Joni K. Pajarinen, Jaakko Peltonen
Abstract: Applications such as robot control and wireless communication require planning under uncertainty. Partially observable Markov decision processes (POMDPs) plan policies for single agents under uncertainty and their decentralized versions (DEC-POMDPs) find a policy for multiple agents. The policy in infinite-horizon POMDP and DEC-POMDP problems has been represented as finite state controllers (FSCs). We introduce a novel class of periodic FSCs, composed of layers connected only to the previous and next layer. Our periodic FSC method finds a deterministic finite-horizon policy and converts it to an initial periodic infinitehorizon policy. This policy is optimized by a new infinite-horizon algorithm to yield deterministic periodic policies, and by a new expectation maximization algorithm to yield stochastic periodic policies. Our method yields better results than earlier planning methods and can compute larger solutions than with regular FSCs.
6 0.078256637 246 nips-2011-Selective Prediction of Financial Trends with Hidden Markov Models
7 0.064574659 188 nips-2011-Non-conjugate Variational Message Passing for Multinomial and Binary Regression
8 0.059225224 205 nips-2011-Online Submodular Set Cover, Ranking, and Repeated Active Learning
9 0.055173319 47 nips-2011-Beyond Spectral Clustering - Tight Relaxations of Balanced Graph Cuts
10 0.05483076 206 nips-2011-Optimal Reinforcement Learning for Gaussian Systems
11 0.05345609 227 nips-2011-Pylon Model for Semantic Segmentation
12 0.05216267 112 nips-2011-Heavy-tailed Distances for Gradient Based Image Descriptors
13 0.051104102 13 nips-2011-A blind sparse deconvolution method for neural spike identification
14 0.051040728 158 nips-2011-Learning unbelievable probabilities
15 0.048465114 37 nips-2011-Analytical Results for the Error in Filtering of Gaussian Processes
16 0.048129901 84 nips-2011-EigenNet: A Bayesian hybrid of generative and conditional models for sparse learning
17 0.047727112 164 nips-2011-Manifold Precis: An Annealing Technique for Diverse Sampling of Manifolds
18 0.047371306 306 nips-2011-t-divergence Based Approximate Inference
19 0.04676412 27 nips-2011-Advice Refinement in Knowledge-Based SVMs
20 0.046113603 8 nips-2011-A Model for Temporal Dependencies in Event Streams
topicId topicWeight
[(0, 0.131), (1, 0.02), (2, 0.047), (3, -0.033), (4, 0.004), (5, 0.011), (6, -0.064), (7, -0.025), (8, 0.004), (9, 0.0), (10, -0.026), (11, 0.028), (12, 0.004), (13, -0.005), (14, -0.05), (15, -0.031), (16, 0.01), (17, 0.011), (18, -0.017), (19, 0.009), (20, 0.031), (21, 0.033), (22, 0.132), (23, -0.003), (24, 0.069), (25, 0.056), (26, -0.085), (27, -0.013), (28, 0.009), (29, 0.031), (30, 0.009), (31, 0.082), (32, 0.038), (33, 0.049), (34, 0.001), (35, 0.085), (36, 0.038), (37, 0.036), (38, 0.117), (39, 0.184), (40, -0.001), (41, 0.094), (42, 0.031), (43, -0.064), (44, -0.005), (45, 0.053), (46, 0.119), (47, -0.032), (48, 0.018), (49, -0.043)]
simIndex simValue paperId paperTitle
same-paper 1 0.92761981 253 nips-2011-Signal Estimation Under Random Time-Warpings and Nonlinear Signal Alignment
Author: Sebastian A. Kurtek, Anuj Srivastava, Wei Wu
Abstract: While signal estimation under random amplitudes, phase shifts, and additive noise is studied frequently, the problem of estimating a deterministic signal under random time-warpings has been relatively unexplored. We present a novel framework for estimating the unknown signal that utilizes the action of the warping group to form an equivalence relation between signals. First, we derive an estimator for the equivalence class of the unknown signal using the notion of Karcher mean on the quotient space of equivalence classes. This step requires the use of Fisher-Rao Riemannian metric and a square-root representation of signals to enable computations of distances and means under this metric. Then, we define a notion of the center of a class and show that the center of the estimated class is a consistent estimator of the underlying unknown signal. This estimation algorithm has many applications: (1) registration/alignment of functional data, (2) separation of phase/amplitude components of functional data, (3) joint demodulation and carrier estimation, and (4) sparse modeling of functional data. Here we demonstrate only (1) and (2): Given signals are temporally aligned using nonlinear warpings and, thus, separated into their phase and amplitude components. The proposed method for signal alignment is shown to have state of the art performance using Berkeley growth, handwritten signatures, and neuroscience spike train data. 1
2 0.52011871 135 nips-2011-Information Rates and Optimal Decoding in Large Neural Populations
Author: Kamiar R. Rad, Liam Paninski
Abstract: Many fundamental questions in theoretical neuroscience involve optimal decoding and the computation of Shannon information rates in populations of spiking neurons. In this paper, we apply methods from the asymptotic theory of statistical inference to obtain a clearer analytical understanding of these quantities. We find that for large neural populations carrying a finite total amount of information, the full spiking population response is asymptotically as informative as a single observation from a Gaussian process whose mean and covariance can be characterized explicitly in terms of network and single neuron properties. The Gaussian form of this asymptotic sufficient statistic allows us in certain cases to perform optimal Bayesian decoding by simple linear transformations, and to obtain closed-form expressions of the Shannon information carried by the network. One technical advantage of the theory is that it may be applied easily even to non-Poisson point process network models; for example, we find that under some conditions, neural populations with strong history-dependent (non-Poisson) effects carry exactly the same information as do simpler equivalent populations of non-interacting Poisson neurons with matched firing rates. We argue that our findings help to clarify some results from the recent literature on neural decoding and neuroprosthetic design.
3 0.49364263 306 nips-2011-t-divergence Based Approximate Inference
Author: Nan Ding, Yuan Qi, S.v.n. Vishwanathan
Abstract: Approximate inference is an important technique for dealing with large, intractable graphical models based on the exponential family of distributions. We extend the idea of approximate inference to the t-exponential family by defining a new t-divergence. This divergence measure is obtained via convex duality between the log-partition function of the t-exponential family and a new t-entropy. We illustrate our approach on the Bayes Point Machine with a Student’s t-prior. 1
4 0.46475756 225 nips-2011-Probabilistic amplitude and frequency demodulation
Author: Richard Turner, Maneesh Sahani
Abstract: A number of recent scientific and engineering problems require signals to be decomposed into a product of a slowly varying positive envelope and a quickly varying carrier whose instantaneous frequency also varies slowly over time. Although signal processing provides algorithms for so-called amplitude- and frequencydemodulation (AFD), there are well known problems with all of the existing methods. Motivated by the fact that AFD is ill-posed, we approach the problem using probabilistic inference. The new approach, called probabilistic amplitude and frequency demodulation (PAFD), models instantaneous frequency using an auto-regressive generalization of the von Mises distribution, and the envelopes using Gaussian auto-regressive dynamics with a positivity constraint. A novel form of expectation propagation is used for inference. We demonstrate that although PAFD is computationally demanding, it outperforms previous approaches on synthetic and real signals in clean, noisy and missing data settings. 1
5 0.43567502 158 nips-2011-Learning unbelievable probabilities
Author: Xaq Pitkow, Yashar Ahmadian, Ken D. Miller
Abstract: Loopy belief propagation performs approximate inference on graphical models with loops. One might hope to compensate for the approximation by adjusting model parameters. Learning algorithms for this purpose have been explored previously, and the claim has been made that every set of locally consistent marginals can arise from belief propagation run on a graphical model. On the contrary, here we show that many probability distributions have marginals that cannot be reached by belief propagation using any set of model parameters or any learning algorithm. We call such marginals ‘unbelievable.’ This problem occurs whenever the Hessian of the Bethe free energy is not positive-definite at the target marginals. All learning algorithms for belief propagation necessarily fail in these cases, producing beliefs or sets of beliefs that may even be worse than the pre-learning approximation. We then show that averaging inaccurate beliefs, each obtained from belief propagation using model parameters perturbed about some learned mean values, can achieve the unbelievable marginals. 1
6 0.429636 13 nips-2011-A blind sparse deconvolution method for neural spike identification
7 0.42686218 37 nips-2011-Analytical Results for the Error in Filtering of Gaussian Processes
8 0.42609629 206 nips-2011-Optimal Reinforcement Learning for Gaussian Systems
9 0.40629438 8 nips-2011-A Model for Temporal Dependencies in Event Streams
10 0.38124767 188 nips-2011-Non-conjugate Variational Message Passing for Multinomial and Binary Regression
11 0.37286416 248 nips-2011-Semi-supervised Regression via Parallel Field Regularization
12 0.37270051 2 nips-2011-A Brain-Machine Interface Operating with a Real-Time Spiking Neural Network Control Algorithm
13 0.36854202 238 nips-2011-Relative Density-Ratio Estimation for Robust Distribution Comparison
14 0.36799118 167 nips-2011-Maximum Covariance Unfolding : Manifold Learning for Bimodal Data
15 0.36281639 296 nips-2011-Uniqueness of Belief Propagation on Signed Graphs
16 0.36019728 284 nips-2011-The Impact of Unlabeled Patterns in Rademacher Complexity Theory for Kernel Classifiers
17 0.3544268 69 nips-2011-Differentially Private M-Estimators
18 0.35021481 123 nips-2011-How biased are maximum entropy models?
19 0.34672588 295 nips-2011-Unifying Non-Maximum Likelihood Learning Objectives with Minimum KL Contraction
20 0.33719406 24 nips-2011-Active learning of neural response functions with Gaussian processes
topicId topicWeight
[(0, 0.028), (4, 0.021), (20, 0.026), (26, 0.029), (31, 0.11), (33, 0.069), (43, 0.058), (45, 0.072), (51, 0.256), (57, 0.045), (74, 0.054), (83, 0.089), (99, 0.044)]
simIndex simValue paperId paperTitle
1 0.84333843 89 nips-2011-Estimating time-varying input signals and ion channel states from a single voltage trace of a neuron
Author: Ryota Kobayashi, Yasuhiro Tsubo, Petr Lansky, Shigeru Shinomoto
Abstract: State-of-the-art statistical methods in neuroscience have enabled us to fit mathematical models to experimental data and subsequently to infer the dynamics of hidden parameters underlying the observable phenomena. Here, we develop a Bayesian method for inferring the time-varying mean and variance of the synaptic input, along with the dynamics of each ion channel from a single voltage trace of a neuron. An estimation problem may be formulated on the basis of the state-space model with prior distributions that penalize large fluctuations in these parameters. After optimizing the hyperparameters by maximizing the marginal likelihood, the state-space model provides the time-varying parameters of the input signals and the ion channel states. The proposed method is tested not only on the simulated data from the Hodgkin−Huxley type models but also on experimental data obtained from a cortical slice in vitro. 1
2 0.82952243 3 nips-2011-A Collaborative Mechanism for Crowdsourcing Prediction Problems
Author: Jacob D. Abernethy, Rafael M. Frongillo
Abstract: Machine Learning competitions such as the Netflix Prize have proven reasonably successful as a method of “crowdsourcing” prediction tasks. But these competitions have a number of weaknesses, particularly in the incentive structure they create for the participants. We propose a new approach, called a Crowdsourced Learning Mechanism, in which participants collaboratively “learn” a hypothesis for a given prediction task. The approach draws heavily from the concept of a prediction market, where traders bet on the likelihood of a future event. In our framework, the mechanism continues to publish the current hypothesis, and participants can modify this hypothesis by wagering on an update. The critical incentive property is that a participant will profit an amount that scales according to how much her update improves performance on a released test set. 1
same-paper 3 0.76460344 253 nips-2011-Signal Estimation Under Random Time-Warpings and Nonlinear Signal Alignment
Author: Sebastian A. Kurtek, Anuj Srivastava, Wei Wu
Abstract: While signal estimation under random amplitudes, phase shifts, and additive noise is studied frequently, the problem of estimating a deterministic signal under random time-warpings has been relatively unexplored. We present a novel framework for estimating the unknown signal that utilizes the action of the warping group to form an equivalence relation between signals. First, we derive an estimator for the equivalence class of the unknown signal using the notion of Karcher mean on the quotient space of equivalence classes. This step requires the use of Fisher-Rao Riemannian metric and a square-root representation of signals to enable computations of distances and means under this metric. Then, we define a notion of the center of a class and show that the center of the estimated class is a consistent estimator of the underlying unknown signal. This estimation algorithm has many applications: (1) registration/alignment of functional data, (2) separation of phase/amplitude components of functional data, (3) joint demodulation and carrier estimation, and (4) sparse modeling of functional data. Here we demonstrate only (1) and (2): Given signals are temporally aligned using nonlinear warpings and, thus, separated into their phase and amplitude components. The proposed method for signal alignment is shown to have state of the art performance using Berkeley growth, handwritten signatures, and neuroscience spike train data. 1
4 0.69044685 287 nips-2011-The Manifold Tangent Classifier
Author: Salah Rifai, Yann N. Dauphin, Pascal Vincent, Yoshua Bengio, Xavier Muller
Abstract: We combine three important ideas present in previous work for building classifiers: the semi-supervised hypothesis (the input distribution contains information about the classifier), the unsupervised manifold hypothesis (data density concentrates near low-dimensional manifolds), and the manifold hypothesis for classification (different classes correspond to disjoint manifolds separated by low density). We exploit a novel algorithm for capturing manifold structure (high-order contractive auto-encoders) and we show how it builds a topological atlas of charts, each chart being characterized by the principal singular vectors of the Jacobian of a representation mapping. This representation learning algorithm can be stacked to yield a deep architecture, and we combine it with a domain knowledge-free version of the TangentProp algorithm to encourage the classifier to be insensitive to local directions changes along the manifold. Record-breaking classification results are obtained. 1
5 0.64718574 87 nips-2011-Energetically Optimal Action Potentials
Author: Martin B. Stemmler, Biswa Sengupta, Simon Laughlin, Jeremy Niven
Abstract: Most action potentials in the nervous system take on the form of strong, rapid, and brief voltage deflections known as spikes, in stark contrast to other action potentials, such as in the heart, that are characterized by broad voltage plateaus. We derive the shape of the neuronal action potential from first principles, by postulating that action potential generation is strongly constrained by the brain’s need to minimize energy expenditure. For a given height of an action potential, the least energy is consumed when the underlying currents obey the bang-bang principle: the currents giving rise to the spike should be intense, yet short-lived, yielding spikes with sharp onsets and offsets. Energy optimality predicts features in the biophysics that are not per se required for producing the characteristic neuronal action potential: sodium currents should be extraordinarily powerful and inactivate with voltage; both potassium and sodium currents should have kinetics that have a bell-shaped voltage-dependence; and the cooperative action of multiple ‘gates’ should start the flow of current. 1 The paradox Nerve cells communicate with each other over long distances using spike-like action potentials, which are brief electrical events traveling rapidly down axons and dendrites. Each action potential is caused by an accelerating influx of sodium or calcium ions, depolarizing the cell membrane by forty millivolts or more, followed by repolarization of the cell membrane caused by an efflux of potassium ions. As different species of ions are swapped across the membrane during the action potential, ion pumps shuttle the excess ions back and restore the ionic concentration gradients. If we label each ionic species by α, the work ∆E done to restore the ionic concentration gradients is [α] ∆E = RT V ∆[α]in ln out , (1) [α]in α where R is the gas constant, T is the temperature in Kelvin, V is the cell volume, [α]in|out is the concentration of ion α inside or outside the cell, and ∆[α]in is the concentration change inside the cell, which is assumed to be small relative to the total concentration. The sum α zα ∆[α] = 0, where zα is the charge on ion α, as no net charge accumulates during the action potential and no net work is done by or on the electric field. Often, sodium (Na+ ) and potassium (K+ ) play the dominant role in generating action potentials, in which case ∆E = ∆[Na]in F V(ENa − EK ), where F is Faraday’s constant, ENa = RT /F ln [Na]out /[Na]in is the reversal potential for Na+ , at which no net sodium current flows, and EK = RT /F ln [K]out /[K]in . This estimate of the work done does not include heat (due to loss through the membrane resistance) or the work done by the ion channel proteins in changing their conformational state during the action potential. Hence, the action potential’s energetic cost to the cell is directly proportional to ∆[Na]in ; taking into account that each Na+ ion carries one elementary charge, the cost is also proportional to the 1 charge QNa that accumulates inside the cell. A maximally efficient cell reduces the charge per spike to a minimum. If a cell fires action potentials at an average rate f , the cell’s Na/K pumps must move Na+ and K+ ions in opposite directions, against their respective concentration gradients, to counteract an average inward Na+ current of f QNa . Exhaustive measurements on myocytes in the heart, which expend tremendous amounts of energy to keep the heart beating, indicate that Na/K pumps expel ∼ 0.5 µA/cm2 of Na+ current at membrane potentials close to rest [1]. Most excitable cells, even when spiking, spend most of their time close to resting potential, and yet standard models for action potentials can easily lead to accumulating an ionic charge of up to 5 µC/cm2 [2]; most of this accumulation occurs during a very brief time interval. If one were to take an isopotential nerve cell with the same density of ion pumps as in the heart, then such a cell would not be able to produce more than an action potential once every ten seconds on average. The brain should be effectively silent. Clearly, this conflicts with what is known about the average firing rates of neurons in the brainstem or even the neocortex, which can sustain spiking up to at least 7 Hz [3]. Part of the discrepancy can be resolved by noting that nerve cells are not isopotential and that action potential generation occurs within a highly restricted area of the membrane. Even so, standard models of action potential generation waste extraordinary amounts of energy; recent evidence [4] points out that many mammalian cortical neurons are much more efficient. As nature places a premium on energy consumption, we will argue that one can predict both the shape of the action potential and the underlying biophysics of the nonlinear, voltage-dependent ionic conductances from the principle of minimal energy consumption. After reviewing the ionic basis of action potentials, we first sketch how to compute the minimal energy cost for an arbitrary spike shape, and then solve for the optimal action potential shape with a given height. Finally, we show how minimal energy consumption explains all the dynamical features in the standard HodgkinHuxley (HH) model for neuronal dynamics that distinguish the brain’s action potentials from other highly nonlinear oscillations in physics and chemistry. 2 Ionic basis of the action potential In an excitable cell, synaptic drive forces the membrane permeability to different ions to change rapidly in time, producing the dynamics of the action potential. The current density Iα carried by an ion species α is given by the Goldman-Hodgkin-Katz (GHK) current equation[5, 6, 2], which assumes that ions are driven independently across the membrane under the influence of a constant electric field. Iα depends upon the ions membrane permeability, Pα , its concentrations on either side of the membrane [α]out and [α]in and the voltage across the membrane, V , according to: Iα = Pα 2 zα V F 2 [α]out − [α]in exp (zα V F/RT ) , RT 1 − exp(zα V F/RT ) (2) To produce the fast currents that generate APs, a subset of the membranes ionic permeabilities Pα are gated by voltage. Changes in the permeability Pα are not instantaneous; the voltage-gated permeability is scaled mathematically by gating variables m(t) and h(t) with their own time dependence. After separating constant from time-dependent components in the permeability, the voltage-gated permeability obeys ¯ Pα (t) = m(t)r h(t)s such that 0 ≤ Pα (t) ≤ Pα , ¯ where r and s are positive, and Pα is the peak permeability to ion α when all channels for ion α are open. Gating is also referred to as activation, and the associated nonlinear permeabilities are called active. There are also passive, voltage-insensitive permeabilities that maintain the resting potential and depolarise the membrane to trigger action potentials. The simplest possible kinetics for the gating variables are first order, involving only a single derivative in time. The steady state of each gating variable at a given voltage is determined by a Boltzmann function, to which the gating variables evolve: dm r ¯ τm = Pα m∞ (V ) − m(t) dt dh and τh =h∞ (V ) − h(t), dt 2 −1 with m∞ (V ) = {1 + exp ((V − Vm )/sm )} the Boltzmann function described by the slope sm > −1 0 and the midpoint Vm ; similarly, h∞ (V ) = {1 + exp ((V − Vh )/sh )} , but with sh < 0. Scaling ¯ m∞ (V ) by the rth root of the peak permeability Pα is a matter of mathematical convenience. We will consider both voltage-independent and voltage-dependent time constants, either setting τj = τj,0 to be constant, where j ∈ {m(t), h(t)}, or imposing a bell-shaped voltage dependence τj (V ) = τj,0 sech [sj (V − Vj )] The synaptic, leak, and voltage-dependent currents drive the rate of change in the voltage across the membrane dV C = Isyn + Ileak + Iα , dt α where the synaptic permeability and leak permeability are held constant. 3 Resistive and capacitive components of the energy cost By treating the action potential as the charging and discharging of the cell membrane capacitance, the action potentials measured at the mossy fibre synapse in rats [4] or in mouse thalamocortical neurons [7] were found to be highly energy-efficient: the nonlinear, active conductances inject only slightly more current than is needed to charge a capacitor to the peak voltage of the action potential. The implicit assumption made here is that one can neglect the passive loss of current through the membrane resistance, known as the leak. Any passive loss must be compensated by additional charge, making this loss the primary target of the selection pressure that has shaped the dynamics of action potentials. On the other hand, the membrane capacitance at the site of AP initiation is generally modelled and experimentally confirmed [8] as being fairly constant around 1 µF/cm2 ; in contrast, the propagation, but not generation, of AP’s can be assisted by a reduction in the capacitance achieved by the myelin sheath that wraps some axons. As myelin would block the flow of ions, we posit that the specific capacitance cannot yield to selection pressure to minimise the work W = QNa (ENa − EK ) needed for AP generation. To address how the shape and dynamics of action potentials might have evolved to consume less energy, we first fix the action potential’s shape and solve for the minimum charge QNa ab initio, without treating the cell membrane as a pure capacitor. Regardless of the action potential’s particular time-course V (t), voltage-dependent ionic conductances must transfer Na+ and K+ charge to elicit an action potential. Figure 1 shows a generic action potential and the associated ionic currents, comparing the latter to the minimal currents required. The passive equivalent circuit for the neuron consists of a resistor in parallel with a capacitor, driven by a synaptic current. To charge the membrane to the peak voltage, a neuron in a high-conductance state [9, 10] may well lose more charge through the resistor than is stored on the capacitor. For neurons in a low-conductance state and for rapid voltage deflections from the resting potential, membrane capacitance will be the primary determinant of the charge. 4 The norm of spikes How close can voltage-gated channels with realistic properties come to the minimal currents? What time-course for the action potential leads to the smallest minimal currents? To answer these questions, we must solve a constrained optimization problem on the solutions to the nonlinear differential equations for the neuronal dynamics. To separate action potentials from mere small-amplitude oscillations in the voltage, we need to introduce a metric. Smaller action potentials consume less energy, provided the underlying currents are optimal, yet signalling between neurons depends on the action potential’s voltage deflection reaching a minimum amplitude. Given the importance of the action potential’s amplitude, we define an Lp norm on the voltage wave-form V (t) to emphasize the maximal voltage deflection: 1 p T V (t) − V p V (t) − V = 0 3 p dt , Generic Action Potential -10 + a V [mV] -20 -30 gsyn -40 -50 -60 0 2 4 6 8 t [ms] 10 12 14 16 gNa Active and Minimal Currents 100 gK + gleak C + + 80 2 current [µA/cm ] 60 b Active IK Minimum IK 40 20 0 -20 For a fixed action potential waveform V (t): Active INa Minimum INa -40 -60 Minimum INa (t) = −LV (t)θ(LV (t)) Minimum IK (t) = −LV (t)θ(−LV (t)) -80 -100 0 2 4 6 8 10 t [ms] 12 14 ˙ with LV (t) ≡ C V (t) + Ileak [V (t)] + Isyn [V (t)]. 16 c Qresistive/Qcapacitive Resistive vs. Capacitive Minimum Charge 1 0.5 0 0.2 0.4 0.6 0.8 1.0 1.2 leak conductance [mS/cm2] 1.4 Figure 1: To generate an action potential with an arbitrary time-course V (t), the nonlinear, timedependent permeabilities must deliver more charge than just to load the membrane capacitance— resistive losses must be compensated. (a) The action potential’s time-course in a generic HH model for a neuron, represented by the circuit diagram on the right. The peak of the action potential is ∼ 50 mV above the average potential. (b) The inward Na+ current, shown in green going in the negative direction, rapidly depolarizes the potential V (t) and yields the upstroke of the action potential. Concurrently, the K+ current activates, displayed as a positive deflection, and leads to the downstroke in the potential V (t). Inward and outward currents overlap significantly in time. The dotted lines within the region bounded by the solid lines represent the minimal Na+ current and the minimal K+ current needed to produce the V (t) spike waveform in (a). By the law of current conservation, the sum of capacitive, resistive, and synaptic currents, denoted by ˙ LV (t) ≡ C V (t) + Ileak [V (t)] + Isyn [V (t)], must be balanced by the active currents. If the cell’s passive properties, namely its capacitance and (leak) resistance, and the synaptic conductance are constant, we can deduce the minimal active currents needed to generate a specified V (t). The minimal currents, by definition, do not overlap in time. Taking into account passive current flow, restoring the concentration gradients after the action potential requires 29 nJ/cm2 . By contrast, if the active currents were optimal, the cost would be 8.9 nJ/cm2 . (c) To depolarize from the minimum to the maximum of the AP, the synaptic voltage-gated currents must deliver a charge Qcapacitive to charge the membrane capacitance and a charge Qresistive to compensate for the loss of current through leak channels. For a large leak conductance in the cell membrane, Qresistive can be larger than Qcapacitive . 4 where V is the average voltage. In the limit as p → ∞, the norm simply becomes the difference between the action potential’s peak voltage and the mean voltage, whereas a finite p ensures that the norm is differentiable. In parameter space, we will focus our attention to the manifold of action potentials with constant Lp norm with 2 p < ∞, which entails that the optimal action potential will have a finite, though possibly narrow width. To be close to the supremum norm, yet still have a norm that is well-behaved under differentiation, we decided to use p = 16. 5 Poincar´ -Lindstedt perturbation of periodic dynamical orbits e Standard (secular) perturbation theory diverges for periodic orbits, so we apply the PoincarLindstedt technique of expanding both in the period and the dynamics of the asymptotic orbit and then derive a set of adjoint sensitivity equations for the differential-algebraic system. Solving once for the adjoint functions, we can easily compute the parameter gradient of any functional on the orbit, even for thousands of parameters. ˙ We start with a set of ordinary differential equations x = F(x; p) for the neuron’s dynamics, an asymptotically periodic orbit xγ (t) that describes the action potential, and a functional G(x; p) on the orbit, representing the energy consumption, for instance. The functional can be written as an integral ω(p)−1 G(xγ ; p) = g(xγ (t); p) dt, 0 over some source term g(xγ (t); p). Assume that locally perturbing a parameter p ∈ p induces a smooth change in the stable limit cycle, preserving its existence. Generally, a perturbation changes not only the limit cycle’s path in state space, but also the average speed with which this orbit is traversed; as a consequence, the value of the functional depends on this change in speed, to lowest order. For simplicity, consider a single, scalar parameter p. G(xγ ; p) is the solution to ω(p)∂τ [G(xγ ; p)] = g(xγ ; p), where we have normalised time via τ = ω(p)t. Denoting partial derivatives by subscripts, we expand p → p + to get the O 1 equation dτ [Gp (xγ ; p)] + ωp g(xγ ; p) = gx (xγ ; p)xp + gp (xγ ; p) in a procedure known as the Poincar´ -Lindstedt method. Hence, e dG = dp ω −1 (gp + gx xp − ωp g) dt, 0 where, once again by the Poincar´ -Lindstedt method, xp is the solution to e ˙ xp =Fx (xγ )xp + Fp (xγ ) − ωp F (xγ ) . Following the approach described by Cao, Li, Petzold, and Serban (2003), introduce a Lagrange vector AG (x) and consider the augmented objective function ω −1 I(xγ ; p) = G(xγ ; p) − ˙ AG (xγ ). (F(xγ ) − xγ ) dt, 0 γ ˙ which is identical to G(x ; p) as F(x) − x = 0. Then dI(xγ ; p) = dp ω −1 ω −1 ˙ AG . (Fp + Fx xp − ωp F − xp ) dt. (gp + gx xp − ωp g) dt − 0 0 ˙ Integrating the AG (x).xp term by parts and using periodicity, we get dI(xγ ; p) = dp ω −1 ω −1 ˙ −gx + AG + AG .F xp dt. G gp − ωp g − A . (Fp − ωp F) dt − 0 0 5 Parameter ¯ peak permeability PNa ¯ peak permeability PK midpoint voltage Vm ∨ Vh slope sm ∨ (−sh ) time constant τm,0 ∨ τh,0 gating exponent r ∨ s minimum 0.24 fm/s 6.6 fm/s - 72 mV 3.33 mV 5 µs 0.2 maximum 0.15 µm/s 11 µm/s 70 mV 200 mV 200 ms 5.0 Table 1: Parameter limits. We can let the second term vanish by making the vector AG (x) obey ˙ AG (x) = −FT (x; p) AG (x) + gx (x; p). x Label the homogeneous solution (obtained by setting gx (xγ ; p) = 0) as Z(x). It is known that ω −1 the term ωp is given by ωp = ω 0 Z(x).Fp (x) dt, provided Z(x) is normalised to satisfy Z(x).F(x) = 1. We can add any multiple of the homogeneous solution Z(x) to the inhomogeneous solution, so we can always make ω −1 AG (x).F(x) dt = G 0 by taking ω −1 G G AG (x).F(x) dt − ωG . A (x) → A (x) − Z(x) (3) 0 This condition will make AG (x) unique. Finally, with eq. (3) we get dI(xγ ; p) dG(xγ ; p) = = dp dp ω −1 gp − AG . Fp dt. 0 The first term in the integral gives rise to the partial derivative ∂G(xγ ; p)/ ∂p. In many cases, this term is either zero, can be made zero, or at least made independent of the dynamical variables. The parameters for the neuron models are listed in Table 1 together with their minimum and maximum allowed values. For each parameter in the neuron model, an auxiliary parameter on the entire real line is introduced, and a mapping from the real line onto the finite range set by the biophysical limits is defined. Gradient descent on this auxiliary parameter space is performed by orthogonalizing the gradient dQα /dp to the gradient dL/dp of the norm. To correct for drift off the constraint manifold of constant norm, illustrated in Fig. 3, steps of gradient ascent or descent on the Lp norm are performed while keeping Qα constant. The step size during gradient descent is adjusted to assure that ∆Qα < 0 and that a periodic solution xγ exists after adapting the parameters. The energy landscape is locally convex (Fig. 3). 6 Predicting the Hodgkin-Huxley model We start with a single-compartment Goldman-Hodgkin-Katz model neuron containing voltage-gated Na+ and leak conductances (Figure 1). A tonic synaptic input to the model evokes repetitive firing of action potentials. We seek those parameters that minimize the ionic load for an action potential of constant norm—in other words, spikes whose height relative to the average voltage is fairly constant, subject to a trade-off with the spike width. The ionic load is directly proportional to the work W performed by the ion flux. All parameters governing the ion channels’ voltage dependence and kinetics, including their time constants, mid-points, slopes, and peak values, are subject to change. The simplest model capable of generating an action potential must have two dynamical variables and two time scales: one for the upstroke and another for the downstroke. If both Na+ and K+ currents 6 Transient Na Current Model Optimal Action Potential Falling Phase Currents 40 20 a τ [ms] 5 1 2 Q = 239 nC/cm PNa = m(t)h(t) PK = n(t) 0 -60 0 -20 τh τn 60 current [μA/cm2] V [mV] V [mV] -40 IK[V] Excess INa[V] Peak Resurgence 300 200 100 -60 -4 -2 0 0 4 40 20 τ [ms] 5 1 Q = 169 nC/cm2 PNa = m(t)h(t) PK = n(t) τi = τi(V) 0 -60 0 -20 τh τn 60 current [μA/cm2] 60 V [mV] -40 -60 -4 -2 0 2 t [ms] 0.5 0.75 IK[V] Excess INa[V] Peak Resurgence 200 100 0 4 0.25 40 5 1 PNa = m(t)h(t) s PK = n(t) τi = τi(V) 20 0 delay τ [ms] Q = 156 nC/cm2 current [μA/cm2] 60 -60 0 -20 τh τn 60 V [mV] -40 -60 t [ms] 0.5 t [ms] 0.75 Cooperative Gating Model Optimal Action Potential Falling Phase Currents V [mV] c 0.25 Voltage-dependent (In)activation Model Falling Phase Currents Optimal Action Potential V [mV] b 2 t [ms] -2 -1 0 t [ms] 1 750 500 250 0 0 2 IK[V] Excess INa[V] Peak Resurgence 0.2 t [ms] 0.4 Figure 2: Optimal spike shapes and currents for neuron models with different biophysical features. During optimization, the spikes were constrained to have constant norm V (t) − V 16 = 92 mV, which controls the height of the spike. Insets in the left column display the voltage-dependence of the optimized time constants for sodium inactivation and potassium activation; sodium activation is modeled as occurring instantaneously. (a) Model with voltage-dependent inactivation of Na+ ; time constants for the first order permeability kinetics are voltage-independent (inset). Inactivation turns off the Na+ current on the downstroke, but not completely: as the K+ current activates to repolarize the membrane, the inward Na+ current reactivates and counteracts the K+ current; the peak of the resurgent Na+ current is marked by a triangle. (b) Model with voltage-dependent time constants for the first order kinetics of activation and inactivation. The voltage dependence minimizes the resurgence of the Na+ current. (c) Power-law gating model with an inwardly rectifying potassium current replacing the leak current. The power law dependence introduces an effective delay in the onset of the K+ current, which further minimizes the overlap of Na+ and K+ currents in time. 7 Energy per Spike Surface of Constant Norm Spikes ya 10 16 V [mV] K 18 10 12 14 14 16 T b V [mV] 0 t [ms] 2 10 18 16 12 s [mV] K V [mV] K T a V [mV] 12 nJ/cm2 ≥ 16.5 16.3 16.3 yc 16 sK [mV] 100 0 -2 16.4 T c 100 0 -2 V [mV] 14 14 VE [nJ/cm2] yc ya τK [ms] yb 12 16.5 yb 20 0 t [ms] 2 100 0 -2 0 t [ms] 2 Figure 3: The energy required for an action potential three parameters governing potassium activation: the midpoint voltage VK , the slope sK , and the (maximum) time constant τK . The energy is the minimum work required to restore the ionic concentration gradients, as given by Eq. (1). Note that the energy within the constrained manifold of constant norm spikes is locally convex. are persistent, current flows in opposite directions at the same time, so that, even at the optimum, the ionic load is 1200 nC/cm2 . On the other hand, no voltage-gated K+ channels are even required for a spike, as long as Na+ channels activate on a fast time scale and inactivate on a slower time scale and the leak is powerful enough to repolarize the neuron. Even so, the load is still 520 nC/cm2 . While spikes require dynamics on two time scales, suppressing the overlap between inward and outward currents calls for a third time scale. The resulting dynamics are higher-dimensional and reduce the load to to 239 nC/cm2 . Making the activation and inactivation time constants voltage-dependent permits ion channels to latch to an open or closed state during the rising and falling phase of the spike, reducing the ionic load to 189 nC/cm2 (Fig. 2) . The minimal Na+ and K+ currents are separated in time, yet dynamics that are linear in the activation variables cannot enforce a true delay between the offset of the Na+ current and the onset of the K+ current. If current flow depends on multiple gates that need to be activated simultaneously, optimization can use the nonlinearity of multiplication to introduce a delay in the rise of the K+ current that abolishes the overlap, and the ionic load drops to 156 nC/cm2 . Any number of kinetic schemes for the nonlinear permeabilities Pα can give rise to the same spike waveform V (t), including the simplest two-dimensional one. Yet only the full Hodgkin-Huxley (HH) model, with its voltage-dependent kinetics that prevent the premature resurgence of inward current and cooperative gating that delays the onset of the outward current, minimizes the energetic cost. More complex models, in which voltage-dependent ion channels make transitions between multiple closed, inactivated, and open states, instantiate the energy-conserving features of the HH system at the molecular level. Furthermore, features that are eliminated during optimization, such as a voltage-dependent inactivation of the outward potassium current, are also not part of the delayed rectifier potassium current in the Hodgkin-Huxley framework. 8 References [1] Paul De Weer, David C. Gadsby, and R. F. Rakowski. Voltage dependence of the na-k pump. Ann. Rev. Physiol., 50:225–241, 1988. [2] B. Frankenhaeuser and A. F. Huxley. The action potential in the myelinated nerve fibre of xenopus laevis as computed on the basis of voltage clamp data. J. Physiol., 171:302–315, 1964. [3] Samuel S.-H. Wang, Jennifer R. Shultz, Mark J. Burish, Kimberly H. Harrison, Patrick R. Hof, Lex C. Towns, Matthew W. Wagers, and Krysta D. Wyatt. Functional trade-offs in white matter axonal scaling. J. Neurosci., 28(15):4047–4056, 2008. [4] Henrik Alle, Arnd Roth, and J¨ rg R. P. Geiger. Energy-efficient action potentials in hippocamo pal mossy fibers. Science, 325(5946):1405–1408, 2009. [5] D. E. Goldman. Potential, impedance and rectification in membranes. J. Gen. Physiol., 27:37– 60, 1943. [6] A. L. Hodgkin and B. Katz. The effect of sodium ions on the electrical activity of the giant axon of the squid. J. Physiol., 108:37–77, 1949. [7] Brett C. Carter and Bruce P. Bean. Sodium entry during action potentials of mammalian neurons: Incomplete inactivation and reduced metabolic efficiency in fast-spiking neurons. Neuron, 64(6):898–909, 2009. [8] Luc J. Gentet, Greg J. Stuart, and John D. Clements. Direct measurement of specific membrane capacitance in neurons. Biophys. J., 79:314–320, 2000. [9] Alain Destexhe, Michael Rudolph, and Denis Par´ . The high-conductance state of neocortical e neurons in vivo. Nature Neurosci. Rev., 4:739–751, 2003. [10] Bilal Haider and David A. McCormick. Rapid neocortical dynamics: Cellular and network mechanisms. Neuron, 62:171–189, 2009. 9
6 0.59295315 99 nips-2011-From Stochastic Nonlinear Integrate-and-Fire to Generalized Linear Models
7 0.58180314 133 nips-2011-Inferring spike-timing-dependent plasticity from spike train data
8 0.58168209 266 nips-2011-Spatial distance dependent Chinese restaurant processes for image segmentation
9 0.5793519 249 nips-2011-Sequence learning with hidden units in spiking neural networks
10 0.57449847 57 nips-2011-Comparative Analysis of Viterbi Training and Maximum Likelihood Estimation for HMMs
11 0.56917048 135 nips-2011-Information Rates and Optimal Decoding in Large Neural Populations
12 0.56902206 66 nips-2011-Crowdclustering
13 0.56896567 219 nips-2011-Predicting response time and error rates in visual search
14 0.56848645 188 nips-2011-Non-conjugate Variational Message Passing for Multinomial and Binary Regression
15 0.56614918 102 nips-2011-Generalised Coupled Tensor Factorisation
16 0.56496722 227 nips-2011-Pylon Model for Semantic Segmentation
17 0.56326419 86 nips-2011-Empirical models of spiking in neural populations
18 0.56316614 13 nips-2011-A blind sparse deconvolution method for neural spike identification
19 0.56250298 55 nips-2011-Collective Graphical Models
20 0.56089789 75 nips-2011-Dynamical segmentation of single trials from population neural data