nips nips2011 nips2011-31 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Pablo M. Olmos, Luis Salamanca, Juan Fuentes, Fernando Pérez-Cruz
Abstract: We show an application of a tree structure for approximate inference in graphical models using the expectation propagation algorithm. These approximations are typically used over graphs with short-range cycles. We demonstrate that these approximations also help in sparse graphs with long-range loops, as the ones used in coding theory to approach channel capacity. For asymptotically large sparse graph, the expectation propagation algorithm together with the tree structure yields a completely disconnected approximation to the graphical model but, for for finite-length practical sparse graphs, the tree structure approximation to the code graph provides accurate estimates for the marginal of each variable. Furthermore, we propose a new method for constructing the tree structure on the fly that might be more amenable for sparse graphs with general factors. 1
Reference: text
sentIndex sentText sentNum sentScore
1 1 Introduction Belief propagation (BP) has become the standard procedure to decode channel codes, since in 1996 MacKay [7] proposed BP to decode codes based on low-density parity-check (LDPC) matrices with linear complexity. [sent-13, score-0.282]
2 A rate r = k/n LDPC code can be represented as a sparse factor graph with n variable nodes (typically depicted on the left side) and n − k factor nodes (on the right side), in which the number of edges is linear in n [15]. [sent-14, score-0.424]
3 The first LDPC codes [6] presented a regular structure, in which all variables and factors had, respectively, and r connections, i. [sent-15, score-0.228]
4 But the analysis of their limiting decoding performance, when n tends to infinity for a fixed rate, showed that they do not approach the channel capacity [15]. [sent-18, score-0.306]
5 The left (right) degree of an edge is the degree of the variable (factor) node it is connected to. [sent-20, score-0.353]
6 Although optimized irregular LDPC codes can achieve the channel capacity with a decoder based on BP [15], they present several drawbacks. [sent-22, score-0.561]
7 First, the error floor in those codes increases significantly, because capacity achieving LDPC ensembles with BP decoding have a large fraction of variables 1 with two connections and they present low minimum distances. [sent-23, score-0.376]
8 These problems limit the BP decoding performance of capacity approaching codes, when we work with finite length codes used in real applications. [sent-25, score-0.32]
9 But it is a common belief that BP is sufficiently accurate to decode LDPC codes and other approximate inference algorithms would not outperform BP decoding significantly, if at all. [sent-29, score-0.354]
10 In this paper, we challenge that belief and show that more accurate approximate inference algorithms for graphical models can also improve the BP decoding performance for LDPC codes, which are sparse graphical models with long-range loops. [sent-30, score-0.269]
11 The EP with a tree-structured approximation can be presented in a similar way as the BP decoder for an LDPC code over the BEC [11], with similar run-time complexity. [sent-33, score-0.358]
12 We show that a decoder based on EP with a tree-structured approximation converges to the BP solution for the asymptotic limit n → ∞, for finite-length graphs the performance is otherwise improved significantly [13, 11]. [sent-34, score-0.39]
13 Therefore, it makes the expectation propagation with a tree-structured approximation (for short we refer to this algorithm as tree-structured EP or TEP) a more practical decoding algorithm for finite length LDPC codes. [sent-36, score-0.263]
14 Besides, the analysis of the application of the tree-structured EP to channel decoding over the BEC leads to another way of fixing the approximating tree structure different from the one proposed in [9] for dense codes with positive correlation potentials. [sent-37, score-0.398]
15 In channel coding, the factors of the graph are parity-checks and the correlations are high but can change from positive to negative by the flip of a single variable. [sent-38, score-0.188]
16 In Section 2, we present the peeling decoder, which is the interpretation of the BP algorithm for LDPC codes over the BEC, and how it can be extended to incorporate the tree-structured EP decoding procedure. [sent-42, score-0.313]
17 In Section 3, we analyze the TEP decoder performance for LDPC codes in both the asymptotic and the finite-length regimen. [sent-43, score-0.393]
18 We provide an estimation of the TEP decoder error rate for a given LDPC ensemble. [sent-44, score-0.293]
19 The BP under this interpretation is referred to as the peeling decoder (PD) [3, 15] and it is easily described using the factor graph of the code. [sent-48, score-0.426]
20 The first step is to initialize the graph by removing all the variable nodes corresponding to non-erased bits. [sent-49, score-0.182]
21 When removing a one-valued nonerased variable node, the parity of the factors it was connected to are flipped. [sent-50, score-0.22]
22 2 stage, the algorithm proceeds over the resulting graph by removing a factor and a variable node in each step: 1. [sent-53, score-0.218]
23 It looks for any factor linked to a single variable (a check node of degree one). [sent-54, score-0.304]
24 The peeling decoder copies the parity of this factor into the variable node and removes the factor. [sent-55, score-0.543]
25 If the variable was assigned a one, it changes the parity of the factors it was connected to. [sent-58, score-0.192]
26 It repeats Steps 1 and 2 until all the variable nodes have been removed, successful decoding, or until there are no degree-one factors left, unsuccessful decoding. [sent-60, score-0.16]
27 The first and last bits have not been erased and when we remove them from the graph, the second factor is singled connected to the third variable, which can be now de-erased (Figure 1(b)). [sent-62, score-0.159]
28 Finally, the first factor is singled connected to the second variable, decoding the transmitted codeword (Figure 1(c)). [sent-63, score-0.305]
29 |V3 ) V3 p(Y4 = 1|V4 ) V4 P2 1 V4 p(Y4 = 1|V4 ) (b) (c) Figure 1: Example of the PD algorithm for LDPC channel decoding in the erasure channel. [sent-70, score-0.373]
30 This result can be used to optimize the DD to build irregular LDPC codes that, as n tends to infinity, approach the channel capacity. [sent-72, score-0.27]
31 However, as already discussed, these codes present higher error floors, because they present many variables with only two edges, and they usually present poor finite-length performance due to the slow convergence to the asymptotic limit [15]. [sent-73, score-0.155]
32 1 The TEP decoder The tree-structured EP overlaps a tree over the variables on the graph to further impose pairwise marginal constraints. [sent-75, score-0.425]
33 Any factor of degree two in the remaining graph either tells us that the connected variables are equal (if the parity check is zero), or opposite (if the parity check is one). [sent-80, score-0.562]
34 The proposed algorithm actually replaces one variable by the other and iterates until a factor of degree one is created and more variables can be de-erased. [sent-82, score-0.226]
35 The TEP decoder can be explained in a similar fashion as the PD decoder, in which instead of looking for degree-one factors, we look for degree one and two. [sent-84, score-0.395]
36 We initialize the TEP decoder, as the PD, by removing all known variable nodes and updating the parity checks for the variables that are one. [sent-85, score-0.216]
37 If a factor of degree one is found, the TEP recovers the associated variable, performing the Steps 1 and 2 of the PD previously described. [sent-89, score-0.164]
38 If a factor of degree two is found, the decoder removes it from the graph together with one of the variable nodes connected to it and the two associated edges. [sent-91, score-0.664]
39 Then, it reconnects 3 to the remaining variable node all the factors that were connected to the removed variable node. [sent-92, score-0.23]
40 The parities of the factors re-connected to the remaining variable node are reversed if the removed factor had parity one. [sent-93, score-0.293]
41 Steps 1-3 are repeated until all the variable nodes have been removed, successful decoding, or the graph runs out of factors of degree one or two, unsuccessful decoding. [sent-95, score-0.341]
42 The process of removing a factor of degree two is sketched in Figure 2. [sent-96, score-0.192]
43 Finally, the factor P1 and the variable V2 can be removed (Figure 2(c)), because they have no further implication in the decoding process. [sent-98, score-0.297]
44 The TEP removes a factor and a variable node per iteration, as the PD does. [sent-100, score-0.154]
45 The removal of a factor and a variable does not increase the complexity of the TEP decoder compared to the BP algorithm. [sent-101, score-0.363]
46 P1 V1 P1 P2 V2 V2 P2 P3 P2 V1 V1 P3 P3 (a) (b) (c) Figure 2: In (a) we show two variable nodes, V1 and V2 , that share a factor of degree two P1 . [sent-103, score-0.225]
47 By removing factors of degree two, we eventually create factors of degree one, whenever we find an scenario equivalent to the one depicted in Figure 3. [sent-107, score-0.361]
48 Consider two variable nodes connected to a factor of degree two that also share another factor with degree three, as illustrated in Figure 3(a). [sent-108, score-0.481]
49 When we remove the factor P3 and the variable node V2 , the factor P4 is now degree one, as illustrated in Figure 3(b). [sent-109, score-0.305]
50 At the beginning of the decoding algorithm, it is unlikely that the two variable nodes in a factor of degree two also share a factor of degree three. [sent-110, score-0.62]
51 Note that, when we remove a factor of degree two connected to variables V1 and V2 , in terms of the EP algorithm, we are including a pairwise factor between both variables. [sent-112, score-0.29]
52 Therefore, the TEP equivalent tree structure is not fixed a priori and we construct it along the decoding process. [sent-113, score-0.217]
53 Also, the steps of the TEP decoder can be presented as a linear combination of the columns of the parity-check matrix of the code and hence its solution is independent of the processing order. [sent-114, score-0.362]
54 P1 V1 P2 P1 V2 P3 P2 V1 V3 V3 P4 P4 (a) (b) Figure 3: In (a), the variables V1 and V2 are connected to a degree two factor, P3 , and they also share a factor of degree three, P4 . [sent-115, score-0.368]
55 3 TEP analysis: expected graph evolution We now sketch the proof of why the TEP decoder outperforms BP. [sent-117, score-0.408]
56 Both the PD and the TEP decoder sequentially reduces 4 the LDPC graph by removing check nodes of degree one or two. [sent-119, score-0.6]
57 As a consequence, the decoding process yields a sequence of residual graphs and their associated DD. [sent-120, score-0.266]
58 In [3, 4], the sequence of residual graphs follows a typical path or expected evolution [15]. [sent-122, score-0.153]
59 For the PD, we have an analytical form for the evolution of the number of degree one factor as the decoding progresses, r1 (τ, ), as a function of the decoding time, τ , and the erasure rate, . [sent-124, score-0.698]
60 In [1, 15], the authors show that particular decoding realizations are Gaussian distributed around r1 (τ, ), with a variance of order αBP /n, where αBP can be computed from the LDPC DD. [sent-126, score-0.179]
61 For the TEP decoder the analysis follows a similar path, but its derivation is more involved. [sent-128, score-0.278]
62 For arbitrarily large codes, the expected graph evolution during the TEP decoding is computed in [12], with a set of non-linear differential equations. [sent-129, score-0.339]
63 They track down the expected progression of the fraction of edges with left degree i, li (τ ) for i = 1, . [sent-130, score-0.171]
64 , lmax , and right degree j, rj (τ ) for j = 1, . [sent-133, score-0.151]
65 , rmax as the TEP decoder performs, where τ is a normalized time: if u is the TEP iteration index and E is the total number of edges in the original graph, then τ = u/E. [sent-136, score-0.321]
66 By Wormald’s theorem [21], any real decoding realization does not differ from the solution of such equations in a factor larger than O(E −1/6 ). [sent-137, score-0.266]
67 Let us illustrate the accuracy of the model derived to analyze the TEP decoder properties. [sent-140, score-0.278]
68 415, we compare the solution of the system of differential equations for R1 (τ ) = r1 (τ )E and R2 (τ ) = r2 (τ )E, depicted by thick solid lines, with 30 simulated decoding trajectories, depicted by thin dashed lines. [sent-142, score-0.351]
69 All curves are plotted with respect the evolution of the normalized size of the graph at each time instant, denoted by e(τ ) so that the decoding process starts on the right e(τ = 0) ≈ 0. [sent-145, score-0.327]
70 In [12], we compute the probability that two variable nodes that share a check node of degree-two also share another check node (scenario S). [sent-149, score-0.34]
71 As the TEP decoder progresses, lavg (τ ) increases, because the remaining variables in the graph inherits the connections of the variables that have been removed, and e(τ ) decreases, therefore creating new factors of degree one and improving the BP/PD performance. [sent-151, score-0.646]
72 The solution of the TEP decoder differential equations does not satisfy this property. [sent-154, score-0.348]
73 For instance, in Figure 5 (a), we plot the expected evolution of r1 (τ ) and r2 (τ ) for n → ∞ and the (3, 6) regular LDPC ensemble when 5 we are just above the BP threshold for this code, which is BP ≈ 0. [sent-155, score-0.193]
74 Unlike Figure 4(a), r1 (τ ) and r2 (τ ) go to zero before e(τ ) cancels: the TEP decoder gets stuck before completing the decoding process. [sent-157, score-0.457]
75 5 (b), we include the computed evolution for lavg (τ ). [sent-159, score-0.146]
76 As shown, the fraction of degree two check nodes vanishes before lavg (τ ) becomes infinite. [sent-160, score-0.339]
77 45 Residual graph normalized size e(τ ) (a) (b) Figure 4: In (a), for a regular (3, 6) code with n = 217 and = 0. [sent-197, score-0.214]
78 415, we compare the solution of the system of differential equations for R1 (τ ) = r1 (τ )E ( ) and R2 (τ ) = r2 (τ )E ( ) (thick solid lines) with 30 simulated decoding trajectories (thin dashed lines). [sent-198, score-0.317]
79 5 Residual graph normalized size e(t) Residual graph normalized size e(τ ) (a) (b) Figure 5: For the regular (3, 6) ensemble and BP ≈ 0. [sent-221, score-0.271]
80 1 Analysis in the finite-length regime In the finite-length regime, the TEP decoder emerges as a powerful decoding algorithm. [sent-225, score-0.457]
81 We illustrate the TEP decoder performance for some regular and irregular finite-length LDPC codes. [sent-229, score-0.419]
82 This ensemble has no asymptotic error floor [15] and we plot the word error rate obtained with the TEP and the BP decoders with different code lengths in Figure 6(a). [sent-231, score-0.204]
83 5 Channel erasure probability (a) (b) Figure 6: TEP (solid line) and BP (dashed line) decoding performance for a regular LDPC (3,6) code in (a), and the irregular LDPC in (3) and (4) in (b), with code lengths n = 29 (◦), n = 210 ( ), n = 211 (×) and 212 ( ). [sent-245, score-0.589]
84 By using the regular (3, 6) code as an example, in Figure 5(a), we plot the solution for r1 (τ ) in the case n → ∞. [sent-247, score-0.171]
85 Let τ ∗ be the time at which the decoder gets stuck, i. [sent-248, score-0.278]
86 In Figure 7, we plot the solution for the evolution of r1 (τ, n, BP ) with respect to e(t) for a (3, 6) regular code at = BP = TEP . [sent-251, score-0.237]
87 The idea to estimate the TEP decoder performance at = BP + ∆ is to assume that any particular realization will succeed almost surely as long as the fraction of degree one check nodes at τ ∗ is positive. [sent-257, score-0.537]
88 (7) In [1, 15], it is shown that simulated trajectories for the evolution of degree one check nodes under BP are asymptotically Gaussian distributed and this is observed for the TEP decoder as well. [sent-259, score-0.594]
89 Furthermore, the variance is of order δ(τ )/n, where δ(τ ) depends on the ensemble and the decoder [1]. [sent-260, score-0.318]
90 To estimate the TEP decoder error rate, we compute the probability that the fraction of degree one check nodes at at τ ∗ is positive. [sent-261, score-0.537]
91 Besides, we have empirically observed that the variance of trajectories under BP and TEP decoding are quite similar so, for simplicity, we set δ(τ ∗ ) in (8) equal to δ(τ ∗ )BP , whose analytic solution can be found in [16, 1]. [sent-264, score-0.218]
92 Hence, we consider the TEP decoder expected evolution to estimate the parameter γTEP in (8). [sent-265, score-0.344]
93 48 Channel erasure probability (a) (b) Figure 7: In (a), we plot the solution for r1 (τ ) with respect to e(t) for a (3, 6) regular code at = BP = TEP . [sent-285, score-0.281]
94 As a consequence, the decoding error rates are clearly improved. [sent-293, score-0.179]
95 Additionally, the application of LDPC decoding showed us a different way of learning the tree structure that might be amenable for general factors. [sent-295, score-0.242]
96 Near Shannon limit performance of low density parity check codes. [sent-327, score-0.151]
97 Tree-structure expectation prope e agation for decoding LDPC codes over binary erasure channels. [sent-347, score-0.41]
98 Tree-structure expectation propagation for LDPC e decoding in erasure channels. [sent-355, score-0.358]
99 Tree-structured expectation propagation for decode ing finite-length ldpc codes. [sent-363, score-0.487]
100 Analytical solution of covariance evolution for irregular LDPC codes. [sent-380, score-0.159]
wordName wordTfidf (topN-words)
[('tep', 0.681), ('ldpc', 0.39), ('bp', 0.318), ('decoder', 0.278), ('decoding', 0.179), ('degree', 0.117), ('erasure', 0.11), ('ep', 0.098), ('codes', 0.097), ('pd', 0.086), ('channel', 0.084), ('lavg', 0.08), ('irregular', 0.074), ('parity', 0.074), ('regular', 0.067), ('evolution', 0.066), ('code', 0.065), ('graph', 0.064), ('check', 0.061), ('bec', 0.057), ('nodes', 0.052), ('factor', 0.047), ('olmos', 0.046), ('propagation', 0.045), ('graphs', 0.044), ('residual', 0.043), ('node', 0.041), ('ensemble', 0.04), ('factors', 0.04), ('connected', 0.04), ('variable', 0.038), ('tree', 0.038), ('peeling', 0.037), ('dd', 0.036), ('erased', 0.034), ('lmax', 0.034), ('removed', 0.033), ('differential', 0.03), ('unsuccessful', 0.03), ('fraction', 0.029), ('lengths', 0.029), ('solid', 0.029), ('removing', 0.028), ('decode', 0.028), ('removes', 0.028), ('capacity', 0.028), ('lines', 0.026), ('edges', 0.025), ('amenable', 0.025), ('expectation', 0.024), ('variables', 0.024), ('share', 0.023), ('eldpc', 0.023), ('heirs', 0.023), ('luby', 0.023), ('ravg', 0.023), ('sevilla', 0.023), ('shokrollahi', 0.023), ('singled', 0.023), ('pw', 0.022), ('fernando', 0.022), ('communications', 0.022), ('marginal', 0.021), ('oor', 0.021), ('equations', 0.021), ('plot', 0.02), ('trajectories', 0.02), ('graphical', 0.02), ('amin', 0.02), ('mitzenmacher', 0.02), ('pablo', 0.02), ('parities', 0.02), ('wormald', 0.02), ('connections', 0.019), ('solution', 0.019), ('depicted', 0.019), ('dashed', 0.019), ('normalized', 0.018), ('volker', 0.018), ('michael', 0.018), ('asymptotic', 0.018), ('mackay', 0.017), ('madrid', 0.017), ('word', 0.017), ('belief', 0.017), ('inference', 0.017), ('transmitted', 0.016), ('juan', 0.016), ('spielman', 0.016), ('accurate', 0.016), ('thick', 0.016), ('limit', 0.016), ('transactions', 0.015), ('tends', 0.015), ('rate', 0.015), ('progresses', 0.015), ('spain', 0.015), ('remove', 0.015), ('approximation', 0.015), ('loops', 0.014)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999982 31 nips-2011-An Application of Tree-Structured Expectation Propagation for Channel Decoding
Author: Pablo M. Olmos, Luis Salamanca, Juan Fuentes, Fernando Pérez-Cruz
Abstract: We show an application of a tree structure for approximate inference in graphical models using the expectation propagation algorithm. These approximations are typically used over graphs with short-range cycles. We demonstrate that these approximations also help in sparse graphs with long-range loops, as the ones used in coding theory to approach channel capacity. For asymptotically large sparse graph, the expectation propagation algorithm together with the tree structure yields a completely disconnected approximation to the graphical model but, for for finite-length practical sparse graphs, the tree structure approximation to the code graph provides accurate estimates for the marginal of each variable. Furthermore, we propose a new method for constructing the tree structure on the fly that might be more amenable for sparse graphs with general factors. 1
2 0.12372596 158 nips-2011-Learning unbelievable probabilities
Author: Xaq Pitkow, Yashar Ahmadian, Ken D. Miller
Abstract: Loopy belief propagation performs approximate inference on graphical models with loops. One might hope to compensate for the approximation by adjusting model parameters. Learning algorithms for this purpose have been explored previously, and the claim has been made that every set of locally consistent marginals can arise from belief propagation run on a graphical model. On the contrary, here we show that many probability distributions have marginals that cannot be reached by belief propagation using any set of model parameters or any learning algorithm. We call such marginals ‘unbelievable.’ This problem occurs whenever the Hessian of the Bethe free energy is not positive-definite at the target marginals. All learning algorithms for belief propagation necessarily fail in these cases, producing beliefs or sets of beliefs that may even be worse than the pre-learning approximation. We then show that averaging inaccurate beliefs, each obtained from belief propagation using model parameters perturbed about some learned mean values, can achieve the unbelievable marginals. 1
3 0.077408068 2 nips-2011-A Brain-Machine Interface Operating with a Real-Time Spiking Neural Network Control Algorithm
Author: Julie Dethier, Paul Nuyujukian, Chris Eliasmith, Terrence C. Stewart, Shauki A. Elasaad, Krishna V. Shenoy, Kwabena A. Boahen
Abstract: Motor prostheses aim to restore function to disabled patients. Despite compelling proof of concept systems, barriers to clinical translation remain. One challenge is to develop a low-power, fully-implantable system that dissipates only minimal power so as not to damage tissue. To this end, we implemented a Kalman-filter based decoder via a spiking neural network (SNN) and tested it in brain-machine interface (BMI) experiments with a rhesus monkey. The Kalman filter was trained to predict the arm’s velocity and mapped on to the SNN using the Neural Engineering Framework (NEF). A 2,000-neuron embedded Matlab SNN implementation runs in real-time and its closed-loop performance is quite comparable to that of the standard Kalman filter. The success of this closed-loop decoder holds promise for hardware SNN implementations of statistical signal processing algorithms on neuromorphic chips, which may offer power savings necessary to overcome a major obstacle to the successful clinical translation of neural motor prostheses. ∗ Present: Research Fellow F.R.S.-FNRS, Systmod Unit, University of Liege, Belgium. 1 1 Cortically-controlled motor prostheses: the challenge Motor prostheses aim to restore function for severely disabled patients by translating neural signals from the brain into useful control signals for prosthetic limbs or computer cursors. Several proof of concept demonstrations have shown encouraging results, but barriers to clinical translation still remain. One example is the development of a fully-implantable system that meets power dissipation constraints, but is still powerful enough to perform complex operations. A recently reported closedloop cortically-controlled motor prosthesis is capable of producing quick, accurate, and robust computer cursor movements by decoding neural signals (threshold-crossings) from a 96-electrode array in rhesus macaque premotor/motor cortex [1]-[4]. This, and previous designs (e.g., [5]), employ versions of the Kalman filter, ubiquitous in statistical signal processing. Such a filter and its variants are the state-of-the-art decoder for brain-machine interfaces (BMIs) in humans [5] and monkeys [2]. While these recent advances are encouraging, clinical translation of such BMIs requires fullyimplanted systems, which in turn impose severe power dissipation constraints. Even though it is an open, actively-debated question as to how much of the neural prosthetic system must be implanted, we note that there are no reports to date demonstrating a fully implantable 100-channel wireless transmission system, motivating performing decoding within the implanted chip. This computation is constrained by a stringent power budget: A 6 × 6mm2 implant must dissipate less than 10mW to avoid heating the brain by more than 1◦ C [6], which is believed to be important for long term cell health. With this power budget, current approaches can not scale to higher electrode densities or to substantially more computer-intensive decode/control algorithms. The feasibility of mapping a Kalman-filter based decoder algorithm [1]-[4] on to a spiking neural network (SNN) has been explored off-line (open-loop). In these off-line tests, the SNN’s performance virtually matched that of the standard implementation [7]. These simulations provide confidence that this algorithm—and others similar to it—could be implemented using an ultra-low-power approach potentially capable of meeting the severe power constraints set by clinical translation. This neuromorphic approach uses very-large-scale integrated systems containing microelectronic analog circuits to morph neural systems into silicon chips [8, 9]. These neuromorphic circuits may yield tremendous power savings—50nW per silicon neuron [10]—over digital circuits because they use physical operations to perform mathematical computations (analog approach). When implemented on a chip designed using the neuromorphic approach, a 2,000-neuron SNN network can consume as little as 100µW. Demonstrating this approach’s feasibility in a closed-loop system running in real-time is a key, non-incremental step in the development of a fully implantable decoding chip, and is necessary before proceeding with fabricating and implanting the chip. As noise, delay, and over-fitting play a more important role in the closed-loop setting, it is not obvious that the SNN’s stellar open-loop performance will hold up. In addition, performance criteria are different in the closed-loop and openloop settings (e.g., time per target vs. root mean squared error). Therefore, a SNN of a different size may be required to meet the desired specifications. Here we present results and assess the performance and viability of the SNN Kalman-filter based decoder in real-time, closed-loop tests, with the monkey performing a center-out-and-back target acquisition task. To achieve closed-loop operation, we developed an embedded Matlab implementation that ran a 2,000-neuron version of the SNN in real-time on a PC. We achieved almost a 50-fold speed-up by performing part of the computation in a lower-dimensional space defined by the formal method we used to map the Kalman filter on to the SNN. This shortcut allowed us to run a larger SNN in real-time than would otherwise be possible. 2 Spiking neural network mapping of control theory algorithms As reported in [11], a formal methodology, called the Neural Engineering Framework (NEF), has been developed to map control-theory algorithms onto a computational fabric consisting of a highly heterogeneous population of spiking neurons simply by programming the strengths of their connections. These artificial neurons are characterized by a nonlinear multi-dimensional-vector-to-spikerate function—ai (x(t)) for the ith neuron—with parameters (preferred direction, maximum firing rate, and spiking-threshold) drawn randomly from a wide distribution (standard deviation ≈ mean). 2 Spike rate (spikes/s) Representation ˆ x → ai (x) → x = ∑i ai (x)φix ˜ ai (x) = G(αi φix · x + Jibias ) 400 Transformation y = Ax → b j (Aˆ ) x Aˆ = ∑i ai (x)Aφix x x(t) B' y(t) A' 200 0 −1 Dynamics ˙ x = Ax → x = h ∗ A x A = τA + I 0 Stimulus x 1 bk(t) y(t) B' h(t) x(t) A' aj(t) Figure 1: NEF’s three principles. Representation. 1D tuning curves of a population of 50 leaky integrate-and-fire neurons. The neurons’ tuning curves map control variables (x) to spike rates (ai (x)); this nonlinear transformation is inverted by linear weighted decoding. G() is the neurons’ nonlinear current-to-spike-rate function. Transformation. SNN with populations bk (t) and a j (t) representing y(t) and x(t). Feedforward and recurrent weights are determined by B and A , as described next. Dynamics. The system’s dynamics is captured in a neurally plausible fashion by replacing integration with the synapses’ spike response, h(t), and replacing the matrices with A = τA + I and B = τB to compensate. The neural engineering approach to configuring SNNs to perform arbitrary computations is underlined by three principles (Figure 1) [11]-[14]: Representation is defined by nonlinear encoding of x(t) as a spike rate, ai (x(t))—represented by the neuron tuning curve—combined with optimal weighted linear decoding of ai (x(t)) to recover ˆ an estimate of x(t), x(t) = ∑i ai (x(t))φix , where φix are the decoding weights. Transformation is performed by using alternate decoding weights in the decoding operation to map transformations of x(t) directly into transformations of ai (x(t)). For example, y(t) = Ax(t) is represented by the spike rates b j (Aˆ (t)), where unit j’s input is computed directly from unit i’s x output using Aˆ (t) = ∑i ai (x(t))Aφix , an alternative linear weighting. x Dynamics brings the first two principles together and adds the time dimension to the circuit. This principle aims at reuniting the control-theory and neural levels by modifying the matrices to render the system neurally plausible, thereby permitting the synapses’ spike response, h(t), (i.e., impulse ˙ response) to capture the system’s dynamics. For example, for h(t) = τ −1 e−t/τ , x = Ax(t) is realized by replacing A with A = τA + I. This so-called neurally plausible matrix yields an equivalent dynamical system: x(t) = h(t) ∗ A x(t), where convolution replaces integration. The nonlinear encoding process—from a multi-dimensional stimulus, x(t), to a one-dimensional soma current, Ji (x(t)), to a firing rate, ai (x(t))—is specified as: ai (x(t)) = G(Ji (x(t))). (1) Here G is the neurons’ nonlinear current-to-spike-rate function, which is given by G(Ji (x)) = τ ref − τ RC ln (1 − Jth /Ji (x)) −1 , (2) for the leaky integrate-and-fire model (LIF). The LIF neuron has two behavioral regimes: subthreshold and super-threshold. The sub-threshold regime is described by an RC circuit with time constant τ RC . When the sub-threshold soma voltage reaches the threshold, Vth , the neuron emits a spike δ (t −tn ). After this spike, the neuron is reset and rests for τ ref seconds (absolute refractory period) before it resumes integrating. Jth = Vth /R is the minimum input current that produces spiking. Ignoring the soma’s RC time-constant when specifying the SNN’s dynamics are reasonable because the neurons cross threshold at a rate that is proportional to their input current, which thus sets the spike rate instantaneously, without any filtering [11]. The conversion from a multi-dimensional stimulus, x(t), to a one-dimensional soma current, Ji , is ˜ performed by assigning to the neuron a preferred direction, φix , in the stimulus space and taking the dot-product: ˜ Ji (x(t)) = αi φix · x(t) + Jibias , (3) 3 where αi is a gain or conversion factor, and Jibias is a bias current that accounts for background ˜ activity. For a 1D space, φix is either +1 or −1 (drawn randomly), for ON and OFF neurons, respectively. The resulting tuning curves are illustrated in Figure 1, left. The linear decoding process is characterized by the synapses’ spike response, h(t) (i.e., post-synaptic currents), and the decoding weights, φix , which are obtained by minimizing the mean square error. A single noise term, η, takes into account all sources of noise, which have the effect of introducing uncertainty into the decoding process. Hence, the transmitted firing rate can be written as ai (x(t)) + ηi , where ai (x(t)) represents the noiseless set of tuning curves and ηi is a random variable picked from a zero-mean Gaussian distribution with variance σ 2 . Consequently, the mean square error can be written as [11]: E = 1 ˆ [x(t) − x(t)]2 2 x,η,t = 2 1 2 x(t) − ∑ (ai (x(t)) + ηi ) φix i (4) x,η,t where · x,η denotes integration over the range of x and η, the expected noise. We assume that the noise is independent and has the same variance for each neuron [11], which yields: E= where σ2 1 2 2 x(t) − ∑ ai (x(t))φix i x,t 1 + σ 2 ∑(φix )2 , 2 i (5) is the noise variance ηi η j . This expression is minimized by: N φix = ∑ Γ−1 ϒ j , ij (6) j with Γi j = ai (x)a j (x) x + σ 2 δi j , where δ is the Kronecker delta function matrix, and ϒ j = xa j (x) x [11]. One consequence of modeling noise in the neural representation is that the matrix Γ is invertible despite the use of a highly overcomplete representation. In a noiseless representation, Γ is generally singular because, due to the large number of neurons, there is a high probability of having two neurons with similar tuning curves leading to two similar rows in Γ. 3 Kalman-filter based cortical decoder In the 1960’s, Kalman described a method that uses linear filtering to track the state of a dynamical system throughout time using a model of the dynamics of the system as well as noisy measurements [15]. The model dynamics gives an estimate of the state of the system at the next time step. This estimate is then corrected using the observations (i.e., measurements) at this time step. The relative weights for these two pieces of information are given by the Kalman gain, K [15, 16]. Whereas the Kalman gain is updated at each iteration, the state and observation matrices (defined below)—and corresponding noise matrices—are supposed constant. In the case of prosthetic applications, the system’s state vector is the cursor’s kinematics, xt = y [veltx , velt , 1], where the constant 1 allows for a fixed offset compensation. The measurement vector, yt , is the neural spike rate (spike counts in each time step) of 192 channels of neural threshold crossings. The system’s dynamics is modeled by: xt yt = Axt−1 + wt , = Cxt + qt , (7) (8) where A is the state matrix, C is the observation matrix, and wt and qt are additive, Gaussian noise sources with wt ∼ N (0, W) and qt ∼ N (0, Q). The model parameters (A, C, W and Q) are fit with training data by correlating the observed hand kinematics with the simultaneously measured neural signals (Figure 2). For an efficient decoding, we derived the steady-state update equation by replacing the adaptive Kalman gain by its steady-state formulation: K = (I + WCQ−1 C)−1 W CT Q−1 . This yields the following estimate of the system’s state: xt = (I − KC)Axt−1 + Kyt = MDT xt−1 + MDT yt , x y 4 (9) a Velocity (cm/s) Neuron 10 c 150 5 100 b 50 20 0 −20 0 0 x−velocity y−velocity 2000 4000 6000 8000 Time (ms) 10000 12000 1cm 14000 Trials: 0034-0049 Figure 2: Neural and kinematic measurements (monkey J, 2011-04-16, 16 continuous trials) used to fit the standard Kalman filter model. a. The 192 cortical recordings fed as input to fit the Kalman filter’s matrices (color code refers to the number of threshold crossings observed in each 50ms bin). b. Hand x- and y-velocity measurements correlated with the neural data to obtain the Kalman filter’s matrices. c. Cursor kinematics of 16 continuous trials under direct hand control. where MDT = (I − KC)A and MDT = K are the discrete time (DT) Kalman matrices. The steadyx y state formulation improves efficiency with little loss in accuracy because the optimal Kalman gain rapidly converges (typically less than 100 iterations). Indeed, in neural applications under both open-loop and closed-loop conditions, the difference between the full Kalman filter and its steadystate implementation falls to within 1% in a few seconds [17]. This simplifying assumption reduces the execution time for decoding a typical neuronal firing rate signal approximately seven-fold [17], a critical speed-up for real-time applications. 4 Kalman filter with a spiking neural network To implement the Kalman filter with a SNN by applying the NEF, we first convert Equation 9 from DT to continuous time (CT), and then replace the CT matrices with neurally plausible ones, which yields: x(t) = h(t) ∗ A x(t) + B y(t) , (10) where A = τMCT + I, B = τMCT , with MCT = MDT − I /∆t and MCT = MDT /∆t, the CT x y x x y y Kalman matrices, and ∆t = 50ms, the discrete time step; τ is the synaptic time-constant. The jth neuron’s input current (see Equation 3) is computed from the system’s current state, x(t), which is computed from estimates of the system’s previous state (ˆ (t) = ∑i ai (t)φix ) and current x y input (ˆ (t) = ∑k bk (t)φk ) using Equation 10. This yields: y ˜x J j (x(t)) = α j φ j · x(t) + J bias j ˜x ˆ ˆ = α j φ j · h(t) ∗ A x(t) + B y(t) ˜x = α j φ j · h(t) ∗ A + J bias j ∑ ai (t)φix + B ∑ bk (t)φky i + J bias j (11) k This last equation can be written in a neural network form: J j (x(t)) = h(t) ∗ ∑ ω ji ai (t) + ∑ ω jk bk (t) i + J bias j (12) k y ˜x ˜x where ω ji = α j φ j A φix and ω jk = α j φ j B φk are the recurrent and feedforward weights, respectively. 5 Efficient implementation of the SNN In this section, we describe the two distinct steps carried out when implementing the SNN: creating and running the network. The first step has no computational constraints whereas the second must be very efficient in order to be successfully deployed in the closed-loop experimental setting. 5 x ( 1000 x ( = 1000 1000 = 1000 x 1000 b 1000 x 1000 1000 a Figure 3: Computing a 1000-neuron pool’s recurrent connections. a. Using connection weights requires multiplying a 1000×1000 matrix by a 1000 ×1 vector. b. Operating in the lower-dimensional state space requires multiplying a 1 × 1000 vector by a 1000 × 1 vector to get the decoded state, multiplying this state by a component of the A matrix to update it, and multiplying the updated state by a 1000 × 1 vector to re-encode it as firing rates, which are then used to update the soma current for every neuron. Network creation: This step generates, for a specified number of neurons composing the network, x ˜x the gain α j , bias current J bias , preferred direction φ j , and decoding weight φ j for each neuron. The j ˜x preferred directions φ j are drawn randomly from a uniform distribution over the unit sphere. The maximum firing rate, max G(J j (x)), and the normalized x-axis intercept, G(J j (x)) = 0, are drawn randomly from a uniform distribution on [200, 400] Hz and [-1, 1], respectively. From these two specifications, α j and J bias are computed using Equation 2 and Equation 3. The decoding weights j x φ j are computed by minimizing the mean square error (Equation 6). For efficient implementation, we used two 1D integrators (i.e., two recurrent neuron pools, with each pool representing a scalar) rather than a single 3D integrator (i.e., one recurrent neuron pool, with the pool representing a 3D vector by itself) [13]. The constant 1 is fed to the 1D integrators as an input, rather than continuously integrated as part of the state vector. We also replaced the bk (t) units’ spike rates (Figure 1, middle) with the 192 neural measurements (spike counts in 50ms bins), y which is equivalent to choosing φk from a standard basis (i.e., a unit vector with 1 at the kth position and 0 everywhere else) [7]. Network simulation: This step runs the simulation to update the soma current for every neuron, based on input spikes. The soma voltage is then updated following RC circuit dynamics. Gaussian noise is normally added at this step, the rest of the simulation being noiseless. Neurons with soma voltage above threshold generate a spike and enter their refractory period. The neuron firing rates are decoded using the linear decoding weights to get the updated states values, x and y-velocity. These values are smoothed with a filter identical to h(t), but with τ set to 5ms instead of 20ms to avoid introducing significant delay. Then the simulation step starts over again. In order to ensure rapid execution of the simulation step, neuron interactions are not updated dix rectly using the connection matrix (Equation 12), but rather indirectly with the decoding matrix φ j , ˜x dynamics matrix A , and preferred direction matrix φ j (Equation 11). To see why this is more efficient, suppose we have 1000 neurons in the a population for each of the state vector’s two scalars. Computing the recurrent connections using connection weights requires multiplying a 1000 × 1000 matrix by a 1000-dimensional vector (Figure 3a). This requires 106 multiplications and about 106 sums. Decoding each scalar (i.e., ∑i ai (t)φix ), however, requires only 1000 multiplications and 1000 sums. The decoded state vector is then updated by multiplying it by the (diagonal) A matrix, another 2 products and 1 sum. The updated state vector is then encoded by multiplying it with the neurons’ preferred direction vectors, another 1000 multiplications per scalar (Figure 3b). The resulting total of about 3000 operations is nearly three orders of magnitude fewer than using the connection weights to compute the identical transformation. To measure the speedup, we simulated a 2,000-neuron network on a computer running Matlab 2011a (Intel Core i7, 2.7-GHz, Mac OS X Lion). Although the exact run-times depend on the computing hardware and software, the run-time reduction factor should remain approximately constant across platforms. For each reported result, we ran the simulation 10 times to obtain a reliable estimate of the execution time. The run-time for neuron interactions using the recurrent connection weights was 9.9ms and dropped to 2.7µs in the lower-dimensional space, approximately a 3,500-fold speedup. Only the recurrent interactions benefit from the speedup, the execution time for the rest of the operations remaining constant. The run-time for a 50ms network simulation using the recurrent connec6 Table 1: Model parameters Symbol max G(J j (x)) G(J j (x)) = 0 J bias j αj ˜x φj Range 200-400 Hz −1 to 1 Satisfies first two Satisfies first two ˜x φj = 1 Description Maximum firing rate Normalized x-axis intercept Bias current Gain factor Preferred-direction vector σ2 τ RC j τ ref j τ PSC j 0.1 20 ms 1 ms 20 ms Gaussian noise variance RC time constant Refractory period PSC time constant tion weights was 0.94s and dropped to 0.0198s in the lower-dimensional space, a 47-fold speedup. These results demonstrate the efficiency the lower-dimensional space offers, which made the closedloop application of SNNs possible. 6 Closed-loop implementation An adult male rhesus macaque (monkey J) was trained to perform a center-out-and-back reaching task for juice rewards to one of eight targets, with a 500ms hold time (Figure 4a) [1]. All animal protocols and procedures were approved by the Stanford Institutional Animal Care and Use Committee. Hand position was measured using a Polaris optical tracking system at 60Hz (Northern Digital Inc.). Neural data were recorded from two 96-electrode silicon arrays (Blackrock Microsystems) implanted in the dorsal pre-motor and motor cortex. These recordings (-4.5 RMS threshold crossing applied to each electrode’s signal) yielded tuned activity for the direction and speed of arm movements. As detailed in [1], a standard Kalman filter model was fit by correlating the observed hand kinematics with the simultaneously measured neural signals, while the monkey moved his arm to acquire virtual targets (Figure 2). The resulting model was used in a closed-loop system to control an on-screen cursor in real-time (Figure 4a, Decoder block). A steady-state version of this model serves as the standard against which the SNN implementation’s performance is compared. We built a SNN using the NEF methodology based on derived Kalman filter parameters mentioned above. This SNN was then simulated on an xPC Target (Mathworks) x86 system (Dell T3400, Intel Core 2 Duo E8600, 3.33GHz). It ran in closed-loop, replacing the standard Kalman filter as the decoder block in Figure 4a. The parameter values listed in Table 1 were used for the SNN implementation. We ensured that the time constants τiRC ,τiref , and τiPSC were smaller than the implementation’s time step (50ms). Noise was not explicitly added. It arose naturally from the fluctuations produced by representing a scalar with filtered spike trains, which has been shown to have effects similar to Gaussian noise [11]. For the purpose of computing the linear decoding weights (i.e., Γ), we modeled the resulting noise as Gaussian with a variance of 0.1. A 2,000-neuron version of the SNN-based decoder was tested in a closed-loop system, the largest network our embedded MatLab implementation could run in real-time. There were 1206 trials total among which 301 (center-outs only) were performed with the SNN and 302 with the standard (steady-state) Kalman filter. The block structure was randomized and interleaved, so that there is no behavioral bias present in the findings. 100 trials under hand control are used as a baseline comparison. Success corresponds to a target acquisition under 1500ms, with 500ms hold time. Success rates were higher than 99% on all blocks for the SNN implementation and 100% for the standard Kalman filter. The average time to acquire the target was slightly slower for the SNN (Figure 5b)—711ms vs. 661ms, respectively—we believe this could be improved by using more neurons in the SNN.1 The average distance to target (Figure 5a) and the average velocity of the cursor (Figure 5c) are very similar. 1 Off-line, the SNN performed better as we increased the number of neurons [7]. 7 a Neural Spikes b c BMI: Kalman decoder BMI: SNN decoder Decoder Cursor Velocity 1cm 1cm Trials: 2056-2071 Trials: 1748-1763 5 0 0 400 Time after Target Onset (ms) 800 Target acquisition time histogram 40 Mean cursor velocity 50 Standard Kalman filter 40 20 Hand 30 30 Spiking Neural Network 20 10 0 c Cursor Velocity (cm/s) b Mean distance to target 10 Percent of Trials (%) a Distance to Target (cm) Figure 4: Experimental setup and results. a. Data are recorded from two 96-channel silicon electrode arrays implanted in dorsal pre-motor and motor cortex of an adult male monkey performing a centerout-and-back reach task for juice rewards to one of eight targets with a 500ms hold time. b. BMI position kinematics of 16 continuous trials for the standard Kalman filter implementation. c. BMI position kinematics of 16 continuous trials for the SNN implementation. 10 0 500 1000 Target Acquire Time (ms) 1500 0 0 200 400 600 800 Time after Target Onset (ms) 1000 Figure 5: SNN (red) performance compared to standard Kalman filter (blue) (hand trials are shown for reference (yellow)). The SNN achieves similar results—success rates are higher than 99% on all blocks—as the standard Kalman filter implementation. a. Plot of distance to target vs. time both after target onset for different control modalities. The thicker traces represent the average time when the cursor first enters the acceptance window until successfully entering for the 500ms hold time. b. Histogram of target acquisition time. c. Plot of mean cursor velocity vs. time. 7 Conclusions and future work The SNN’s performance was quite comparable to that produced by a standard Kalman filter implementation. The 2,000-neuron network had success rates higher than 99% on all blocks, with mean distance to target, target acquisition time, and mean cursor velocity curves very similar to the ones obtained with the standard implementation. Future work will explore whether these results extend to additional animals. As the Kalman filter and its variants are the state-of-the-art in cortically-controlled motor prostheses [1]-[5], these simulations provide confidence that similar levels of performance can be attained with a neuromorphic system, which can potentially overcome the power constraints set by clinical applications. Our ultimate goal is to develop an ultra-low-power neuromorphic chip for prosthetic applications on to which control theory algorithms can be mapped using the NEF. As our next step in this direction, we will begin exploring this mapping with Neurogrid, a hardware platform with sixteen programmable neuromorphic chips that can simulate up to a million spiking neurons in real-time [9]. However, bandwidth limitations prevent Neurogrid from realizing random connectivity patterns. It can only connect each neuron to thousands of others if neighboring neurons share common inputs — just as they do in the cortex. Such columnar organization may be possible with NEF-generated networks if preferred directions vectors are assigned topographically rather than randomly. Implementing this constraint effectively is a subject of ongoing research. Acknowledgment This work was supported in part by the Belgian American Education Foundation(J. Dethier), Stanford NIH Medical Scientist Training Program (MSTP) and Soros Fellowship (P. Nuyujukian), DARPA Revolutionizing Prosthetics program (N66001-06-C-8005, K. V. Shenoy), and two NIH Director’s Pioneer Awards (DP1-OD006409, K. V. Shenoy; DPI-OD000965, K. Boahen). 8 References [1] V. Gilja, Towards clinically viable neural prosthetic systems, Ph.D. Thesis, Department of Computer Science, Stanford University, 2010, pp 19–22 and pp 57–73. [2] V. Gilja, P. Nuyujukian, C.A. Chestek, J.P. Cunningham, J.M. Fan, B.M. Yu, S.I. Ryu, and K.V. Shenoy, A high-performance continuous cortically-controlled prosthesis enabled by feedback control design, 2010 Neuroscience Meeting Planner, San Diego, CA: Society for Neuroscience, 2010. [3] P. Nuyujukian, V. Gilja, C.A. Chestek, J.P. Cunningham, J.M. Fan, B.M. Yu, S.I. Ryu, and K.V. Shenoy, Generalization and robustness of a continuous cortically-controlled prosthesis enabled by feedback control design, 2010 Neuroscience Meeting Planner, San Diego, CA: Society for Neuroscience, 2010. [4] V. Gilja, C.A. Chestek, I. Diester, J.M. Henderson, K. Deisseroth, and K.V. Shenoy, Challenges and opportunities for next-generation intra-cortically based neural prostheses, IEEE Transactions on Biomedical Engineering, 2011, in press. [5] S.P. Kim, J.D. Simeral, L.R. Hochberg, J.P. Donoghue, and M.J. Black, Neural control of computer cursor velocity by decoding motor cortical spiking activity in humans with tetraplegia, Journal of Neural Engineering, vol. 5, 2008, pp 455–476. [6] S. Kim, P. Tathireddy, R.A. Normann, and F. Solzbacher, Thermal impact of an active 3-D microelectrode array implanted in the brain, IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 15, 2007, pp 493–501. [7] J. Dethier, V. Gilja, P. Nuyujukian, S.A. Elassaad, K.V. Shenoy, and K. Boahen, Spiking neural network decoder for brain-machine interfaces, IEEE Engineering in Medicine & Biology Society Conference on Neural Engineering, Cancun, Mexico, 2011, pp 396–399. [8] K. Boahen, Neuromorphic microchips, Scientific American, vol. 292(5), 2005, pp 56–63. [9] R. Silver, K. Boahen, S. Grillner, N. Kopell, and K.L. Olsen, Neurotech for neuroscience: unifying concepts, organizing principles, and emerging tools, Journal of Neuroscience, vol. 27(44), 2007, pp 11807– 11819. [10] J.V. Arthur and K. Boahen, Silicon neuron design: the dynamical systems approach, IEEE Transactions on Circuits and Systems, vol. 58(5), 2011, pp 1034-1043. [11] C. Eliasmith and C.H. Anderson, Neural engineering: computation, representation, and dynamics in neurobiological systems, MIT Press, Cambridge, MA; 2003. [12] C. Eliasmith, A unified approach to building and controlling spiking attractor networks, Neural Computation, vol. 17, 2005, pp 1276–1314. [13] R. Singh and C. Eliasmith, Higher-dimensional neurons explain the tuning and dynamics of working memory cells, The Journal of Neuroscience, vol. 26(14), 2006, pp 3667–3678. [14] C. Eliasmith, How to build a brain: from function to implementation, Synthese, vol. 159(3), 2007, pp 373–388. [15] R.E. Kalman, A new approach to linear filtering and prediction problems, Transactions of the ASME– Journal of Basic Engineering, vol. 82(Series D), 1960, pp 35–45. [16] G. Welsh and G. Bishop, An introduction to the Kalman Filter, University of North Carolina at Chapel Hill Chapel Hill NC, vol. 95(TR 95-041), 1995, pp 1–16. [17] W.Q. Malik, W. Truccolo, E.N. Brown, and L.R. Hochberg, Efficient decoding with steady-state Kalman filter in neural interface systems, IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 19(1), 2011, pp 25–34. 9
4 0.077033907 135 nips-2011-Information Rates and Optimal Decoding in Large Neural Populations
Author: Kamiar R. Rad, Liam Paninski
Abstract: Many fundamental questions in theoretical neuroscience involve optimal decoding and the computation of Shannon information rates in populations of spiking neurons. In this paper, we apply methods from the asymptotic theory of statistical inference to obtain a clearer analytical understanding of these quantities. We find that for large neural populations carrying a finite total amount of information, the full spiking population response is asymptotically as informative as a single observation from a Gaussian process whose mean and covariance can be characterized explicitly in terms of network and single neuron properties. The Gaussian form of this asymptotic sufficient statistic allows us in certain cases to perform optimal Bayesian decoding by simple linear transformations, and to obtain closed-form expressions of the Shannon information carried by the network. One technical advantage of the theory is that it may be applied easily even to non-Poisson point process network models; for example, we find that under some conditions, neural populations with strong history-dependent (non-Poisson) effects carry exactly the same information as do simpler equivalent populations of non-interacting Poisson neurons with matched firing rates. We argue that our findings help to clarify some results from the recent literature on neural decoding and neuroprosthetic design.
5 0.065528512 117 nips-2011-High-Dimensional Graphical Model Selection: Tractable Graph Families and Necessary Conditions
Author: Animashree Anandkumar, Vincent Tan, Alan S. Willsky
Abstract: We consider the problem of Ising and Gaussian graphical model selection given n i.i.d. samples from the model. We propose an efficient threshold-based algorithm for structure estimation based on conditional mutual information thresholding. This simple local algorithm requires only loworder statistics of the data and decides whether two nodes are neighbors in the unknown graph. We identify graph families for which the proposed algorithm has low sample and computational complexities. Under some transparent assumptions, we establish that the proposed algorithm is −4 structurally consistent (or sparsistent) when the number of samples scales as n = Ω(Jmin log p), where p is the number of nodes and Jmin is the minimum edge potential. We also develop novel non-asymptotic techniques for obtaining necessary conditions for graphical model selection. Keywords: Graphical model selection, high-dimensional learning, local-separation property, necessary conditions, typical sets, Fano’s inequality.
6 0.062610954 67 nips-2011-Data Skeletonization via Reeb Graphs
7 0.061237551 286 nips-2011-The Local Rademacher Complexity of Lp-Norm Multiple Kernel Learning
8 0.049685419 137 nips-2011-Iterative Learning for Reliable Crowdsourcing Systems
9 0.048003621 211 nips-2011-Penalty Decomposition Methods for Rank Minimization
10 0.046148561 234 nips-2011-Reconstructing Patterns of Information Diffusion from Incomplete Observations
11 0.046111133 170 nips-2011-Message-Passing for Approximate MAP Inference with Latent Variables
12 0.041995559 37 nips-2011-Analytical Results for the Error in Filtering of Gaussian Processes
13 0.04033101 81 nips-2011-Efficient anomaly detection using bipartite k-NN graphs
14 0.039074305 285 nips-2011-The Kernel Beta Process
15 0.037329566 229 nips-2011-Query-Aware MCMC
16 0.037060648 119 nips-2011-Higher-Order Correlation Clustering for Image Segmentation
17 0.035501789 296 nips-2011-Uniqueness of Belief Propagation on Signed Graphs
18 0.034864537 195 nips-2011-On Learning Discrete Graphical Models using Greedy Methods
19 0.034749921 183 nips-2011-Neural Reconstruction with Approximate Message Passing (NeuRAMP)
20 0.034636397 70 nips-2011-Dimensionality Reduction Using the Sparse Linear Model
topicId topicWeight
[(0, 0.091), (1, 0.019), (2, 0.014), (3, -0.018), (4, -0.019), (5, -0.029), (6, -0.054), (7, -0.032), (8, 0.026), (9, -0.089), (10, -0.017), (11, 0.005), (12, -0.053), (13, -0.002), (14, -0.031), (15, 0.042), (16, 0.076), (17, -0.03), (18, 0.01), (19, 0.023), (20, -0.018), (21, -0.036), (22, 0.081), (23, -0.047), (24, 0.101), (25, -0.043), (26, -0.037), (27, -0.019), (28, -0.017), (29, 0.038), (30, -0.049), (31, -0.034), (32, -0.044), (33, 0.071), (34, -0.033), (35, 0.022), (36, -0.041), (37, 0.034), (38, -0.056), (39, 0.026), (40, 0.113), (41, 0.041), (42, 0.06), (43, -0.071), (44, -0.033), (45, -0.031), (46, -0.005), (47, -0.016), (48, 0.003), (49, -0.071)]
simIndex simValue paperId paperTitle
same-paper 1 0.92557698 31 nips-2011-An Application of Tree-Structured Expectation Propagation for Channel Decoding
Author: Pablo M. Olmos, Luis Salamanca, Juan Fuentes, Fernando Pérez-Cruz
Abstract: We show an application of a tree structure for approximate inference in graphical models using the expectation propagation algorithm. These approximations are typically used over graphs with short-range cycles. We demonstrate that these approximations also help in sparse graphs with long-range loops, as the ones used in coding theory to approach channel capacity. For asymptotically large sparse graph, the expectation propagation algorithm together with the tree structure yields a completely disconnected approximation to the graphical model but, for for finite-length practical sparse graphs, the tree structure approximation to the code graph provides accurate estimates for the marginal of each variable. Furthermore, we propose a new method for constructing the tree structure on the fly that might be more amenable for sparse graphs with general factors. 1
2 0.75459284 158 nips-2011-Learning unbelievable probabilities
Author: Xaq Pitkow, Yashar Ahmadian, Ken D. Miller
Abstract: Loopy belief propagation performs approximate inference on graphical models with loops. One might hope to compensate for the approximation by adjusting model parameters. Learning algorithms for this purpose have been explored previously, and the claim has been made that every set of locally consistent marginals can arise from belief propagation run on a graphical model. On the contrary, here we show that many probability distributions have marginals that cannot be reached by belief propagation using any set of model parameters or any learning algorithm. We call such marginals ‘unbelievable.’ This problem occurs whenever the Hessian of the Bethe free energy is not positive-definite at the target marginals. All learning algorithms for belief propagation necessarily fail in these cases, producing beliefs or sets of beliefs that may even be worse than the pre-learning approximation. We then show that averaging inaccurate beliefs, each obtained from belief propagation using model parameters perturbed about some learned mean values, can achieve the unbelievable marginals. 1
3 0.72109473 296 nips-2011-Uniqueness of Belief Propagation on Signed Graphs
Author: Yusuke Watanabe
Abstract: While loopy Belief Propagation (LBP) has been utilized in a wide variety of applications with empirical success, it comes with few theoretical guarantees. Especially, if the interactions of random variables in a graphical model are strong, the behaviors of the algorithm can be difficult to analyze due to underlying phase transitions. In this paper, we develop a novel approach to the uniqueness problem of the LBP fixed point; our new “necessary and sufficient” condition is stated in terms of graphs and signs, where the sign denotes the types (attractive/repulsive) of the interaction (i.e., compatibility function) on the edge. In all previous works, uniqueness is guaranteed only in the situations where the strength of the interactions are “sufficiently” small in certain senses. In contrast, our condition covers arbitrary strong interactions on the specified class of signed graphs. The result of this paper is based on the recent theoretical advance in the LBP algorithm; the connection with the graph zeta function.
4 0.56872159 117 nips-2011-High-Dimensional Graphical Model Selection: Tractable Graph Families and Necessary Conditions
Author: Animashree Anandkumar, Vincent Tan, Alan S. Willsky
Abstract: We consider the problem of Ising and Gaussian graphical model selection given n i.i.d. samples from the model. We propose an efficient threshold-based algorithm for structure estimation based on conditional mutual information thresholding. This simple local algorithm requires only loworder statistics of the data and decides whether two nodes are neighbors in the unknown graph. We identify graph families for which the proposed algorithm has low sample and computational complexities. Under some transparent assumptions, we establish that the proposed algorithm is −4 structurally consistent (or sparsistent) when the number of samples scales as n = Ω(Jmin log p), where p is the number of nodes and Jmin is the minimum edge potential. We also develop novel non-asymptotic techniques for obtaining necessary conditions for graphical model selection. Keywords: Graphical model selection, high-dimensional learning, local-separation property, necessary conditions, typical sets, Fano’s inequality.
5 0.5588299 67 nips-2011-Data Skeletonization via Reeb Graphs
Author: Xiaoyin Ge, Issam I. Safa, Mikhail Belkin, Yusu Wang
Abstract: Recovering hidden structure from complex and noisy non-linear data is one of the most fundamental problems in machine learning and statistical inference. While such data is often high-dimensional, it is of interest to approximate it with a lowdimensional or even one-dimensional space, since many important aspects of data are often intrinsically low-dimensional. Furthermore, there are many scenarios where the underlying structure is graph-like, e.g, river/road networks or various trajectories. In this paper, we develop a framework to extract, as well as to simplify, a one-dimensional ”skeleton” from unorganized data using the Reeb graph. Our algorithm is very simple, does not require complex optimizations and can be easily applied to unorganized high-dimensional data such as point clouds or proximity graphs. It can also represent arbitrary graph structures in the data. We also give theoretical results to justify our method. We provide a number of experiments to demonstrate the effectiveness and generality of our algorithm, including comparisons to existing methods, such as principal curves. We believe that the simplicity and practicality of our algorithm will help to promote skeleton graphs as a data analysis tool for a broad range of applications.
6 0.47672793 170 nips-2011-Message-Passing for Approximate MAP Inference with Latent Variables
7 0.4741897 274 nips-2011-Structure Learning for Optimization
8 0.47250259 242 nips-2011-See the Tree Through the Lines: The Shazoo Algorithm
9 0.46060795 146 nips-2011-Learning Higher-Order Graph Structure with Features by Structure Penalty
10 0.45986789 213 nips-2011-Phase transition in the family of p-resistances
11 0.39777777 137 nips-2011-Iterative Learning for Reliable Crowdsourcing Systems
12 0.39585051 212 nips-2011-Periodic Finite State Controllers for Efficient POMDP and DEC-POMDP Planning
13 0.388679 253 nips-2011-Signal Estimation Under Random Time-Warpings and Nonlinear Signal Alignment
14 0.36472252 135 nips-2011-Information Rates and Optimal Decoding in Large Neural Populations
15 0.35603505 81 nips-2011-Efficient anomaly detection using bipartite k-NN graphs
16 0.35383099 79 nips-2011-Efficient Offline Communication Policies for Factored Multiagent POMDPs
17 0.34425545 306 nips-2011-t-divergence Based Approximate Inference
18 0.34055114 236 nips-2011-Regularized Laplacian Estimation and Fast Eigenvector Approximation
19 0.33887428 150 nips-2011-Learning a Distance Metric from a Network
20 0.33450463 206 nips-2011-Optimal Reinforcement Learning for Gaussian Systems
topicId topicWeight
[(0, 0.024), (4, 0.045), (20, 0.018), (21, 0.321), (26, 0.022), (31, 0.111), (33, 0.018), (39, 0.012), (43, 0.043), (45, 0.071), (57, 0.075), (65, 0.017), (74, 0.039), (83, 0.039), (99, 0.031)]
simIndex simValue paperId paperTitle
same-paper 1 0.74605519 31 nips-2011-An Application of Tree-Structured Expectation Propagation for Channel Decoding
Author: Pablo M. Olmos, Luis Salamanca, Juan Fuentes, Fernando Pérez-Cruz
Abstract: We show an application of a tree structure for approximate inference in graphical models using the expectation propagation algorithm. These approximations are typically used over graphs with short-range cycles. We demonstrate that these approximations also help in sparse graphs with long-range loops, as the ones used in coding theory to approach channel capacity. For asymptotically large sparse graph, the expectation propagation algorithm together with the tree structure yields a completely disconnected approximation to the graphical model but, for for finite-length practical sparse graphs, the tree structure approximation to the code graph provides accurate estimates for the marginal of each variable. Furthermore, we propose a new method for constructing the tree structure on the fly that might be more amenable for sparse graphs with general factors. 1
2 0.63315582 58 nips-2011-Complexity of Inference in Latent Dirichlet Allocation
Author: David Sontag, Dan Roy
Abstract: We consider the computational complexity of probabilistic inference in Latent Dirichlet Allocation (LDA). First, we study the problem of finding the maximum a posteriori (MAP) assignment of topics to words, where the document’s topic distribution is integrated out. We show that, when the e↵ective number of topics per document is small, exact inference takes polynomial time. In contrast, we show that, when a document has a large number of topics, finding the MAP assignment of topics to words in LDA is NP-hard. Next, we consider the problem of finding the MAP topic distribution for a document, where the topic-word assignments are integrated out. We show that this problem is also NP-hard. Finally, we briefly discuss the problem of sampling from the posterior, showing that this is NP-hard in one restricted setting, but leaving open the general question. 1
3 0.56988299 281 nips-2011-The Doubly Correlated Nonparametric Topic Model
Author: Dae I. Kim, Erik B. Sudderth
Abstract: Topic models are learned via a statistical model of variation within document collections, but designed to extract meaningful semantic structure. Desirable traits include the ability to incorporate annotations or metadata associated with documents; the discovery of correlated patterns of topic usage; and the avoidance of parametric assumptions, such as manual specification of the number of topics. We propose a doubly correlated nonparametric topic (DCNT) model, the first model to simultaneously capture all three of these properties. The DCNT models metadata via a flexible, Gaussian regression on arbitrary input features; correlations via a scalable square-root covariance representation; and nonparametric selection from an unbounded series of potential topics via a stick-breaking construction. We validate the semantic structure and predictive performance of the DCNT using a corpus of NIPS documents annotated by various metadata. 1
4 0.49728489 116 nips-2011-Hierarchically Supervised Latent Dirichlet Allocation
Author: Adler J. Perotte, Frank Wood, Noemie Elhadad, Nicholas Bartlett
Abstract: We introduce hierarchically supervised latent Dirichlet allocation (HSLDA), a model for hierarchically and multiply labeled bag-of-word data. Examples of such data include web pages and their placement in directories, product descriptions and associated categories from product hierarchies, and free-text clinical records and their assigned diagnosis codes. Out-of-sample label prediction is the primary goal of this work, but improved lower-dimensional representations of the bagof-word data are also of interest. We demonstrate HSLDA on large-scale data from clinical document labeling and retail product categorization tasks. We show that leveraging the structure from hierarchical labels improves out-of-sample label prediction substantially when compared to models that do not. 1
5 0.48083648 115 nips-2011-Hierarchical Topic Modeling for Analysis of Time-Evolving Personal Choices
Author: Xianxing Zhang, Lawrence Carin, David B. Dunson
Abstract: The nested Chinese restaurant process is extended to design a nonparametric topic-model tree for representation of human choices. Each tree path corresponds to a type of person, and each node (topic) has a corresponding probability vector over items that may be selected. The observed data are assumed to have associated temporal covariates (corresponding to the time at which choices are made), and we wish to impose that with increasing time it is more probable that topics deeper in the tree are utilized. This structure is imposed by developing a new “change point
6 0.47260964 110 nips-2011-Group Anomaly Detection using Flexible Genre Models
7 0.46716204 75 nips-2011-Dynamical segmentation of single trials from population neural data
8 0.4618808 219 nips-2011-Predicting response time and error rates in visual search
9 0.46023947 140 nips-2011-Kernel Embeddings of Latent Tree Graphical Models
10 0.45855233 292 nips-2011-Two is better than one: distinct roles for familiarity and recollection in retrieving palimpsest memories
11 0.45739776 229 nips-2011-Query-Aware MCMC
12 0.45711651 129 nips-2011-Improving Topic Coherence with Regularized Topic Models
13 0.45030543 37 nips-2011-Analytical Results for the Error in Filtering of Gaussian Processes
14 0.44982442 300 nips-2011-Variance Reduction in Monte-Carlo Tree Search
15 0.44946584 241 nips-2011-Scalable Training of Mixture Models via Coresets
16 0.44854349 82 nips-2011-Efficient coding of natural images with a population of noisy Linear-Nonlinear neurons
17 0.4481484 206 nips-2011-Optimal Reinforcement Learning for Gaussian Systems
18 0.44782925 57 nips-2011-Comparative Analysis of Viterbi Training and Maximum Likelihood Estimation for HMMs
19 0.44759101 249 nips-2011-Sequence learning with hidden units in spiking neural networks
20 0.44756478 301 nips-2011-Variational Gaussian Process Dynamical Systems