nips nips2013 nips2013-341 knowledge-graph by maker-knowledge-mining

341 nips-2013-Universal models for binary spike patterns using centered Dirichlet processes


Source: pdf

Author: Il M. Park, Evan W. Archer, Kenneth Latimer, Jonathan W. Pillow

Abstract: Probabilistic models for binary spike patterns provide a powerful tool for understanding the statistical dependencies in large-scale neural recordings. Maximum entropy (or “maxent”) models, which seek to explain dependencies in terms of low-order interactions between neurons, have enjoyed remarkable success in modeling such patterns, particularly for small groups of neurons. However, these models are computationally intractable for large populations, and low-order maxent models have been shown to be inadequate for some datasets. To overcome these limitations, we propose a family of “universal” models for binary spike patterns, where universality refers to the ability to model arbitrary distributions over all 2m binary patterns. We construct universal models using a Dirichlet process centered on a well-behaved parametric base measure, which naturally combines the flexibility of a histogram and the parsimony of a parametric model. We derive computationally efficient inference methods using Bernoulli and cascaded logistic base measures, which scale tractably to large populations. We also establish a condition for equivalence between the cascaded logistic and the 2nd-order maxent or “Ising” model, making cascaded logistic a reasonable choice for base measure in a universal model. We illustrate the performance of these models using neural data. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Universal models for binary spike patterns using centered Dirichlet processes Il Memming Park123 , Evan Archer24 , Kenneth Latimer12 , Jonathan W. [sent-1, score-0.343]

2 edu Abstract Probabilistic models for binary spike patterns provide a powerful tool for understanding the statistical dependencies in large-scale neural recordings. [sent-9, score-0.333]

3 However, these models are computationally intractable for large populations, and low-order maxent models have been shown to be inadequate for some datasets. [sent-11, score-0.24]

4 To overcome these limitations, we propose a family of “universal” models for binary spike patterns, where universality refers to the ability to model arbitrary distributions over all 2m binary patterns. [sent-12, score-0.335]

5 We construct universal models using a Dirichlet process centered on a well-behaved parametric base measure, which naturally combines the flexibility of a histogram and the parsimony of a parametric model. [sent-13, score-0.841]

6 We derive computationally efficient inference methods using Bernoulli and cascaded logistic base measures, which scale tractably to large populations. [sent-14, score-1.091]

7 We also establish a condition for equivalence between the cascaded logistic and the 2nd-order maxent or “Ising” model, making cascaded logistic a reasonable choice for base measure in a universal model. [sent-15, score-2.195]

8 1 Introduction Probability distributions over spike words form the fundamental building blocks of the neural code. [sent-17, score-0.271]

9 These difficulties, both computational and statistical, arise fundamentally from the exponential scaling (in population size) of the number of possible words a given population is capable of expressing. [sent-19, score-0.162]

10 One strategy for combating this combinatorial explosion is to introduce a parametric model which seeks to make trade-offs between flexibility, computational expense [1, 2], or mathematical completeness [3] in order to be applicable to large-scale neural recordings. [sent-20, score-0.193]

11 A variety of parametric models have been proposed in the literature, including the 2nd-order maxent or Ising model [4, 5], the reliable interaction model [3], restricted Boltzmann machine [6], deep learning [7], mixture of Bernoulli model [8], and the dichotomized Gaussian model [9]. [sent-21, score-0.483]

12 However, while the number of parameters in a model chosen from a given parametric family may increase with the number of neurons, it cannot increase exponentially with the number of words. [sent-22, score-0.16]

13 Thus, as the size of a population increases, a parametric model rapidly loses flexibility in describing the full spike distribution. [sent-23, score-0.387]

14 In contrast, nonparametric models allow flexibility to grow with the amount of data [10, 11, 12, 13, 14]. [sent-24, score-0.126]

15 A naive nonparametric model, such as the histogram of spike words, theoretically preserves representational power and computational simplicity. [sent-25, score-0.389]

16 Yet in practice, the empirical histogram may be extremely slow to converge, especially for the high dimensional data we are primarily interested 1 B C independent Bernoulli model m neurons A D cascaded logistic model time Figure 1: (A) Binary representation of neural population activity. [sent-26, score-1.153]

17 (B) Hierarchical Dirichlet process prior for the universal binary model (UBM) over spike words. [sent-28, score-0.366]

18 The ⇡’s are drawn from a Dirichlet with parameters given by ↵ and a base distribution over spike words with parameter ✓. [sent-30, score-0.483]

19 (C, D) Graphical models of two base measures over spike words: independent Bernoulli model and cascaded logistic model. [sent-31, score-1.343]

20 The base measure is also a distribution over each spike word x = (x1 , . [sent-32, score-0.6]

21 In most cases, we expect never to have enough data for the empirical histogram to converge. [sent-37, score-0.105]

22 Perhaps even more concerning is that a naive histogram model fails smooth over the space of words: unobserved words are not accounted for in the model. [sent-38, score-0.212]

23 We propose a framework which combines the parsimony of parametric models with the flexibility of nonparametric models. [sent-39, score-0.287]

24 We model the spike word distribution as a Dirichlet process centered on a parametric base measure. [sent-40, score-0.745]

25 An appropriately chosen base measure smooths the observations, while the Dirichlet process allows for data that depart systematically from the base measure. [sent-41, score-0.603]

26 These models are universal in the sense that they can converge to any distribution supported on the (2m 1)dimensional simplex. [sent-42, score-0.14]

27 The influence of any base measure diminishes with increasing sample size, and the model ultimately converges to the empirical distribution function. [sent-43, score-0.371]

28 The choice of base measure influences the small-sample behavior and computational tractability of universal models, both of which are crucial for neural applications. [sent-44, score-0.45]

29 We consider two base measures that exploit a priori knowledge about neural data while remaining computationally tractable for large populations: the independent Bernoulli spiking model, and the cascaded logistic model [15, 16]. [sent-45, score-1.246]

30 Both the Bernoulli and cascaded logistic models show better performance when used as a base measure for a universal model than when used alone. [sent-46, score-1.316]

31 2 Universal binary model Consider a (random) binary spike word of length m, x 2 {0, 1}m , where m denotes the number of distinct neurons (and/or time bins; Fig. [sent-48, score-0.472]

32 The universal binary model is a hierarchical probabilistic model where on the bottom level (Fig. [sent-54, score-0.197]

33 1B), x is drawn from a multinomial (categorical) distribution with the probability of observing each word given by the vector ⇡ (spike word distribution). [sent-55, score-0.194]

34 We choose a discrete probability measure for G✓ such that it has positive measure only over {1, . [sent-57, score-0.152]

35 Thus, the Dirichlet process has probability mass only on the K spike words, and is described by a (finite dimensional) Dirichlet distribution, ⇡ ⇠ Dir(↵g1 , . [sent-61, score-0.2]

36 (2) In the absence of data, the parametric base measure controls the mean of this nonparametric model, E[⇡|↵] = G✓ , 2 (3) regardless of ↵. [sent-65, score-0.532]

37 1 We can start with good parametric models of neural populations, and extend them into a nonparametric model by using them as the base measure [17]. [sent-67, score-0.64]

38 Under this scheme, the base measure quickly learns much of the basic structure of the data while the Dirichlet extension takes into account any deviations in the data which are not predicted by the parametric component. [sent-68, score-0.476]

39 We call such an extension a universal binary model (UBM) with base measure G✓ . [sent-69, score-0.487]

40 Dirichlet-Multinomial) distribution: P (X|↵, G✓ ) = K Y (↵) (N + ↵) k=1 (nk + ↵gk ) , (↵gk ) (4) where nk is the number of observations of the word k. [sent-73, score-0.19]

41 This leads to a simple formula for sampling from the predictive distribution over words: Pr(xN +1 = k|XN , ↵, G✓ ) = nk + ↵gk . [sent-74, score-0.118]

42 0, the predictive distribution converges to the histogram estimate nk , and N as ↵ ! [sent-77, score-0.242]

43 The marginal log-likelihood from (4) is given by, X X L = log P (XN |↵, ✓) = log (nk + ↵gk ) log (↵gk ) + log (↵) log (N + ↵) . [sent-82, score-0.1]

44 (6) k Derivatives with respect to ↵ and ✓ are, X @L =↵ ( (nk + ↵gk ) @✓ k @L X = gk ( (nk + ↵gk ) @↵ k (↵gk )) @ gk , @✓ (↵gk )) + (↵) (7) (N + ↵) , (8) k where denotes the digamma function. [sent-83, score-0.414]

45 1, dL converges to d✓ gk @✓ gk , the derivative of the logarithm of the base measure with respect to ✓. [sent-86, score-0.754]

46 0, the derivative P 1 @ goes to gk @✓ gk , reflecting the fact that the number of observations nk is ignored: the likelihood effectively reflects only a single draw from the base distribution with probability gk . [sent-88, score-0.982]

47 Even when the likelihood defined by the base measure is a convex or log-convex in ✓, the UBM likelihood is not guaranteed to be convex. [sent-89, score-0.321]

48 2 Hyper-prior When modeling large populations of neurons, the number of parameters ✓ of the base measure grows and over-fitting becomes a concern. [sent-92, score-0.378]

49 Since the UBM relies on the base measure to provide smoothing over words, it is critical to properly regularize our estimate of ✓. [sent-93, score-0.347]

50 3 Base measures The scalability of UBM hinges on the scalability of its base measure. [sent-100, score-0.296]

51 1 Independent Bernoulli model We consider the independent Bernoulli model which assumes (statistically) independent spiking neurons. [sent-103, score-0.114]

52 The Bernoulli base measure takes the form, G✓ (k) = p(x1 , . [sent-105, score-0.321]

53 , xm |✓) = m Y pxi (1 i pi ) 1 xi (9) , i where pi 0 and ✓ = (p1 , . [sent-108, score-0.127]

54 The distribution has full support on K spike words as long as all pi ’s are non-zero. [sent-112, score-0.268]

55 Although the Bernoulli model cannot capture the higher-order correlation structure of the spike word distribution with only m parameters, inference is fast and memoryefficient. [sent-113, score-0.31]

56 2 Cascaded logistic model To introduce a rich dependence structure among the neurons, we assume the joint firing probability of each neuron factors with a cascaded structure (see Fig. [sent-115, score-0.871]

57 , xm Along with a parametric form of conditional distribution p(xi |x1 , . [sent-122, score-0.177]

58 X p(xi = 1|x1:i 1 , ✓) = logistic(hi + wij xj ) (11) j < i 2 or j > i+2, is also a cascaded logistic model. [sent-128, score-0.824]

59 5 (15) (16) (17) (18) (19) (20) A 10 10 2 10 B 3 10 4 10 5 10 4 6 x 10 4 2 0 0 1 2 3 4 5 6 7 8 Figure 3: 3rd order maxent distribution experiment. [sent-136, score-0.13]

60 Shaded area represents frequentist 95% confidence interval for histogram estimator assuming the same amount of data. [sent-143, score-0.122]

61 Unlike the Ising model, the order of the neurons plays a role in the formulation of the cascaded logistic model. [sent-145, score-0.908]

62 This theorem can be generalized to sparse, structured cascaded logistic models. [sent-150, score-0.824]

63 Theorem 2 (Intersection between cascaded logistic model and Ising model). [sent-151, score-0.855]

64 A cascaded logistic model with at most two interactions with other neurons is also an Ising model. [sent-152, score-0.939]

65 For example, cascaded logistic with a sparse cascade p(x1 )p(x2 |x1 )p(x3 |x1 )p(x4 |x1 , x3 )p(x5 |x2 , x4 ) is an Ising model (Fig. [sent-153, score-0.872]

66 We remark that although the cascaded logistic model can be written as an exponential family form, the cascaded logistic does not correspond to a simple family of maximum entropy models in general. [sent-155, score-1.755]

67 The theorems show that only a subset of Ising models are equivalent to cascaded logistic models. [sent-156, score-0.868]

68 However, cascaded logistic models generally provide good approximations to the Ising model. [sent-157, score-0.868]

69 We demonstrate this by drawing random Ising models (both with sparse and dense pairwise coupling J), and then fitting with a cascaded logistic model (Fig. [sent-158, score-0.916]

70 Since Ising models are widely accepted as effective models of neural populations, the cascaded logistic model presents a computationally tractable alternative. [sent-160, score-1.02]

71 4 Simulations We compare two parametric models (independent Bernoulli and cascaded logistic model) with three nonparametric models (two universal binary models centered on the parametric models, and a naive histogram estimator) on simulated data with 15 neurons. [sent-161, score-1.599]

72 We use an l1 regularization to fit the cascaded logistic model and the corresponding UBM. [sent-163, score-0.855]

73 As the number of samples increases, Jensen-Shannon (JS) divergence between the estimated model and true maxent model decreases exponentially for the nonparametric models. [sent-167, score-0.3]

74 The JS-divergence of the 4 We provide MATLAB code to convert back and forth between a subset of Ising models and the corresponding subset of cascaded logistic models (see online supplemental material). [sent-168, score-0.932]

75 6 A 10 10 2 3 10 B 4 10 5 10 10 4 4 x 10 3 2 1 0 0 1 2 3 4 5 6 7 8 9 1011 Figure 4: Synchrony histogram model. [sent-169, score-0.105]

76 Each word with the same number of total spikes regardless of neuron identity has the same probability. [sent-170, score-0.169]

77 Both Bernoulli and cascaded logistic models do not provide a good approximation in this case and saturate, in terms of JS divergence. [sent-171, score-0.868]

78 Note that cascaded logistic and UBM with cascaded logistic base measure perform almost identically, and their convergence does not saturate (as expected by Theorem 1). [sent-177, score-1.998]

79 parametric models saturates since the actual distribution does not lie within the same parametric family. [sent-178, score-0.302]

80 The cascaded logistic model and the UBM centered on it show the best performance for the small sample regime, but eventually other nonparametric models catch up with the cascaded logistic model. [sent-179, score-1.865]

81 Where significant deviations from the base measure model can be observed in Fig. [sent-182, score-0.378]

82 4, we draw samples from a distribution with higher-order dependences; Each word with the same number of total spikes are assigned the same probability. [sent-185, score-0.176]

83 For example, words with exactly 10 neurons spiking (and 5 not spiking, out of 15 neurons) occur with high probability as can be seen from the histogram of the total spikes (Fig. [sent-186, score-0.353]

84 Neither the Bernoulli model nor the cascaded logistic model can capture this structure accurately, indicated by a plateau in the convergence plots (Fig. [sent-188, score-0.905]

85 In addition, we see that if the data comes from the model class assumed by the base measure, then UBM is just as good as the base measure alone (Fig. [sent-191, score-0.597]

86 0349 0 10 10 0 10 4 x 10 4 4 10 D 14 10 10 8 10 0 10 0 6 8 10 4 6 10 4 10 0 0 1 3 4 5 6 7 8 9 10 0 10 10 0 10 3 10 4 10 5 10 Figure 6: Various models fit to a population of ten retinal ganglion neurons’ response to naturalistic movie [3]. [sent-195, score-0.131]

87 (A) JS divergence between the estimated model, and histogram constructed from the test data. [sent-198, score-0.131]

88 Ising model is included, and its trace is closely followed by the cascaded logistic model. [sent-199, score-0.855]

89 supplements the base measure to model flexibly the observed firing patterns, and performs at least as well as the histogram in the worst case. [sent-203, score-0.457]

90 In panel C, we confirm that the cascaded logistic UBM gives the best fit. [sent-208, score-0.824]

91 The decrease in corresponding ↵, shown in panel D, indicates that the cascaded logistic UBM is becoming less confident that the data is from an actual cascaded logistic model as we obtain more data. [sent-209, score-1.679]

92 6 Conclusion We proposed universal binary models (UBMs), a nonparametric framework that extends parametric models of neural recordings. [sent-210, score-0.467]

93 UBMs flexibly trade off between smoothing from the base measure and “histogram-like” behavior. [sent-211, score-0.347]

94 The Dirichlet process can incorporate deviations from the base measure when supported by the data, even as the base measure buttresses the nonparametric approach with desirable properties of parametric models, such as fast convergence and interpretability. [sent-212, score-0.897]

95 Since the main source of smoothing is the base measure, UBM’s ability to extrapolate is limited to repeatedly observed words. [sent-214, score-0.29]

96 We proposed the cascaded logistic model for use as a powerful, but still computationally tractable, base measure. [sent-216, score-1.122]

97 We showed, both theoretically and empirically, that the cascaded logistic model is an effective, scalable alternative to the Ising model, which is usually limited to smaller populations. [sent-217, score-0.855]

98 The UBM model class has the potential to reveal complex structure in large-scale recordings without the limitations of a priori parametric assumptions. [sent-218, score-0.16]

99 Near-maximum entropy models for binary neural representations of natural images. [sent-292, score-0.148]

100 Bayesian entropy estimation for binary spike train data using parametric prior knowledge. [sent-354, score-0.382]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('cascaded', 0.594), ('ubm', 0.42), ('base', 0.245), ('logistic', 0.23), ('gk', 0.207), ('ising', 0.193), ('spike', 0.182), ('maxent', 0.13), ('parametric', 0.129), ('js', 0.107), ('histogram', 0.105), ('word', 0.097), ('universal', 0.096), ('hi', 0.096), ('bernoulli', 0.095), ('nk', 0.093), ('neurons', 0.084), ('ubms', 0.084), ('nonparametric', 0.082), ('measure', 0.076), ('dirichlet', 0.076), ('pentadiagonal', 0.063), ('populations', 0.057), ('words', 0.056), ('spikes', 0.056), ('spiking', 0.052), ('segev', 0.051), ('wi', 0.05), ('xm', 0.048), ('hm', 0.047), ('population', 0.045), ('models', 0.044), ('centered', 0.043), ('retinal', 0.042), ('ganmor', 0.042), ('pmodel', 0.042), ('binary', 0.039), ('scatter', 0.036), ('patterns', 0.035), ('memming', 0.034), ('exibility', 0.033), ('neural', 0.033), ('parsimony', 0.032), ('entropy', 0.032), ('model', 0.031), ('format', 0.031), ('pi', 0.03), ('saturate', 0.029), ('jm', 0.029), ('exp', 0.028), ('ring', 0.028), ('exibly', 0.026), ('deviations', 0.026), ('divergence', 0.026), ('smoothing', 0.026), ('predictive', 0.025), ('draw', 0.023), ('tting', 0.023), ('tractable', 0.022), ('computationally', 0.022), ('log', 0.02), ('naive', 0.02), ('bandwidth', 0.02), ('supplemental', 0.02), ('indicated', 0.019), ('interaction', 0.019), ('converges', 0.019), ('pxi', 0.019), ('moorman', 0.019), ('dichotomized', 0.019), ('bethge', 0.019), ('pachitariu', 0.019), ('extrapolate', 0.019), ('smooths', 0.019), ('microstructure', 0.019), ('ohiorhenuan', 0.019), ('reserved', 0.019), ('chapter', 0.018), ('reliable', 0.018), ('process', 0.018), ('measures', 0.017), ('connectivity', 0.017), ('frequentist', 0.017), ('schneidman', 0.017), ('truccolo', 0.017), ('dependences', 0.017), ('earcher', 0.017), ('evan', 0.017), ('synchrony', 0.017), ('catch', 0.017), ('dl', 0.017), ('mclachlan', 0.017), ('grating', 0.017), ('petreska', 0.017), ('usa', 0.017), ('scalability', 0.017), ('sparse', 0.017), ('quantify', 0.016), ('capable', 0.016), ('neuron', 0.016)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000005 341 nips-2013-Universal models for binary spike patterns using centered Dirichlet processes

Author: Il M. Park, Evan W. Archer, Kenneth Latimer, Jonathan W. Pillow

Abstract: Probabilistic models for binary spike patterns provide a powerful tool for understanding the statistical dependencies in large-scale neural recordings. Maximum entropy (or “maxent”) models, which seek to explain dependencies in terms of low-order interactions between neurons, have enjoyed remarkable success in modeling such patterns, particularly for small groups of neurons. However, these models are computationally intractable for large populations, and low-order maxent models have been shown to be inadequate for some datasets. To overcome these limitations, we propose a family of “universal” models for binary spike patterns, where universality refers to the ability to model arbitrary distributions over all 2m binary patterns. We construct universal models using a Dirichlet process centered on a well-behaved parametric base measure, which naturally combines the flexibility of a histogram and the parsimony of a parametric model. We derive computationally efficient inference methods using Bernoulli and cascaded logistic base measures, which scale tractably to large populations. We also establish a condition for equivalence between the cascaded logistic and the 2nd-order maxent or “Ising” model, making cascaded logistic a reasonable choice for base measure in a universal model. We illustrate the performance of these models using neural data. 1

2 0.22634578 51 nips-2013-Bayesian entropy estimation for binary spike train data using parametric prior knowledge

Author: Evan W. Archer, Il M. Park, Jonathan W. Pillow

Abstract: Shannon’s entropy is a basic quantity in information theory, and a fundamental building block for the analysis of neural codes. Estimating the entropy of a discrete distribution from samples is an important and difficult problem that has received considerable attention in statistics and theoretical neuroscience. However, neural responses have characteristic statistical structure that generic entropy estimators fail to exploit. For example, existing Bayesian entropy estimators make the naive assumption that all spike words are equally likely a priori, which makes for an inefficient allocation of prior probability mass in cases where spikes are sparse. Here we develop Bayesian estimators for the entropy of binary spike trains using priors designed to flexibly exploit the statistical structure of simultaneouslyrecorded spike responses. We define two prior distributions over spike words using mixtures of Dirichlet distributions centered on simple parametric models. The parametric model captures high-level statistical features of the data, such as the average spike count in a spike word, which allows the posterior over entropy to concentrate more rapidly than with standard estimators (e.g., in cases where the probability of spiking differs strongly from 0.5). Conversely, the Dirichlet distributions assign prior mass to distributions far from the parametric model, ensuring consistent estimates for arbitrary distributions. We devise a compact representation of the data and prior that allow for computationally efficient implementations of Bayesian least squares and empirical Bayes entropy estimators with large numbers of neurons. We apply these estimators to simulated and real neural data and show that they substantially outperform traditional methods.

3 0.15619273 262 nips-2013-Real-Time Inference for a Gamma Process Model of Neural Spiking

Author: David Carlson, Vinayak Rao, Joshua T. Vogelstein, Lawrence Carin

Abstract: With simultaneous measurements from ever increasing populations of neurons, there is a growing need for sophisticated tools to recover signals from individual neurons. In electrophysiology experiments, this classically proceeds in a two-step process: (i) threshold the waveforms to detect putative spikes and (ii) cluster the waveforms into single units (neurons). We extend previous Bayesian nonparametric models of neural spiking to jointly detect and cluster neurons using a Gamma process model. Importantly, we develop an online approximate inference scheme enabling real-time analysis, with performance exceeding the previous state-of-theart. Via exploratory data analysis—using data with partial ground truth as well as two novel data sets—we find several features of our model collectively contribute to our improved performance including: (i) accounting for colored noise, (ii) detecting overlapping spikes, (iii) tracking waveform dynamics, and (iv) using multiple channels. We hope to enable novel experiments simultaneously measuring many thousands of neurons and possibly adapting stimuli dynamically to probe ever deeper into the mysteries of the brain. 1

4 0.11690907 6 nips-2013-A Determinantal Point Process Latent Variable Model for Inhibition in Neural Spiking Data

Author: Jasper Snoek, Richard Zemel, Ryan P. Adams

Abstract: Point processes are popular models of neural spiking behavior as they provide a statistical distribution over temporal sequences of spikes and help to reveal the complexities underlying a series of recorded action potentials. However, the most common neural point process models, the Poisson process and the gamma renewal process, do not capture interactions and correlations that are critical to modeling populations of neurons. We develop a novel model based on a determinantal point process over latent embeddings of neurons that effectively captures and helps visualize complex inhibitory and competitive interaction. We show that this model is a natural extension of the popular generalized linear model to sets of interacting neurons. The model is extended to incorporate gain control or divisive normalization, and the modulation of neural spiking based on periodic phenomena. Applied to neural spike recordings from the rat hippocampus, we see that the model captures inhibitory relationships, a dichotomy of classes of neurons, and a periodic modulation by the theta rhythm known to be present in the data. 1

5 0.10072222 257 nips-2013-Projected Natural Actor-Critic

Author: Philip S. Thomas, William C. Dabney, Stephen Giguere, Sridhar Mahadevan

Abstract: Natural actor-critics form a popular class of policy search algorithms for finding locally optimal policies for Markov decision processes. In this paper we address a drawback of natural actor-critics that limits their real-world applicability—their lack of safety guarantees. We present a principled algorithm for performing natural gradient descent over a constrained domain. In the context of reinforcement learning, this allows for natural actor-critic algorithms that are guaranteed to remain within a known safe region of policy space. While deriving our class of constrained natural actor-critic algorithms, which we call Projected Natural ActorCritics (PNACs), we also elucidate the relationship between natural gradient descent and mirror descent. 1

6 0.093580455 121 nips-2013-Firing rate predictions in optimal balanced networks

7 0.092171304 266 nips-2013-Recurrent linear models of simultaneously-recorded neural populations

8 0.088194706 286 nips-2013-Robust learning of low-dimensional dynamics from large neural ensembles

9 0.086781755 246 nips-2013-Perfect Associative Learning with Spike-Timing-Dependent Plasticity

10 0.080887645 141 nips-2013-Inferring neural population dynamics from multiple partial recordings of the same neural circuit

11 0.077341467 173 nips-2013-Least Informative Dimensions

12 0.073475495 308 nips-2013-Spike train entropy-rate estimation using hierarchical Dirichlet process priors

13 0.073289998 304 nips-2013-Sparse nonnegative deconvolution for compressive calcium imaging: algorithms and phase transitions

14 0.070608325 172 nips-2013-Learning word embeddings efficiently with noise-contrastive estimation

15 0.069173746 258 nips-2013-Projecting Ising Model Parameters for Fast Mixing

16 0.066702507 318 nips-2013-Structured Learning via Logistic Regression

17 0.066208988 298 nips-2013-Small-Variance Asymptotics for Hidden Markov Models

18 0.063946672 229 nips-2013-Online Learning of Nonparametric Mixture Models via Sequential Variational Approximation

19 0.058670212 96 nips-2013-Distributed Representations of Words and Phrases and their Compositionality

20 0.058004625 305 nips-2013-Spectral methods for neural characterization using generalized quadratic models


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.143), (1, 0.066), (2, -0.069), (3, -0.042), (4, -0.167), (5, 0.019), (6, 0.014), (7, -0.027), (8, 0.064), (9, 0.044), (10, -0.014), (11, 0.033), (12, -0.047), (13, -0.023), (14, 0.034), (15, -0.054), (16, 0.128), (17, 0.11), (18, -0.022), (19, 0.154), (20, -0.055), (21, 0.074), (22, -0.052), (23, -0.099), (24, 0.021), (25, 0.011), (26, 0.038), (27, -0.096), (28, 0.014), (29, 0.076), (30, -0.053), (31, 0.053), (32, 0.021), (33, -0.012), (34, 0.0), (35, 0.094), (36, -0.141), (37, -0.016), (38, 0.025), (39, -0.018), (40, 0.007), (41, 0.063), (42, 0.022), (43, -0.014), (44, -0.014), (45, -0.046), (46, 0.045), (47, -0.001), (48, 0.055), (49, 0.113)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94350159 341 nips-2013-Universal models for binary spike patterns using centered Dirichlet processes

Author: Il M. Park, Evan W. Archer, Kenneth Latimer, Jonathan W. Pillow

Abstract: Probabilistic models for binary spike patterns provide a powerful tool for understanding the statistical dependencies in large-scale neural recordings. Maximum entropy (or “maxent”) models, which seek to explain dependencies in terms of low-order interactions between neurons, have enjoyed remarkable success in modeling such patterns, particularly for small groups of neurons. However, these models are computationally intractable for large populations, and low-order maxent models have been shown to be inadequate for some datasets. To overcome these limitations, we propose a family of “universal” models for binary spike patterns, where universality refers to the ability to model arbitrary distributions over all 2m binary patterns. We construct universal models using a Dirichlet process centered on a well-behaved parametric base measure, which naturally combines the flexibility of a histogram and the parsimony of a parametric model. We derive computationally efficient inference methods using Bernoulli and cascaded logistic base measures, which scale tractably to large populations. We also establish a condition for equivalence between the cascaded logistic and the 2nd-order maxent or “Ising” model, making cascaded logistic a reasonable choice for base measure in a universal model. We illustrate the performance of these models using neural data. 1

2 0.85707778 51 nips-2013-Bayesian entropy estimation for binary spike train data using parametric prior knowledge

Author: Evan W. Archer, Il M. Park, Jonathan W. Pillow

Abstract: Shannon’s entropy is a basic quantity in information theory, and a fundamental building block for the analysis of neural codes. Estimating the entropy of a discrete distribution from samples is an important and difficult problem that has received considerable attention in statistics and theoretical neuroscience. However, neural responses have characteristic statistical structure that generic entropy estimators fail to exploit. For example, existing Bayesian entropy estimators make the naive assumption that all spike words are equally likely a priori, which makes for an inefficient allocation of prior probability mass in cases where spikes are sparse. Here we develop Bayesian estimators for the entropy of binary spike trains using priors designed to flexibly exploit the statistical structure of simultaneouslyrecorded spike responses. We define two prior distributions over spike words using mixtures of Dirichlet distributions centered on simple parametric models. The parametric model captures high-level statistical features of the data, such as the average spike count in a spike word, which allows the posterior over entropy to concentrate more rapidly than with standard estimators (e.g., in cases where the probability of spiking differs strongly from 0.5). Conversely, the Dirichlet distributions assign prior mass to distributions far from the parametric model, ensuring consistent estimates for arbitrary distributions. We devise a compact representation of the data and prior that allow for computationally efficient implementations of Bayesian least squares and empirical Bayes entropy estimators with large numbers of neurons. We apply these estimators to simulated and real neural data and show that they substantially outperform traditional methods.

3 0.71619165 262 nips-2013-Real-Time Inference for a Gamma Process Model of Neural Spiking

Author: David Carlson, Vinayak Rao, Joshua T. Vogelstein, Lawrence Carin

Abstract: With simultaneous measurements from ever increasing populations of neurons, there is a growing need for sophisticated tools to recover signals from individual neurons. In electrophysiology experiments, this classically proceeds in a two-step process: (i) threshold the waveforms to detect putative spikes and (ii) cluster the waveforms into single units (neurons). We extend previous Bayesian nonparametric models of neural spiking to jointly detect and cluster neurons using a Gamma process model. Importantly, we develop an online approximate inference scheme enabling real-time analysis, with performance exceeding the previous state-of-theart. Via exploratory data analysis—using data with partial ground truth as well as two novel data sets—we find several features of our model collectively contribute to our improved performance including: (i) accounting for colored noise, (ii) detecting overlapping spikes, (iii) tracking waveform dynamics, and (iv) using multiple channels. We hope to enable novel experiments simultaneously measuring many thousands of neurons and possibly adapting stimuli dynamically to probe ever deeper into the mysteries of the brain. 1

4 0.63147891 308 nips-2013-Spike train entropy-rate estimation using hierarchical Dirichlet process priors

Author: Karin C. Knudson, Jonathan W. Pillow

Abstract: Entropy rate quantifies the amount of disorder in a stochastic process. For spiking neurons, the entropy rate places an upper bound on the rate at which the spike train can convey stimulus information, and a large literature has focused on the problem of estimating entropy rate from spike train data. Here we present Bayes least squares and empirical Bayesian entropy rate estimators for binary spike trains using hierarchical Dirichlet process (HDP) priors. Our estimator leverages the fact that the entropy rate of an ergodic Markov Chain with known transition probabilities can be calculated analytically, and many stochastic processes that are non-Markovian can still be well approximated by Markov processes of sufficient depth. Choosing an appropriate depth of Markov model presents challenges due to possibly long time dependencies and short data sequences: a deeper model can better account for long time dependencies, but is more difficult to infer from limited data. Our approach mitigates this difficulty by using a hierarchical prior to share statistical power across Markov chains of different depths. We present both a fully Bayesian and empirical Bayes entropy rate estimator based on this model, and demonstrate their performance on simulated and real neural spike train data. 1

5 0.52138317 6 nips-2013-A Determinantal Point Process Latent Variable Model for Inhibition in Neural Spiking Data

Author: Jasper Snoek, Richard Zemel, Ryan P. Adams

Abstract: Point processes are popular models of neural spiking behavior as they provide a statistical distribution over temporal sequences of spikes and help to reveal the complexities underlying a series of recorded action potentials. However, the most common neural point process models, the Poisson process and the gamma renewal process, do not capture interactions and correlations that are critical to modeling populations of neurons. We develop a novel model based on a determinantal point process over latent embeddings of neurons that effectively captures and helps visualize complex inhibitory and competitive interaction. We show that this model is a natural extension of the popular generalized linear model to sets of interacting neurons. The model is extended to incorporate gain control or divisive normalization, and the modulation of neural spiking based on periodic phenomena. Applied to neural spike recordings from the rat hippocampus, we see that the model captures inhibitory relationships, a dichotomy of classes of neurons, and a periodic modulation by the theta rhythm known to be present in the data. 1

6 0.51290101 110 nips-2013-Estimating the Unseen: Improved Estimators for Entropy and other Properties

7 0.51105833 246 nips-2013-Perfect Associative Learning with Spike-Timing-Dependent Plasticity

8 0.48033145 320 nips-2013-Summary Statistics for Partitionings and Feature Allocations

9 0.46857524 121 nips-2013-Firing rate predictions in optimal balanced networks

10 0.46259513 305 nips-2013-Spectral methods for neural characterization using generalized quadratic models

11 0.46245456 286 nips-2013-Robust learning of low-dimensional dynamics from large neural ensembles

12 0.43192694 172 nips-2013-Learning word embeddings efficiently with noise-contrastive estimation

13 0.4229852 173 nips-2013-Least Informative Dimensions

14 0.4194895 141 nips-2013-Inferring neural population dynamics from multiple partial recordings of the same neural circuit

15 0.40546778 96 nips-2013-Distributed Representations of Words and Phrases and their Compositionality

16 0.40040618 205 nips-2013-Multisensory Encoding, Decoding, and Identification

17 0.38271856 49 nips-2013-Bayesian Inference and Online Experimental Design for Mapping Neural Microcircuits

18 0.3814868 86 nips-2013-Demixing odors - fast inference in olfaction

19 0.36348331 164 nips-2013-Learning and using language via recursive pragmatic reasoning about other agents

20 0.36318231 266 nips-2013-Recurrent linear models of simultaneously-recorded neural populations


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(2, 0.036), (16, 0.054), (33, 0.155), (34, 0.103), (36, 0.014), (39, 0.175), (41, 0.056), (49, 0.065), (56, 0.066), (70, 0.04), (85, 0.028), (89, 0.032), (93, 0.07), (95, 0.02)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.8843841 12 nips-2013-A Novel Two-Step Method for Cross Language Representation Learning

Author: Min Xiao, Yuhong Guo

Abstract: Cross language text classification is an important learning task in natural language processing. A critical challenge of cross language learning arises from the fact that words of different languages are in disjoint feature spaces. In this paper, we propose a two-step representation learning method to bridge the feature spaces of different languages by exploiting a set of parallel bilingual documents. Specifically, we first formulate a matrix completion problem to produce a complete parallel document-term matrix for all documents in two languages, and then induce a low dimensional cross-lingual document representation by applying latent semantic indexing on the obtained matrix. We use a projected gradient descent algorithm to solve the formulated matrix completion problem with convergence guarantees. The proposed method is evaluated by conducting a set of experiments with cross language sentiment classification tasks on Amazon product reviews. The experimental results demonstrate that the proposed learning method outperforms a number of other cross language representation learning methods, especially when the number of parallel bilingual documents is small. 1

2 0.84267837 76 nips-2013-Correlated random features for fast semi-supervised learning

Author: Brian McWilliams, David Balduzzi, Joachim Buhmann

Abstract: This paper presents Correlated Nystr¨ m Views (XNV), a fast semi-supervised alo gorithm for regression and classification. The algorithm draws on two main ideas. First, it generates two views consisting of computationally inexpensive random features. Second, multiview regression, using Canonical Correlation Analysis (CCA) on unlabeled data, biases the regression towards useful features. It has been shown that CCA regression can substantially reduce variance with a minimal increase in bias if the views contains accurate estimators. Recent theoretical and empirical work shows that regression with random features closely approximates kernel regression, implying that the accuracy requirement holds for random views. We show that XNV consistently outperforms a state-of-the-art algorithm for semi-supervised learning: substantially improving predictive performance and reducing the variability of performance on a wide variety of real-world datasets, whilst also reducing runtime by orders of magnitude. 1

same-paper 3 0.84123576 341 nips-2013-Universal models for binary spike patterns using centered Dirichlet processes

Author: Il M. Park, Evan W. Archer, Kenneth Latimer, Jonathan W. Pillow

Abstract: Probabilistic models for binary spike patterns provide a powerful tool for understanding the statistical dependencies in large-scale neural recordings. Maximum entropy (or “maxent”) models, which seek to explain dependencies in terms of low-order interactions between neurons, have enjoyed remarkable success in modeling such patterns, particularly for small groups of neurons. However, these models are computationally intractable for large populations, and low-order maxent models have been shown to be inadequate for some datasets. To overcome these limitations, we propose a family of “universal” models for binary spike patterns, where universality refers to the ability to model arbitrary distributions over all 2m binary patterns. We construct universal models using a Dirichlet process centered on a well-behaved parametric base measure, which naturally combines the flexibility of a histogram and the parsimony of a parametric model. We derive computationally efficient inference methods using Bernoulli and cascaded logistic base measures, which scale tractably to large populations. We also establish a condition for equivalence between the cascaded logistic and the 2nd-order maxent or “Ising” model, making cascaded logistic a reasonable choice for base measure in a universal model. We illustrate the performance of these models using neural data. 1

4 0.83092195 118 nips-2013-Fast Determinantal Point Process Sampling with Application to Clustering

Author: Byungkon Kang

Abstract: Determinantal Point Process (DPP) has gained much popularity for modeling sets of diverse items. The gist of DPP is that the probability of choosing a particular set of items is proportional to the determinant of a positive definite matrix that defines the similarity of those items. However, computing the determinant requires time cubic in the number of items, and is hence impractical for large sets. In this paper, we address this problem by constructing a rapidly mixing Markov chain, from which we can acquire a sample from the given DPP in sub-cubic time. In addition, we show that this framework can be extended to sampling from cardinalityconstrained DPPs. As an application, we show how our sampling algorithm can be used to provide a fast heuristic for determining the number of clusters, resulting in better clustering.

5 0.7621001 22 nips-2013-Action is in the Eye of the Beholder: Eye-gaze Driven Model for Spatio-Temporal Action Localization

Author: Nataliya Shapovalova, Michalis Raptis, Leonid Sigal, Greg Mori

Abstract: We propose a weakly-supervised structured learning approach for recognition and spatio-temporal localization of actions in video. As part of the proposed approach, we develop a generalization of the Max-Path search algorithm which allows us to efficiently search over a structured space of multiple spatio-temporal paths while also incorporating context information into the model. Instead of using spatial annotations in the form of bounding boxes to guide the latent model during training, we utilize human gaze data in the form of a weak supervisory signal. This is achieved by incorporating eye gaze, along with the classification, into the structured loss within the latent SVM learning framework. Experiments on a challenging benchmark dataset, UCF-Sports, show that our model is more accurate, in terms of classification, and achieves state-of-the-art results in localization. In addition, our model can produce top-down saliency maps conditioned on the classification label and localized latent paths. 1

6 0.7599349 64 nips-2013-Compete to Compute

7 0.75942665 201 nips-2013-Multi-Task Bayesian Optimization

8 0.7591269 301 nips-2013-Sparse Additive Text Models with Low Rank Background

9 0.7554239 331 nips-2013-Top-Down Regularization of Deep Belief Networks

10 0.75514066 262 nips-2013-Real-Time Inference for a Gamma Process Model of Neural Spiking

11 0.75488132 287 nips-2013-Scalable Inference for Logistic-Normal Topic Models

12 0.75463736 114 nips-2013-Extracting regions of interest from biological images with convolutional sparse block coding

13 0.7518205 49 nips-2013-Bayesian Inference and Online Experimental Design for Mapping Neural Microcircuits

14 0.75122732 286 nips-2013-Robust learning of low-dimensional dynamics from large neural ensembles

15 0.75078017 121 nips-2013-Firing rate predictions in optimal balanced networks

16 0.75069374 350 nips-2013-Wavelets on Graphs via Deep Learning

17 0.75031388 183 nips-2013-Mapping paradigm ontologies to and from the brain

18 0.75021249 173 nips-2013-Least Informative Dimensions

19 0.74977374 40 nips-2013-Approximate Inference in Continuous Determinantal Processes

20 0.74902779 236 nips-2013-Optimal Neural Population Codes for High-dimensional Stimulus Variables