nips nips2012 nips2012-138 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: James Scott, Jonathan W. Pillow
Abstract: Characterizing the information carried by neural populations in the brain requires accurate statistical models of neural spike responses. The negative-binomial distribution provides a convenient model for over-dispersed spike counts, that is, responses with greater-than-Poisson variability. Here we describe a powerful data-augmentation framework for fully Bayesian inference in neural models with negative-binomial spiking. Our approach relies on a recently described latentvariable representation of the negative-binomial distribution, which equates it to a Polya-gamma mixture of normals. This framework provides a tractable, conditionally Gaussian representation of the posterior that can be used to design efficient EM and Gibbs sampling based algorithms for inference in regression and dynamic factor models. We apply the model to neural data from primate retina and show that it substantially outperforms Poisson regression on held-out data, and reveals latent structure underlying spike count correlations in simultaneously recorded spike trains. 1
Reference: text
sentIndex sentText sentNum sentScore
1 Fully Bayesian inference for neural models with negative-binomial spiking Jonathan W. [sent-1, score-0.133]
2 edu Abstract Characterizing the information carried by neural populations in the brain requires accurate statistical models of neural spike responses. [sent-8, score-0.338]
3 The negative-binomial distribution provides a convenient model for over-dispersed spike counts, that is, responses with greater-than-Poisson variability. [sent-9, score-0.333]
4 Here we describe a powerful data-augmentation framework for fully Bayesian inference in neural models with negative-binomial spiking. [sent-10, score-0.146]
5 Our approach relies on a recently described latentvariable representation of the negative-binomial distribution, which equates it to a Polya-gamma mixture of normals. [sent-11, score-0.09]
6 This framework provides a tractable, conditionally Gaussian representation of the posterior that can be used to design efficient EM and Gibbs sampling based algorithms for inference in regression and dynamic factor models. [sent-12, score-0.476]
7 We apply the model to neural data from primate retina and show that it substantially outperforms Poisson regression on held-out data, and reveals latent structure underlying spike count correlations in simultaneously recorded spike trains. [sent-13, score-1.014]
8 1 Introduction A central problem in systems neuroscience is to understand the probabilistic representation of information by neurons and neural populations. [sent-14, score-0.239]
9 Statistical models play a critical role in this endeavor, as they provide essential tools for quantifying the stochasticity of neural responses and the information they carry about various sensory and behavioral quantities of interest. [sent-15, score-0.091]
10 Poisson and conditionally Poisson models feature prominently in systems neuroscience, as they provide a convenient and tractable description of spike counts governed by an underlying spike rate. [sent-16, score-0.796]
11 However, Poisson models are limited by the fact that they constrain the ratio between the spike count mean and variance to one. [sent-17, score-0.374]
12 A second limitation of Poisson models in regression analyses (for relating spike responses to stimuli) or latent factor analyses (for finding common sources of underlying variability) is the difficulty of performing fully Bayesian inference. [sent-19, score-0.743]
13 The posterior formed under Poisson likelihood and Gaussian prior has no tractable representation, so most theorists resort to either fast, approximate methods based on Gaussians, [2–9] or slower, sampling-based methods that may scale poorly with data or dimensionality [10–15]. [sent-20, score-0.17]
14 The negative-binomial (NB) distribution generalizes the Poisson with a shape parameter that controls the tradeoff between mean and variance, providing an attractive alternative for over-dispersed spike count data. [sent-21, score-0.394]
15 Here we describe fully Bayesian inference methods for the neural spike count data based on a recently developed representation of the NB as a Gaussian mixture model [19]. [sent-23, score-0.523]
16 In the 1 weights B shape stimulus C 300 response variance A 200 100 on Poiss latent 0 0 50 mean 100 Figure 1: Representations of the negative-binomial (NB) regression model. [sent-24, score-0.327]
17 The linearly projected stimulus t = T xt defines the scale parameter for a gamma r. [sent-26, score-0.219]
18 with shape parameter ⇠, giving t ⇠ Ga(e t , ⇠), which is in turn the rate for a Poisson spike count: yt ⇠ Poiss( t ). [sent-28, score-0.626]
19 (B) Graphical model illustrating novel representation as a Polya-Gamma (PG) mixture of normals. [sent-29, score-0.069]
20 Spike counts are represented as NB distributed with shape ⇠ and rate pt = 1/(1 + e t ). [sent-30, score-0.304]
21 t is conditionally PG, while (and |x) are normal given (! [sent-32, score-0.08]
22 (C) Relationship between spike-count mean and variance for different settings of shape parameter ⇠, illustrating superPoisson variability of the NB model. [sent-34, score-0.118]
23 following, we review the conditionally Gaussian representation for the negative-binomial (Sec. [sent-35, score-0.115]
24 2), describe batch-EM, online-EM and Gibbs-sampling based inference methods for NB regression (Sec. [sent-36, score-0.127]
25 3), sampling-based methods for dynamic latent factor models (Sec. [sent-37, score-0.249]
26 4), and show applications to spiking data from primate retina. [sent-38, score-0.164]
27 2 The negative-binomial model Begin with the single-variable case where the data Y = {yt } are scalar counts observed at times t = 1, . [sent-39, score-0.168]
28 A standard Poisson generalized linear model (GLM) assumes that yt ⇠ Pois(e t ), where the log rate parameter t may depend upon the stimulus. [sent-43, score-0.338]
29 To relax this assumption, we can consider the negative binomial model, which can be described as a doubly-stochastic or hierarchical Poisson model [18]. [sent-45, score-0.079]
30 Suppose that yt arises according to: ( (yt | t ⇠ t) | ⇠, ⇠ t) Pois( t ) Ga ⇠, e t , where we have parametrized the Gamma distribution in terms of its shape and scale parameters. [sent-46, score-0.352]
31 By marginalizing over the top-level model for t , we recover a negative-binomial distribution for yt : where pt is related to p(yt | ⇠, t t) / (1 pt ) ⇠ p y t , t via the logistic transformation: e t . [sent-47, score-0.507]
32 1+e t The extra parameter ⇠ therefore allows for over-dispersion compared to the Poisson, with the count yt having expected value ⇠e t and variance ⇠e t (1 + e t ). [sent-48, score-0.4]
33 pt = Bayesian inference for models of this form has long been recognized as a challenging problem, due to the analytically inconvenient form of the likelihood function. [sent-51, score-0.19]
34 To see the difficulty, suppose that T is a linear function of known inputs xt = (xt1 , . [sent-52, score-0.139]
35 Then the conditional posterior t = xt distribution for , up to a multiplicative constant, is p( | ⇠, Y ) / p( ) · N Y {exp(xT )}yt t , {1 + exp(xT )}⇠+yt t t=1 (1) where p( ) is the prior distribution, and where we have assumed for the moment that ⇠ is fixed. [sent-56, score-0.304]
36 The two major issues are the same as those that arise in Bayesian logistic regression: the response 2 depends non-linearly upon the parameters, and there is no natural conjugate prior p( ) to facilitate posterior computation. [sent-57, score-0.253]
37 One traditional approach for Bayesian inference in logistic models is to work directly with the discrete-data likelihood. [sent-58, score-0.096]
38 A variety of tactics along these lines have been proposed, including numerical integration [23], analytic approximations to the likelihood [24–26], or Metropolis-Hastings [27]. [sent-59, score-0.081]
39 A second approach is to assume that the discrete outcome is some function of an unobserved continuous quantity or latent variable. [sent-60, score-0.133]
40 This is most familiar in the case of Bayesian inference for the probit or dichotomized-Gaussian model [28, 29], where binary outcomes yi are assumed to be thresholded versions of a latent Gaussian quantity zi . [sent-61, score-0.19]
41 The same approach has also been applied to logistic and Poisson regression [30, e. [sent-62, score-0.109]
42 To proceed with Bayesian inference in the negative-binomial model, we appeal to a recent latentvariable construction (depicted in Fig. [sent-66, score-0.112]
43 The basic result we exploit is that the negative binomial likelihood can be represented as a mixture of normals with Polya-Gamma mixing distribution. [sent-68, score-0.128]
44 A random variable X has a Polya-Gamma distribution with parameters b > 0 and c 2 R, denoted X ⇠ PG(b, c), if 1 1 X X= 2⇡ 2 (k gk , 1/2)2 + c2 /(4⇡ 2 ) D k=1 (2) D where each gk ⇠ Ga(b, 1) is an independent gamma random variable, and where = denotes equality in distribution. [sent-71, score-0.112]
45 This integral identity allows us to rewrite each term in the negative binomial likelihood (eq. [sent-80, score-0.128]
46 1) as Z 1 2 {exp( t )}yt ⇠ yt t t (1 pt ) pt = /e e ! [sent-81, score-0.468]
47 t , we have a likelihood proportional to e Q( t ) for some quadratic form Q, which will be conditionally conjugate to any Gaussian or mixture-of-Gaussians prior for t . [sent-86, score-0.219]
48 In this sense, the Polya-Gamma distribution is conditionally conjugate to the NB likelihood, which is very useful for Gibbs sampling. [sent-96, score-0.135]
49 t form a set of sufficient statistics for the complete-data log posterior distribution in . [sent-100, score-0.086]
50 As we now describe, these four facts are sufficient to allow straightforward Bayesian inference for negative-binomial models. [sent-106, score-0.09]
51 We focus first on regression models, for which we derive simple Gibbs sampling and EM algorithms. [sent-107, score-0.102]
52 We then turn to negative-binomial dynamic factor models, which can be fit using a variant of the forward-filter, backwards-sample (FFBS) algorithm [32]. [sent-108, score-0.116]
53 1 Negative-binomial regression Fully Bayes inference via MCMC Suppose that t = xT for some p-vector of regressors xt . [sent-110, score-0.309]
54 It is usually reasonable to assume a conditionally Gaussian prior, ⇠ N (c, C). [sent-124, score-0.08]
55 Returning to the likelihood in (4) and ignoring constants of proportionality, we may write the complete-data log posterior distribution, given ! [sent-131, score-0.135]
56 Suppose that our current estimate of the parameter is (t 1) , and that the current estimate of the complete-data log posterior is 1 T (t 1) Q( ) = S + T d(t 1) + log p( ) , (6) 2 where S (t 1) t 1 X = ! [sent-168, score-0.086]
57 After observing new data (yt , xt ), we first compute the expected ✓ ◆ t (t 1) ! [sent-171, score-0.139]
58 t | yt , ˆ )= tanh( t /2) , t with t = denoting the linear predictor evaluated at the current estimate. [sent-173, score-0.3]
59 t xt xT ˆ t )S t (t 1) + t t x t , t )d where t is the learning rate. [sent-175, score-0.139]
60 In high-dimensional problems, the usual practice is to impose sparsity via an `1 penalty on the regression coefficients, leading to a lasso-type prior. [sent-178, score-0.07]
61 This online EM is guaranteed to converge to a stationary point of the log posterior distribution if the P1 P1 2 learning rate decays in time such that t=1 t = 1 and t=1 t < 1. [sent-180, score-0.12]
62 , tK ) denote a vector of K linear predictors at time t, corresponding to K different neurons with observed counts Yt = (yt1 , . [sent-188, score-0.279]
63 We propose a dynamic negative5 binomial factor model for Yt , with a vector autoregressive (VAR) structure for the latent factors: NB(⇠, e tk ) for k = 1, . [sent-192, score-0.412]
64 ⇠ = = ytk t ft Here ft denotes an L-vector of latent factors, with L typically much smaller than P . [sent-196, score-0.632]
65 These restrictions are traditional in Bayesian factor analysis [41], and ensure that B is formally identified. [sent-198, score-0.072]
66 We also assume that is a diagonal matrix, and impose conjugate inversegamma priors on ⌧ 2 to ensure that, marginally over the latent factors ft , the entries of t have approximately unit variance. [sent-199, score-0.44]
67 By exploiting the Polya-Gamma data-augmentation scheme, posterior inference in this model may proceed via straightforward Gibbs sampling—something not previously possible for count-data factor models. [sent-201, score-0.215]
68 Prior work on latent variable modeling of spike data has relied on either Gaussian approximations [2–6, 8] or variants of particle filtering [10–13]. [sent-202, score-0.439]
69 Conditional upon B and ft , we update the latent variables as ! [sent-204, score-0.366]
70 tk ⇠ PG(ytk + ⇠, Bk ft ), where Bk denotes the kth row of the loadings matrix. [sent-205, score-0.279]
71 The mean vector ↵ and factor-loadings matrix B can both be updated in closed-form via a Gaussian draw using the full conditional distributions given in, for example, [42] or [43]. [sent-206, score-0.076]
72 Given all latent variables and other parameters of the model, the factors ft can be updated in a single block using the forward-filter, backwards-sample (FFBS) algorithm from [32]. [sent-207, score-0.385]
73 First, pass forwards through the data from y1 to yN , recursively computing the filtered moments of ft as Mt = mt = 1 + B T ⌦t B) 1 T (Vt 1 Mt (B ⌦t zt + Vt mt 1) , where Mt T + ⌧ 2I Vt = zt = (zt1 , . [sent-208, score-0.593]
74 tk ↵k Then draw fN ⇠ N(mN , MN ) from its conditional distribution. [sent-218, score-0.16]
75 Finally, pass backwards through the data, sampling ft as (ft | mt , Mt , ft+1 ) ⇠ N(at , At ), where At 1 at = = Mt 1 2 +⌧ 1 1 I At (Mt mt + ⌧ 2 ft+1 ) . [sent-219, score-0.523]
76 This will result in a block draw of all N ⇥ L factors from their joint conditional distribution. [sent-220, score-0.133]
77 5 Experiments To demonstrate our methods, we performed regression and dynamic factor analyses on a dataset of 27 neurons recorded from primate retina (published in [44] and re-used with authors’ permission). [sent-221, score-0.499]
78 Briefly, these data consist of spike responses from a simultaneously-recorded population of ON and OFF parasol retinal ganglion cells, stimulated with a flickering, 120-Hz binary white noise stimulus. [sent-222, score-0.438]
79 1 Regression Figure 2 shows a comparison of a Poisson model versus a negative-binomial model for each of the 27 neurons in the retinal dataset. [sent-224, score-0.216]
80 We binned spike counts in 8 ms bins, and regressed against a temporally lagged stimulus, resulting in a 100-element (10 ⇥ 10 pixel) spatial receptive field for each neuron. [sent-225, score-0.442]
81 In some cases it is dozens of orders of magnitude better (as in neurons 12–14 and 22–27), suggesting that there is substantial over-dispersion in the data that is not faithfully captured by the Poisson model. [sent-230, score-0.111]
82 Yet these results suggest, at the very least, that many of these neurons have marginal distributions that are quite far from Poisson. [sent-232, score-0.111]
83 Moreover, regardless of the underlying signal strength, the regression problem can be handled quite straightforwardly using our online method, even in high dimensions, without settling for the restrictive Poisson assumption. [sent-233, score-0.142]
84 2 Dynamic factor analysis To study the factor-modeling framework, we conducted parallel experiments on both simulated and real data. [sent-235, score-0.119]
85 First, we simulated two different data sets comprising 1000 time points and 11 neurons, each from a two-factor model: one with high factor autocorrelation ( = 0. [sent-236, score-0.209]
86 98), and one with low factor autocorrelation ( = 0. [sent-237, score-0.162]
87 It is especially interesting to compare the left-most column of Figure 3 with the actual cross-sectional correlation of t , the systematic component of variation, in the second column. [sent-241, score-0.115]
88 The correlation of the raw counts yt show a dramatic attenuation effect, compared to the real latent states. [sent-242, score-0.753]
89 Yet this structure is uncovered easily by the model, with together with a full assessment of posterior uncertainty. [sent-243, score-0.086]
90 Finally, Figure 4 shows the results of fitting a two-factor model to the primate retinal data. [sent-245, score-0.225]
91 We are able to uncover latent structure in the data in a completely unsupervised fashion. [sent-246, score-0.133]
92 As with the simulated data, it is interesting to compare the correlation of the raw counts yt with the estimated correlation structure of the latent states. [sent-247, score-0.915]
93 There is also strong support for a low-autocorrelation regime in the factors, in light of the posterior mean factor scores depicted in the right-most pane. [sent-248, score-0.196]
94 Likewise, Bayesian inference for the negative binomial model has traditionally been a difficult problem, with the existence of a fully automatic Gibbs sampler only recently discovered [19]. [sent-250, score-0.193]
95 The three left-most columns show the raw correlation among T the counts yt ; the actual correlation, E( t t ), of the latent states; and the posterior mean estimator for the correlation of the latent states. [sent-253, score-1.087]
96 The right-most column shows the simulated spike trains for the 11 neurons, along with the factors ft in blue (with 75% credible intervals), plotted over time. [sent-254, score-0.573]
97 1 2 3 4 5 6 7 8 9 10 11 Index Index Index Index Index Index Index Index Index Index Correlation among spike counts Estimated correlation of latent states Spike counts Posterior mean factor scores Figure 4: Results for factor analysis of the primate retinal data. [sent-255, score-1.261]
98 Such models can be fit straightforwardly via MCMC for a wide class of prior distributions over model parameters (including sparsity-inducing choices, such as the lasso). [sent-257, score-0.073]
99 Finally, we have embedded a dynamic factor model inside a negative-binomial likelihood. [sent-260, score-0.116]
100 This latter approach can be extended quite easily to spatial interactions, more general state-space models, or mixed models incorporating both regressors and latent variables. [sent-261, score-0.176]
wordName wordTfidf (topN-words)
[('yt', 0.3), ('spike', 0.274), ('nb', 0.259), ('poisson', 0.244), ('pg', 0.208), ('ft', 0.195), ('index', 0.183), ('counts', 0.168), ('mt', 0.148), ('xt', 0.139), ('latent', 0.133), ('primate', 0.12), ('pillow', 0.117), ('correlation', 0.115), ('neurons', 0.111), ('ytk', 0.109), ('retinal', 0.105), ('ahmadian', 0.101), ('em', 0.092), ('comput', 0.092), ('autocorrelation', 0.09), ('posterior', 0.086), ('tk', 0.084), ('pt', 0.084), ('polson', 0.082), ('conditionally', 0.08), ('binomial', 0.079), ('gibbs', 0.076), ('james', 0.076), ('litke', 0.076), ('shlens', 0.076), ('factor', 0.072), ('chichilnisky', 0.071), ('regression', 0.07), ('bayesian', 0.069), ('austin', 0.069), ('count', 0.068), ('brockwell', 0.062), ('ffbs', 0.062), ('ztk', 0.062), ('neuroscience', 0.061), ('paninski', 0.06), ('responses', 0.059), ('texas', 0.058), ('inference', 0.057), ('fully', 0.057), ('factors', 0.057), ('tanh', 0.057), ('ga', 0.055), ('conjugate', 0.055), ('pois', 0.055), ('eden', 0.055), ('poiss', 0.055), ('hatsopoulos', 0.055), ('latentvariable', 0.055), ('vidne', 0.055), ('shape', 0.052), ('zt', 0.051), ('neuron', 0.05), ('cunningham', 0.05), ('kulkarni', 0.05), ('likelihood', 0.049), ('simulated', 0.047), ('carvalho', 0.047), ('shenoy', 0.047), ('spiking', 0.044), ('dynamic', 0.044), ('conditional', 0.044), ('retina', 0.043), ('regressors', 0.043), ('sher', 0.043), ('vt', 0.043), ('stimulus', 0.04), ('scott', 0.04), ('gamma', 0.04), ('nicholas', 0.039), ('analyses', 0.039), ('logistic', 0.039), ('regime', 0.038), ('upon', 0.038), ('straightforwardly', 0.038), ('biometrika', 0.037), ('raw', 0.037), ('gk', 0.036), ('prior', 0.035), ('chris', 0.035), ('carlos', 0.035), ('representation', 0.035), ('online', 0.034), ('illustrating', 0.034), ('simoncelli', 0.034), ('states', 0.034), ('facts', 0.033), ('neural', 0.032), ('draw', 0.032), ('sampling', 0.032), ('neurophysiology', 0.032), ('medicine', 0.032), ('approximations', 0.032), ('variance', 0.032)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 138 nips-2012-Fully Bayesian inference for neural models with negative-binomial spiking
Author: James Scott, Jonathan W. Pillow
Abstract: Characterizing the information carried by neural populations in the brain requires accurate statistical models of neural spike responses. The negative-binomial distribution provides a convenient model for over-dispersed spike counts, that is, responses with greater-than-Poisson variability. Here we describe a powerful data-augmentation framework for fully Bayesian inference in neural models with negative-binomial spiking. Our approach relies on a recently described latentvariable representation of the negative-binomial distribution, which equates it to a Polya-gamma mixture of normals. This framework provides a tractable, conditionally Gaussian representation of the posterior that can be used to design efficient EM and Gibbs sampling based algorithms for inference in regression and dynamic factor models. We apply the model to neural data from primate retina and show that it substantially outperforms Poisson regression on held-out data, and reveals latent structure underlying spike count correlations in simultaneously recorded spike trains. 1
2 0.30606306 218 nips-2012-Mixing Properties of Conditional Markov Chains with Unbounded Feature Functions
Author: Mathieu Sinn, Bei Chen
Abstract: Conditional Markov Chains (also known as Linear-Chain Conditional Random Fields in the literature) are a versatile class of discriminative models for the distribution of a sequence of hidden states conditional on a sequence of observable variables. Large-sample properties of Conditional Markov Chains have been first studied in [1]. The paper extends this work in two directions: first, mixing properties of models with unbounded feature functions are being established; second, necessary conditions for model identifiability and the uniqueness of maximum likelihood estimates are being given. 1
3 0.26958129 252 nips-2012-On Multilabel Classification and Ranking with Partial Feedback
Author: Claudio Gentile, Francesco Orabona
Abstract: We present a novel multilabel/ranking algorithm working in partial information settings. The algorithm is based on 2nd-order descent methods, and relies on upper-confidence bounds to trade-off exploration and exploitation. We analyze this algorithm in a partial adversarial setting, where covariates can be adversarial, but multilabel probabilities are ruled by (generalized) linear models. We show O(T 1/2 log T ) regret bounds, which improve in several ways on the existing results. We test the effectiveness of our upper-confidence scheme by contrasting against full-information baselines on real-world multilabel datasets, often obtaining comparable performance. 1
4 0.25884008 47 nips-2012-Augment-and-Conquer Negative Binomial Processes
Author: Mingyuan Zhou, Lawrence Carin
Abstract: By developing data augmentation methods unique to the negative binomial (NB) distribution, we unite seemingly disjoint count and mixture models under the NB process framework. We develop fundamental properties of the models and derive efficient Gibbs sampling inference. We show that the gamma-NB process can be reduced to the hierarchical Dirichlet process with normalization, highlighting its unique theoretical, structural and computational advantages. A variety of NB processes with distinct sharing mechanisms are constructed and applied to topic modeling, with connections to existing algorithms, showing the importance of inferring both the NB dispersion and probability parameters. 1
5 0.22405356 13 nips-2012-A Nonparametric Conjugate Prior Distribution for the Maximizing Argument of a Noisy Function
Author: Pedro Ortega, Jordi Grau-moya, Tim Genewein, David Balduzzi, Daniel Braun
Abstract: We propose a novel Bayesian approach to solve stochastic optimization problems that involve finding extrema of noisy, nonlinear functions. Previous work has focused on representing possible functions explicitly, which leads to a two-step procedure of first, doing inference over the function space and second, finding the extrema of these functions. Here we skip the representation step and directly model the distribution over extrema. To this end, we devise a non-parametric conjugate prior based on a kernel regressor. The resulting posterior distribution directly captures the uncertainty over the maximum of the unknown function. Given t observations of the function, the posterior can be evaluated efficiently in time O(t2 ) up to a multiplicative constant. Finally, we show how to apply our model to optimize a noisy, non-convex, high-dimensional objective function.
6 0.19568549 239 nips-2012-Neuronal Spike Generation Mechanism as an Oversampling, Noise-shaping A-to-D converter
7 0.19452012 190 nips-2012-Learning optimal spike-based representations
8 0.1797028 56 nips-2012-Bayesian active learning with localized priors for fast receptive field characterization
9 0.17011862 314 nips-2012-Slice Normalized Dynamic Markov Logic Networks
10 0.16874535 293 nips-2012-Relax and Randomize : From Value to Algorithms
11 0.15599549 41 nips-2012-Ancestor Sampling for Particle Gibbs
13 0.14926581 195 nips-2012-Learning visual motion in recurrent neural networks
14 0.14653514 77 nips-2012-Complex Inference in Neural Circuits with Probabilistic Population Codes and Topic Models
15 0.12812893 121 nips-2012-Expectation Propagation in Gaussian Process Dynamical Systems
16 0.11767083 112 nips-2012-Efficient Spike-Coding with Multiplicative Adaptation in a Spike Response Model
17 0.11690348 292 nips-2012-Regularized Off-Policy TD-Learning
18 0.11225751 324 nips-2012-Stochastic Gradient Descent with Only One Projection
19 0.10757679 80 nips-2012-Confusion-Based Online Learning and a Passive-Aggressive Scheme
20 0.10439776 262 nips-2012-Optimal Neural Tuning Curves for Arbitrary Stimulus Distributions: Discrimax, Infomax and Minimum $L p$ Loss
topicId topicWeight
[(0, 0.287), (1, 0.052), (2, 0.072), (3, 0.435), (4, -0.112), (5, 0.032), (6, -0.001), (7, -0.024), (8, 0.043), (9, 0.055), (10, 0.014), (11, -0.032), (12, 0.095), (13, -0.008), (14, 0.062), (15, 0.014), (16, 0.107), (17, 0.003), (18, 0.036), (19, -0.032), (20, -0.006), (21, 0.018), (22, 0.018), (23, 0.015), (24, -0.023), (25, 0.032), (26, 0.034), (27, -0.106), (28, -0.109), (29, -0.017), (30, -0.043), (31, 0.011), (32, 0.035), (33, 0.076), (34, 0.02), (35, 0.013), (36, -0.046), (37, 0.123), (38, 0.024), (39, 0.037), (40, -0.033), (41, 0.049), (42, 0.052), (43, 0.016), (44, -0.001), (45, 0.02), (46, -0.034), (47, 0.009), (48, 0.055), (49, -0.013)]
simIndex simValue paperId paperTitle
same-paper 1 0.94380295 138 nips-2012-Fully Bayesian inference for neural models with negative-binomial spiking
Author: James Scott, Jonathan W. Pillow
Abstract: Characterizing the information carried by neural populations in the brain requires accurate statistical models of neural spike responses. The negative-binomial distribution provides a convenient model for over-dispersed spike counts, that is, responses with greater-than-Poisson variability. Here we describe a powerful data-augmentation framework for fully Bayesian inference in neural models with negative-binomial spiking. Our approach relies on a recently described latentvariable representation of the negative-binomial distribution, which equates it to a Polya-gamma mixture of normals. This framework provides a tractable, conditionally Gaussian representation of the posterior that can be used to design efficient EM and Gibbs sampling based algorithms for inference in regression and dynamic factor models. We apply the model to neural data from primate retina and show that it substantially outperforms Poisson regression on held-out data, and reveals latent structure underlying spike count correlations in simultaneously recorded spike trains. 1
2 0.69882053 218 nips-2012-Mixing Properties of Conditional Markov Chains with Unbounded Feature Functions
Author: Mathieu Sinn, Bei Chen
Abstract: Conditional Markov Chains (also known as Linear-Chain Conditional Random Fields in the literature) are a versatile class of discriminative models for the distribution of a sequence of hidden states conditional on a sequence of observable variables. Large-sample properties of Conditional Markov Chains have been first studied in [1]. The paper extends this work in two directions: first, mixing properties of models with unbounded feature functions are being established; second, necessary conditions for model identifiability and the uniqueness of maximum likelihood estimates are being given. 1
3 0.68559003 13 nips-2012-A Nonparametric Conjugate Prior Distribution for the Maximizing Argument of a Noisy Function
Author: Pedro Ortega, Jordi Grau-moya, Tim Genewein, David Balduzzi, Daniel Braun
Abstract: We propose a novel Bayesian approach to solve stochastic optimization problems that involve finding extrema of noisy, nonlinear functions. Previous work has focused on representing possible functions explicitly, which leads to a two-step procedure of first, doing inference over the function space and second, finding the extrema of these functions. Here we skip the representation step and directly model the distribution over extrema. To this end, we devise a non-parametric conjugate prior based on a kernel regressor. The resulting posterior distribution directly captures the uncertainty over the maximum of the unknown function. Given t observations of the function, the posterior can be evaluated efficiently in time O(t2 ) up to a multiplicative constant. Finally, we show how to apply our model to optimize a noisy, non-convex, high-dimensional objective function.
4 0.67044747 66 nips-2012-Causal discovery with scale-mixture model for spatiotemporal variance dependencies
Author: Zhitang Chen, Kun Zhang, Laiwan Chan
Abstract: In conventional causal discovery, structural equation models (SEM) are directly applied to the observed variables, meaning that the causal effect can be represented as a function of the direct causes themselves. However, in many real world problems, there are significant dependencies in the variances or energies, which indicates that causality may possibly take place at the level of variances or energies. In this paper, we propose a probabilistic causal scale-mixture model with spatiotemporal variance dependencies to represent a specific type of generating mechanism of the observations. In particular, the causal mechanism including contemporaneous and temporal causal relations in variances or energies is represented by a Structural Vector AutoRegressive model (SVAR). We prove the identifiability of this model under the non-Gaussian assumption on the innovation processes. We also propose algorithms to estimate the involved parameters and discover the contemporaneous causal structure. Experiments on synthetic and real world data are conducted to show the applicability of the proposed model and algorithms.
Author: Lars Buesing, Maneesh Sahani, Jakob H. Macke
Abstract: Latent linear dynamical systems with generalised-linear observation models arise in a variety of applications, for instance when modelling the spiking activity of populations of neurons. Here, we show how spectral learning methods (usually called subspace identification in this context) for linear systems with linear-Gaussian observations can be extended to estimate the parameters of a generalised-linear dynamical system model despite a non-linear and non-Gaussian observation process. We use this approach to obtain estimates of parameters for a dynamical model of neural population data, where the observed spike-counts are Poisson-distributed with log-rates determined by the latent dynamical process, possibly driven by external inputs. We show that the extended subspace identification algorithm is consistent and accurately recovers the correct parameters on large simulated data sets with a single calculation, avoiding the costly iterative computation of approximate expectation-maximisation (EM). Even on smaller data sets, it provides an effective initialisation for EM, avoiding local optima and speeding convergence. These benefits are shown to extend to real neural data.
6 0.65474474 252 nips-2012-On Multilabel Classification and Ranking with Partial Feedback
7 0.64430779 314 nips-2012-Slice Normalized Dynamic Markov Logic Networks
8 0.62370867 283 nips-2012-Putting Bayes to sleep
9 0.59177274 80 nips-2012-Confusion-Based Online Learning and a Passive-Aggressive Scheme
10 0.56968606 11 nips-2012-A Marginalized Particle Gaussian Process Regression
11 0.56933761 56 nips-2012-Bayesian active learning with localized priors for fast receptive field characterization
12 0.56092924 239 nips-2012-Neuronal Spike Generation Mechanism as an Oversampling, Noise-shaping A-to-D converter
13 0.55546916 77 nips-2012-Complex Inference in Neural Circuits with Probabilistic Population Codes and Topic Models
14 0.53635204 195 nips-2012-Learning visual motion in recurrent neural networks
15 0.52624959 47 nips-2012-Augment-and-Conquer Negative Binomial Processes
16 0.52377594 293 nips-2012-Relax and Randomize : From Value to Algorithms
17 0.51207733 73 nips-2012-Coding efficiency and detectability of rate fluctuations with non-Poisson neuronal firing
18 0.49703899 190 nips-2012-Learning optimal spike-based representations
19 0.47957489 224 nips-2012-Multi-scale Hyper-time Hardware Emulation of Human Motor Nervous System Based on Spiking Neurons using FPGA
20 0.47732562 41 nips-2012-Ancestor Sampling for Particle Gibbs
topicId topicWeight
[(0, 0.02), (21, 0.021), (38, 0.091), (42, 0.011), (54, 0.011), (55, 0.011), (74, 0.026), (76, 0.091), (80, 0.603), (92, 0.04)]
simIndex simValue paperId paperTitle
1 0.95070225 204 nips-2012-MAP Inference in Chains using Column Generation
Author: David Belanger, Alexandre Passos, Sebastian Riedel, Andrew McCallum
Abstract: Linear chains and trees are basic building blocks in many applications of graphical models, and they admit simple exact maximum a-posteriori (MAP) inference algorithms based on message passing. However, in many cases this computation is prohibitively expensive, due to quadratic dependence on variables’ domain sizes. The standard algorithms are inefficient because they compute scores for hypotheses for which there is strong negative local evidence. For this reason there has been significant previous interest in beam search and its variants; however, these methods provide only approximate results. This paper presents new exact inference algorithms based on the combination of column generation and pre-computed bounds on terms of the model’s scoring function. While we do not improve worst-case performance, our method substantially speeds real-world, typical-case inference in chains and trees. Experiments show our method to be twice as fast as exact Viterbi for Wall Street Journal part-of-speech tagging and over thirteen times faster for a joint part-of-speed and named-entity-recognition task. Our algorithm is also extendable to new techniques for approximate inference, to faster 0/1 loss oracles, and new opportunities for connections between inference and learning. We encourage further exploration of high-level reasoning about the optimization problem implicit in dynamic programs. 1
same-paper 2 0.9498899 138 nips-2012-Fully Bayesian inference for neural models with negative-binomial spiking
Author: James Scott, Jonathan W. Pillow
Abstract: Characterizing the information carried by neural populations in the brain requires accurate statistical models of neural spike responses. The negative-binomial distribution provides a convenient model for over-dispersed spike counts, that is, responses with greater-than-Poisson variability. Here we describe a powerful data-augmentation framework for fully Bayesian inference in neural models with negative-binomial spiking. Our approach relies on a recently described latentvariable representation of the negative-binomial distribution, which equates it to a Polya-gamma mixture of normals. This framework provides a tractable, conditionally Gaussian representation of the posterior that can be used to design efficient EM and Gibbs sampling based algorithms for inference in regression and dynamic factor models. We apply the model to neural data from primate retina and show that it substantially outperforms Poisson regression on held-out data, and reveals latent structure underlying spike count correlations in simultaneously recorded spike trains. 1
3 0.94884145 314 nips-2012-Slice Normalized Dynamic Markov Logic Networks
Author: Tivadar Papai, Henry Kautz, Daniel Stefankovic
Abstract: Markov logic is a widely used tool in statistical relational learning, which uses a weighted first-order logic knowledge base to specify a Markov random field (MRF) or a conditional random field (CRF). In many applications, a Markov logic network (MLN) is trained in one domain, but used in a different one. This paper focuses on dynamic Markov logic networks, where the size of the discretized time-domain typically varies between training and testing. It has been previously pointed out that the marginal probabilities of truth assignments to ground atoms can change if one extends or reduces the domains of predicates in an MLN. We show that in addition to this problem, the standard way of unrolling a Markov logic theory into a MRF may result in time-inhomogeneity of the underlying Markov chain. Furthermore, even if these representational problems are not significant for a given domain, we show that the more practical problem of generating samples in a sequential conditional random field for the next slice relying on the samples from the previous slice has high computational cost in the general case, due to the need to estimate a normalization factor for each sample. We propose a new discriminative model, slice normalized dynamic Markov logic networks (SN-DMLN), that suffers from none of these issues. It supports efficient online inference, and can directly model influences between variables within a time slice that do not have a causal direction, in contrast with fully directed models (e.g., DBNs). Experimental results show an improvement in accuracy over previous approaches to online inference in dynamic Markov logic networks. 1
4 0.94682628 228 nips-2012-Multilabel Classification using Bayesian Compressed Sensing
Author: Ashish Kapoor, Raajay Viswanathan, Prateek Jain
Abstract: In this paper, we present a Bayesian framework for multilabel classiďŹ cation using compressed sensing. The key idea in compressed sensing for multilabel classiďŹ cation is to ďŹ rst project the label vector to a lower dimensional space using a random transformation and then learn regression functions over these projections. Our approach considers both of these components in a single probabilistic model, thereby jointly optimizing over compression as well as learning tasks. We then derive an efďŹ cient variational inference scheme that provides joint posterior distribution over all the unobserved labels. The two key beneďŹ ts of the model are that a) it can naturally handle datasets that have missing labels and b) it can also measure uncertainty in prediction. The uncertainty estimate provided by the model allows for active learning paradigms where an oracle provides information about labels that promise to be maximally informative for the prediction task. Our experiments show signiďŹ cant boost over prior methods in terms of prediction performance over benchmark datasets, both in the fully labeled and the missing labels case. Finally, we also highlight various useful active learning scenarios that are enabled by the probabilistic model. 1
5 0.91713995 100 nips-2012-Discriminative Learning of Sum-Product Networks
Author: Robert Gens, Pedro Domingos
Abstract: Sum-product networks are a new deep architecture that can perform fast, exact inference on high-treewidth models. Only generative methods for training SPNs have been proposed to date. In this paper, we present the first discriminative training algorithms for SPNs, combining the high accuracy of the former with the representational power and tractability of the latter. We show that the class of tractable discriminative SPNs is broader than the class of tractable generative ones, and propose an efficient backpropagation-style algorithm for computing the gradient of the conditional log likelihood. Standard gradient descent suffers from the diffusion problem, but networks with many layers can be learned reliably using “hard” gradient descent, where marginal inference is replaced by MPE inference (i.e., inferring the most probable state of the non-evidence variables). The resulting updates have a simple and intuitive form. We test discriminative SPNs on standard image classification tasks. We obtain the best results to date on the CIFAR-10 dataset, using fewer features than prior methods with an SPN architecture that learns local image structure discriminatively. We also report the highest published test accuracy on STL-10 even though we only use the labeled portion of the dataset. 1
6 0.90415597 67 nips-2012-Classification Calibration Dimension for General Multiclass Losses
7 0.79211301 121 nips-2012-Expectation Propagation in Gaussian Process Dynamical Systems
8 0.77102154 218 nips-2012-Mixing Properties of Conditional Markov Chains with Unbounded Feature Functions
9 0.73517781 251 nips-2012-On Lifting the Gibbs Sampling Algorithm
10 0.73364079 281 nips-2012-Provable ICA with Unknown Gaussian Noise, with Implications for Gaussian Mixtures and Autoencoders
11 0.72075015 293 nips-2012-Relax and Randomize : From Value to Algorithms
12 0.71659917 200 nips-2012-Local Supervised Learning through Space Partitioning
13 0.71541107 171 nips-2012-Latent Coincidence Analysis: A Hidden Variable Model for Distance Metric Learning
14 0.69742513 207 nips-2012-Mandatory Leaf Node Prediction in Hierarchical Multilabel Classification
15 0.68801671 197 nips-2012-Learning with Recursive Perceptual Representations
16 0.6823771 279 nips-2012-Projection Retrieval for Classification
17 0.68122214 56 nips-2012-Bayesian active learning with localized priors for fast receptive field characterization
18 0.66960299 315 nips-2012-Slice sampling normalized kernel-weighted completely random measure mixture models
19 0.66582215 252 nips-2012-On Multilabel Classification and Ranking with Partial Feedback
20 0.6642099 130 nips-2012-Feature-aware Label Space Dimension Reduction for Multi-label Classification