nips nips2013 nips2013-173 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Fabian Sinz, Anna Stockl, January Grewe, January Benda
Abstract: We present a novel non-parametric method for finding a subspace of stimulus features that contains all information about the response of a system. Our method generalizes similar approaches to this problem such as spike triggered average, spike triggered covariance, or maximally informative dimensions. Instead of maximizing the mutual information between features and responses directly, we use integral probability metrics in kernel Hilbert spaces to minimize the information between uninformative features and the combination of informative features and responses. Since estimators of these metrics access the data via kernels, are easy to compute, and exhibit good theoretical convergence properties, our method can easily be generalized to populations of neurons or spike patterns. By using a particular expansion of the mutual information, we can show that the informative features must contain all information if we can make the uninformative features independent of the rest. 1
Reference: text
sentIndex sentText sentNum sentScore
1 de Abstract We present a novel non-parametric method for finding a subspace of stimulus features that contains all information about the response of a system. [sent-10, score-0.311]
2 Our method generalizes similar approaches to this problem such as spike triggered average, spike triggered covariance, or maximally informative dimensions. [sent-11, score-1.216]
3 Instead of maximizing the mutual information between features and responses directly, we use integral probability metrics in kernel Hilbert spaces to minimize the information between uninformative features and the combination of informative features and responses. [sent-12, score-1.259]
4 Since estimators of these metrics access the data via kernels, are easy to compute, and exhibit good theoretical convergence properties, our method can easily be generalized to populations of neurons or spike patterns. [sent-13, score-0.307]
5 By using a particular expansion of the mutual information, we can show that the informative features must contain all information if we can make the uninformative features independent of the rest. [sent-14, score-0.813]
6 1 Introduction An important aspect of deciphering the neural code is to determine those stimulus features populations of sensory neurons are most sensitive to. [sent-15, score-0.185]
7 Approaches to that problem include white noise analysis [2, 14], in particular spike-triggered average [4] or spike-triggered covariance [3, 19], canonical correlation analysis or population receptive fields [12], generalized linear models [18, 15], or maximally informative dimensions [22]. [sent-16, score-0.679]
8 All these techniques have in common that they optimize a statistical dependency measure between stimuli and spike responses over the choice of a linear subspace. [sent-17, score-0.444]
9 The particular algorithms differ in the dimensionality of the subspace they extract (one- vs. [sent-18, score-0.225]
10 multi-dimensional), the statistical measure they use (correlation, likelihood, relative entropy), and whether an extension to population responses is feasible or not. [sent-19, score-0.164]
11 While spike-triggered average uses correlation and is restricted to a single subspace, spike-triggered covariance and canonical correlation analysis can already extract multi-dimensional subspaces but are still restricted to second-order statistics. [sent-20, score-0.167]
12 Maximally informative dimensions is the only technique of the above that can extract multiple dimensions that are informative also with respect to higher-order statistics. [sent-21, score-0.871]
13 However, an extension to spike patterns or population responses is not straightforward because of the curse of dimensionality. [sent-22, score-0.423]
14 Here we approach the problem from a different perspective and propose an algorithm that can extract a multi-dimensional subspace containing all relevant information about the neural responses Y in terms of Shannon’s mutual information (if such a subspace exists). [sent-23, score-0.688]
15 Our method does not commit to a particular parametric model, and can easily be extended to spike patterns or population responses. [sent-24, score-0.336]
16 1 In general, the problem of finding the most informative subspace of the stimuli X about the responses Y can be described as finding an orthogonal matrix Q (a basis for Rn ) that separates X into informative and non-informative features (U , V ) = QX. [sent-25, score-1.073]
17 Since Q is orthogonal, the mutual information I [X : Y ] between X and Y can be decomposed as [5] I [Y : X] p (U , V , Y ) p (U , V ) p (Y ) p (Y , V | U ) = I [Y : U ] + EY ,V log p (Y | U ) p (V | U ) = I [Y : U ] + EU [I [Y | U : V | U ]] . [sent-26, score-0.248]
18 The first possibility is along the lines of maximally informative dimensions [22] and involves direct estimation of the mutual information. [sent-28, score-0.813]
19 The idea is to obtain maximally informative features U by making V as independent as possible from the combination of U and Y . [sent-31, score-0.558]
20 For this reason, we name our approach least informative dimensions (LID). [sent-32, score-0.419]
21 Formally, least informative dimensions tries to minimize the mutual information between the pair Y , U and V . [sent-33, score-0.667]
22 Since each new choice of Q requires the estimation of the mutual information between (potentially high-dimensional) variables, direct optimization is hard or unfeasible. [sent-38, score-0.285]
23 For this reason, we resort to another dependency measure which is easier to estimate but shares its minimum with mutual information, that is, it is zero if and only if the mutual information is zero. [sent-39, score-0.54]
24 If we can find such a Q, then we know that I [Y , U : V ] is zero as well, which means that U are the most informative features in terms of the Shannon mutual information. [sent-41, score-0.66]
25 This will allow us to obtain maximally informative features without ever having to estimate a mutual information. [sent-42, score-0.806]
26 The easier estimation procedure comes at the cost of only being able to link the alternative dependency measure to the mutual information if both of them are zero. [sent-43, score-0.292]
27 If there is no Q that achieves this, we will still get informative features in the alternative measure, but it is not clear how informative they are in terms of mutual information. [sent-44, score-0.984]
28 2 Least informative dimensions This section describes how to efficiently find a Q such that I [Y , U : V ] = 0 (if such a Q exists). [sent-45, score-0.419]
29 Unless noted otherwise, (U , V ) = QX where U denotes the informative and V the uninformative features. [sent-46, score-0.389]
30 The mutual information is a special case of the relative entropy DKL [p || q] = EX∼p log p (X) log q (X) between two distribution p and q. [sent-47, score-0.346]
31 Alternatives to relative entropy of increasing practical interest are the integral probability metrics (IPM), defined as [25, 17] γF (X : Z) = sup |EX [f (X)] − EZ [f (Z)]| . [sent-49, score-0.189]
32 If F is chosen to be a sufficiently rich reproducing kernel Hilbert space H [21], then the supremum in equation (3) can be computed explicitly and the divergence can be computed in closed form [7]. [sent-52, score-0.238]
33 In that case, the functions k (·, x) are elements of a reproducing kernel Hilbert space (RKHS) of functions H. [sent-58, score-0.203]
34 This means that for characteristic kernels MMD is zero exactly if the relative entropy DKL [p q] is zero as well. [sent-66, score-0.194]
35 Since the mutual information is the relative entropy between the joint distribution and the products of the marginals, we can use MMD to search for a Q such that γH (PY ,U ,V : PY ,U × PV ) is zero1 , which then implies that I [Y , U : V ] = 0 as well. [sent-67, score-0.346]
36 Objective function The objective function for our optimization problem now has the following form: We transform input examples xi into features ui and v i via (ui , v i ) = Qxi . [sent-73, score-0.322]
37 Then we use a kernel k (ui , v i , y i ) , uj , v j , y j to compute and minimize MMD with respect to the choice of Q. [sent-74, score-0.24]
38 Second, in order to get samples from PY ,U × PV , we assume that our kernel takes the form k (ui , v i , y i ) , uj , v j , y j = k1 (ui , y i ) , uj , y j · k2 (v i , v j ). [sent-77, score-0.331]
39 We can also use shuffling to assess whether the optimal value γhs found during the optimization is significantly different from zero by comparing ˆ2 the value to a null distribution over γhs obtained from datasets where the (ui , y i ) and v i have been ˆ2 permuted across examples. [sent-82, score-0.262]
40 The optimization can be carried out by computing the unconstrained gradient Q γ of the objective function with respect to Q (treating Q as an ordinary matrix), projecting that gradient onto the tangent space of SO (n), and performing a line search along the gradient direction. [sent-84, score-0.197]
41 Efficient implementation with incomplete Cholesky decomposition of the kernel matrix So far, the evaluation of HSIC requires the computation of two m × m kernel matrices in each step. [sent-95, score-0.298]
42 Also in this case, the matrix J can be computed efficiently in terms of derivatives of sub-matrices of the kernel matrix (see supplementary material for the exact formulae). [sent-99, score-0.186]
43 3 Related work Kernel dimension reduction in regression [5, 6] Fukumizu and colleagues find maximally informative features U by minimizing EU [I [V | U : Y | U ]] in equation (1) via conditional kernel 4 covariance operators. [sent-100, score-0.849]
44 However, this will not make a difference in many practical cases, since many stimulus distributions are Gaussian for which the dependencies between U and V can be removed by prewhitening the stimulus data before training LID. [sent-105, score-0.176]
45 In that case I [U : V ] = 0 for every choice of Q and equation (2) becomes equivalent to maximizing the mutual information between U and Y . [sent-106, score-0.314]
46 The advantage of our formulation of the problem is that it allows us to detect and quantify independence by comparing the current γhs to its null distribution obtained by shuffling the (y i , ui ) against v i ˆ across examples. [sent-107, score-0.496]
47 However, a residual redundancy remains which would show up when comparing γhs to its null distribution. [sent-112, score-0.266]
48 Finally, the use of kernel covariance operators is bound to ˆ2 kernels that factorize. [sent-113, score-0.254]
49 Maximally informative dimensions [22] Sharpee and colleagues maximize the relative entropy Ispike = DKL p v s|spike || p v s between the distribution of stimuli projected onto informative dimensions given a spike, to the marginal distribution of the projection. [sent-115, score-1.063]
50 This relative entropy is the part of the mutual information which is carried by the arrival of a single spike, since I v s : {spike, no spike} = p (spike) · Ispike + p (no spike) Ino spike . [sent-116, score-0.6]
51 However, by focusing on single spikes and the spike triggered density only, it neglects the dependencies between spikes and the information carried by the silence of the neuron [28]. [sent-118, score-0.579]
52 Additionally, the generalization to spike patterns or population responses is non-trivial because the information between the projected stimuli and spike patterns 1 , . [sent-119, score-0.775]
53 We use an RBF kernel on the v i and a tensor RBF kernel on the (ui , y i ): k (v i , v j ) = exp − vi − vj σ2 2 and k (ui , y i ) , uj , y j = exp − ui y i − uj y j σ2 2 . [sent-125, score-0.706]
54 The offset was chosen such that approximately 35% non-zero spike counts in the yi were obtained. [sent-131, score-0.254]
55 We used one informative and 19 non-informative dimensions, and set σ = 1 for the tensor kernel. [sent-132, score-0.353]
56 We compared the HSIC values γhs {(y i , ui )}i=1,. [sent-134, score-0.197]
57 ,m before and after the optimization to their ˆ null distribution obtained by shuffling. [sent-140, score-0.233]
58 The informative dimension (gray during optimization, black after optimization) converges to the true filter of an LNP model (blue line). [sent-142, score-0.324]
59 Before optimization (Y , U ) and V are dependent as shown by the left inset (null distribution obtained via shuffling in gray, dashed line shows actual HSIC value). [sent-143, score-0.197]
60 After the optimization (right inset) the HSIC value is even below the null distribution. [sent-144, score-0.233]
61 LID correctly identifies the subspace (blue dashed) in which the two true filters (solid black) reside since projections of the filters on the subspace (red dashed) closely resemble the original filters. [sent-146, score-0.354]
62 After convergence the actual HSIC value lies left to the null distribution’s domain. [sent-148, score-0.227]
63 Since the appropriate test for independence would be one-sided, the null hypothesis “(Y , U ) is independent of V ” would not be rejected in this case. [sent-149, score-0.233]
64 Two state neuron In this experiment, we simulated a neuron with two states that were both attained in 50% of the trials (see Figure 1, right). [sent-150, score-0.164]
65 In the first—steady rate—state, the four bins contained spike counts drawn from an LNP neuron with exponentially decaying filter as above. [sent-152, score-0.384]
66 We use two informative dimensions and set σ of the tensor kernel to two times the median of the pairwise distances. [sent-156, score-0.597]
67 LID correctly identified the subspace associated with the two filters also in this case (Figure 1, right). [sent-157, score-0.194]
68 When using two informative subspaces, LID was able to identify the subspace correctly (Figure 2, left). [sent-163, score-0.518]
69 When comparing the HSIC value against the null distribution found via shuffling, the final value indicated no further dependencies. [sent-164, score-0.225]
70 Importantly, the HSIC value after optimization was clearly outside the support of the null distribution, thereby correctly indicating residual dependencies. [sent-166, score-0.308]
71 P-Unit recordings from weakly electric fish Finally, we applied our method to P-unit recordings from the weakly electric fish Eigenmannia virescens. [sent-167, score-0.374]
72 These weakly electric fish generate a dipolelike electric field which changes polarity with a frequency at about 300Hz. [sent-168, score-0.242]
73 We selected m = 8400 random time points in the spike response and the corresponding preceding 20ms of the input (20 dimensions). [sent-173, score-0.254]
74 After optimization, the two informative dimensions of LID (first two rows of Q) converge to that subspace and also form a pair of 90° phase shifted filters (note that even if the filters are not the same, they span the same subspace). [sent-176, score-0.579]
75 Comparing the HSIC values before and after optimization shows that this subspace contains the relevant information (left and right inset). [sent-177, score-0.197]
76 Right: If only a one-dimensional informative subspace is used, the filter only slightly converges to the subspace. [sent-178, score-0.484]
77 After optimization, a comparison of the HSIC value to the null distribution obtained via shuffling indicates residual dependencies which are not explained by the one-dimensional subspace (left and right inset). [sent-179, score-0.447]
78 Figure 3: Most informative feature for a weakly electric fish P-Unit: A random filter (blue trace) exhibits HSIC values that are clearly outside the domain of the null distribution (left inset). [sent-180, score-0.662]
79 Using the spike triggered average (red trace) moves the HSIC values of the first feature of Q already inside the null distribution (middle inset). [sent-181, score-0.569]
80 After optimization, the informative feature U is independent of the features V because the first row and column of the covariance matrix of the transformed Gaussian input show no correlations. [sent-183, score-0.451]
81 The fact that one informative feature is sufficient to bring the HSIC values inside the null distribution indicates that a single subspace captures all information conveyed by these sensory neurons. [sent-184, score-0.714]
82 We initialized the first row in Q with the normalized spike triggered average (STA; Figure 3, left, red trace). [sent-186, score-0.373]
83 Unlike a random feature (Figure 3, left, blue trace), the spike triggered average already achieves HSIC values within the null distribution (Figure 3, left and middle inset). [sent-188, score-0.6]
84 The most informative feature corresponding to U looks very similar to the STA but shifts the HSIC value deeper into the domain of the null distribution (Figure 3, right inset). [sent-189, score-0.52]
85 5 Discussion Here we presented a non-parametric method to estimate a subspace of the stimulus space that contains all information about a response variable Y . [sent-191, score-0.223]
86 The advantage of the generic approach is that Y can in principle be anything from spike counts, to spike patterns or population responses. [sent-193, score-0.59]
87 Since our method finds the most informative dimensions by making the complement of those dimensions as independent from the data as possible, we termed it least informative dimensions (LID). [sent-194, score-0.933]
88 We use the Hilbert-Schmidt independence criterion to minimize the dependencies between the uninformative features and the combination of informative features and outputs. [sent-195, score-0.652]
89 This measure is easy to implement, avoids the need to estimate mutual information, and its estimator has good convergence properties independent of the dimensionality of the data. [sent-196, score-0.28]
90 Even though our approach only estimates the informative features and not mutual information itself, it can help to estimate mutual information by reducing the number of dimensions. [sent-197, score-0.908]
91 In that situation, the price to pay for an easier measure is that it is hard to make definite statements about the informativeness of the features U in terms of the Shannon information, since γH = I [Y , U : V ] = 0 is the point that connects γH to the mutual information. [sent-199, score-0.383]
92 As demonstrated in the experiments, we can detect this case by comparing the actual value of γH to an ˆ empirical null distribution of γH values obtained by shuffling the v i against the ui , y i pairs. [sent-200, score-0.459]
93 Howˆ ever, if γH = 0, theoretical upper bounds on the mutual information are unfortunately not available. [sent-201, score-0.248]
94 2 In fact, using results from [25] and Pinsker’s inequality one can show that γH bounds the mutual information from below. [sent-202, score-0.248]
95 One might now be tempted to think that maximizing γH [Y , U ] might be a better way to find informative features. [sent-203, score-0.355]
96 While this might be a way to get some informative features [24], it is not possible to link the features to informativeness in terms of Shannon mutual information, because the point that builds the bridge between the two dependency measures is where both of them are zero. [sent-204, score-0.839]
97 Anywhere else the bound may not be tight so the maximally informative features in terms of γH and in terms of mutual information can be different. [sent-205, score-0.806]
98 For instance, while the subspaces of the LNP or the two state neuron were detected reliably, the two dimensional subspace of the artificial complex cell seems to pose a harder problem. [sent-208, score-0.273]
99 Beyond that, however, integral probability metric approaches to maximally informative dimensions offer a great chance to avoid many problems associated with direct estimation of mutual information, and to extend it to much more interesting output structures than single spikes. [sent-212, score-0.851]
100 Analyzing neural responses to natural signals: maximally informative dimensions. [sent-363, score-0.557]
wordName wordTfidf (topN-words)
[('informative', 0.324), ('hsic', 0.29), ('spike', 0.254), ('mutual', 0.248), ('ui', 0.197), ('null', 0.196), ('mmd', 0.186), ('lid', 0.186), ('hs', 0.182), ('subspace', 0.16), ('kernel', 0.149), ('fukumizu', 0.147), ('maximally', 0.146), ('lnp', 0.143), ('shuf', 0.132), ('inset', 0.129), ('gretton', 0.122), ('triggered', 0.119), ('electric', 0.1), ('sh', 0.097), ('dimensions', 0.095), ('uj', 0.091), ('features', 0.088), ('responses', 0.087), ('neuron', 0.082), ('py', 0.079), ('lters', 0.078), ('ipm', 0.07), ('neuroethology', 0.07), ('shannon', 0.069), ('ing', 0.069), ('colleagues', 0.068), ('sch', 0.068), ('hilbert', 0.067), ('kernels', 0.066), ('uninformative', 0.065), ('entropy', 0.064), ('stimulus', 0.063), ('ez', 0.063), ('ex', 0.06), ('stimuli', 0.059), ('pv', 0.059), ('sta', 0.057), ('bingen', 0.057), ('lter', 0.056), ('reproducing', 0.054), ('formulae', 0.054), ('eberhard', 0.054), ('karls', 0.054), ('metrics', 0.053), ('lkopf', 0.052), ('dependencies', 0.05), ('trace', 0.05), ('bins', 0.048), ('sriperumbudur', 0.047), ('informativeness', 0.047), ('ispike', 0.047), ('poisson', 0.045), ('recordings', 0.045), ('dependency', 0.044), ('population', 0.043), ('dkl', 0.043), ('weakly', 0.042), ('cholesky', 0.042), ('sinz', 0.041), ('sharpee', 0.041), ('residual', 0.041), ('eu', 0.04), ('patterns', 0.039), ('covariance', 0.039), ('fabian', 0.038), ('integral', 0.038), ('optimization', 0.037), ('spikes', 0.037), ('detect', 0.037), ('derivatives', 0.037), ('independence', 0.037), ('equation', 0.035), ('relative', 0.034), ('pz', 0.034), ('correctly', 0.034), ('gradient', 0.034), ('sensory', 0.034), ('extract', 0.033), ('correlation', 0.032), ('dimensionality', 0.032), ('injective', 0.031), ('rasch', 0.031), ('left', 0.031), ('maximizing', 0.031), ('subspaces', 0.031), ('orthogonal', 0.031), ('characteristic', 0.03), ('smola', 0.03), ('borgwardt', 0.029), ('tangent', 0.029), ('unconstrained', 0.029), ('tensor', 0.029), ('comparing', 0.029), ('hd', 0.029)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000004 173 nips-2013-Least Informative Dimensions
Author: Fabian Sinz, Anna Stockl, January Grewe, January Benda
Abstract: We present a novel non-parametric method for finding a subspace of stimulus features that contains all information about the response of a system. Our method generalizes similar approaches to this problem such as spike triggered average, spike triggered covariance, or maximally informative dimensions. Instead of maximizing the mutual information between features and responses directly, we use integral probability metrics in kernel Hilbert spaces to minimize the information between uninformative features and the combination of informative features and responses. Since estimators of these metrics access the data via kernels, are easy to compute, and exhibit good theoretical convergence properties, our method can easily be generalized to populations of neurons or spike patterns. By using a particular expansion of the mutual information, we can show that the informative features must contain all information if we can make the uninformative features independent of the rest. 1
2 0.21857095 44 nips-2013-B-test: A Non-parametric, Low Variance Kernel Two-sample Test
Author: Wojciech Zaremba, Arthur Gretton, Matthew Blaschko
Abstract: A family of maximum mean discrepancy (MMD) kernel two-sample tests is introduced. Members of the test family are called Block-tests or B-tests, since the test statistic is an average over MMDs computed on subsets of the samples. The choice of block size allows control over the tradeoff between test power and computation time. In this respect, the B-test family combines favorable properties of previously proposed MMD two-sample tests: B-tests are more powerful than a linear time test where blocks are just pairs of samples, yet they are more computationally efficient than a quadratic time test where a single large block incorporating all the samples is used to compute a U-statistic. A further important advantage of the B-tests is their asymptotically Normal null distribution: this is by contrast with the U-statistic, which is degenerate under the null hypothesis, and for which estimates of the null distribution are computationally demanding. Recent results on kernel selection for hypothesis testing transfer seamlessly to the B-tests, yielding a means to optimize test power via kernel choice. 1
3 0.20990098 310 nips-2013-Statistical analysis of coupled time series with Kernel Cross-Spectral Density operators.
Author: Michel Besserve, Nikos K. Logothetis, Bernhard Schölkopf
Abstract: Many applications require the analysis of complex interactions between time series. These interactions can be non-linear and involve vector valued as well as complex data structures such as graphs or strings. Here we provide a general framework for the statistical analysis of these dependencies when random variables are sampled from stationary time-series of arbitrary objects. To achieve this goal, we study the properties of the Kernel Cross-Spectral Density (KCSD) operator induced by positive definite kernels on arbitrary input domains. This framework enables us to develop an independence test between time series, as well as a similarity measure to compare different types of coupling. The performance of our test is compared to the HSIC test using i.i.d. assumptions, showing improvements in terms of detection errors, as well as the suitability of this approach for testing dependency in complex dynamical systems. This similarity measure enables us to identify different types of interactions in electrophysiological neural time series. 1
4 0.1941435 262 nips-2013-Real-Time Inference for a Gamma Process Model of Neural Spiking
Author: David Carlson, Vinayak Rao, Joshua T. Vogelstein, Lawrence Carin
Abstract: With simultaneous measurements from ever increasing populations of neurons, there is a growing need for sophisticated tools to recover signals from individual neurons. In electrophysiology experiments, this classically proceeds in a two-step process: (i) threshold the waveforms to detect putative spikes and (ii) cluster the waveforms into single units (neurons). We extend previous Bayesian nonparametric models of neural spiking to jointly detect and cluster neurons using a Gamma process model. Importantly, we develop an online approximate inference scheme enabling real-time analysis, with performance exceeding the previous state-of-theart. Via exploratory data analysis—using data with partial ground truth as well as two novel data sets—we find several features of our model collectively contribute to our improved performance including: (i) accounting for colored noise, (ii) detecting overlapping spikes, (iii) tracking waveform dynamics, and (iv) using multiple channels. We hope to enable novel experiments simultaneously measuring many thousands of neurons and possibly adapting stimuli dynamically to probe ever deeper into the mysteries of the brain. 1
5 0.17990135 51 nips-2013-Bayesian entropy estimation for binary spike train data using parametric prior knowledge
Author: Evan W. Archer, Il M. Park, Jonathan W. Pillow
Abstract: Shannon’s entropy is a basic quantity in information theory, and a fundamental building block for the analysis of neural codes. Estimating the entropy of a discrete distribution from samples is an important and difficult problem that has received considerable attention in statistics and theoretical neuroscience. However, neural responses have characteristic statistical structure that generic entropy estimators fail to exploit. For example, existing Bayesian entropy estimators make the naive assumption that all spike words are equally likely a priori, which makes for an inefficient allocation of prior probability mass in cases where spikes are sparse. Here we develop Bayesian estimators for the entropy of binary spike trains using priors designed to flexibly exploit the statistical structure of simultaneouslyrecorded spike responses. We define two prior distributions over spike words using mixtures of Dirichlet distributions centered on simple parametric models. The parametric model captures high-level statistical features of the data, such as the average spike count in a spike word, which allows the posterior over entropy to concentrate more rapidly than with standard estimators (e.g., in cases where the probability of spiking differs strongly from 0.5). Conversely, the Dirichlet distributions assign prior mass to distributions far from the parametric model, ensuring consistent estimates for arbitrary distributions. We devise a compact representation of the data and prior that allow for computationally efficient implementations of Bayesian least squares and empirical Bayes entropy estimators with large numbers of neurons. We apply these estimators to simulated and real neural data and show that they substantially outperform traditional methods.
6 0.16836326 9 nips-2013-A Kernel Test for Three-Variable Interactions
7 0.15679045 305 nips-2013-Spectral methods for neural characterization using generalized quadratic models
8 0.14042293 286 nips-2013-Robust learning of low-dimensional dynamics from large neural ensembles
9 0.13198471 144 nips-2013-Inverse Density as an Inverse Problem: the Fredholm Equation Approach
10 0.12755914 224 nips-2013-On the Sample Complexity of Subspace Learning
11 0.12736295 236 nips-2013-Optimal Neural Population Codes for High-dimensional Stimulus Variables
12 0.1271726 281 nips-2013-Robust Low Rank Kernel Embeddings of Multivariate Distributions
13 0.11997119 6 nips-2013-A Determinantal Point Process Latent Variable Model for Inhibition in Neural Spiking Data
14 0.11748625 88 nips-2013-Designed Measurements for Vector Count Data
15 0.1172552 137 nips-2013-High-Dimensional Gaussian Process Bandits
16 0.11225586 246 nips-2013-Perfect Associative Learning with Spike-Timing-Dependent Plasticity
17 0.096042924 306 nips-2013-Speeding up Permutation Testing in Neuroimaging
18 0.093390644 116 nips-2013-Fantope Projection and Selection: A near-optimal convex relaxation of sparse PCA
19 0.093119107 156 nips-2013-Learning Kernels Using Local Rademacher Complexity
20 0.088981837 179 nips-2013-Low-Rank Matrix and Tensor Completion via Adaptive Sampling
topicId topicWeight
[(0, 0.221), (1, 0.121), (2, -0.01), (3, 0.026), (4, -0.219), (5, -0.005), (6, -0.007), (7, -0.025), (8, -0.096), (9, 0.061), (10, -0.096), (11, -0.014), (12, -0.113), (13, -0.109), (14, 0.267), (15, 0.103), (16, 0.064), (17, 0.131), (18, -0.166), (19, -0.073), (20, -0.062), (21, 0.013), (22, -0.045), (23, -0.137), (24, 0.024), (25, -0.028), (26, 0.035), (27, 0.023), (28, 0.101), (29, 0.005), (30, 0.113), (31, 0.041), (32, 0.069), (33, -0.083), (34, 0.0), (35, 0.045), (36, -0.049), (37, 0.096), (38, -0.027), (39, 0.006), (40, 0.06), (41, 0.077), (42, -0.053), (43, 0.003), (44, 0.046), (45, 0.066), (46, 0.036), (47, 0.05), (48, -0.056), (49, -0.041)]
simIndex simValue paperId paperTitle
same-paper 1 0.95591062 173 nips-2013-Least Informative Dimensions
Author: Fabian Sinz, Anna Stockl, January Grewe, January Benda
Abstract: We present a novel non-parametric method for finding a subspace of stimulus features that contains all information about the response of a system. Our method generalizes similar approaches to this problem such as spike triggered average, spike triggered covariance, or maximally informative dimensions. Instead of maximizing the mutual information between features and responses directly, we use integral probability metrics in kernel Hilbert spaces to minimize the information between uninformative features and the combination of informative features and responses. Since estimators of these metrics access the data via kernels, are easy to compute, and exhibit good theoretical convergence properties, our method can easily be generalized to populations of neurons or spike patterns. By using a particular expansion of the mutual information, we can show that the informative features must contain all information if we can make the uninformative features independent of the rest. 1
2 0.72368139 310 nips-2013-Statistical analysis of coupled time series with Kernel Cross-Spectral Density operators.
Author: Michel Besserve, Nikos K. Logothetis, Bernhard Schölkopf
Abstract: Many applications require the analysis of complex interactions between time series. These interactions can be non-linear and involve vector valued as well as complex data structures such as graphs or strings. Here we provide a general framework for the statistical analysis of these dependencies when random variables are sampled from stationary time-series of arbitrary objects. To achieve this goal, we study the properties of the Kernel Cross-Spectral Density (KCSD) operator induced by positive definite kernels on arbitrary input domains. This framework enables us to develop an independence test between time series, as well as a similarity measure to compare different types of coupling. The performance of our test is compared to the HSIC test using i.i.d. assumptions, showing improvements in terms of detection errors, as well as the suitability of this approach for testing dependency in complex dynamical systems. This similarity measure enables us to identify different types of interactions in electrophysiological neural time series. 1
3 0.71719354 9 nips-2013-A Kernel Test for Three-Variable Interactions
Author: Dino Sejdinovic, Arthur Gretton, Wicher Bergsma
Abstract: We introduce kernel nonparametric tests for Lancaster three-variable interaction and for total independence, using embeddings of signed measures into a reproducing kernel Hilbert space. The resulting test statistics are straightforward to compute, and are used in powerful interaction tests, which are consistent against all alternatives for a large family of reproducing kernels. We show the Lancaster test to be sensitive to cases where two independent causes individually have weak influence on a third dependent variable, but their combined effect has a strong influence. This makes the Lancaster test especially suited to finding structure in directed graphical models, where it outperforms competing nonparametric tests in detecting such V-structures.
4 0.67706698 44 nips-2013-B-test: A Non-parametric, Low Variance Kernel Two-sample Test
Author: Wojciech Zaremba, Arthur Gretton, Matthew Blaschko
Abstract: A family of maximum mean discrepancy (MMD) kernel two-sample tests is introduced. Members of the test family are called Block-tests or B-tests, since the test statistic is an average over MMDs computed on subsets of the samples. The choice of block size allows control over the tradeoff between test power and computation time. In this respect, the B-test family combines favorable properties of previously proposed MMD two-sample tests: B-tests are more powerful than a linear time test where blocks are just pairs of samples, yet they are more computationally efficient than a quadratic time test where a single large block incorporating all the samples is used to compute a U-statistic. A further important advantage of the B-tests is their asymptotically Normal null distribution: this is by contrast with the U-statistic, which is degenerate under the null hypothesis, and for which estimates of the null distribution are computationally demanding. Recent results on kernel selection for hypothesis testing transfer seamlessly to the B-tests, yielding a means to optimize test power via kernel choice. 1
5 0.64890027 305 nips-2013-Spectral methods for neural characterization using generalized quadratic models
Author: Il M. Park, Evan W. Archer, Nicholas Priebe, Jonathan W. Pillow
Abstract: We describe a set of fast, tractable methods for characterizing neural responses to high-dimensional sensory stimuli using a model we refer to as the generalized quadratic model (GQM). The GQM consists of a low-rank quadratic function followed by a point nonlinearity and exponential-family noise. The quadratic function characterizes the neuron’s stimulus selectivity in terms of a set linear receptive fields followed by a quadratic combination rule, and the invertible nonlinearity maps this output to the desired response range. Special cases of the GQM include the 2nd-order Volterra model [1, 2] and the elliptical Linear-Nonlinear-Poisson model [3]. Here we show that for “canonical form” GQMs, spectral decomposition of the first two response-weighted moments yields approximate maximumlikelihood estimators via a quantity called the expected log-likelihood. The resulting theory generalizes moment-based estimators such as the spike-triggered covariance, and, in the Gaussian noise case, provides closed-form estimators under a large class of non-Gaussian stimulus distributions. We show that these estimators are fast and provide highly accurate estimates with far lower computational cost than full maximum likelihood. Moreover, the GQM provides a natural framework for combining multi-dimensional stimulus sensitivity and spike-history dependencies within a single model. We show applications to both analog and spiking data using intracellular recordings of V1 membrane potential and extracellular recordings of retinal spike trains. 1
6 0.56830627 205 nips-2013-Multisensory Encoding, Decoding, and Identification
7 0.55711335 51 nips-2013-Bayesian entropy estimation for binary spike train data using parametric prior knowledge
8 0.54656553 156 nips-2013-Learning Kernels Using Local Rademacher Complexity
9 0.54065174 327 nips-2013-The Randomized Dependence Coefficient
10 0.5355978 281 nips-2013-Robust Low Rank Kernel Embeddings of Multivariate Distributions
11 0.52671713 262 nips-2013-Real-Time Inference for a Gamma Process Model of Neural Spiking
12 0.52547503 144 nips-2013-Inverse Density as an Inverse Problem: the Fredholm Equation Approach
13 0.50923133 224 nips-2013-On the Sample Complexity of Subspace Learning
14 0.47012523 170 nips-2013-Learning with Invariance via Linear Functionals on Reproducing Kernel Hilbert Space
15 0.46096244 308 nips-2013-Spike train entropy-rate estimation using hierarchical Dirichlet process priors
16 0.44995636 236 nips-2013-Optimal Neural Population Codes for High-dimensional Stimulus Variables
17 0.4432286 341 nips-2013-Universal models for binary spike patterns using centered Dirichlet processes
18 0.43325627 6 nips-2013-A Determinantal Point Process Latent Variable Model for Inhibition in Neural Spiking Data
19 0.42623556 286 nips-2013-Robust learning of low-dimensional dynamics from large neural ensembles
20 0.42106804 88 nips-2013-Designed Measurements for Vector Count Data
topicId topicWeight
[(2, 0.054), (16, 0.035), (19, 0.048), (33, 0.142), (34, 0.145), (36, 0.023), (41, 0.039), (49, 0.056), (56, 0.083), (60, 0.11), (70, 0.037), (85, 0.044), (89, 0.07), (93, 0.031), (95, 0.015)]
simIndex simValue paperId paperTitle
1 0.90378487 289 nips-2013-Scalable kernels for graphs with continuous attributes
Author: Aasa Feragen, Niklas Kasenburg, Jens Petersen, Marleen de Bruijne, Karsten Borgwardt
Abstract: While graphs with continuous node attributes arise in many applications, stateof-the-art graph kernels for comparing continuous-attributed graphs suffer from a high runtime complexity. For instance, the popular shortest path kernel scales as O(n4 ), where n is the number of nodes. In this paper, we present a class of graph kernels with computational complexity O(n2 (m + log n + δ 2 + d)), where δ is the graph diameter, m is the number of edges, and d is the dimension of the node attributes. Due to the sparsity and small diameter of real-world graphs, these kernels typically scale comfortably to large graphs. In our experiments, the presented kernels outperform state-of-the-art kernels in terms of speed and accuracy on classification benchmark datasets. 1
same-paper 2 0.90188533 173 nips-2013-Least Informative Dimensions
Author: Fabian Sinz, Anna Stockl, January Grewe, January Benda
Abstract: We present a novel non-parametric method for finding a subspace of stimulus features that contains all information about the response of a system. Our method generalizes similar approaches to this problem such as spike triggered average, spike triggered covariance, or maximally informative dimensions. Instead of maximizing the mutual information between features and responses directly, we use integral probability metrics in kernel Hilbert spaces to minimize the information between uninformative features and the combination of informative features and responses. Since estimators of these metrics access the data via kernels, are easy to compute, and exhibit good theoretical convergence properties, our method can easily be generalized to populations of neurons or spike patterns. By using a particular expansion of the mutual information, we can show that the informative features must contain all information if we can make the uninformative features independent of the rest. 1
3 0.89204371 287 nips-2013-Scalable Inference for Logistic-Normal Topic Models
Author: Jianfei Chen, June Zhu, Zi Wang, Xun Zheng, Bo Zhang
Abstract: Logistic-normal topic models can effectively discover correlation structures among latent topics. However, their inference remains a challenge because of the non-conjugacy between the logistic-normal prior and multinomial topic mixing proportions. Existing algorithms either make restricting mean-field assumptions or are not scalable to large-scale applications. This paper presents a partially collapsed Gibbs sampling algorithm that approaches the provably correct distribution by exploring the ideas of data augmentation. To improve time efficiency, we further present a parallel implementation that can deal with large-scale applications and learn the correlation structures of thousands of topics from millions of documents. Extensive empirical results demonstrate the promise. 1
4 0.86634082 310 nips-2013-Statistical analysis of coupled time series with Kernel Cross-Spectral Density operators.
Author: Michel Besserve, Nikos K. Logothetis, Bernhard Schölkopf
Abstract: Many applications require the analysis of complex interactions between time series. These interactions can be non-linear and involve vector valued as well as complex data structures such as graphs or strings. Here we provide a general framework for the statistical analysis of these dependencies when random variables are sampled from stationary time-series of arbitrary objects. To achieve this goal, we study the properties of the Kernel Cross-Spectral Density (KCSD) operator induced by positive definite kernels on arbitrary input domains. This framework enables us to develop an independence test between time series, as well as a similarity measure to compare different types of coupling. The performance of our test is compared to the HSIC test using i.i.d. assumptions, showing improvements in terms of detection errors, as well as the suitability of this approach for testing dependency in complex dynamical systems. This similarity measure enables us to identify different types of interactions in electrophysiological neural time series. 1
5 0.85352981 201 nips-2013-Multi-Task Bayesian Optimization
Author: Kevin Swersky, Jasper Snoek, Ryan P. Adams
Abstract: Bayesian optimization has recently been proposed as a framework for automatically tuning the hyperparameters of machine learning models and has been shown to yield state-of-the-art performance with impressive ease and efficiency. In this paper, we explore whether it is possible to transfer the knowledge gained from previous optimizations to new tasks in order to find optimal hyperparameter settings more efficiently. Our approach is based on extending multi-task Gaussian processes to the framework of Bayesian optimization. We show that this method significantly speeds up the optimization process when compared to the standard single-task approach. We further propose a straightforward extension of our algorithm in order to jointly minimize the average error across multiple tasks and demonstrate how this can be used to greatly speed up k-fold cross-validation. Lastly, we propose an adaptation of a recently developed acquisition function, entropy search, to the cost-sensitive, multi-task setting. We demonstrate the utility of this new acquisition function by leveraging a small dataset to explore hyperparameter settings for a large dataset. Our algorithm dynamically chooses which dataset to query in order to yield the most information per unit cost. 1
6 0.8527261 49 nips-2013-Bayesian Inference and Online Experimental Design for Mapping Neural Microcircuits
7 0.85265988 262 nips-2013-Real-Time Inference for a Gamma Process Model of Neural Spiking
8 0.85012031 123 nips-2013-Flexible sampling of discrete data correlations without the marginal distributions
9 0.84779817 228 nips-2013-Online Learning of Dynamic Parameters in Social Networks
11 0.84441543 286 nips-2013-Robust learning of low-dimensional dynamics from large neural ensembles
12 0.8432346 308 nips-2013-Spike train entropy-rate estimation using hierarchical Dirichlet process priors
13 0.8401835 182 nips-2013-Manifold-based Similarity Adaptation for Label Propagation
14 0.83998698 350 nips-2013-Wavelets on Graphs via Deep Learning
15 0.83917421 51 nips-2013-Bayesian entropy estimation for binary spike train data using parametric prior knowledge
16 0.83777392 53 nips-2013-Bayesian inference for low rank spatiotemporal neural receptive fields
17 0.83502245 86 nips-2013-Demixing odors - fast inference in olfaction
18 0.83408219 119 nips-2013-Fast Template Evaluation with Vector Quantization
19 0.83403373 294 nips-2013-Similarity Component Analysis
20 0.83364606 236 nips-2013-Optimal Neural Population Codes for High-dimensional Stimulus Variables