nips nips2011 nips2011-173 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Yee W. Teh, Charles Blundell, Lloyd Elliott
Abstract: We propose a novel class of Bayesian nonparametric models for sequential data called fragmentation-coagulation processes (FCPs). FCPs model a set of sequences using a partition-valued Markov process which evolves by splitting and merging clusters. An FCP is exchangeable, projective, stationary and reversible, and its equilibrium distributions are given by the Chinese restaurant process. As opposed to hidden Markov models, FCPs allow for flexible modelling of the number of clusters, and they avoid label switching non-identifiability problems. We develop an efficient Gibbs sampler for FCPs which uses uniformization and the forward-backward algorithm. Our development of FCPs is motivated by applications in population genetics, and we demonstrate the utility of FCPs on problems of genotype imputation with phased and unphased SNP data. 1
Reference: text
sentIndex sentText sentNum sentScore
1 uk Abstract We propose a novel class of Bayesian nonparametric models for sequential data called fragmentation-coagulation processes (FCPs). [sent-6, score-0.083]
2 FCPs model a set of sequences using a partition-valued Markov process which evolves by splitting and merging clusters. [sent-7, score-0.222]
3 An FCP is exchangeable, projective, stationary and reversible, and its equilibrium distributions are given by the Chinese restaurant process. [sent-8, score-0.158]
4 As opposed to hidden Markov models, FCPs allow for flexible modelling of the number of clusters, and they avoid label switching non-identifiability problems. [sent-9, score-0.179]
5 Our development of FCPs is motivated by applications in population genetics, and we demonstrate the utility of FCPs on problems of genotype imputation with phased and unphased SNP data. [sent-11, score-0.025]
6 1 Introduction We are interested in probablistic models for sequences arising from the study of genetic variations in a population of organisms (particularly humans). [sent-12, score-0.263]
7 The most commonly studied class of genetic variations in humans are single nucleotide polymorphisms (SNPs), with large quantities of data now available (e. [sent-13, score-0.256]
8 SNPs play an important role in our understanding of genetic processes, human historical migratory patterns, and in genome-wide association studies for discovering the genetic basis of diseases, which in turn are useful in clinical settings for diagnoses and treatment recommendations. [sent-16, score-0.319]
9 A SNP is a specific location in the genome where a mutation has occurred to a single nucleotide at some time during the evolutionary history of a species. [sent-17, score-0.109]
10 Because the rate of such mutations is low in human populations the chances of two mutations occurring in the same location is small and so most SNPs have only two variants (wild type and mutant) in the population. [sent-18, score-0.173]
11 The SNP variants on a chromosome of an individual form a sequence, called a haplotype, with each entry being binary valued coding for the two possible variants at that SNP. [sent-19, score-0.081]
12 Due to the effects of gene conversion and recombination, the haplotypes of a set of individuals often has a “mosaic” structure where contiguous subsequences recur across multiple individuals [3]. [sent-20, score-0.21]
13 Hidden Markov Models (HMMs) [4] are often used as the basis of existing models of genetic variations that exploit this mosaic structure (e. [sent-21, score-0.332]
14 However, HMMs, as dynamic generalisations of finite mixture models, cannot flexibly model the number of states needed for a particular dataset, and suffer from the same label switching non-identifiability problems of finite mixture models [6] (see Section 3. [sent-24, score-0.201]
15 While nonparametric generalisations of HMMs [7, 8, 9] allow for flexible modelling of the number of states, they still suffer from label switching problems. [sent-26, score-0.286]
16 In this paper we propose alternative Bayesian nonparametric models for genetic variations called fragmentation-coagulation processes (FCPs). [sent-27, score-0.281]
17 An FCP defines a Markov process on the space of partitions of haplotypes, such that the random partition at each time is marginally a Chinese restaurant 1 process (CRP). [sent-28, score-0.332]
18 The clusters of the FCP are used in the place of HMM states. [sent-29, score-0.112]
19 FCPs do not require the number of clusters in each partition to be specified, and do not have explicit labels for clusters thus avoid label switching problems. [sent-30, score-0.417]
20 The partitions of FCPs evolve via a series of events, each of which involves either two clusters merging into one, or one cluster splitting into two. [sent-31, score-0.39]
21 We will see that FCPs are natural models for the mosaic structure of SNP data since they can flexibly accommodate varying numbers of subsequences and they do not have the label switching problems inherent in HMMs. [sent-32, score-0.276]
22 There is a rich literature on modelling genetic variations. [sent-34, score-0.202]
23 The standard coalescent with recombination (also known as the ancestral recombination graph) model describes the genealogical history of a set of haplotypes using coalescent, recombination and mutation events [10]. [sent-35, score-0.74]
24 Though an accurate model of the genetic process, inference is unfortunately highly intractable. [sent-36, score-0.147]
25 PHASE [11, 12] and IMPUTE [13] are a class of HMM based models, where each HMM state corresponds to a haplotype in a reference panel (training set). [sent-37, score-0.125]
26 This alleviates the label switching problem, but incurs higher computational costs than the normal HMMs or our FCP since there are now as many HMM states as reference haplotypes. [sent-38, score-0.157]
27 BEAGLE [14] introduces computational improvements by collapsing the multiple occurrences of the same mosaic subsequence across the reference haplotypes into a single node of a graph, with the graph constructed in a very efficient but somewhat ad hoc manner. [sent-39, score-0.235]
28 Section 2 introduces preliminary notation and describes random partitions and the CRP. [sent-40, score-0.127]
29 Section 4 describes an auxiliary variables Gibbs sampler for our model. [sent-42, score-0.034]
30 A partition γ of S is a set of disjoint non-empty subsets of S (called clusters) whose union is S. [sent-52, score-0.069]
31 If a ⊂ S, define the projection γ|a of γ onto a to be the partition of a obtained by removing the elements of S\a as well as any resulting empty subsets from γ. [sent-54, score-0.069]
32 The canonical distribution over ΠS is the Chinese restaurant process (CRP) [15, 16]. [sent-55, score-0.128]
33 It can be described using an iterative generative process: n customers enter a Chinese restaurant one at a time. [sent-56, score-0.333]
34 The first customer sits at some table and each subsequent customer sit at a table with m current customers with probability proportional to m, or at a new table with probability proportional to α, where α is a parameter of the CRP. [sent-57, score-1.608]
35 The seating arrangement of customers around tables forms a partition γ of S, with occupied tables corresponding to the clusters in γ. [sent-58, score-0.718]
36 We write γ ∼ CRP(α, S) if γ ∈ ΠS is a CRP distributed random partition over S. [sent-59, score-0.069]
37 Multiplying the conditional probabilities together gives the probability mass function of the CRP: fα,S (γ) = α|γ| Γ(α) Γ(|a|) Γ(n + α) a∈γ (1) where Γ is the gamma function. [sent-60, score-0.088]
38 The CRP is exchangeable (invariant to permutations of S), and projective (the probability of the projection γ|a is simply fα,a (γ|a )), so can be extended in a natural manner to partitions of N and is related via de Finetti’s theorem to the Dirichlet process [17]. [sent-61, score-0.309]
39 3 Fragmentation-Coagulation Processes A fragmentation-coagulation process (FCP) is a continuous-time Markov process π ≡ (π(t), t ∈ [0, T ]) over a time interval [0, T ] where each π(t) is a random partition in ΠS . [sent-62, score-0.153]
40 Since the space of partitions for a finite S is finite, the FCP is a Markov jump process (MJP) [18] : it evolves according to a discrete series of random events (or jumps) at which it changes state and at all other times the state remains unchanged. [sent-63, score-0.419]
41 In particular, the jump events in an FCP are either fragmentations or coagulations. [sent-64, score-0.269]
42 Note that fragmentations and coagulations are converses of each other; as we will see later, this will lead to some important properties of the FCP. [sent-66, score-0.201]
43 Fractions are, for the orange sequence, from left to right: probability of joining cluster c at time 0, probability of following cluster a at a fragmentation event, rate of starting a new table (creating a fragmentation), and rate of joining with an existing table (creating a coagulation). [sent-71, score-0.923]
44 Following the various popular culinary processes in Bayesian nonparametrics, we will start by describing the law of π in terms of the conditional distribution of the cluster membership of each sequence i given those of 1, . [sent-72, score-0.186]
45 π|[i−1] is piecewise constant, with π|[i−1] (t) ∈ Π[i−1] describing the partitioning of the sequences 1, . [sent-78, score-0.04]
46 , i − 1 (the seating arrangement of customers 1, . [sent-81, score-0.381]
47 Let ai (t) = c\{i}, where c is the unique cluster in π|[i] (t) containing i. [sent-85, score-0.227]
48 Note that either ai (t) ∈ π|[i−1] (t), meaning customer i sits at an existing table in π|[i−1] (t), or ai (t) = ∅, which will mean that customer i sits at a new table. [sent-86, score-1.296]
49 Thus the function ai describes customer i’s choice of table to sit at through times [0, T ]. [sent-87, score-0.712]
50 We define the conditional distribution of ai given π|[i−1] as a Markov jump process evolving from time 0 to T with two parameters µ > 0 and R > 0 (see Figure 1): i = 1: The first customer sits at a table for the duration of the process, i. [sent-88, score-0.823]
51 t = 0: Each subsequent customer i starts at time t = 0 by sitting at a table according to CRP probabilities with parameter µ. [sent-91, score-0.662]
52 So, ai (0) = c ∈ π|[i−1] (0) with probability proportional to |c|, and ai (0) = ∅ with probability proportional to µ. [sent-92, score-0.422]
53 F1: At time t > 0, if customer i is sitting at table ai (t−) = c ∈ π|[i−1] (t−), and the table c fragments into two tables a, b ∈ π|[i−1] (t), customer i will move to table a with probability |a|/|c|, and to table b with probability |b|/|c|. [sent-93, score-1.551]
54 C1: If the table c merges with another table at time t, the customer simply follows the other customers to the resulting merged table. [sent-94, score-0.869]
55 F2: At all other times t, if customer i is sitting at some existing table ai (t−) = c ∈ π|[i−1] (t), then the customer will move to a new empty table (ai (t) = ∅) with rate R/|c|. [sent-95, score-1.293]
56 C2: Finally, if i is sitting by himself (ai (t−) = ∅), then he will join an existing table ai (t) = c ∈ π|[i−1] (t) with rate R/µ. [sent-96, score-0.467]
57 The total rate of joining any existing table is |π|[i−1] (t)|R/µ. [sent-97, score-0.27]
58 Note that when customer i moves to a new table in step F2, a fragmentation event is created, and all subsequent customers who end up in the same table will have to decide at step F1 whether to move to the original table or to the table newly created by i. [sent-98, score-1.403]
59 The probabilities in steps F1 and F2 are exactly the same as those for a Dirichlet diffusion tree [19] with constant divergence function R. [sent-99, score-0.149]
60 Similarly step C2 creates a coagulation event in which subsequent customers seated at the two merging tables will move to the merged table in step C1, and the probabilities are exactly the same as those for Kingman’s coalescent [20, 21]. [sent-100, score-1.078]
61 Thus our FCP is a combination of the Dirichlet diffusion tree and Kingman’s coalescent. [sent-101, score-0.089]
62 Theorem 3 below shows that this combination results in FCPs being stationary Markov processes with CRP equilibrium distributions. [sent-102, score-0.125]
63 Further, FCPs are reversible, so in a sense the Dirichlet diffusion tree and Kingman’s coalescent are duals of each other. [sent-103, score-0.233]
64 Given π|[i−1] , π|[i] is uniquely determined by ai and vice versa, so that the seating of all n customers through times [0, T ], a1 , . [sent-104, score-0.552]
65 , an , uniquely determines the sequential partition structure π. [sent-107, score-0.097]
66 The first is an alternative characterisation of π as an MJP whose transitions are fragmentations or coagulations, an unsurprising observation since both the Dirichlet diffusion tree and Kingman’s coalescent, as partition-valued processes, are Markov. [sent-109, score-0.204]
67 The total rate of transition out of γ is: q(γ, ·) = R H|c|−1 + c∈γ R |γ|(|γ| − 1) µ 2 (3) where H|c|−1 is the |c| − 1st harmonic number. [sent-112, score-0.105]
68 The initial distribution follows from the CRP probabilities of step t = 0. [sent-114, score-0.06]
69 For every i, ai is Markov and ai (t) depends only on ai (t−) and π|[i−1] (t), thus (ai (s), s ∈ [0, t]) depends only on (aj (s), s ∈ [0, t], j < i) and the Markovian structure of π follows by induction. [sent-115, score-0.444]
70 Further, the probabilities and rates in steps F1, F2, C1 and C2 do not depend explicitly on t so π has stationary transit rates. [sent-117, score-0.175]
71 By construction, q(γ, ρ) is only non-zero if γ and ρ are related by a complimentary pair of fragmentation and coagulation events, as in the theorem. [sent-118, score-0.375]
72 To derive the transition rates (2), recall that a transition rate r from state s to state s means that if the MJP is in state s at time t then it will transit to state s by an infinitesimal time later t + δ with probability δr. [sent-119, score-0.393]
73 For the fragmentation rate q(γ, ρ), the probability of transiting from γ to ρ in an infinitesimal time δ is δ times the rate at which a customer starts his own table in step F2, times the probabilities of subsequent customers choosing either table in step F1 to form the two tables a and b. [sent-120, score-1.483]
74 Dividing this product by δ forms the rate q(γ, ρ). [sent-121, score-0.057]
75 Without loss of generality suppose that the table started by the customer eventually becomes a and that there were j other customers at the existing table which eventually becomes b. [sent-122, score-0.852]
76 Thus, the rate of the customer starting his own table is R/j and the product of probabilities of subsequent customer choices in step F1 is then 1·2···(|a|−1)×j···(|b|−1) . [sent-123, score-0.966]
77 Similarly, the coagulation rate q(ρ, γ) is a product of the rate R at which a customer moves from his own table to an existing table in step C2 and the µ probability of all subsequent customers in either table moving to the merged table (which is just 1). [sent-125, score-1.527]
78 Finally, the total transition rate q(γ, ·) is a sum over all possible fragmentations and coagulations of γ. [sent-126, score-0.306]
79 There are |γ|(|γ|−1) possible pairs of clusters to coagulate, giving the second term. [sent-127, score-0.112]
80 The first term 2 is obtained by summing over all c ∈ γ, and over all unordered pairs a, b resulting from fragmenting c, and using the identity {a,b} Γ(|a|)Γ(|b|) = H|c|−1 . [sent-128, score-0.058]
81 Thus it can be extended naturally to a Markov process over partitions of N. [sent-131, score-0.135]
82 Both properties follow from the fact that both the initial distribution CRP(µ, S) and the transition rates (2) are projective and exchangeable. [sent-133, score-0.127]
83 Projectivity is a direct consequence of the iterative construction, showing that the law of π|[i] does not depend on the clustering trajectories aj of subsequent customers j > i. [sent-135, score-0.373]
84 We can show exchangeability of π by deriving the joint probability density of a sample path of π (the density exists since both ΠS and T are finite so π has a finite number of events on [0, T ]), and seeing that it is invariant to permutations of S. [sent-136, score-0.127]
wordName wordTfidf (topN-words)
[('fcps', 0.403), ('customer', 0.33), ('fcp', 0.259), ('customers', 0.247), ('crp', 0.228), ('fragmentation', 0.202), ('coagulation', 0.173), ('snp', 0.152), ('ai', 0.148), ('genetic', 0.147), ('coalescent', 0.144), ('mjp', 0.144), ('table', 0.121), ('fragmentations', 0.115), ('recombination', 0.115), ('clusters', 0.112), ('mosaic', 0.101), ('haplotypes', 0.101), ('kingman', 0.101), ('seating', 0.101), ('partitions', 0.093), ('sits', 0.093), ('jump', 0.089), ('coagulations', 0.086), ('restaurant', 0.086), ('chinese', 0.084), ('sitting', 0.083), ('switching', 0.082), ('projective', 0.079), ('cluster', 0.079), ('tables', 0.078), ('transit', 0.076), ('hmms', 0.073), ('partition', 0.069), ('subsequent', 0.068), ('snps', 0.065), ('merging', 0.065), ('events', 0.065), ('diffusion', 0.064), ('hmm', 0.062), ('probabilities', 0.06), ('joining', 0.059), ('dirichlet', 0.058), ('fragmenting', 0.058), ('haplotype', 0.058), ('nucleotide', 0.058), ('transiting', 0.058), ('rate', 0.057), ('markov', 0.056), ('modelling', 0.055), ('processes', 0.053), ('nitesimal', 0.053), ('variations', 0.051), ('mutation', 0.051), ('generalisations', 0.051), ('sit', 0.051), ('subsequences', 0.051), ('merged', 0.05), ('transition', 0.048), ('mutations', 0.044), ('label', 0.042), ('move', 0.042), ('process', 0.042), ('splitting', 0.041), ('sequences', 0.04), ('exibly', 0.04), ('stationary', 0.039), ('multiplying', 0.039), ('reversible', 0.035), ('proportional', 0.035), ('permutations', 0.034), ('evolves', 0.034), ('describes', 0.034), ('state', 0.034), ('exchangeable', 0.033), ('reference', 0.033), ('existing', 0.033), ('equilibrium', 0.033), ('arrangement', 0.033), ('event', 0.03), ('nonparametric', 0.03), ('aj', 0.029), ('individuals', 0.029), ('law', 0.029), ('uniquely', 0.028), ('variants', 0.028), ('probability', 0.028), ('times', 0.028), ('creating', 0.026), ('suffer', 0.026), ('tree', 0.025), ('chromosome', 0.025), ('arrangements', 0.025), ('join', 0.025), ('lloyd', 0.025), ('genomes', 0.025), ('organisms', 0.025), ('diagnoses', 0.025), ('genotype', 0.025), ('culinary', 0.025)]
simIndex simValue paperId paperTitle
same-paper 1 1.0 173 nips-2011-Modelling Genetic Variations using Fragmentation-Coagulation Processes
Author: Yee W. Teh, Charles Blundell, Lloyd Elliott
Abstract: We propose a novel class of Bayesian nonparametric models for sequential data called fragmentation-coagulation processes (FCPs). FCPs model a set of sequences using a partition-valued Markov process which evolves by splitting and merging clusters. An FCP is exchangeable, projective, stationary and reversible, and its equilibrium distributions are given by the Chinese restaurant process. As opposed to hidden Markov models, FCPs allow for flexible modelling of the number of clusters, and they avoid label switching non-identifiability problems. We develop an efficient Gibbs sampler for FCPs which uses uniformization and the forward-backward algorithm. Our development of FCPs is motivated by applications in population genetics, and we demonstrate the utility of FCPs on problems of genotype imputation with phased and unphased SNP data. 1
2 0.19984104 266 nips-2011-Spatial distance dependent Chinese restaurant processes for image segmentation
Author: Soumya Ghosh, Andrei B. Ungureanu, Erik B. Sudderth, David M. Blei
Abstract: The distance dependent Chinese restaurant process (ddCRP) was recently introduced to accommodate random partitions of non-exchangeable data [1]. The ddCRP clusters data in a biased way: each data point is more likely to be clustered with other data that are near it in an external sense. This paper examines the ddCRP in a spatial setting with the goal of natural image segmentation. We explore the biases of the spatial ddCRP model and propose a novel hierarchical extension better suited for producing “human-like” segmentations. We then study the sensitivity of the models to various distance and appearance hyperparameters, and provide the first rigorous comparison of nonparametric Bayesian models in the image segmentation domain. On unsupervised image segmentation, we demonstrate that similar performance to existing nonparametric Bayesian models is possible with substantially simpler models and algorithms.
3 0.095399335 131 nips-2011-Inference in continuous-time change-point models
Author: Florian Stimberg, Manfred Opper, Guido Sanguinetti, Andreas Ruttor
Abstract: We consider the problem of Bayesian inference for continuous-time multi-stable stochastic systems which can change both their diffusion and drift parameters at discrete times. We propose exact inference and sampling methodologies for two specific cases where the discontinuous dynamics is given by a Poisson process and a two-state Markovian switch. We test the methodology on simulated data, and apply it to two real data sets in finance and systems biology. Our experimental results show that the approach leads to valid inferences and non-trivial insights. 1
4 0.089124314 221 nips-2011-Priors over Recurrent Continuous Time Processes
Author: Ardavan Saeedi, Alexandre Bouchard-côté
Abstract: We introduce the Gamma-Exponential Process (GEP), a prior over a large family of continuous time stochastic processes. A hierarchical version of this prior (HGEP; the Hierarchical GEP) yields a useful model for analyzing complex time series. Models based on HGEPs display many attractive properties: conjugacy, exchangeability and closed-form predictive distribution for the waiting times, and exact Gibbs updates for the time scale parameters. After establishing these properties, we show how posterior inference can be carried efficiently using Particle MCMC methods [1]. This yields a MCMC algorithm that can resample entire sequences atomically while avoiding the complications of introducing slice and stick auxiliary variables of the beam sampler [2]. We applied our model to the problem of estimating the disease progression in multiple sclerosis [3], and to RNA evolutionary modeling [4]. In both domains, we found that our model outperformed the standard rate matrix estimation approach. 1
5 0.070800684 246 nips-2011-Selective Prediction of Financial Trends with Hidden Markov Models
Author: Dmitry Pidan, Ran El-Yaniv
Abstract: Focusing on short term trend prediction in a Ä?Ĺš nancial context, we consider the problem of selective prediction whereby the predictor can abstain from prediction in order to improve performance. We examine two types of selective mechanisms for HMM predictors. The Ä?Ĺš rst is a rejection in the spirit of Chow’s well-known ambiguity principle. The second is a specialized mechanism for HMMs that identiÄ?Ĺš es low quality HMM states and abstain from prediction in those states. We call this model selective HMM (sHMM). In both approaches we can trade-off prediction coverage to gain better accuracy in a controlled manner. We compare performance of the ambiguity-based rejection technique with that of the sHMM approach. Our results indicate that both methods are effective, and that the sHMM model is superior. 1
6 0.070477709 132 nips-2011-Inferring Interaction Networks using the IBP applied to microRNA Target Prediction
7 0.063219719 39 nips-2011-Approximating Semidefinite Programs in Sublinear Time
8 0.062793605 43 nips-2011-Bayesian Partitioning of Large-Scale Distance Data
9 0.060034502 156 nips-2011-Learning to Learn with Compound HD Models
10 0.060001772 57 nips-2011-Comparative Analysis of Viterbi Training and Maximum Likelihood Estimation for HMMs
11 0.057722718 37 nips-2011-Analytical Results for the Error in Filtering of Gaussian Processes
12 0.057486244 55 nips-2011-Collective Graphical Models
13 0.056709908 186 nips-2011-Noise Thresholds for Spectral Clustering
14 0.05480513 285 nips-2011-The Kernel Beta Process
15 0.049575381 96 nips-2011-Fast and Balanced: Efficient Label Tree Learning for Large Scale Object Recognition
16 0.048691429 101 nips-2011-Gaussian process modulated renewal processes
17 0.046963044 215 nips-2011-Policy Gradient Coagent Networks
18 0.045910694 115 nips-2011-Hierarchical Topic Modeling for Analysis of Time-Evolving Personal Choices
19 0.043690994 66 nips-2011-Crowdclustering
20 0.042836264 198 nips-2011-On U-processes and clustering performance
topicId topicWeight
[(0, 0.115), (1, 0.01), (2, 0.018), (3, 0.022), (4, -0.033), (5, -0.103), (6, -0.015), (7, -0.071), (8, 0.029), (9, -0.018), (10, 0.032), (11, -0.009), (12, 0.081), (13, -0.116), (14, -0.046), (15, -0.0), (16, 0.023), (17, -0.05), (18, 0.023), (19, -0.039), (20, 0.144), (21, -0.023), (22, -0.067), (23, -0.02), (24, -0.17), (25, 0.009), (26, -0.058), (27, 0.042), (28, 0.067), (29, 0.025), (30, -0.002), (31, 0.102), (32, -0.122), (33, 0.074), (34, -0.051), (35, -0.053), (36, 0.001), (37, -0.013), (38, -0.022), (39, 0.016), (40, 0.129), (41, -0.085), (42, -0.048), (43, 0.025), (44, 0.135), (45, 0.088), (46, -0.155), (47, 0.029), (48, 0.07), (49, -0.088)]
simIndex simValue paperId paperTitle
same-paper 1 0.95697176 173 nips-2011-Modelling Genetic Variations using Fragmentation-Coagulation Processes
Author: Yee W. Teh, Charles Blundell, Lloyd Elliott
Abstract: We propose a novel class of Bayesian nonparametric models for sequential data called fragmentation-coagulation processes (FCPs). FCPs model a set of sequences using a partition-valued Markov process which evolves by splitting and merging clusters. An FCP is exchangeable, projective, stationary and reversible, and its equilibrium distributions are given by the Chinese restaurant process. As opposed to hidden Markov models, FCPs allow for flexible modelling of the number of clusters, and they avoid label switching non-identifiability problems. We develop an efficient Gibbs sampler for FCPs which uses uniformization and the forward-backward algorithm. Our development of FCPs is motivated by applications in population genetics, and we demonstrate the utility of FCPs on problems of genotype imputation with phased and unphased SNP data. 1
2 0.64528018 221 nips-2011-Priors over Recurrent Continuous Time Processes
Author: Ardavan Saeedi, Alexandre Bouchard-côté
Abstract: We introduce the Gamma-Exponential Process (GEP), a prior over a large family of continuous time stochastic processes. A hierarchical version of this prior (HGEP; the Hierarchical GEP) yields a useful model for analyzing complex time series. Models based on HGEPs display many attractive properties: conjugacy, exchangeability and closed-form predictive distribution for the waiting times, and exact Gibbs updates for the time scale parameters. After establishing these properties, we show how posterior inference can be carried efficiently using Particle MCMC methods [1]. This yields a MCMC algorithm that can resample entire sequences atomically while avoiding the complications of introducing slice and stick auxiliary variables of the beam sampler [2]. We applied our model to the problem of estimating the disease progression in multiple sclerosis [3], and to RNA evolutionary modeling [4]. In both domains, we found that our model outperformed the standard rate matrix estimation approach. 1
3 0.63014591 132 nips-2011-Inferring Interaction Networks using the IBP applied to microRNA Target Prediction
Author: Hai-son P. Le, Ziv Bar-joseph
Abstract: Determining interactions between entities and the overall organization and clustering of nodes in networks is a major challenge when analyzing biological and social network data. Here we extend the Indian Buffet Process (IBP), a nonparametric Bayesian model, to integrate noisy interaction scores with properties of individual entities for inferring interaction networks and clustering nodes within these networks. We present an application of this method to study how microRNAs regulate mRNAs in cells. Analysis of synthetic and real data indicates that the method improves upon prior methods, correctly recovers interactions and clusters, and provides accurate biological predictions. 1
4 0.57746255 131 nips-2011-Inference in continuous-time change-point models
Author: Florian Stimberg, Manfred Opper, Guido Sanguinetti, Andreas Ruttor
Abstract: We consider the problem of Bayesian inference for continuous-time multi-stable stochastic systems which can change both their diffusion and drift parameters at discrete times. We propose exact inference and sampling methodologies for two specific cases where the discontinuous dynamics is given by a Poisson process and a two-state Markovian switch. We test the methodology on simulated data, and apply it to two real data sets in finance and systems biology. Our experimental results show that the approach leads to valid inferences and non-trivial insights. 1
5 0.54910946 266 nips-2011-Spatial distance dependent Chinese restaurant processes for image segmentation
Author: Soumya Ghosh, Andrei B. Ungureanu, Erik B. Sudderth, David M. Blei
Abstract: The distance dependent Chinese restaurant process (ddCRP) was recently introduced to accommodate random partitions of non-exchangeable data [1]. The ddCRP clusters data in a biased way: each data point is more likely to be clustered with other data that are near it in an external sense. This paper examines the ddCRP in a spatial setting with the goal of natural image segmentation. We explore the biases of the spatial ddCRP model and propose a novel hierarchical extension better suited for producing “human-like” segmentations. We then study the sensitivity of the models to various distance and appearance hyperparameters, and provide the first rigorous comparison of nonparametric Bayesian models in the image segmentation domain. On unsupervised image segmentation, we demonstrate that similar performance to existing nonparametric Bayesian models is possible with substantially simpler models and algorithms.
6 0.53265345 101 nips-2011-Gaussian process modulated renewal processes
7 0.4869377 285 nips-2011-The Kernel Beta Process
8 0.48456651 43 nips-2011-Bayesian Partitioning of Large-Scale Distance Data
9 0.4840076 55 nips-2011-Collective Graphical Models
10 0.44842988 8 nips-2011-A Model for Temporal Dependencies in Event Streams
11 0.41125798 37 nips-2011-Analytical Results for the Error in Filtering of Gaussian Processes
12 0.39396307 57 nips-2011-Comparative Analysis of Viterbi Training and Maximum Likelihood Estimation for HMMs
13 0.3786234 52 nips-2011-Clustering via Dirichlet Process Mixture Models for Portable Skill Discovery
14 0.36963621 17 nips-2011-Accelerated Adaptive Markov Chain for Partition Function Computation
15 0.36826751 127 nips-2011-Image Parsing with Stochastic Scene Grammar
16 0.35844222 115 nips-2011-Hierarchical Topic Modeling for Analysis of Time-Evolving Personal Choices
17 0.35804909 255 nips-2011-Simultaneous Sampling and Multi-Structure Fitting with Adaptive Reversible Jump MCMC
18 0.35070947 246 nips-2011-Selective Prediction of Financial Trends with Hidden Markov Models
19 0.3452296 42 nips-2011-Bayesian Bias Mitigation for Crowdsourcing
20 0.3282761 104 nips-2011-Generalized Beta Mixtures of Gaussians
topicId topicWeight
[(4, 0.04), (20, 0.019), (26, 0.014), (31, 0.075), (33, 0.561), (43, 0.028), (45, 0.025), (57, 0.023), (74, 0.028), (83, 0.019), (84, 0.031), (99, 0.044)]
simIndex simValue paperId paperTitle
same-paper 1 0.91850358 173 nips-2011-Modelling Genetic Variations using Fragmentation-Coagulation Processes
Author: Yee W. Teh, Charles Blundell, Lloyd Elliott
Abstract: We propose a novel class of Bayesian nonparametric models for sequential data called fragmentation-coagulation processes (FCPs). FCPs model a set of sequences using a partition-valued Markov process which evolves by splitting and merging clusters. An FCP is exchangeable, projective, stationary and reversible, and its equilibrium distributions are given by the Chinese restaurant process. As opposed to hidden Markov models, FCPs allow for flexible modelling of the number of clusters, and they avoid label switching non-identifiability problems. We develop an efficient Gibbs sampler for FCPs which uses uniformization and the forward-backward algorithm. Our development of FCPs is motivated by applications in population genetics, and we demonstrate the utility of FCPs on problems of genotype imputation with phased and unphased SNP data. 1
2 0.89269525 91 nips-2011-Exploiting spatial overlap to efficiently compute appearance distances between image windows
Author: Bogdan Alexe, Viviana Petrescu, Vittorio Ferrari
Abstract: We present a computationally efficient technique to compute the distance of highdimensional appearance descriptor vectors between image windows. The method exploits the relation between appearance distance and spatial overlap. We derive an upper bound on appearance distance given the spatial overlap of two windows in an image, and use it to bound the distances of many pairs between two images. We propose algorithms that build on these basic operations to efficiently solve tasks relevant to many computer vision applications, such as finding all pairs of windows between two images with distance smaller than a threshold, or finding the single pair with the smallest distance. In experiments on the PASCAL VOC 07 dataset, our algorithms accurately solve these problems while greatly reducing the number of appearance distances computed, and achieve larger speedups than approximate nearest neighbour algorithms based on trees [18] and on hashing [21]. For example, our algorithm finds the most similar pair of windows between two images while computing only 1% of all distances on average. 1
3 0.71527493 141 nips-2011-Large-Scale Category Structure Aware Image Categorization
Author: Bin Zhao, Fei Li, Eric P. Xing
Abstract: Most previous research on image categorization has focused on medium-scale data sets, while large-scale image categorization with millions of images from thousands of categories remains a challenge. With the emergence of structured large-scale dataset such as the ImageNet, rich information about the conceptual relationships between images, such as a tree hierarchy among various image categories, become available. As human cognition of complex visual world benefits from underlying semantic relationships between object classes, we believe a machine learning system can and should leverage such information as well for better performance. In this paper, we employ such semantic relatedness among image categories for large-scale image categorization. Specifically, a category hierarchy is utilized to properly define loss function and select common set of features for related categories. An efficient optimization method based on proximal approximation and accelerated parallel gradient method is introduced. Experimental results on a subset of ImageNet containing 1.2 million images from 1000 categories demonstrate the effectiveness and promise of our proposed approach. 1
4 0.67368847 99 nips-2011-From Stochastic Nonlinear Integrate-and-Fire to Generalized Linear Models
Author: Skander Mensi, Richard Naud, Wulfram Gerstner
Abstract: Variability in single neuron models is typically implemented either by a stochastic Leaky-Integrate-and-Fire model or by a model of the Generalized Linear Model (GLM) family. We use analytical and numerical methods to relate state-of-theart models from both schools of thought. First we find the analytical expressions relating the subthreshold voltage from the Adaptive Exponential Integrate-andFire model (AdEx) to the Spike-Response Model with escape noise (SRM as an example of a GLM). Then we calculate numerically the link-function that provides the firing probability given a deterministic membrane potential. We find a mathematical expression for this link-function and test the ability of the GLM to predict the firing probability of a neuron receiving complex stimulation. Comparing the prediction performance of various link-functions, we find that a GLM with an exponential link-function provides an excellent approximation to the Adaptive Exponential Integrate-and-Fire with colored-noise input. These results help to understand the relationship between the different approaches to stochastic neuron models. 1 Motivation When it comes to modeling the intrinsic variability in simple neuron models, we can distinguish two traditional approaches. One approach is inspired by the stochastic Leaky Integrate-and-Fire (LIF) hypothesis of Stein (1967) [1], where a noise term is added to the system of differential equations implementing the leaky integration to a threshold. There are multiple versions of such a stochastic LIF [2]. How the noise affects the firing probability is also a function of the parameters of the neuron model. Therefore, it is important to take into account the refinements of simple neuron models in terms of subthreshold resonance [3, 4], spike-triggered adaptation [5, 6] and non-linear spike 1 initiation [7, 5]. All these improvements are encompassed by the Adaptive Exponential Integrateand-Fire model (AdEx [8, 9]). The other approach is to start with some deterministic dynamics for the the state of the neuron (for instance the instantaneous distance from the membrane potential to the threshold) and link the probability intensity of emitting a spike with a non-linear function of the state variable. Under some conditions, this type of model is part of a greater class of statistical models called Generalized Linear Models (GLM [10]). As a single neuron model, the Spike Response Model (SRM) with escape noise is a GLM in which the state variable is explicitly the distance between a deterministic voltage and the threshold. The original SRM could account for subthreshold resonance, refractory effects and spike-frequency adaptation [11]. Mathematically similar models were developed independently in the study of the visual system [12] where spike-frequency adaptation has also been modeled [13]. Recently, this approach has retained increased attention since the probabilistic framework can be linked with the Bayesian theory of neural systems [14] and because Bayesian inference can be applied to the population of neurons [15]. In this paper, we investigate the similarity and differences between the state-of-the-art GLM and the stochastic AdEx. The motivation behind this work is to relate the traditional threshold neuron models to Bayesian theory. Our results extend the work of Plesser and Gerstner (2000) [16] since we include the non-linearity for spike initiation and spike-frequency adaptation. We also provide relationships between the parameters of the AdEx and the equivalent GLM. These precise relationships can be used to relate analog implementations of threshold models [17] to the probabilistic models used in the Bayesian approach. The paper is organized as follows: We first describe the expressions relating the SRM state-variable to the parameters of the AdEx (Sect. 3.1) in the subthreshold regime. Then, we use numerical methods to find the non-linear link-function that models the firing probability (Sect. 3.2). We find a functional form for the SRM link-function that best describes the firing probability of a stochastic AdEx. We then compare the performance of this link-function with the often used exponential or linear-rectifier link-functions (also called half-wave linear rectifier) in terms of predicting the firing probability of an AdEx under complex stimulus (Sect. 3.3). We find that the exponential linkfunction yields almost perfect prediction. Finally, we explore the relations between the statistic of the noise and the sharpness of the non-linearity for spike initiation with the parameters of the SRM. 2 Presentation of the Models In this section we present the general formula for the stochastic AdEx model (Sect. 2.1) and the SRM (Sect 2.2). 2.1 The Stochastic Adaptive Exponential Integrate-and-Fire Model The voltage dynamics of the stochastic AdEx is given by: V −Θ ˙ τm V = El − V + ∆T exp − Rw + RI + R (1) ∆T τw w = a(V − El ) − w ˙ (2) where τm is the membrane time constant, El the reverse potential, R the membrane resistance, Θ is the threshold, ∆T is the shape factor and I(t) the input current which is chosen to be an Ornstein−Θ Uhlenbeck process with correlation time-constant of 5 ms. The exponential term ∆T exp( V∆T ) is a non-linear function responsible for the emission of spikes and is a diffusive white noise with standard deviation σ (i.e. ∼ N (0, σ)). Note that the diffusive white-noise does not imply white noise fluctuations of the voltage V (t), the probability distribution of V (t) will depend on ∆T and Θ. The second variable, w, describes the subthreshold as well as the spike-triggered adaptation both ˆ parametrized by the coupling strength a and the time constant τw . Each time tj the voltage goes to infinity, we assumed that a spike is emitted. Then the voltage is reset to a fixed value Vr and w is increased by a constant value b. 2.2 The Generalized Linear Model In the SRM, The voltage V (t) is given by the convolution of the injected current I(t) with the membrane filter κ(t) plus the additional kernel η(t) that acts after each spikes (here we split the 2 spike-triggered kernel in two η(t) = ηv (t) + ηw (t) for reasons that will become clear later): V (t) = ˆ ˆ ηv (t − tj ) + ηw (t − tj ) El + [κ ∗ I](t) + (3) ˆ {tj } ˆ Then at each time tj a spike is emitted which results in a change of voltage described by η(t) = ηv (t) + ηw (t). Given the deterministic voltage, (Eq. 3) a spike is emitted according to the firing intensity λ(V ): λ(t) = f (V (t)) (4) where f (·) is an arbitrary function called the link-function. Then the firing behavior of the SRM depends on the choice of the link-function and its parameters. The most common link-function used to model single neuron activities are the linear-rectifier and the exponential function. 3 Mapping In order to map the stochastic AdEx to the SRM we follow a two-step procedure. First we derive the filter κ(t) and the kernels ηv (t) and ηw (t) analytically as a function of AdEx parameters. Second, we derive the link-function of the SRM from the stochastic spike emission of the AdEx. Figure 1: Mapping of the subthreshold dynamics of an AdEx to an equivalent SRM. A. Membrane filter κ(t) for three different sets of parameters of the AdEx leading to over-damped, critically damped and under-damped cases (upper, middle and lower panel, respectively). B. Spike-Triggered η(t) (black), ηv (t) (light gray) and ηw (gray) for the three cases. C. Example of voltage trace produced when an AdEx is stimulated with a step of colored noise (black). The corresponding voltage from a SRM stimulated with the same current and where we forced the spikes to match those of the AdEx (red). D. Error in the subthreshold voltage (VAdEx − VGLM ) as a function of the mean voltage of the AdEx, for the three different cases: over-, critically and under-damped (light gray, gray and black, respectively) with ∆T = 1 mV. Red line represents the voltage threshold Θ. E. Root Mean Square Error (RMSE) ratio for the three cases with ∆T = 1 mV. The RMSE ratio is the RMSE between the deterministic VSRM and the stochastic VAdEx divided by the RMSE between repetitions of the stochastic AdEx voltage. The error bar shows a single standard deviation as the RMSE ratio is averaged accross multiple value of σ. 3.1 Subthreshold voltage dynamics We start by assuming that the non-linearity for spike initiation does not affect the mean subthreshold voltage of the stochastic AdEx (see Figure 1 D). This assumption is motivated by the small ∆T 3 observed in in-vitro recordings (from 0.5 to 2 mV [8, 9]) which suggest that the subthreshold dynamics are mainly linear except very close to Θ. Also, we expect that the non-linear link-function will capture some of the dynamics due to the non-linearity for spike initiation. Thus it is possible to rewrite the deterministic subthreshold part of the AdEx (Eq. 1-2 without and without ∆T exp((V − Θ)/∆T )) using matrices: ˙ x = Ax (5) with x = V w and A = − τ1 m a τw − gl1m τ − τ1 w (6) In this form, the dynamics of the deterministic AdEx voltage is a damped oscillator with a driving force. Depending on the eigenvalues of A the system could be over-damped, critically damped or under-damped. The filter κ(t) of the GLM is given by the impulse response of the system of coupled differential equations of the AdEx, described by Eq. 5 and 6. In other words, one has to derive the response of the system when stimulating with a Dirac-delta function. The type of damping gives three different qualitative shapes of the kernel κ(t), which are summarized in Table 3.1 and Figure 1 A. Since the three different filters also affect the nature of the stochastic voltage fluctuations, we will keep the distinction between over-damped, critically damped and under-damped scenarios throughout the paper. This means that our approach is valid for at least 3 types of diffusive voltage-noise (i.e. the white noise in Eq. 1 filtered by 3 different membrane filters κ(t)). To complete the description of the deterministic voltage, we need an expression for the spiketriggered kernels. The voltage reset at each spike brings a spike-triggered jump in voltage of magˆ nitude ∆ = Vr − V (t). This perturbation is superposed to the current fluctuations due to I(t) and can be mediated by a Delta-diract pulse of current. Thus we can write the voltage reset kernel by: ηv (t) = ∆ ∆ [δ ∗ κ] (t) = κ(t) κ(0) κ(0) (7) where δ(t) is the Dirac-delta function. The shape of this kernel depends on κ(t) and can be computed from Table 3.1 (see Figure 1 B). Finally, the AdEx mediates spike-frequency adaptation by the jump of the second variables w. From Eq. 2 we can see that this produces a current wspike (t) = b exp (−t/τw ) that can cumulate over subsequent spikes. The effect of this current on voltage is then given by the convolution of wspike (t) with the membrane filter κ(t). Thus in the SRM framework the spike-frequency adaptation is taken into account by: ηw (t) = [wspike ∗ κ](t) (8) Again the precise form of ηw (t) depends on κ(t) and can be computed from Table 3.1 (see Figure 1 B). At this point, we would like to verify our assumption that the non-linearity for spike emission can be neglected. Fig. 1 C and D shows that the error between the voltage from Eq. 3 and the voltage from the stochastic AdEx is generally small. Moreover, we see that the main contribution to the voltage prediction error is due to the mismatch close to the spikes. However the non-linearity for spike initiation may change the probability distribution of the voltage fluctuations, which in turn influences the probability of spiking. This will influence the choice of the link-function, as we will see in the next section. 3.2 Spike Generation Using κ(t), ηv (t) and ηw (t), we must relate the spiking probability of the stochastic AdEx as a function of its deterministic voltage. According to [2] the probability of spiking in time bin dt given the deterministic voltage V (t) is given by: p(V ) = prob{spike in [t, t + dt]} = 1 − exp (−f (V (t))dt) (9) where f (·) gives the firing intensity as a function of the deterministic V (t) (Eq. 3). Thus to extract the link-function f we have to compute the probability of spiking given V (t) for our SRM. To do so we apply the method proposed by Jolivet et al. (2004) [18], where the probability of spiking is simply given by the distribution of the deterministic voltage estimated at the spike times divided by the distribution of the SRM voltage when there is no spike (see figure 2 A). One can numerically compute these two quantities for our models using N repetitions of the same stimulus. 4 Table 1: Analytical expressions for the membrane filter κ(t) in terms of the parameters of the AdEx for over-, critically-, and under-damped cases. Membrane Filter: κ(t) over-damped if: (τm + τw )2 > 4τm τw (gl +a) gl κ(t) = k1 eλ1 t + k2 eλ2 t λ1 = 1 2τm τw (−(τm + τw ) + critically-damped if: (τm + τw )2 = 4τm τw (gl +a) gl κ(t) = (αt + β)eλt λ= under-damped if: (τm + τw )2 < 4τm τw (gl +a) gl κ(t) = (k1 cos (ωt) + k2 sin (ωt)) eλt −(τm +τw ) 2τm τw λ= −(τm +τw ) 2τm τw (τm + τw )2 − 4 τm τw (gl + a) gl λ2 = 1 2τm τw (−(τm + τw ) − α= τm −τw 2Cτm τw ω= τw −τm 2τm τw 2 − a g l τm τw (τm + τw )2 − 4 τm τw (gl + a) gl k1 = −(1+(τm λ2 )) Cτm (λ1 −λ2 ) k2 = 1+(τm λ1 ) Cτm (λ1 −λ2 ) β= 1 C k1 = k2 = 1 C −(1+τm λ) Cωτm The standard deviation σ of the noise and the parameter ∆T of the AdEx non-linearity may affect the shape of the link-function. We thus extract p(V ) for different σ and ∆T (Fig. 2 B). Then using visual heuristics and previous knowledge about the potential analytical expression of the link-funtion, we try to find a simple analytical function that captures p(V ) for a large range of combinations of σ and ∆T . We observed that the log(− log(p)) is close to linear in most studied conditions Fig. 2 B suggesting the following two distributions of p(V ): V − VT (10) p(V ) = 1 − exp − exp ∆V V − VT p(V ) = exp − exp − (11) ∆V Once we have p(V ), we can use Eq. 4 to obtain the equivalent SRM link-function, which leads to: −1 f (V ) = log (1 − p(V )) (12) dt Then the two potential link-functions of the SRM can be derived from Eq. 10 and Eq. 11 (respectively): V − VT f (V ) = λ0 exp (13) ∆V V − VT (14) f (V ) = −λ0 log 1 − exp − exp − ∆V 1 with λ0 = dt , VT the threshold of the SRM and ∆V the sharpness of the link-function (i.e. the parameters that governs the degree of the stochasticity). Note that the exact value of λ0 has no importance since it is redundant with VT . Eq. 13 is the standard exponential link-function, but we call Eq. 14 the log-exp-exp link-function. 3.3 Prediction The next point is to evaluate the fit quality of each link-function. To do this, we first estimate the parameters VT and ∆V of the GLM link-function that maximize the likelihood of observing a spike 5 Figure 2: SRM link-function. A. Histogram of the SRM voltage at the AdEx firing times (red) and at non-firing times (gray). The ratio of the two distributions gives p(V ) (Eq. 9, dashed lines). Inset, zoom to see the voltage histogram evaluated at the firing time (red). B. log(− log(p)) as a function of the SRM voltage for three different noise levels σ = 0.07, 0.14, 0.18 nA (pale gray, gray, black dots, respectively) and ∆T = 1 mV. The line is a linear fit corresponding to the log-exp-exp linkfunction and the dashed line corresponds to a fit with the exponential link-function. C. Same data and labeling scheme as B, but plotting f (V ) according to Eq. 12. The lines are produced with Eq. 14 with parameters fitted as described in B. and the dashed lines are produced with Eq. 13. Inset, same plot but on a semi-log(y) axis. train generated with an AdEx. Second we look at the predictive power of the resulting SRM in terms of Peri-Stimulus Time Histogram (PSTH). In other words we ask how close the spike trains generated with a GLM are from the spike train generated with a stochastic AdEx when both models are stimulated with the same input current. For any GLM with link-function f (V ) ≡ f (t|I, θ) and parameters θ regulating the shape of κ(t), ˆ ηv (t) and ηw (t), the Negative Log-Likelihood (NLL) of observing a spike-train {t} is given by: NLL = − log(f (t|I, θ)) − f (t|I, θ) (15) t ˆ t It has been shown that the negative log-likelihood is convex in the parameters if f is convex and logconcave [19]. It is easy to show that a linear-rectifier link-function, the exponential link-function and the log-exp-exp link-function all satisfy these conditions. This allows efficient estimation of ˆ ˆ the optimal parameters VT and ∆V using a simple gradient descent. One can thus estimate from a single AdEx spike train the optimal parameters of a given link-function, which is more efficient than the method used in Sect. 3.2. The minimal NLL resulting from the gradient descent gives an estimation of the fit quality. A better estimate of the fit quality is given by the distance between the PSTHs in response to stimuli not used for parameter fitting . Let ν1 (t) be the PSTH of the AdEx, and ν2 (t) be the PSTH of the fitted SRM, 6 Figure 3: PSTH prediction. A. Injected current. B. Voltage traces produced by an AdEx (black) and the equivalent SRM (red), when stimulated with the current in A. C. Raster plot for 20 realizations of AdEx (black tick marks) and equivalent SRM (red tick marks). D. PSTH of the AdEx (black) and the SRM (red) obtained by averaging 10,000 repetitions. E. Optimal log-likelihood for the three cases of the AdEx, using three different link-functions, a linear-rectifier (light gray), an exponential link-function (gray) and the link-function defined by Eq. 14 (dark gray), these values are obtained by averaging over 40 different combinations σ and ∆T (see Fig. 4). Error bars are one standard deviation, the stars denote a significant difference, two-sample t-test with α = 0.01. F. same as E. but for Md (Eq. 16). then we use Md ∈ [0, 1] as a measure of match: Md = 2 2 (ν1 (t) − ν2 (t)) dt ν1 (t)2 dt + ν2 (t)2 dt (16) Md = 1 means that it is impossible to differentiate the SRM from the AdEx in terms of their PSTHs, whereas a Md of 0 means that the two PSTHs are completely different. Thus Md is a normalized similarity measure between two PSTHs. In practice, Md is estimated from the smoothed (boxcar average of 1 ms half-width) averaged spike train of 1 000 repetitions for each models. We use both the NLL and Md to quantify the fit quality for each of the three damping cases and each of the three link-functions. Figure 3 shows the match between the stochastic AdEx used as a reference and the derived GLM when both are stimulated with the same input current (Fig. 3 A). The resulting voltage traces are almost identical (Fig. 3 B) and both models predict almost the same spike trains and so the same PSTHs (Fig. 3 C and D). More quantitalively, we see on Fig. 3 E and F, that the linear-rectifier fits significantly worse than both the exponential and log-exp-exp link-functions, both in terms of NLL and of Md . The exponential link-function performs as well as the log-exp-exp link-function, with a spike train similarity measure Md being almost 1 for both. Finally the likelihood-based method described above gives us the opportunity to look at the relationship between the AdEx parameters σ and ∆T that governs its spike emission and the parameters VT and ∆V of the link-function (Fig. 4). We observe that an increase of the noise level produces a flatter link-function (greater ∆V ) while an increase in ∆T also produces an increase in ∆V and VT (note that Fig. 4 shows ∆V and VT for the exponential link-function only, but equivalent results are obtained with the log-exp-exp link-function). 4 Discussion In Sect. 3.3 we have shown that it is possible to predict with almost perfect accuracy the PSTH of a stochastic AdEx model using an appropriate set of parameters in the SRM. Moreover, since 7 Figure 4: Influence of the AdEx parameters on the parameters of the exponential link-function. A. VT as a function of ∆T and σ. B. ∆V as a function of ∆T and σ. the subthreshold voltage of the AdEx also gives a good match with the deterministic voltage of the SRM, we expect that the AdEx and the SRM will not differ in higher moments of the spike train probability distributions beyond the PSTH. We therefore conclude that diffusive noise models of the type of Eq. 1-2 are equivalent to GLM of the type of Eq. 3-4. Once combined with similar results on other types of stochastic LIF (e.g. correlated noise), we could bridge the gap between the literature on GLM and the literature on diffusive noise models. Another noteworthy observation pertains to the nature of the link-function. The link-function has been hypothesized to be a linear-rectifier, an exponential, a sigmoidal or a Gaussian [16]. We have observed that for the AdEx the link-function follows Eq. 14 that we called the log-exp-exp linkfunction. Although the link-function is log-exp-exp for most of the AdEx parameters, the exponential link-function gives an equivalently good prediction of the PSTH. This can be explained by the fact that the difference between log-exp-exp and exponential link-functions happens mainly at low voltage (i.e. far from the threshold), where the probability of emitting a spike is so low (Figure 2 C, until -50 mv). Therefore, even if the exponential link-function overestimates the firing probability at these low voltages it rarely produces extra spikes. At voltages closer to the threshold, where most of the spikes are emitted, the two link-functions behave almost identically and hence produce the same PSTH. The Gaussian link-function can be seen as lying in-between the exponential link-function and the log-exp-exp link-function in Fig. 2. This means that the work of Plesser and Gerstner (2000) [16] is in agreement with the results presented here. The importance of the time-derivative of the ˙ voltage stressed by Plesser and Gerstner (leading to a two-dimensional link-function f (V, V )) was not studied here to remain consistent with the typical usage of GLM in neural systems [14]. Finally we restricted our study to exponential non-linearity for spike initiation and do not consider other cases such as the Quadratic Integrate-and-fire (QIF, [5]) or other polynomial functional shapes. We overlooked these cases for two reasons. First, there are many evidences that the non-linearity in neurons (estimated from in-vitro recordings of Pyramidal neurons) is well approximated by a single exponential [9]. Second, the exponential non-linearity of the AdEx only affects the subthreshold voltage at high voltage (close to threshold) and thus can be neglected to derive the filters κ(t) and η(t). Polynomial non-linearities on the other hand affect a larger range of the subthreshold voltage so that it would be difficult to justify the linearization of subthreshold dynamics essential to the method presented here. References [1] R. B. Stein, “Some models of neuronal variability,” Biophys J, vol. 7, no. 1, pp. 37–68, 1967. [2] W. Gerstner and W. Kistler, Spiking neuron models. Cambridge University Press New York, 2002. [3] E. Izhikevich, “Resonate-and-fire neurons,” Neural Networks, vol. 14, no. 883-894, 2001. [4] M. J. E. Richardson, N. Brunel, and V. Hakim, “From subthreshold to firing-rate resonance,” Journal of Neurophysiology, vol. 89, pp. 2538–2554, 2003. 8 [5] E. Izhikevich, “Simple model of spiking neurons,” IEEE Transactions on Neural Networks, vol. 14, pp. 1569–1572, 2003. [6] S. Mensi, R. Naud, M. Avermann, C. C. H. Petersen, and W. Gerstner, “Parameter extraction and classification of three neuron types reveals two different adaptation mechanisms,” Under review. [7] N. Fourcaud-Trocme, D. Hansel, C. V. Vreeswijk, and N. Brunel, “How spike generation mechanisms determine the neuronal response to fluctuating inputs,” Journal of Neuroscience, vol. 23, no. 37, pp. 11 628–11 640, 2003. [8] R. Brette and W. Gerstner, “Adaptive exponential integrate-and-fire model as an effective description of neuronal activity,” Journal of Neurophysiology, vol. 94, pp. 3637–3642, 2005. [9] L. Badel, W. Gerstner, and M. Richardson, “Dependence of the spike-triggered average voltage on membrane response properties,” Neurocomputing, vol. 69, pp. 1062–1065, 2007. [10] P. McCullagh and J. A. Nelder, Generalized linear models, 2nd ed. Chapman & Hall/CRC, 1998, vol. 37. [11] W. Gerstner, J. van Hemmen, and J. Cowan, “What matters in neuronal locking?” Neural computation, vol. 8, pp. 1653–1676, 1996. [12] D. Hubel and T. Wiesel, “Receptive fields and functional architecture of monkey striate cortex,” Journal of Physiology, vol. 195, pp. 215–243, 1968. [13] J. Pillow, L. Paninski, V. Uzzell, E. Simoncelli, and E. Chichilnisky, “Prediction and decoding of retinal ganglion cell responses with a probabilistic spiking model,” Journal of Neuroscience, vol. 25, no. 47, pp. 11 003–11 013, 2005. [14] K. Doya, S. Ishii, A. Pouget, and R. P. N. Rao, Bayesian brain: Probabilistic approaches to neural coding. The MIT Press, 2007. [15] S. Gerwinn, J. H. Macke, M. Seeger, and M. Bethge, “Bayesian inference for spiking neuron models with a sparsity prior,” in Advances in Neural Information Processing Systems, 2007. [16] H. Plesser and W. Gerstner, “Noise in integrate-and-fire neurons: From stochastic input to escape rates,” Neural Computation, vol. 12, pp. 367–384, 2000. [17] J. Schemmel, J. Fieres, and K. Meier, “Wafer-scale integration of analog neural networks,” in Neural Networks, 2008. IJCNN 2008. (IEEE World Congress on Computational Intelligence). IEEE International Joint Conference on, june 2008, pp. 431 –438. [18] R. Jolivet, T. Lewis, and W. Gerstner, “Generalized integrate-and-fire models of neuronal activity approximate spike trains of a detailed model to a high degree of accuracy,” Journal of Neurophysiology, vol. 92, pp. 959–976, 2004. [19] L. Paninski, “Maximum likelihood estimation of cascade point-process neural encoding models,” Network: Computation in Neural Systems, vol. 15, pp. 243–262, 2004. 9
5 0.62039721 103 nips-2011-Generalization Bounds and Consistency for Latent Structural Probit and Ramp Loss
Author: Joseph Keshet, David A. McAllester
Abstract: We consider latent structural versions of probit loss and ramp loss. We show that these surrogate loss functions are consistent in the strong sense that for any feature map (finite or infinite dimensional) they yield predictors approaching the infimum task loss achievable by any linear predictor over the given features. We also give finite sample generalization bounds (convergence rates) for these loss functions. These bounds suggest that probit loss converges more rapidly. However, ramp loss is more easily optimized on a given sample. 1
6 0.5528965 165 nips-2011-Matrix Completion for Multi-label Image Classification
7 0.55069727 188 nips-2011-Non-conjugate Variational Message Passing for Multinomial and Binary Regression
8 0.54188973 266 nips-2011-Spatial distance dependent Chinese restaurant processes for image segmentation
9 0.46095169 216 nips-2011-Portmanteau Vocabularies for Multi-Cue Image Representation
10 0.44260427 223 nips-2011-Probabilistic Joint Image Segmentation and Labeling
11 0.41672 126 nips-2011-Im2Text: Describing Images Using 1 Million Captioned Photographs
12 0.41542828 168 nips-2011-Maximum Margin Multi-Instance Learning
13 0.41362956 227 nips-2011-Pylon Model for Semantic Segmentation
14 0.40893704 1 nips-2011-$\theta$-MRF: Capturing Spatial and Semantic Structure in the Parameters for Scene Understanding
15 0.3920441 112 nips-2011-Heavy-tailed Distances for Gradient Based Image Descriptors
16 0.38792077 127 nips-2011-Image Parsing with Stochastic Scene Grammar
17 0.38475168 154 nips-2011-Learning person-object interactions for action recognition in still images
18 0.37649921 66 nips-2011-Crowdclustering
19 0.36893523 35 nips-2011-An ideal observer model for identifying the reference frame of objects
20 0.35610777 214 nips-2011-PiCoDes: Learning a Compact Code for Novel-Category Recognition