nips nips2010 nips2010-33 knowledge-graph by maker-knowledge-mining

33 nips-2010-Approximate inference in continuous time Gaussian-Jump processes

Source: pdf

Author: Manfred Opper, Andreas Ruttor, Guido Sanguinetti

Abstract: We present a novel approach to inference in conditionally Gaussian continuous time stochastic processes, where the latent process is a Markovian jump process. We ﬁrst consider the case of jump-diffusion processes, where the drift of a linear stochastic differential equation can jump at arbitrary time points. We derive partial differential equations for exact inference and present a very efﬁcient mean ﬁeld approximation. By introducing a novel lower bound on the free energy, we then generalise our approach to Gaussian processes with arbitrary covariance, such as the non-Markovian RBF covariance. We present results on both simulated and real data, showing that the approach is very accurate in capturing latent dynamics and can be useful in a number of real data modelling tasks.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Approximate inference in continuous time Gaussian-Jump processes Andreas Ruttor Fakult¨ t Elektrotechnik und Informatik a Technische Universit¨ t Berlin a Berlin, Germany andreas. [sent-1, score-0.314]

2 uk Abstract We present a novel approach to inference in conditionally Gaussian continuous time stochastic processes, where the latent process is a Markovian jump process. [sent-8, score-0.879]

3 We ﬁrst consider the case of jump-diffusion processes, where the drift of a linear stochastic differential equation can jump at arbitrary time points. [sent-9, score-0.749]

4 We derive partial differential equations for exact inference and present a very efﬁcient mean ﬁeld approximation. [sent-10, score-0.175]

5 By introducing a novel lower bound on the free energy, we then generalise our approach to Gaussian processes with arbitrary covariance, such as the non-Markovian RBF covariance. [sent-11, score-0.291]

6 We present results on both simulated and real data, showing that the approach is very accurate in capturing latent dynamics and can be useful in a number of real data modelling tasks. [sent-12, score-0.099]

7 Introduction Continuous time stochastic processes are receiving increasing attention within the statistical machine learning community, as they provide a convenient and physically realistic tool for modelling and inference in a variety of real world problems. [sent-13, score-0.318]

8 Both continuous state space [1, 2] and discrete state space [3–5] systems have been considered, with applications ranging from systems biology [6] to modelling motion capture [7]. [sent-14, score-0.235]

9 Within the machine learning community, Gaussian processes (GPs) [8] have proved particularly popular, due to their appealing properties which allow to reduce the inﬁnite dimensional smoothing problem into a ﬁnite dimensional regression problem. [sent-15, score-0.23]

10 While GPs are indubitably a very successful tool in many pattern recognition tasks, their use is restricted to processes with continuously varying temporal behaviour, which can be a limit in many applications which exhibit inherently non-stationary or discontinuous behaviour. [sent-16, score-0.12]

11 Inference in this case is intractable, but, by means of a Legendre transform, we can derive a lower bound on the exact free energy, which can be optimised using a saddle point procedure. [sent-24, score-0.165]

12 1 Conditionally Gaussian Markov Processes We consider a continuous state stochastic system governed by a linear stochastic differential equation (SDE) with piecewise constant (in time) drift bias which can switch randomly with Markovian dynamics (see e. [sent-25, score-0.561]

13 For simplicity, we give the derivations in the case when there are only two states in the switching process (i. [sent-28, score-0.341]

14 it is a random telegraph process) and the diffusion system is one dimensional; generalisation to more dimensions or more latent states is straightforward. [sent-30, score-0.428]

15 The system can be written as dx = (Aµ + b − λx) dt + σdw(t), µ(t) ∼ T P (f± ) , (1) 2 where w is the Wiener process with variance σ and µ(t) is a random telegraph process with switching rates f± . [sent-31, score-0.817]

16 Our interest in this type of models is twofold: similar models have found applications in ﬁelds like systems biology, where the rapid transitions of regulatory proteins make a switching latent variable a plausible model [6]. [sent-32, score-0.286]

17 At the same time, at least intuitively, model (1) could be considered as an approximation to more complex non-linear diffusion processes, where diffusion near local minima of the potential is approximated by linear diffusion. [sent-33, score-0.382]

18 Let us assume that we observe the process x at a ﬁnite number of time points with i. [sent-34, score-0.203]

19 For simplicity, we have assumed that the process itself is observed; nothing would change in what follows if we assumed that the variable y is linearly related to the process (except of course that we would have more parameters to estimate). [sent-41, score-0.332]

20 The problem we wish to address is the inference of the joint posterior over both variables x and µ at any time within a certain interval, as well as the determination of (a subset of) the parameters and hyperparameters involved in equation (1) and in the observation model. [sent-42, score-0.264]

21 1 Exact state inference As the system described by equation (1) is a Markovian process, the marginal probability distribution qµ (x, t) for both state variables µ ∈ {0, 1} and x of the posterior process can be calculated using a smoothing algorithm similar to the one described in [6]. [sent-44, score-0.626]

22 (2) Z Here pµ (x, t) denotes the marginal ﬁltering distribution, while Ψµ (x, t) = p({yi |ti > t}|xt = x, µt = µ) is the likelihood of all observations after time t under the condition that the process has state (x, µ) at time t (backward message). [sent-46, score-0.322]

23 The time evolution of the backward message is described by the backward Chapman-Kolmogorov equation for µ ∈ {0, 1} [9]: ∂Ψµ ∂Ψµ σ 2 ∂ 2 Ψµ + (Aµ + b − λx) + = f1−µ (Ψµ (x, t) − Ψ1−µ (x, t)). [sent-47, score-0.352]

24 ∂t ∂x 2 ∂x2 (3) This PDE must be solved backward in time starting at the last observation yN using the initial condition Ψµ (x, tN ) = p(yN |x(tN ) = x). [sent-48, score-0.138]

25 (4) The other observations are taken into account by jump conditions Ψµ (x, t− ) = Ψµ (x, t+ ) p(yj |x(tj ) = x), j j (5) where Ψµ (x, tk ) being the values of Ψµ (x, t) before and after the k-th observation and p(yj |x(tj ) = x) is given by the noise model. [sent-49, score-0.474]

26 Its time evolution is given by the forward Chapman-Kolmogorov equation [9] ∂pµ ∂ σ 2 ∂ 2 pµ = fµ p1−µ (x, t) − f1−µ pµ (x, t). [sent-51, score-0.147]

27 (6) + (Aµ + b − λx)pµ (x, t) − ∂t ∂x 2 ∂x2 We can show that the posterior process qµ (x, t) fulﬁls a similar PDE by calculating its time derivative and using both (3) and (6). [sent-52, score-0.31]

28 Consequently, the only differences between prior and posterior process are the jump rates for the telegraph process µ and the drift of the diffusion process x. [sent-55, score-1.527]

29 2 Variational inference The exact inference approach outlined above gives rise to PDEs which need to be solved numerically in order to estimate the relevant posteriors. [sent-57, score-0.128]

30 So the KL is the sum of an initial condition part (which can be set to zero) and two other parts involving the KL between a Markovian Gaussian process and a Markovian Gaussian process observed linearly with noise (second and third terms) and the KL between two telegraph processes. [sent-65, score-0.567]

31 The variational E-step iteratively minimises these two parts using recursions of the forward-backward type. [sent-66, score-0.281]

32 Interleaved with this, variational M-steps can be carried out by optimising the variational free energy w. [sent-67, score-0.532]

33 1 Computation of the approximating diffusion process Minimisation of the second and third term in equation (11) requires ﬁnding an approximating Gaussian process. [sent-77, score-0.567]

34 By inspection of equation (12), we see that we are trying to compute the posterior 1 process for a discretely observed Gaussian process with (prior) drift Aqµ (t)+b−λx, with the observations being i. [sent-78, score-0.683]

35 Due to the Markovian nature of the process, its single time marginals can be computed using the continuous time version of the well known forward-backward algorithm [10, 11]. [sent-82, score-0.19]

36 The single time posterior marginal can be decomposed as 1 φ (x(t)) ξ (x(t)) , (13) Z where φ is the ﬁltered process or forward message, and ξ is the backward message, i. [sent-83, score-0.465]

37 αˆ dt (14) The ﬁltered process outside the observations satisﬁes the forward Fokker-Planck equation of the prior process, so its mean and variance can be propagated using equations (14) with prior drift 1 ˆ coefﬁcients α = −λ and β = Aqµ + b. [sent-90, score-0.573]

38 Observations are incorporated via the jump conditions ˆ lim φ (x(t)) ∝ p (yi |x(ti )) lim φ (x(t)) , − t→t+ i (15) t→ti whence the recursions on the mean and variances easily follow. [sent-91, score-0.49]

39 Notice that this is much simpler than (discrete time) Kalman ﬁlter recursions as the prior gain is zero in continuous time. [sent-92, score-0.149]

40 Computation of the backward message (smoothing) is analogous; the reader is referred to [10, 11] for further details. [sent-93, score-0.158]

41 2 Jump process smoothing Having computed the approximating diffusion process, we now turn to give the updates for the approximating jump process. [sent-96, score-1.012]

42 to the posterior rates g± allow to eliminate them in favour of the Lagrange multipliers; inserting this into the functional derivatives w. [sent-103, score-0.153]

43 to the marginals q1 (t) gives ODEs involving the Lagrange multiplier and the prior rates only (as well as terms from the diffusion process), which can be solved backward in time from the condition ψ(T ) = 0. [sent-106, score-0.487]

44 This allows to update the rates and then the posterior marginals can be found in a forward propagation, in a manner similar to [4]. [sent-107, score-0.273]

45 2 Conditionally Gaussian Processes: general case In this section, we would like to generalise our model to processes of the form dx = (−λx + Aµ + b)dt + df (t), 4 (17) where the white noise driving process σdw(t) in (1) is replaced by an arbitrary GP df (t) 1 . [sent-108, score-0.454]

46 The application of our variational approximation (11) requires the KL divergence KL [qx p (x0:T |µ0:T )] between a GP qx and a GP with a shifted mean function p (x0:T |µ0:T ). [sent-109, score-0.426]

47 Our preliminary results (based on the Cameron-Martin formula for GPs [12]) indicates that even in simple cases (like Ornstein-Uhlenbeck noise) the measures are not absolutely continuous and the KL divergence is inﬁnite. [sent-111, score-0.121]

48 Hence, we have resorted to a different variational approach, which is based on a lower bound to the free energy. [sent-112, score-0.31]

49 We use the fact, that conditioned on the path of the switching process µ0:T , the prior of x(t) is a GP with a covariance kernel K(t, t ) and can be marginalised out exactly. [sent-113, score-0.341]

50 The kernel K can be easily computed from the kernel of the driving noise process f (t) [2]. [sent-114, score-0.207]

51 EGP [x(t)|µ0:T ] = (18) 0 Marginalising out the conditional GP, the negative log marginal probability of observations (free energy) F = − ln p(D) is represented as 1 F = − ln Eµ [p(D|µ0:T )] = κ − ln Eµ exp − (y − xµ ) (K + σ 2 I)−1 (y − xµ ) 2 . [sent-120, score-0.198]

52 (19) Here Eµ denotes expectation over the prior switching process pµ , y is the vector of observations, and xµ = EGP [(x(t1 ), . [sent-121, score-0.341]

53 This intractable free energy contains a functional in the exponent which is bilinear in the switching process µ. [sent-126, score-0.509]

54 In the spirit of other variational transformations [13, 14] this can be linearised through a Legendre transform (or convex duality). [sent-127, score-0.182]

55 It can be shown that the lower bound (20) neglects the variance of the Eµ [xµ ] process (intuitively, the two point expectations in (19) are dropped). [sent-130, score-0.206]

56 The second term in the bracket looks like the free energy for a jump process model having a (pseudo) log likelihood of the data given by −θ (y−xµ ). [sent-131, score-0.725]

57 Inserting (18) into the last term in (21), we see that this KL minimisation is of the same structure as the one in equation (16) with a linear functional of q in the (pseudo) likelihood term. [sent-133, score-0.105]

58 Therefore the minimiser q is an inhomogeneous Markov jump process, and we can use a backward and forward sweep to compute marginals q1 (t) exactly for a ﬁxed θ! [sent-134, score-0.612]

59 These marginals are used to compute the gradient of the lower bound (K + σ 2 I)θ + (y − Eq [xµ ]) and we iterate between gradient ascent steps and recomputations of Eq [xµ ]. [sent-135, score-0.106]

60 Upon convergence, we use the switching process marginals q1 for prediction. [sent-137, score-0.407]

61 Statistics of the smoothed x process can then be computed by summing the conditional GP statistics (obtained by exact GP regression) and the xµ statistics, which can be computed using the same methods as in [6]. [sent-138, score-0.208]

62 1 In case of a process with smooth sample paths, we can write df (t) = g(t)dt with an “ordinary” GP g 5 1 1 0. [sent-139, score-0.208]

63 Variational Markovian Gaussian-Jump process on the left, approximate RBF Gaussian-Jump process on the right. [sent-163, score-0.332]

64 Top row, inferred posterior jump means (solid line) and true jump proﬁle (dotted black) Bottom row: inferred posterior mean x (solid) with conﬁdence intervals (dotted red); data points are shown as red crosses, and the true sample proﬁle is shown as black dots. [sent-164, score-1.098]

65 Notice that the less conﬁdent jump prediction for the RBF process gives a much higher uncertainty in the x prediction (see text). [sent-165, score-0.557]

66 1 Results Synthetic data To evaluate the performance and identiﬁability of our model, we experimented ﬁrst with a simple one-dimensional synthetic data set generated using a jump proﬁle with only two jumps. [sent-168, score-0.391]

67 A sample from the resulting conditional Gaussian process was then obtained by simulating the SDE using the Euler-Maruyama method, and ten identically spaced points were then taken from the sample path and corrupted with Gaussian noise. [sent-169, score-0.166]

68 Inference was then carried out using two procedures: a Markovian Gaussian-Jump process as described in Section 1, using the variational algorithm, and a “RBF” Gaussian-Jump process with slowly varying covariance, as described in Section 2. [sent-170, score-0.514]

69 The inference results are shown in Figure 1: the left column gives the results of the variational smoothing, while the right column gives the results obtained by ﬁtting a RBF Gaussian-Jump process. [sent-172, score-0.246]

70 The top row shows the inferred posterior mean of the discrete state distribution, while the bottom row gives the conditionally Gaussian posterior. [sent-173, score-0.279]

71 We notice that both approaches provide a good smoothing of the GP and the jump process, although the second jump is inferred as being slightly later than in the true path. [sent-174, score-0.943]

72 Notice that the uncertainties associated with the RBF process are much higher than in the Markovian one, and are dominated by the uncertainty in the posterior mean caused by the uncertainty in the jump process, which is less conﬁdent than in the Markovian case (top right ﬁgure). [sent-175, score-0.664]

73 This is probably due to the fact that the lower bound (20) ignores the contributions of the variance of the xµ term in the free energy, which is due to the variance of the jump process, and hence removes the penalty for having intermediate jump posteriors. [sent-176, score-0.91]

74 A similar behaviour was already noted in a related context in [14]. [sent-177, score-0.097]

75 In terms of computational efﬁciency, the variational Markovian algorithm converged in approximately 0. [sent-178, score-0.182]

76 1 seconds on a standard laptop, while the RBF process took approximately two minutes. [sent-179, score-0.166]

77 Left: inferred posterior switch mean; right smoothed data, with conﬁdence intervals. [sent-194, score-0.263]

78 Estimation of the parameters using the variational upper bound also gave very accurate results, with A = 3. [sent-198, score-0.222]

79 It is interesting to note that, if the system noise parameter σ 2 was set at a higher value, then the A parameter was always driven to zero, leading to a decoupling of the Gaussian and jump processes. [sent-204, score-0.475]

80 In fact, it can be shown that the true free energy has always a local minimum for A = 0: heuristically, the GP is always a sufﬁciently ﬂexible model to ﬁt the data on its own. [sent-205, score-0.168]

81 However, for small levels of system noise, the evidence of the data is such that the more complex model involving a jump process is favoured, giving a type of automated Occam razor, which is one of the main attractions of Bayesian modelling. [sent-206, score-0.646]

82 2 Diffusion in a double-well potential To illustrate the properties of the Gaussian-jump process as an approximator for non-linear stochastic models, we considered the benchmark problem of smoothing data generated from a SDE with double-well potential drift and constant diffusion coefﬁcient. [sent-208, score-0.657]

83 Since the process we wish to approximate is a diffusion process, we use the variational upper bound method, which gave good results in the synthetic experiments. [sent-209, score-0.579]

84 The data we use is the same as the one used in [1], where a nonstationary Gaussian approximation to the non-linear SDE was proposed by means of a variational approximation. [sent-210, score-0.216]

85 The results are shown in Figure 2: as is evident the method both captures accurately the transition time, and provides an excellent smoothing (very similar to the one reported in [1]); these results were obtained in 0. [sent-211, score-0.11]

86 07 seconds, while the Gaussian process approximation of [1] involves gradient descent in a high dimensional space and takes approximately three to four orders of magnitude longer. [sent-212, score-0.166]

87 Naturally, our method cannot be used in this case to estimate the parameters of the true (double well) prior drift, as it only models the linear behaviour near the bottom of each well; however, for smoothing purposes it provides a very accurate and efﬁcient alternative method. [sent-213, score-0.207]

88 subtilis Regulation of gene expression at the transcriptional level provides an important application, as well as motivation for the class of models we have been considering. [sent-216, score-0.157]

89 The activation state of a TF is a notoriously difﬁcult quantity to measure experimentally; this has motivated a signiﬁcant effort within the machine learning and systems biology community to provide models to infer TF activities from more easily measurable gene expression levels [2, 16, 17]. [sent-218, score-0.182]

90 The data we use was obtained in [18] during a study of the genetic regulation of competence in B. [sent-220, score-0.213]

91 5 20 Time (h) Figure 3: Results on competence circuit. [sent-243, score-0.151]

92 Left: inferred posterior switch mean (ComK activity proﬁle); right smoothed ComS data, with conﬁdence intervals. [sent-244, score-0.263]

93 Here, we leave ComK as a latent switching variable, and use our model to smooth the ComS data. [sent-250, score-0.221]

94 The results are shown in Figure 3, showing a clear switch behaviour for ComK activity (as expected, and in agreement with the high Hill coefﬁcient), and a good smoothing of the ComS data. [sent-251, score-0.27]

95 While the purpose of that simulation was to recreate the qualitative behaviour of the system, rather than to estimate its parameters, the use of such an implausible parameter value illustrates all too well the need for appropriate data-driven tools in modelling complex systems. [sent-257, score-0.15]

96 4 Discussion In this contribution we proposed a novel inference methodology for continuous time conditionally Gaussian processes. [sent-258, score-0.232]

97 We presented both a method based on a variational upper bound in the case of Markovian processes, and a more general lower bound which holds also for non-Markovian Gaussian processes. [sent-260, score-0.262]

98 A natural question from the machine learning point of view is what are the advantages of continuous time over discrete time approaches. [sent-261, score-0.124]

99 Expectation correction for smoothing in switching linear Gaussian state space models. [sent-326, score-0.325]

100 An exu e citable gene regulatory circuit induces transient cellular differentiation. [sent-340, score-0.151]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('jump', 0.391), ('markovian', 0.216), ('kl', 0.203), ('diffusion', 0.191), ('variational', 0.182), ('switching', 0.175), ('qx', 0.173), ('process', 0.166), ('gp', 0.164), ('competence', 0.151), ('comk', 0.148), ('sde', 0.148), ('telegraph', 0.148), ('drift', 0.146), ('manfred', 0.124), ('coms', 0.123), ('processes', 0.12), ('smoothing', 0.11), ('guido', 0.108), ('opper', 0.108), ('posterior', 0.107), ('backward', 0.101), ('recursions', 0.099), ('gps', 0.097), ('behaviour', 0.097), ('rbf', 0.092), ('free', 0.088), ('conditionally', 0.081), ('energy', 0.08), ('approximating', 0.077), ('differential', 0.075), ('cedric', 0.074), ('ruttor', 0.074), ('sanguinetti', 0.074), ('dt', 0.073), ('divergence', 0.071), ('gaussian', 0.068), ('marginals', 0.066), ('uorescence', 0.065), ('regulatory', 0.065), ('inference', 0.064), ('switch', 0.063), ('regulation', 0.062), ('ti', 0.057), ('message', 0.057), ('berlin', 0.057), ('transcriptional', 0.056), ('equation', 0.056), ('forward', 0.054), ('modelling', 0.053), ('ln', 0.052), ('gene', 0.052), ('biology', 0.052), ('inferred', 0.051), ('continuous', 0.05), ('elektrotechnik', 0.049), ('fakult', 0.049), ('informatik', 0.049), ('legendre', 0.049), ('minimisation', 0.049), ('pprior', 0.049), ('subtilis', 0.049), ('technische', 0.049), ('rates', 0.046), ('involving', 0.046), ('latent', 0.046), ('stochastic', 0.044), ('und', 0.043), ('aq', 0.043), ('egp', 0.043), ('generalise', 0.043), ('odes', 0.043), ('shortage', 0.043), ('andreas', 0.043), ('system', 0.043), ('observations', 0.042), ('df', 0.042), ('smoothed', 0.042), ('noise', 0.041), ('lagrange', 0.04), ('bound', 0.04), ('neil', 0.04), ('pde', 0.04), ('tf', 0.04), ('state', 0.04), ('eq', 0.039), ('community', 0.038), ('pro', 0.038), ('transcription', 0.037), ('optimised', 0.037), ('archambeau', 0.037), ('time', 0.037), ('tn', 0.036), ('equations', 0.036), ('food', 0.035), ('le', 0.034), ('coef', 0.034), ('nonstationary', 0.034), ('dw', 0.034), ('cellular', 0.034)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000007 33 nips-2010-Approximate inference in continuous time Gaussian-Jump processes

Author: Manfred Opper, Andreas Ruttor, Guido Sanguinetti

2 0.13085099 32 nips-2010-Approximate Inference by Compilation to Arithmetic Circuits

Author: Daniel Lowd, Pedro Domingos

Abstract: Arithmetic circuits (ACs) exploit context-speciﬁc independence and determinism to allow exact inference even in networks with high treewidth. In this paper, we introduce the ﬁrst ever approximate inference methods using ACs, for domains where exact inference remains intractable. We propose and evaluate a variety of techniques based on exact compilation, forward sampling, AC structure learning, Markov network parameter learning, variational inference, and Gibbs sampling. In experiments on eight challenging real-world domains, we ﬁnd that the methods based on sampling and learning work best: one such method (AC2 -F) is faster and usually more accurate than loopy belief propagation, mean ﬁeld, and Gibbs sampling; another (AC2 -G) has a running time similar to Gibbs sampling but is consistently more accurate than all baselines. 1

3 0.10977393 263 nips-2010-Switching state space model for simultaneously estimating state transitions and nonstationary firing rates

Author: Ken Takiyama, Masato Okada

Abstract: 019 020 We propose an algorithm for simultaneously estimating state transitions among neural states, the number of neural states, and nonstationary ﬁring rates using a switching state space model (SSSM). This algorithm enables us to detect state transitions on the basis of not only the discontinuous changes of mean ﬁring rates but also discontinuous changes in temporal proﬁles of ﬁring rates, e.g., temporal correlation. We construct a variational Bayes algorithm for a non-Gaussian SSSM whose non-Gaussian property is caused by binary spike events. Synthetic data analysis reveals that our algorithm has the high performance for estimating state transitions, the number of neural states, and nonstationary ﬁring rates compared to previous methods. We also analyze neural data that were recorded from the medial temporal area. The statistically detected neural states probably coincide with transient and sustained states that have been detected heuristically. Estimated parameters suggest that our algorithm detects the state transition on the basis of discontinuous changes in the temporal correlation of ﬁring rates, which transitions previous methods cannot detect. This result suggests that our algorithm is advantageous in real-data analysis. 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050 051 052 053

4 0.10627601 85 nips-2010-Exact learning curves for Gaussian process regression on large random graphs

Author: Matthew Urry, Peter Sollich

Abstract: We study learning curves for Gaussian process regression which characterise performance in terms of the Bayes error averaged over datasets of a given size. Whilst learning curves are in general very difﬁcult to calculate we show that for discrete input domains, where similarity between input points is characterised in terms of a graph, accurate predictions can be obtained. These should in fact become exact for large graphs drawn from a broad range of random graph ensembles with arbitrary degree distributions where each input (node) is connected only to a ﬁnite number of others. Our approach is based on translating the appropriate belief propagation equations to the graph ensemble. We demonstrate the accuracy of the predictions for Poisson (Erdos-Renyi) and regular random graphs, and discuss when and why previous approximations of the learning curve fail. 1

5 0.10277287 242 nips-2010-Slice sampling covariance hyperparameters of latent Gaussian models

Author: Iain Murray, Ryan P. Adams

Abstract: The Gaussian process (GP) is a popular way to specify dependencies between random variables in a probabilistic model. In the Bayesian framework the covariance structure can be speciﬁed using unknown hyperparameters. Integrating over these hyperparameters considers diﬀerent possible explanations for the data when making predictions. This integration is often performed using Markov chain Monte Carlo (MCMC) sampling. However, with non-Gaussian observations standard hyperparameter sampling approaches require careful tuning and may converge slowly. In this paper we present a slice sampling approach that requires little tuning while mixing well in both strong- and weak-data regimes. 1

6 0.10220849 101 nips-2010-Gaussian sampling by local perturbations

7 0.097070359 283 nips-2010-Variational Inference over Combinatorial Spaces

8 0.08581505 268 nips-2010-The Neural Costs of Optimal Control

9 0.083268762 233 nips-2010-Scrambled Objects for Least-Squares Regression

10 0.081179924 113 nips-2010-Heavy-Tailed Process Priors for Selective Shrinkage

11 0.078904547 194 nips-2010-Online Learning for Latent Dirichlet Allocation

12 0.077598929 49 nips-2010-Computing Marginal Distributions over Continuous Markov Networks for Statistical Relational Learning

13 0.077398449 284 nips-2010-Variational bounds for mixed-data factor analysis

14 0.076978229 216 nips-2010-Probabilistic Inference and Differential Privacy

15 0.076590285 129 nips-2010-Inter-time segment information sharing for non-homogeneous dynamic Bayesian networks

16 0.075134255 55 nips-2010-Cross Species Expression Analysis using a Dirichlet Process Mixture Model with Latent Matchings

17 0.074738443 148 nips-2010-Learning Networks of Stochastic Differential Equations

18 0.073604159 118 nips-2010-Implicit Differentiation by Perturbation

19 0.070631295 51 nips-2010-Construction of Dependent Dirichlet Processes based on Poisson Processes

20 0.069832556 100 nips-2010-Gaussian Process Preference Elicitation

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.201), (1, 0.023), (2, -0.05), (3, 0.103), (4, -0.11), (5, 0.065), (6, 0.063), (7, 0.052), (8, -0.028), (9, -0.002), (10, -0.135), (11, -0.074), (12, 0.036), (13, -0.047), (14, -0.076), (15, 0.054), (16, 0.037), (17, 0.112), (18, 0.135), (19, 0.038), (20, -0.107), (21, 0.078), (22, -0.026), (23, -0.029), (24, -0.039), (25, -0.025), (26, -0.025), (27, -0.027), (28, -0.078), (29, -0.027), (30, 0.18), (31, -0.023), (32, 0.058), (33, -0.077), (34, -0.022), (35, -0.033), (36, -0.032), (37, 0.027), (38, 0.007), (39, -0.008), (40, -0.032), (41, 0.067), (42, 0.034), (43, -0.067), (44, 0.068), (45, -0.036), (46, -0.05), (47, 0.063), (48, -0.007), (49, -0.186)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95989078 33 nips-2010-Approximate inference in continuous time Gaussian-Jump processes

Author: Manfred Opper, Andreas Ruttor, Guido Sanguinetti

2 0.70037901 242 nips-2010-Slice sampling covariance hyperparameters of latent Gaussian models

Author: Iain Murray, Ryan P. Adams

3 0.68360281 262 nips-2010-Switched Latent Force Models for Movement Segmentation

Author: Mauricio Alvarez, Jan R. Peters, Neil D. Lawrence, Bernhard Schölkopf

Abstract: Latent force models encode the interaction between multiple related dynamical systems in the form of a kernel or covariance function. Each variable to be modeled is represented as the output of a differential equation and each differential equation is driven by a weighted sum of latent functions with uncertainty given by a Gaussian process prior. In this paper we consider employing the latent force model framework for the problem of determining robot motor primitives. To deal with discontinuities in the dynamical systems or the latent driving force we introduce an extension of the basic latent force model, that switches between different latent functions and potentially different dynamical systems. This creates a versatile representation for robot movements that can capture discrete changes and non-linearities in the dynamics. We give illustrative examples on both synthetic data and for striking movements recorded using a Barrett WAM robot as haptic input device. Our inspiration is robot motor primitives, but we expect our model to have wide application for dynamical systems including models for human motion capture data and systems biology. 1

4 0.65942842 283 nips-2010-Variational Inference over Combinatorial Spaces

Author: Alexandre Bouchard-côté, Michael I. Jordan

Abstract: Since the discovery of sophisticated fully polynomial randomized algorithms for a range of #P problems [1, 2, 3], theoretical work on approximate inference in combinatorial spaces has focused on Markov chain Monte Carlo methods. Despite their strong theoretical guarantees, the slow running time of many of these randomized algorithms and the restrictive assumptions on the potentials have hindered the applicability of these algorithms to machine learning. Because of this, in applications to combinatorial spaces simple exact models are often preferred to more complex models that require approximate inference [4]. Variational inference would appear to provide an appealing alternative, given the success of variational methods for graphical models [5]; unfortunately, however, it is not obvious how to develop variational approximations for combinatorial objects such as matchings, partial orders, plane partitions and sequence alignments. We propose a new framework that extends variational inference to a wide range of combinatorial spaces. Our method is based on a simple assumption: the existence of a tractable measure factorization, which we show holds in many examples. Simulations on a range of matching models show that the algorithm is more general and empirically faster than a popular fully polynomial randomized algorithm. We also apply the framework to the problem of multiple alignment of protein sequences, obtaining state-of-the-art results on the BAliBASE dataset [6]. 1

5 0.6188668 101 nips-2010-Gaussian sampling by local perturbations

Author: George Papandreou, Alan L. Yuille

Abstract: We present a technique for exact simulation of Gaussian Markov random ﬁelds (GMRFs), which can be interpreted as locally injecting noise to each Gaussian factor independently, followed by computing the mean/mode of the perturbed GMRF. Coupled with standard iterative techniques for the solution of symmetric positive deﬁnite systems, this yields a very efﬁcient sampling algorithm with essentially linear complexity in terms of speed and memory requirements, well suited to extremely large scale probabilistic models. Apart from synthesizing data under a Gaussian model, the proposed technique directly leads to an efﬁcient unbiased estimator of marginal variances. Beyond Gaussian models, the proposed algorithm is also very useful for handling highly non-Gaussian continuously-valued MRFs such as those arising in statistical image modeling or in the ﬁrst layer of deep belief networks describing real-valued data, where the non-quadratic potentials coupling different sites can be represented as ﬁnite or inﬁnite mixtures of Gaussians with the help of local or distributed latent mixture assignment variables. The Bayesian treatment of such models most naturally involves a block Gibbs sampler which alternately draws samples of the conditionally independent latent mixture assignments and the conditionally multivariate Gaussian continuous vector and we show that it can directly beneﬁt from the proposed methods. 1

6 0.61654204 284 nips-2010-Variational bounds for mixed-data factor analysis

7 0.59479451 85 nips-2010-Exact learning curves for Gaussian process regression on large random graphs

8 0.58471644 263 nips-2010-Switching state space model for simultaneously estimating state transitions and nonstationary firing rates

9 0.58173192 32 nips-2010-Approximate Inference by Compilation to Arithmetic Circuits

10 0.57169306 82 nips-2010-Evaluation of Rarity of Fingerprints in Forensics

11 0.55634576 129 nips-2010-Inter-time segment information sharing for non-homogeneous dynamic Bayesian networks

12 0.55357647 113 nips-2010-Heavy-Tailed Process Priors for Selective Shrinkage

13 0.55259073 2 nips-2010-A Bayesian Approach to Concept Drift

14 0.50727171 215 nips-2010-Probabilistic Deterministic Infinite Automata

15 0.48980394 106 nips-2010-Global Analytic Solution for Variational Bayesian Matrix Factorization

16 0.48919702 54 nips-2010-Copula Processes

17 0.48127747 118 nips-2010-Implicit Differentiation by Perturbation

18 0.46997729 55 nips-2010-Cross Species Expression Analysis using a Dirichlet Process Mixture Model with Latent Matchings

19 0.46993023 216 nips-2010-Probabilistic Inference and Differential Privacy

20 0.46857849 126 nips-2010-Inference with Multivariate Heavy-Tails in Linear Models

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(13, 0.039), (17, 0.02), (27, 0.057), (30, 0.056), (35, 0.02), (36, 0.114), (45, 0.209), (50, 0.274), (52, 0.037), (60, 0.021), (77, 0.029), (90, 0.023)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.94459593 120 nips-2010-Improvements to the Sequence Memoizer

Author: Jan Gasthaus, Yee W. Teh

Abstract: The sequence memoizer is a model for sequence data with state-of-the-art performance on language modeling and compression. We propose a number of improvements to the model and inference algorithm, including an enlarged range of hyperparameters, a memory-efﬁcient representation, and inference algorithms operating on the new representation. Our derivations are based on precise deﬁnitions of the various processes that will also allow us to provide an elementary proof of the “mysterious” coagulation and fragmentation properties used in the original paper on the sequence memoizer by Wood et al. (2009). We present some experimental results supporting our improvements. 1

2 0.94186378 126 nips-2010-Inference with Multivariate Heavy-Tails in Linear Models

Author: Danny Bickson, Carlos Guestrin

Abstract: Heavy-tailed distributions naturally occur in many real life problems. Unfortunately, it is typically not possible to compute inference in closed-form in graphical models which involve such heavy-tailed distributions. In this work, we propose a novel simple linear graphical model for independent latent random variables, called linear characteristic model (LCM), defined in the characteristic function domain. Using stable distributions, a heavy-tailed family of distributions which is a generalization of Cauchy, L´ vy and Gaussian distrie butions, we show for the first time, how to compute both exact and approximate inference in such a linear multivariate graphical model. LCMs are not limited to stable distributions, in fact LCMs are always defined for any random variables (discrete, continuous or a mixture of both). We provide a realistic problem from the field of computer networks to demonstrate the applicability of our construction. Other potential application is iterative decoding of linear channels with non-Gaussian noise. 1

3 0.93940586 101 nips-2010-Gaussian sampling by local perturbations

Author: George Papandreou, Alan L. Yuille

4 0.93018079 42 nips-2010-Boosting Classifier Cascades

Author: Nuno Vasconcelos, Mohammad J. Saberian

Abstract: The problem of optimal and automatic design of a detector cascade is considered. A novel mathematical model is introduced for a cascaded detector. This model is analytically tractable, leads to recursive computation, and accounts for both classiﬁcation and complexity. A boosting algorithm, FCBoost, is proposed for fully automated cascade design. It exploits the new cascade model, minimizes a Lagrangian cost that accounts for both classiﬁcation risk and complexity. It searches the space of cascade conﬁgurations to automatically determine the optimal number of stages and their predictors, and is compatible with bootstrapping of negative examples and cost sensitive learning. Experiments show that the resulting cascades have state-of-the-art performance in various computer vision problems. 1

same-paper 5 0.91154027 33 nips-2010-Approximate inference in continuous time Gaussian-Jump processes

Author: Manfred Opper, Andreas Ruttor, Guido Sanguinetti

6 0.91114986 147 nips-2010-Learning Multiple Tasks with a Sparse Matrix-Normal Penalty

7 0.8015222 51 nips-2010-Construction of Dependent Dirichlet Processes based on Poisson Processes

8 0.79668278 241 nips-2010-Size Matters: Metric Visual Search Constraints from Monocular Metadata

9 0.79284751 54 nips-2010-Copula Processes

10 0.77987808 113 nips-2010-Heavy-Tailed Process Priors for Selective Shrinkage

11 0.77784294 242 nips-2010-Slice sampling covariance hyperparameters of latent Gaussian models

12 0.76779693 238 nips-2010-Short-term memory in neuronal networks through dynamical compressed sensing

13 0.76746017 49 nips-2010-Computing Marginal Distributions over Continuous Markov Networks for Statistical Relational Learning

14 0.76668322 96 nips-2010-Fractionally Predictive Spiking Neurons

15 0.76429468 109 nips-2010-Group Sparse Coding with a Laplacian Scale Mixture Prior

16 0.7582953 217 nips-2010-Probabilistic Multi-Task Feature Selection

17 0.74988282 158 nips-2010-Learning via Gaussian Herding

18 0.74847811 257 nips-2010-Structured Determinantal Point Processes

19 0.74686694 87 nips-2010-Extended Bayesian Information Criteria for Gaussian Graphical Models

20 0.74620742 122 nips-2010-Improving the Asymptotic Performance of Markov Chain Monte-Carlo by Inserting Vortices