nips nips2008 nips2008-12 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Ben Calderhead, Mark Girolami, Neil D. Lawrence
Abstract: Identification and comparison of nonlinear dynamical system models using noisy and sparse experimental data is a vital task in many fields, however current methods are computationally expensive and prone to error due in part to the nonlinear nature of the likelihood surfaces induced. We present an accelerated sampling procedure which enables Bayesian inference of parameters in nonlinear ordinary and delay differential equations via the novel use of Gaussian processes (GP). Our method involves GP regression over time-series data, and the resulting derivative and time delay estimates make parameter inference possible without solving the dynamical system explicitly, resulting in dramatic savings of computational time. We demonstrate the speed and statistical accuracy of our approach using examples of both ordinary and delay differential equations, and provide a comprehensive comparison with current state of the art methods. 1
Reference: text
sentIndex sentText sentNum sentScore
1 uk Abstract Identification and comparison of nonlinear dynamical system models using noisy and sparse experimental data is a vital task in many fields, however current methods are computationally expensive and prone to error due in part to the nonlinear nature of the likelihood surfaces induced. [sent-16, score-0.422]
2 We present an accelerated sampling procedure which enables Bayesian inference of parameters in nonlinear ordinary and delay differential equations via the novel use of Gaussian processes (GP). [sent-17, score-1.173]
3 Our method involves GP regression over time-series data, and the resulting derivative and time delay estimates make parameter inference possible without solving the dynamical system explicitly, resulting in dramatic savings of computational time. [sent-18, score-0.645]
4 We demonstrate the speed and statistical accuracy of our approach using examples of both ordinary and delay differential equations, and provide a comprehensive comparison with current state of the art methods. [sent-19, score-0.725]
5 1 Introduction Mechanistic system modeling employing nonlinear ordinary or delay differential equations 1 (ODEs or DDEs) is oftentimes hampered by incomplete knowledge of the system structure or the specific parameter values defining the observed dynamics [16]. [sent-20, score-1.242]
6 Bayesian, and indeed non-Bayesian, approaches for parameter estimation and model comparison [19] involve evaluating likelihood functions, which requires the explicit numerical solution of the differential equations describing the model. [sent-21, score-0.708]
7 The computational cost of obtaining the required numerical solutions of the ODEs or DDEs can result in extremely slow running times. [sent-22, score-0.117]
8 In this paper we present a method for performing Bayesian inference over mechanistic models by the novel use of Gaussian processes (GP) to predict the state variables of the model as well as their derivatives, thus avoiding the need to solve the system explicitly. [sent-23, score-0.318]
9 We note that state space models offer an alternative approach for performing parameter inference over dynamical models particularly for on-line analysis of data, see [2]. [sent-25, score-0.286]
10 Related to the work we present, we also note that in [6] the use of GPs has been proposed in obtaining the solution of fully parameterised linear operator equations such as ODEs. [sent-26, score-0.228]
11 Likewise in [12] GPs are employed as emulators of the posterior response to parameter values as a means of improving the computational efficiency of a hybrid Monte Carlo sampler. [sent-27, score-0.133]
12 Our approach is different and builds significantly upon previous work which has investigated the use of derivative estimates to directly approximate system parameters for models described by ODEs. [sent-28, score-0.336]
13 A spline-based approach was first suggested in [18] for smoothing experimental data and obtaining derivative estimates, which could then be used to compute a measure of mismatch for derivative values obtained from the system of equations. [sent-29, score-0.36]
14 The methods 1 The methodology in this paper can also be straightforwardly extended to partial differential equations. [sent-32, score-0.402]
15 Finally, these methods only provide point estimates of the “correct” parameters and are unable to cope with multiple solutions. [sent-39, score-0.081]
16 In contrast we provide a Bayesian solution, which is capable of sampling from multimodal distributions. [sent-42, score-0.109]
17 We demonstrate its speed and statistical accuracy and provide comparisons with the current best methods. [sent-43, score-0.06]
18 Likewise delay differential equations can be used to describe certain dynamic systems, where now an explicit time-delay τ is employed. [sent-46, score-0.742]
19 If observations are made at T distinct time points the N × T matrices summarise the overall observed system as Y = X + E. [sent-50, score-0.16]
20 In order to obtain values for X the system of ODEs must be solved, so that in the case of an initial value problem X(θ, x0 ) denotes the solution of the system of equations at the specified time points for the parameters θ and initial conditions x0 . [sent-51, score-0.543]
21 Figure 1(a) illustrates graphically the conditional dependencies of the overall statistical model and from this the posterior density follows by employing appropriate priors such 2 that p(θ, x0 , σ|Y) ∝ π(θ)π(x0 )π(σ) n NYn,· (X(θ, x0 )n,· , Iσn ). [sent-52, score-0.225]
22 The desired marginal p(θ|Y) 2 can be obtained from this joint posterior . [sent-53, score-0.079]
23 Various sampling schemes can be devised to sample from the joint posterior. [sent-54, score-0.109]
24 However, regardless of the sampling method, each proposal requires the specific solution of the system of differential equations which, as will be demonstrated in the experimental sections, is the main computational bottleneck in running an MCMC scheme for models based on differential equations. [sent-55, score-1.15]
25 The computational complexity of numerically solving such a system cannot be easily quantified since it depends on many factors such as the type of model and its stiffness, which in turn depends on the specific parameter values used. [sent-56, score-0.176]
26 3 Auxiliary Gaussian Processes on State Variables Let us assume independent3 Gaussian process priors on the state variables such that p(Xn,· |ϕn ) = N (0, Cϕn ), where Cϕn denotes the matrix of covariance function values with hyperparameters 2 ϕn . [sent-58, score-0.182]
27 With noise n ∼ N (0, σn IT ), the state posterior, p(Xn,· |Yn,· , σn , ϕn ) follows as N (µn , Σn ) 2 −1 2 2 where µn = Cϕn (Cϕn + σn I) Yn,· and Σn = σn Cϕn (Cϕn + σn I)−1 . [sent-59, score-0.056]
28 Given priors π(σn ) and 2 π(ϕn ) the corresponding posterior is p(ϕn , σn |Yn,· ) ∝ π(σn )π(ϕn )NYn,· (0, σn I + Cϕn ) and from this we can obtain the joint posterior, p(X, σn=1···N , ϕn=1···N |Y, ), over a non-parametric GP model of the state-variables. [sent-60, score-0.122]
29 The conditional distribution for the state-derivatives is 2 This distribution is implicitly conditioned on the numerical solver and associated error tolerances. [sent-62, score-0.101]
30 This allows us to evaluate a posterior over parameters θ consistent with the differential equation based on the smoothed state and state derivative estimates, see Figure 1(b). [sent-68, score-0.655]
31 Assuming Normal errors between the state- derivatives ˙ Xn,· and the functional, fn (X, θ, t) evaluated at the GP generated state- values, X corresponding ˙ to time points t = t1 · · · tT then p(Xn,· |X, θ, γn ) = N (fn (X, θ, t), Iγn ) with γn a state- specific ˙ ˙ error variance. [sent-69, score-0.215]
32 In other words, given observations Y, we can sample from the conditional distribution for X and marginalize the augmented derivative space. [sent-75, score-0.102]
33 The differential equation need now never be explicitly solved, its implicit solution is integrated into the sampling scheme. [sent-76, score-0.478]
34 4 Sampling Schemes for Fully and Partially Observed Systems The introduction of the auxiliary model and its associated variables has enabled us to recast the differential equation as another component of the inference process. [sent-77, score-0.409]
35 This information transfer takes place through sampling candidate solutions for the system in the GP model. [sent-79, score-0.267]
36 Inference is performed by combining these approximate solutions with the system dynamics from the differential equations. [sent-80, score-0.48]
37 It now remains to define an overall sampling scheme for the structural parameters. [sent-81, score-0.253]
38 For brevity, we omit normalizing constants and assume that the system is defined in terms of ODEs. [sent-82, score-0.122]
39 However, our scheme is easily extended for delay differential equations (DDEs) where now predictions at each time point t and the associated delay (t − τ ) are required — we present results for a DDE system in Section 5. [sent-83, score-1.004]
40 We can now consider the complete sampling scheme by also inferring the hyperparameters and corresponding predictions of the state variables and derivatives using the GP framework described in Section 3. [sent-85, score-0.315]
41 This requires two Metropolis sampling schemes; one for inferring the parameters of the GP, ϕ and σ, and another for the parameters of the structural system, θ and γ. [sent-87, score-0.228]
42 However, as a consequence of the system induced dynamics the corresponding likelihood surface defined by p(Y|θ, x0 , σ) can present formidable challenges to standard sampling methods. [sent-88, score-0.286]
43 As an example Figure 1(c) illustrates the induced likelihood surface of a simple dynamic oscillator similar to that presented in the experimental section. [sent-89, score-0.135]
44 Recent advances in MCMC methodology suggest solutions to this problem in the form of population-based MCMC methods [8], which we therefore implement to sample the structural parameters of our model. [sent-90, score-0.161]
45 Population MCMC enables samples to be drawn from a target density p(θ) by defining a product of annealed densities indexed by a temperature parameter β, such that p(θ|β) = i p(θ|βi ) and the desired target density p(θ) is defined for one value of βi . [sent-91, score-0.054]
46 A time homogeneous Markov transition kernel which has p(θ) as its stationary distribution can then be constructed from both local Metropolis proposal moves and global temperature switching moves between the tempered chains of the population [8], allowing freer movement within the parameter space. [sent-93, score-0.2]
47 Sampling of the GP covariance function parameters by a Metropolis step requires computation of a matrix determinant and its inverse, so for all N states in the system a dominant scaling of O(N T 3 ) will be obtained. [sent-95, score-0.214]
48 This poses little problem for many applications in systems biology since T is often fairly small (T ≈ 10 to 100). [sent-96, score-0.057]
49 An approximate scheme can be constructed by first obtaining the maximum a posteriori values for ˆ ˆ ˆ the GP hyperparameters and posterior mean state values, ϕ, σ, Xn , and then employing these in ˆ ϕ, σ , Y) which may be a useful surrogate Equation 3. [sent-100, score-0.332]
50 This will provide samples from p(θ, γ|X, ˆ ˆ for the full joint posterior incurring lower computational cost as all matrix operations will have been pre-computed, as will be demonstrated later in the paper. [sent-101, score-0.12]
51 We can also construct a sampling scheme for the important special case where some states are unobserved. [sent-102, score-0.176]
52 Let o index the observed states, then we may infer all the unknown variables as follows p(θ, γ, Xu |Xo , ϕ, σ) ∝ π(θ)π(γ)π(Xu ) exp − 1 (δ o,u )T (Kn + Iγn )−1 (δ o,u ) n 2 n∈o n where δ o,u ≡ fn (Xo , Xu , θ, t) − mn and π(Xu ) is an appropriately chosen prior. [sent-104, score-0.276]
53 The values of n unobserved species are obtained by propagating their sampled initial values using the corresponding discrete versions of the differential equations and the smoothed estimates of observed species. [sent-105, score-0.742]
54 The p53 transcriptional network example we include requires inference over unobserved protein species, see Section 5. [sent-106, score-0.22]
55 5 Experimental Examples We now demonstrate our GP-based method using a standard squared exponential covariance function on a variety of examples involving both ordinary and delay differential equations, and compare the accuracy and speed with other state-of-the-art methods. [sent-109, score-0.721]
56 Although consisting of only 2 equations and 3 pac V − V /3 + R , R rameters, this dynamical system exhibits a highly nonlinear likelihood surface [11], which is induced by the sharp changes in the properties of the limit cycle as the values of the parameters vary. [sent-112, score-0.557]
57 Such a feature is common to many nonlinear systems and so this model provides an excellent test for our GP-based parameter inference method. [sent-113, score-0.224]
58 The parameters were then inferred from these data sets using the full Bayesian sampling scheme and the approximate sampling scheme (Section 4), both employing population MCMC. [sent-118, score-0.618]
59 Additionally, we inferred the parameters using 2 alternative methods, the profiled estimation method of Ramsay et al. [sent-119, score-0.144]
60 [11] and a Population MCMC based sampling scheme, in which the ODEs were solved explicitly (Section 2), to complete the comparative study. [sent-120, score-0.156]
61 All the algorithms were coded in Matlab, and the population MCMC algorithms were run with 30 temperatures, and used a suitably diffuse Γ(2, 1) prior distribution for all parameters, forming the base distribution for ˆ the sampler. [sent-121, score-0.092]
62 Two of these population MCMC samplers were run in parallel and the R statistic [5] was used to monitor convergence of all chains at all temperatures. [sent-122, score-0.146]
63 In our experiments the chains generally converged after around 5000 iterations, and 2000 samples were then drawn to form the posterior distributions. [sent-124, score-0.133]
64 Each experiment was repeated 100 times, and Table 1 shows summary statistics for each of the inferred parameters. [sent-127, score-0.109]
65 All of the three sampling methods based on population MCMC produced low variance samples from posteriors positioned close to the true parameters values. [sent-128, score-0.241]
66 We found the performance of the profiled estimation method [11] to be very sensitive to the initial parameter values. [sent-130, score-0.141]
67 In practice parameter values are unknown, indeed little may be known even about the range of possible values they may take. [sent-131, score-0.054]
68 Thus it seems sensible to choose initial values from a wide prior distribution so as to explore as many regions of parameter space as possible. [sent-132, score-0.106]
69 0139 Table 1: Summary statistics for each of the inferred parameters of the FitzHugh-Nagumo model. [sent-187, score-0.109]
70 Each experiment was repeated 100 times and the mean parameter values are shown. [sent-188, score-0.054]
71 profiled estimation using initial parameter values drawn from a wide gamma prior, however, yielded highly biased results, with the algorithm often converging to local maxima far from the true parameter values. [sent-194, score-0.195]
72 The parameter estimates become more biased as the variance of the prior is increased, i. [sent-195, score-0.095]
73 consider parameter a; for 40 data points, for initial values a, b, c ∼ N ({0. [sent-200, score-0.106]
74 The speed of the profiled estimation method was also extremely variable, and this was observed to be very dependent on the initial parameter values e. [sent-216, score-0.201]
75 2 Example 2 - Nonlinear Delay Differential Equations This example model describes the oscillatory behaviour of the concentration of mRNA and its corresponding protein level in a genetic regulatory network, introduced by Monk [10]. [sent-228, score-0.141]
76 The translocation of mRNA from the nucleus to the cytosol is explicitly described by a delay differential equation. [sent-229, score-0.538]
77 The application of our method to DDEs is of particular interest since numerical solutions to DDEs are generally much more computationally expensive to obtain than ODEs. [sent-231, score-0.083]
78 Thus inference of such models using MCMC methods and explicitly solving the system at each iteration becomes less feasible as the complexity of the system of DDEs increases. [sent-232, score-0.346]
79 The parameters were then inferred from these data sets using our GP-based population MCMC methods. [sent-237, score-0.201]
80 Figure 3 shows a time comparison for 10 iterations of the GP sampling algorithms and compares it to explicitly solving the DDEs using the Matlab solver DDE23 (which is generally faster than the Sundials solver for DDEs). [sent-238, score-0.264]
81 Using the GP methods, samples from the full posterior can be obtained in less than an hour. [sent-240, score-0.079]
82 Solving the DDEs explicitly, the population MCMC algorithm would take in excess of two weeks computation time, assuming the chains take a similar number of iterations to converge. [sent-241, score-0.146]
83 25 Table 2: Summary statistics for each of the inferred parameters of the Monk model. [sent-290, score-0.109]
84 3 Example 3 - The p53 Gene Regulatory Network with Unobserved Species Our third example considers a linear and a nonlinear model describing the regulation of 5 target genes by the tumour repressor transcription factor protein p53. [sent-295, score-0.283]
85 Letting g(f (t)) = f (t) gives us the linear model originally investigated in [1], and letting g(f (t)) = exp(f (t)) gives us the nonlinear model investigated in [4]. [sent-297, score-0.177]
86 The transcription factor f (t) is unobserved and must be inferred along with the other structural parameters Bj , Sj and Dj using the sampling scheme detailed in Section 4. [sent-298, score-0.47]
87 In this experiment, priors on the unobserved species used were f (t) ∼ Γ(2, 1) with a log-Normal proposal. [sent-300, score-0.215]
88 We test our method using the (a) Linear Model (b) Nonlinear Model Figure 4: The predicted output of the p53 gene using data from Barenco et al. [sent-301, score-0.069]
89 [1] and the accelerated GP inference method for (a) the linear model and (b) the nonlinear response model. [sent-302, score-0.213]
90 Figure 4 shows the inferred missing species and the results are in good accordance with recent biological studies. [sent-308, score-0.174]
91 For this example, our GP sampling algorithms ran to completion in under an hour on a 2. [sent-309, score-0.109]
92 2GHz Centrino laptop, with no difference in speed between using the linear and nonlinear models; indeed the equations describing this biological system could be made more complex with little additional computational cost. [sent-310, score-0.484]
93 6 Conclusions Explicit solution of differential equations is a major bottleneck for the application of inferential methodology in a number of application areas, e. [sent-311, score-0.576]
94 We have addressed this problem and placed it within a Bayesian framework which tackles the main shortcomings of previous solutions to the problem of system identification for nonlinear differential equations. [sent-314, score-0.595]
95 Our methodology allows the possibility of model comparison via the use of Bayes factors, which may be straightforwardly calculated from the samples obtained from the population MCMC algorithm. [sent-315, score-0.172]
96 Possible extensions to this method include more efficient sampling exploiting control variable methods [17], embedding characteristics of a dynamical system in the design of covariance functions and application of our method to models involving partial differential equations. [sent-316, score-0.675]
97 , (2003) Solving noisy linear operator equations by Gaussian processes: application to ordinary and partial differential equations, Proc. [sent-350, score-0.595]
98 , (2003) Gaussian processes to speed up hybrid Monte Carlo for expensive Bayesian integrals, Bayesian Statistics, 7, 651-659. [sent-378, score-0.107]
99 (1982) A spline least squares method for numerical parameter estimation in differential equations. [sent-409, score-0.458]
100 , (2008), Bayesian ranking of biochemical system models Bioinformatics 24, 833-839. [sent-416, score-0.122]
wordName wordTfidf (topN-words)
[('gp', 0.413), ('differential', 0.322), ('ddes', 0.298), ('delay', 0.169), ('fn', 0.163), ('equations', 0.155), ('mcmc', 0.15), ('kn', 0.147), ('ode', 0.146), ('system', 0.122), ('odes', 0.119), ('ordinary', 0.118), ('nonlinear', 0.115), ('mn', 0.113), ('sampling', 0.109), ('monk', 0.108), ('species', 0.105), ('xn', 0.104), ('derivative', 0.102), ('population', 0.092), ('dde', 0.081), ('nyn', 0.081), ('ramsay', 0.081), ('girolami', 0.081), ('posterior', 0.079), ('transcription', 0.079), ('yn', 0.075), ('barenco', 0.071), ('lawrence', 0.071), ('dynamical', 0.07), ('metropolis', 0.07), ('inferred', 0.069), ('gene', 0.069), ('unobserved', 0.067), ('scheme', 0.067), ('employing', 0.065), ('explicit', 0.063), ('xo', 0.061), ('speed', 0.06), ('protein', 0.057), ('biology', 0.057), ('bayesian', 0.056), ('state', 0.056), ('surface', 0.055), ('inference', 0.055), ('parameter', 0.054), ('sundials', 0.054), ('chains', 0.054), ('solver', 0.054), ('led', 0.054), ('pro', 0.053), ('bottleneck', 0.053), ('covariance', 0.052), ('initial', 0.052), ('derivatives', 0.052), ('offer', 0.051), ('rasmussen', 0.051), ('xu', 0.05), ('dj', 0.049), ('median', 0.048), ('ben', 0.047), ('calderhead', 0.047), ('gps', 0.047), ('oscillator', 0.047), ('oscillatory', 0.047), ('numerical', 0.047), ('processes', 0.047), ('explicitly', 0.047), ('gaussian', 0.046), ('methodology', 0.046), ('bj', 0.044), ('accelerated', 0.043), ('glasgow', 0.043), ('mrna', 0.043), ('warped', 0.043), ('priors', 0.043), ('estimates', 0.041), ('transcriptional', 0.041), ('incurring', 0.041), ('parameters', 0.04), ('summary', 0.04), ('sj', 0.039), ('fully', 0.039), ('structural', 0.039), ('mechanistic', 0.038), ('overall', 0.038), ('regulatory', 0.037), ('solutions', 0.036), ('estimation', 0.035), ('map', 0.035), ('nth', 0.034), ('straightforwardly', 0.034), ('obtaining', 0.034), ('dynamic', 0.033), ('matlab', 0.033), ('auxiliary', 0.032), ('describing', 0.032), ('dramatic', 0.032), ('investigated', 0.031), ('hyperparameters', 0.031)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000002 12 nips-2008-Accelerating Bayesian Inference over Nonlinear Differential Equations with Gaussian Processes
Author: Ben Calderhead, Mark Girolami, Neil D. Lawrence
Abstract: Identification and comparison of nonlinear dynamical system models using noisy and sparse experimental data is a vital task in many fields, however current methods are computationally expensive and prone to error due in part to the nonlinear nature of the likelihood surfaces induced. We present an accelerated sampling procedure which enables Bayesian inference of parameters in nonlinear ordinary and delay differential equations via the novel use of Gaussian processes (GP). Our method involves GP regression over time-series data, and the resulting derivative and time delay estimates make parameter inference possible without solving the dynamical system explicitly, resulting in dramatic savings of computational time. We demonstrate the speed and statistical accuracy of our approach using examples of both ordinary and delay differential equations, and provide a comprehensive comparison with current state of the art methods. 1
2 0.29190424 71 nips-2008-Efficient Sampling for Gaussian Process Inference using Control Variables
Author: Neil D. Lawrence, Magnus Rattray, Michalis K. Titsias
Abstract: Sampling functions in Gaussian process (GP) models is challenging because of the highly correlated posterior distribution. We describe an efficient Markov chain Monte Carlo algorithm for sampling from the posterior process of the GP model. This algorithm uses control variables which are auxiliary function values that provide a low dimensional representation of the function. At each iteration, the algorithm proposes new values for the control variables and generates the function from the conditional GP prior. The control variable input locations are found by minimizing an objective function. We demonstrate the algorithm on regression and classification problems and we use it to estimate the parameters of a differential equation model of gene regulation. 1
3 0.19214158 138 nips-2008-Modeling human function learning with Gaussian processes
Author: Thomas L. Griffiths, Chris Lucas, Joseph Williams, Michael L. Kalish
Abstract: Accounts of how people learn functional relationships between continuous variables have tended to focus on two possibilities: that people are estimating explicit functions, or that they are performing associative learning supported by similarity. We provide a rational analysis of function learning, drawing on work on regression in machine learning and statistics. Using the equivalence of Bayesian linear regression and Gaussian processes, we show that learning explicit rules and using similarity can be seen as two views of one solution to this problem. We use this insight to define a Gaussian process model of human function learning that combines the strengths of both approaches. 1
4 0.18803616 32 nips-2008-Bayesian Kernel Shaping for Learning Control
Author: Jo-anne Ting, Mrinal Kalakrishnan, Sethu Vijayakumar, Stefan Schaal
Abstract: In kernel-based regression learning, optimizing each kernel individually is useful when the data density, curvature of regression surfaces (or decision boundaries) or magnitude of output noise varies spatially. Previous work has suggested gradient descent techniques or complex statistical hypothesis methods for local kernel shaping, typically requiring some amount of manual tuning of meta parameters. We introduce a Bayesian formulation of nonparametric regression that, with the help of variational approximations, results in an EM-like algorithm for simultaneous estimation of regression and kernel parameters. The algorithm is computationally efficient, requires no sampling, automatically rejects outliers and has only one prior to be specified. It can be used for nonparametric regression with local polynomials or as a novel method to achieve nonstationary regression with Gaussian processes. Our methods are particularly useful for learning control, where reliable estimation of local tangent planes is essential for adaptive controllers and reinforcement learning. We evaluate our methods on several synthetic data sets and on an actual robot which learns a task-level control law. 1
5 0.15115617 213 nips-2008-Sparse Convolved Gaussian Processes for Multi-output Regression
Author: Mauricio Alvarez, Neil D. Lawrence
Abstract: We present a sparse approximation approach for dependent output Gaussian processes (GP). Employing a latent function framework, we apply the convolution process formalism to establish dependencies between output variables, where each latent function is represented as a GP. Based on these latent functions, we establish an approximation scheme using a conditional independence assumption between the output processes, leading to an approximation of the full covariance which is determined by the locations at which the latent functions are evaluated. We show results of the proposed methodology for synthetic data and real world applications on pollution prediction and a sensor network. 1
6 0.1402792 146 nips-2008-Multi-task Gaussian Process Learning of Robot Inverse Dynamics
8 0.11491076 249 nips-2008-Variational Mixture of Gaussian Process Experts
9 0.113368 9 nips-2008-A mixture model for the evolution of gene expression in non-homogeneous datasets
10 0.09650255 233 nips-2008-The Gaussian Process Density Sampler
11 0.090445101 90 nips-2008-Gaussian-process factor analysis for low-dimensional single-trial analysis of neural population activity
12 0.088568136 221 nips-2008-Stochastic Relational Models for Large-scale Dyadic Data using MCMC
13 0.083074272 76 nips-2008-Estimation of Information Theoretic Measures for Continuous Random Variables
14 0.08073622 77 nips-2008-Evaluating probabilities under high-dimensional latent variable models
15 0.079444595 105 nips-2008-Improving on Expectation Propagation
16 0.078239076 152 nips-2008-Non-stationary dynamic Bayesian networks
17 0.076981507 7 nips-2008-A computational model of hippocampal function in trace conditioning
18 0.073494762 21 nips-2008-An Homotopy Algorithm for the Lasso with Online Observations
19 0.073330402 230 nips-2008-Temporal Difference Based Actor Critic Learning - Convergence and Neural Implementation
20 0.071561448 235 nips-2008-The Infinite Hierarchical Factor Regression Model
topicId topicWeight
[(0, -0.231), (1, 0.047), (2, 0.104), (3, 0.118), (4, 0.132), (5, -0.105), (6, 0.006), (7, 0.271), (8, 0.033), (9, 0.092), (10, 0.089), (11, 0.044), (12, 0.176), (13, -0.156), (14, 0.19), (15, -0.044), (16, -0.101), (17, 0.143), (18, 0.116), (19, -0.028), (20, -0.054), (21, -0.059), (22, -0.002), (23, 0.141), (24, 0.02), (25, 0.196), (26, -0.01), (27, 0.018), (28, -0.025), (29, -0.117), (30, 0.036), (31, -0.06), (32, 0.008), (33, -0.04), (34, 0.027), (35, -0.043), (36, -0.046), (37, 0.063), (38, -0.003), (39, -0.007), (40, 0.008), (41, 0.01), (42, -0.02), (43, 0.063), (44, 0.066), (45, 0.003), (46, -0.039), (47, -0.006), (48, 0.08), (49, 0.0)]
simIndex simValue paperId paperTitle
same-paper 1 0.95593274 12 nips-2008-Accelerating Bayesian Inference over Nonlinear Differential Equations with Gaussian Processes
Author: Ben Calderhead, Mark Girolami, Neil D. Lawrence
Abstract: Identification and comparison of nonlinear dynamical system models using noisy and sparse experimental data is a vital task in many fields, however current methods are computationally expensive and prone to error due in part to the nonlinear nature of the likelihood surfaces induced. We present an accelerated sampling procedure which enables Bayesian inference of parameters in nonlinear ordinary and delay differential equations via the novel use of Gaussian processes (GP). Our method involves GP regression over time-series data, and the resulting derivative and time delay estimates make parameter inference possible without solving the dynamical system explicitly, resulting in dramatic savings of computational time. We demonstrate the speed and statistical accuracy of our approach using examples of both ordinary and delay differential equations, and provide a comprehensive comparison with current state of the art methods. 1
2 0.86471361 71 nips-2008-Efficient Sampling for Gaussian Process Inference using Control Variables
Author: Neil D. Lawrence, Magnus Rattray, Michalis K. Titsias
Abstract: Sampling functions in Gaussian process (GP) models is challenging because of the highly correlated posterior distribution. We describe an efficient Markov chain Monte Carlo algorithm for sampling from the posterior process of the GP model. This algorithm uses control variables which are auxiliary function values that provide a low dimensional representation of the function. At each iteration, the algorithm proposes new values for the control variables and generates the function from the conditional GP prior. The control variable input locations are found by minimizing an objective function. We demonstrate the algorithm on regression and classification problems and we use it to estimate the parameters of a differential equation model of gene regulation. 1
3 0.81092101 213 nips-2008-Sparse Convolved Gaussian Processes for Multi-output Regression
Author: Mauricio Alvarez, Neil D. Lawrence
Abstract: We present a sparse approximation approach for dependent output Gaussian processes (GP). Employing a latent function framework, we apply the convolution process formalism to establish dependencies between output variables, where each latent function is represented as a GP. Based on these latent functions, we establish an approximation scheme using a conditional independence assumption between the output processes, leading to an approximation of the full covariance which is determined by the locations at which the latent functions are evaluated. We show results of the proposed methodology for synthetic data and real world applications on pollution prediction and a sensor network. 1
4 0.74184847 146 nips-2008-Multi-task Gaussian Process Learning of Robot Inverse Dynamics
Author: Christopher Williams, Stefan Klanke, Sethu Vijayakumar, Kian M. Chai
Abstract: The inverse dynamics problem for a robotic manipulator is to compute the torques needed at the joints to drive it along a given trajectory; it is beneficial to be able to learn this function for adaptive control. A robotic manipulator will often need to be controlled while holding different loads in its end effector, giving rise to a multi-task learning problem. By placing independent Gaussian process priors over the latent functions of the inverse dynamics, we obtain a multi-task Gaussian process prior for handling multiple loads, where the inter-task similarity depends on the underlying inertial parameters. Experiments demonstrate that this multi-task formulation is effective in sharing information among the various loads, and generally improves performance over either learning only on single tasks or pooling the data over all tasks. 1
5 0.69210726 32 nips-2008-Bayesian Kernel Shaping for Learning Control
Author: Jo-anne Ting, Mrinal Kalakrishnan, Sethu Vijayakumar, Stefan Schaal
Abstract: In kernel-based regression learning, optimizing each kernel individually is useful when the data density, curvature of regression surfaces (or decision boundaries) or magnitude of output noise varies spatially. Previous work has suggested gradient descent techniques or complex statistical hypothesis methods for local kernel shaping, typically requiring some amount of manual tuning of meta parameters. We introduce a Bayesian formulation of nonparametric regression that, with the help of variational approximations, results in an EM-like algorithm for simultaneous estimation of regression and kernel parameters. The algorithm is computationally efficient, requires no sampling, automatically rejects outliers and has only one prior to be specified. It can be used for nonparametric regression with local polynomials or as a novel method to achieve nonstationary regression with Gaussian processes. Our methods are particularly useful for learning control, where reliable estimation of local tangent planes is essential for adaptive controllers and reinforcement learning. We evaluate our methods on several synthetic data sets and on an actual robot which learns a task-level control law. 1
6 0.6214208 249 nips-2008-Variational Mixture of Gaussian Process Experts
7 0.59234321 138 nips-2008-Modeling human function learning with Gaussian processes
8 0.5831086 233 nips-2008-The Gaussian Process Density Sampler
9 0.55303353 125 nips-2008-Local Gaussian Process Regression for Real Time Online Model Learning
10 0.54884452 221 nips-2008-Stochastic Relational Models for Large-scale Dyadic Data using MCMC
11 0.52947348 90 nips-2008-Gaussian-process factor analysis for low-dimensional single-trial analysis of neural population activity
12 0.47432461 9 nips-2008-A mixture model for the evolution of gene expression in non-homogeneous datasets
13 0.42937639 152 nips-2008-Non-stationary dynamic Bayesian networks
14 0.42485508 105 nips-2008-Improving on Expectation Propagation
15 0.40579003 100 nips-2008-How memory biases affect information transmission: A rational analysis of serial reproduction
16 0.39038846 11 nips-2008-A spatially varying two-sample recombinant coalescent, with applications to HIV escape response
17 0.38151062 129 nips-2008-MAS: a multiplicative approximation scheme for probabilistic inference
18 0.37257323 216 nips-2008-Sparse probabilistic projections
19 0.37255433 30 nips-2008-Bayesian Experimental Design of Magnetic Resonance Imaging Sequences
20 0.37250057 185 nips-2008-Privacy-preserving logistic regression
topicId topicWeight
[(6, 0.034), (7, 0.572), (12, 0.014), (15, 0.012), (28, 0.136), (57, 0.067), (59, 0.012), (63, 0.01), (71, 0.012), (77, 0.028), (83, 0.032)]
simIndex simValue paperId paperTitle
1 0.95163512 109 nips-2008-Interpreting the neural code with Formal Concept Analysis
Author: Dominik Endres, Peter Foldiak
Abstract: We propose a novel application of Formal Concept Analysis (FCA) to neural decoding: instead of just trying to figure out which stimulus was presented, we demonstrate how to explore the semantic relationships in the neural representation of large sets of stimuli. FCA provides a way of displaying and interpreting such relationships via concept lattices. We explore the effects of neural code sparsity on the lattice. We then analyze neurophysiological data from high-level visual cortical area STSa, using an exact Bayesian approach to construct the formal context needed by FCA. Prominent features of the resulting concept lattices are discussed, including hierarchical face representation and indications for a product-of-experts code in real neurons. 1
same-paper 2 0.94930804 12 nips-2008-Accelerating Bayesian Inference over Nonlinear Differential Equations with Gaussian Processes
Author: Ben Calderhead, Mark Girolami, Neil D. Lawrence
Abstract: Identification and comparison of nonlinear dynamical system models using noisy and sparse experimental data is a vital task in many fields, however current methods are computationally expensive and prone to error due in part to the nonlinear nature of the likelihood surfaces induced. We present an accelerated sampling procedure which enables Bayesian inference of parameters in nonlinear ordinary and delay differential equations via the novel use of Gaussian processes (GP). Our method involves GP regression over time-series data, and the resulting derivative and time delay estimates make parameter inference possible without solving the dynamical system explicitly, resulting in dramatic savings of computational time. We demonstrate the speed and statistical accuracy of our approach using examples of both ordinary and delay differential equations, and provide a comprehensive comparison with current state of the art methods. 1
3 0.9305855 56 nips-2008-Deep Learning with Kernel Regularization for Visual Recognition
Author: Kai Yu, Wei Xu, Yihong Gong
Abstract: In this paper we aim to train deep neural networks for rapid visual recognition. The task is highly challenging, largely due to the lack of a meaningful regularizer on the functions realized by the networks. We propose a novel regularization method that takes advantage of kernel methods, where an oracle kernel function represents prior knowledge about the recognition task of interest. We derive an efficient algorithm using stochastic gradient descent, and demonstrate encouraging results on a wide range of recognition tasks, in terms of both accuracy and speed. 1
4 0.90594202 51 nips-2008-Convergence and Rate of Convergence of a Manifold-Based Dimension Reduction Algorithm
Author: Andrew Smith, Hongyuan Zha, Xiao-ming Wu
Abstract: We study the convergence and the rate of convergence of a local manifold learning algorithm: LTSA [13]. The main technical tool is the perturbation analysis on the linear invariant subspace that corresponds to the solution of LTSA. We derive a worst-case upper bound of errors for LTSA which naturally leads to a convergence result. We then derive the rate of convergence for LTSA in a special case. 1
5 0.87270111 45 nips-2008-Characterizing neural dependencies with copula models
Author: Pietro Berkes, Frank Wood, Jonathan W. Pillow
Abstract: The coding of information by neural populations depends critically on the statistical dependencies between neuronal responses. However, there is no simple model that can simultaneously account for (1) marginal distributions over single-neuron spike counts that are discrete and non-negative; and (2) joint distributions over the responses of multiple neurons that are often strongly dependent. Here, we show that both marginal and joint properties of neural responses can be captured using copula models. Copulas are joint distributions that allow random variables with arbitrary marginals to be combined while incorporating arbitrary dependencies between them. Different copulas capture different kinds of dependencies, allowing for a richer and more detailed description of dependencies than traditional summary statistics, such as correlation coefficients. We explore a variety of copula models for joint neural response distributions, and derive an efficient maximum likelihood procedure for estimating them. We apply these models to neuronal data collected in macaque pre-motor cortex, and quantify the improvement in coding accuracy afforded by incorporating the dependency structure between pairs of neurons. We find that more than one third of neuron pairs shows dependency concentrated in the lower or upper tails for their firing rate distribution. 1
6 0.67626297 71 nips-2008-Efficient Sampling for Gaussian Process Inference using Control Variables
7 0.66259885 137 nips-2008-Modeling Short-term Noise Dependence of Spike Counts in Macaque Prefrontal Cortex
8 0.64110941 188 nips-2008-QUIC-SVD: Fast SVD Using Cosine Trees
9 0.6383217 221 nips-2008-Stochastic Relational Models for Large-scale Dyadic Data using MCMC
10 0.63287723 213 nips-2008-Sparse Convolved Gaussian Processes for Multi-output Regression
11 0.62545335 60 nips-2008-Designing neurophysiology experiments to optimally constrain receptive field models along parametric submanifolds
12 0.61306047 54 nips-2008-Covariance Estimation for High Dimensional Data Vectors Using the Sparse Matrix Transform
13 0.60468477 99 nips-2008-High-dimensional support union recovery in multivariate regression
14 0.60110831 83 nips-2008-Fast High-dimensional Kernel Summations Using the Monte Carlo Multipole Method
15 0.59845561 192 nips-2008-Reducing statistical dependencies in natural signals using radial Gaussianization
16 0.59684467 8 nips-2008-A general framework for investigating how far the decoding process in the brain can be simplified
17 0.59654701 146 nips-2008-Multi-task Gaussian Process Learning of Robot Inverse Dynamics
18 0.59377855 66 nips-2008-Dynamic visual attention: searching for coding length increments
19 0.59246886 64 nips-2008-DiscLDA: Discriminative Learning for Dimensionality Reduction and Classification
20 0.59185743 63 nips-2008-Dimensionality Reduction for Data in Multiple Feature Representations