nips nips2002 nips2002-65 knowledge-graph by maker-knowledge-mining

65 nips-2002-Derivative Observations in Gaussian Process Models of Dynamic Systems


Source: pdf

Author: E. Solak, R. Murray-smith, W. E. Leithead, D. J. Leith, Carl E. Rasmussen

Abstract: Gaussian processes provide an approach to nonparametric modelling which allows a straightforward combination of function and derivative observations in an empirical model. This is of particular importance in identification of nonlinear dynamic systems from experimental data. 1) It allows us to combine derivative information, and associated uncertainty with normal function observations into the learning and inference process. This derivative information can be in the form of priors specified by an expert or identified from perturbation data close to equilibrium. 2) It allows a seamless fusion of multiple local linear models in a consistent manner, inferring consistent models and ensuring that integrability constraints are met. 3) It improves dramatically the computational efficiency of Gaussian process models for dynamic system identification, by summarising large quantities of near-equilibrium data by a handful of linearisations, reducing the training set size – traditionally a problem for Gaussian process models.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Derivative observations in Gaussian Process Models of Dynamic Systems £ ¢  ¡   D. [sent-1, score-0.349]

2 uk Abstract Gaussian processes provide an approach to nonparametric modelling which allows a straightforward combination of function and derivative observations in an empirical model. [sent-35, score-1.113]

3 This is of particular importance in identification of nonlinear dynamic systems from experimental data. [sent-36, score-0.104]

4 1) It allows us to combine derivative information, and associated uncertainty with normal function observations into the learning and inference process. [sent-37, score-1.102]

5 This derivative information can be in the form of priors specified by an expert or identified from perturbation data close to equilibrium. [sent-38, score-0.676]

6 2) It allows a seamless fusion of multiple local linear models in a consistent manner, inferring consistent models and ensuring that integrability constraints are met. [sent-39, score-0.072]

7 3) It improves dramatically the computational efficiency of Gaussian process models for dynamic system identification, by summarising large quantities of near-equilibrium data by a handful of linearisations, reducing the training set size – traditionally a problem for Gaussian process models. [sent-40, score-0.354]

8 1 Introduction In many applications which involve modelling an unknown system from observed data, model accuracy could be improved by using not only observations of , but also observations of derivatives e. [sent-41, score-0.973]

9 These derivative observations might be directly available from sensors which, for example, measure velocity or acceleration rather than position, they might be prior linearisation models from historical experiments. [sent-44, score-1.107]

10 A further practical reason is related to the fact that the computational expense of Gaussian processes increases rapidly ( ) with training set size . [sent-45, score-0.158]

11 We may therefore wish to  © § ¥ ¨¦¤ ¤ % ! [sent-46, score-0.039]

12   ¤ "  & %© ('$# use linearisations, which are cheap to estimate, to describe the system in those areas in which they are sufficiently accurate, efficiently summarising a large subset of training data. [sent-47, score-0.155]

13 We focus on application of such models in modelling nonlinear dynamic systems from experimental data. [sent-48, score-0.189]

14 1 Gaussian processes Bayesian regression based on Gaussian processes is described by [1] and interest has grown since publication of [2, 3, 4]. [sent-50, score-0.228]

15 Assume a set of input/output pairs, are given, where In the GP framework, the output values are viewed as being drawn from a zero-mean multivariable Gaussian distribution whose coNamely the output distribution is variance matrix is a function of the input vectors ¤ ¥ ! [sent-51, score-0.198]

16  ¤ © A general model, which reflects the higher correlation between spatially close (in some appropriate metric) points – a smoothness assumption in target system – uses a covariance matrix with the following structure;  © " § I 7 ! [sent-59, score-0.3]

17  „4 ‚ ¢ £ ¤ ¢   ¤ ‘¥  ‡ and (2) The mean of this distribution can be chosen as the maximum-likelihood prediction for the output corresponding to the input ‚     $  2. [sent-67, score-0.108]

18 2 Gaussian process derivatives Differentiation is a linear operation, so the derivative of a Gaussian process remains a Gaussian process. [sent-68, score-0.768]

19 The use of derivative observations in Gaussian processes is described in [5, 6], and in engineering applications in [7, 8, 9]. [sent-69, score-1.034]

20 Suppose we are given new sets of pairs each corresponding to the points of partial derivative of the underlying function In the noise-free setting this corresponds to the relation f ¢ c7     © ¥ §  f ¢   "! [sent-70, score-0.624]

21 ¡ 7 7 • ¢ h g bY™  © ”¥ c 7   ¡ • We now wish to find the joint probability of the vector of ’s and ’s, which involves calculation of the covariance between the function and the derivative observations as well as the covariance among the derivative observations. [sent-75, score-1.786]

22 Covariance functions are typically differentiable, so the covariance between a derivative and function observation and the one between two derivative points satisfy  ¥ ¦ ¤ ¢ ¥ ¤ £ £ ¡ ¤¢  ¤©  • ¦ ¡ ! [sent-76, score-1.436]

23 5 −1 −3 −2 −1 0 distance 1 2 3 Figure 1: The covariance functions between function and derivative points in one dimension, with hyper-parameters . [sent-86, score-0.764]

24 The function defines a covariance that decays monotonically as the distance between the corresponding input points and increases. [sent-87, score-0.225]

25 Covariance between a derivative point and a function point is an odd function, and does not decrease as fast due to the presence of the multiplicative distance term. [sent-88, score-0.579]

26 ¢ ¡ Given perturbation data , around an equilibrium point , we can identify a linearisation , the parameters of which can be viewed as observations of derivatives , and the bias term from the linearisation can be used as a function ‘observation’, i. [sent-92, score-1.14]

27 We use standard linear regression solutions, to estimate the derivatives with a prior of on the covariance matrix  ¤    ¡ ! [sent-95, score-0.337]

28  ¥ n £¤¢   can be viewed as ‘observations’ which have uncertainty specified by the a covariance matrix for the th derivative observations, and their associated linearisation point. [sent-108, score-1.041]

29 £ ¢   R y b© ), the With a suitable ordering of the observations (e. [sent-110, score-0.349]

30 associated noise covariance matrix , which is added to the covariance matrix calculated using (4)-(6), will be block diagonal, where the blocks are the matrices. [sent-112, score-0.338]

31 Use of numerical estimates from linearisations makes it easy to use the full covariance matrix, including off-diagonal elements. [sent-113, score-0.606]

32 This would be much more involved if were to be estimated simultaneously with other covariance function hyperparameters. [sent-114, score-0.18]

33 " ©¢  ¢ h ¢ In a one-dimensional case, given zero noise on observations then two function observations close together give exactly the same information, and constrain the model in the same way as a derivative observation with zero uncertainty. [sent-117, score-1.442]

34 Data is, however, rarely noise-free, and the fact that we can so easily include knowledge of derivative or function observation uncertainty is a major benefit of the Gaussian process prior approach. [sent-118, score-0.943]

35 The identified derivative and function observation, and their covariance matrix can locally summarise the large number of perturbation training points, leading to a significant reduction in data needed during Gaussian process inference. [sent-119, score-0.911]

36 We can, however, choose to improve robustness by retaining any data in the training set from the equilibrium region which have a low likelihood given the GP model based only on the linearisations (e. [sent-120, score-0.622]

37 In this paper we choose the hyper-parameters that maximise the likelihood of the occurrence of the data in the sets , using standard optimisation software. [sent-123, score-0.052]

38 Given and the hyper-parameters the Gaussian process can be used to the data sets infer the conditional distribution of the output as well as its partial derivatives for a given input. [sent-124, score-0.219]

39 4 Derivative and prediction uncertainty Figure 2(c) gives intuitive insight into the constraining effect of function observations, and function+derivative observations on realisations drawn from a Gaussian process prior. [sent-129, score-0.802]

40 To further illustrate the effect of knowledge of derivative information on prediction uncertainty. [sent-130, score-0.656]

41 We consider a simple example with a single pair of function observations and a single derivative pair Hyper-parameters are fixed at Figure 2(a) plots the standard deviation from models resulting from variations of function and derivatives observations. [sent-131, score-1.081]

42 a single function observation, 2 r¥  ¢ H 2  r¥ 2    3. [sent-133, score-0.04]

43 a single function observation + a derivative observation, noise-free, i. [sent-137, score-0.712]

44 a single function observation + uncertain derivative observation (identified from the 150 noisy function observations above, with , ). [sent-141, score-1.273]

45 5 1 function obs + 1 noise−free derivative observation 1 0. [sent-146, score-0.712]

46 + 1 noisy derivative observation almost indistinguishable from 150 function observations −1. [sent-151, score-1.143]

47 5 (b) Effect of including a noise-free derivative or function observation on the prediction of mean and variance, given appropriate hyperparameters. [sent-158, score-0.794]

48 5 dependent variable, y(x) dependent variable, y(x) dependent variable, y(x) (a) The effect of adding a derivative observation on the prediction uncertainty – standard deviation of GP predictions −1. [sent-160, score-1.001]

49 5 2 0 covariate, x 5 2 0 −2 −5 0 covariate, x 5 ¨ ¡ ¥£ ¡ ©§¦¤¢  (c) Examples of realisations drawn from a Gaussian process with , left – no data, middle, showing the constraining effect of function observations (crosses), and right the effect of function & derivative observations (lines). [sent-166, score-1.611]

50 Note that the addition of a derivative point does not have an effect on the mean prediction in any of the cases, because the function derivative is zero. [sent-168, score-1.257]

51 The striking effect of the derivative is on the uncertainty. [sent-169, score-0.596]

52 In the case of prediction using function data the uncertainty increases as we move away from the function observation. [sent-170, score-0.336]

53 Addition of a noise-free derivative observation does not affect uncertainty at , but it does mean that uncertainty increases more slowly as we move away from 0, but if uncertainty on the derivative increases, then there is less of an impact on variance. [sent-171, score-1.661]

54 The model based on the single derivative observation identified from the 150 noisy function observations is almost indistinguishable from the model with all 150 function observations. [sent-172, score-1.183]

55 2 ¥ ' To further illustrate the effect of adding derivative information, consider the pairs of noisefree observations of . [sent-173, score-0.972]

56 The hyper-parameters of the model are obtained through a training involving large amounts of data, but we then perform inference using only points . [sent-174, score-0.131]

57 For illustration, the function point at is replaced with a derivative point at at the same location, and the results shown in Figure 2(b). [sent-175, score-0.579]

58 A standard starting point for identification is to find linear dynamic models at various points on the manifold of equilibria. [sent-177, score-0.155]

59 In the first part of the experiment, we wish to acquire training data by stimulating the system input to take the system through a wide range of conditions along the manifold of equilibria, shown in Figure 3(a). [sent-178, score-0.237]

60 The linearisations are each identified from 200 function observations obtained by starting a simulation at and perturbing the control signal about by . [sent-179, score-0.951]

61 The quadratic derivative from the cubic true function is clearly visible in Figure 4(c), and is smooth, despite the presence of several derivative observations with significant errors, because of the appropriate estimates of derivative uncertainty. [sent-181, score-2.032]

62 Note that the function ‘observations’ derived from the linearisations have much lower uncertainty than the individual function observations. [sent-183, score-0.634]

63    ¤  2 a d   ¤ As a second part of the experiment as shown in Figure 3(b), we now add some offequilibrium function observations to the training set, by applying large control perturbations to the system, taking it through transient regions. [sent-184, score-0.572]

64 We perform a new hyper-parameter optimisation using the using the combination of the transient, off-equilibrium observations and the derivative observations already available. [sent-185, score-1.289]

65 The model incorporates both groups of data and has reduced variance in the off-equilibrium areas. [sent-186, score-0.028]

66 A comparison of simulation runs from the two models with the true data is shown in Figure 5(a), shows the improvement in performance brought by the combination of equilibrium derivatives and off-equilibrium observations over equilibrium information alone. [sent-187, score-0.842]

67 The combined model is almost identical in response to the true system response. [sent-188, score-0.113]

68 4 Conclusions Engineers are used to interpreting linearisations, and find them a natural way of expressing prior knowledge, or constraints that a data-driven model should conform to. [sent-189, score-0.027]

69 Derivative observations in the form of system linearisations are frequently used in control engineering, and many nonlinear identification campaigns will have linearisations of different operating regions as prior information. [sent-190, score-1.438]

70 Acquiring perturbation data close to equilibrium is relatively easy, and the large amounts of data mean that equilibrium linearisations can be made very accurate. [sent-191, score-0.875]

71 While in many cases we will be able to have accurate derivative observations, they will rarely be noise-free, and the fact that we can so easily include knowledge of derivative or function observation uncertainty is a major benefit of the Gaussian process prior approach. [sent-192, score-1.482]

72 In this paper we used numerical estimates of the full covariance matrix for each linearisation, which were different for every linearisation. [sent-193, score-0.197]

73 The analytic inference of derivative information from a model, and importantly, its uncertainty is potentially of great importance to control engineers designing or validating robust control laws, e. [sent-194, score-0.932]

74 Other applications of models which base decisions on model derivatives will have similar potential benefits. [sent-197, score-0.136]

75 Local linearisation models around equilibrium conditions are, however, not sufficient for specifying global dynamics. [sent-198, score-0.346]

76 We need observations away from equilibrium in transient regions, which tend to be much sparser as they are more difficult to obtain experimentally, and the system behaviour tends to be more complex away from equilibrium. [sent-199, score-0.768]

77 Gaussian processes, with robust inference, and input-dependent uncertainty predictions, are especially interesting in sparsely populated off-equilibrium regions. [sent-200, score-0.116]

78 Summarising the large quantities of near-equilibrium data by derivative ‘observations’ should signficantly reduce the computational problems associated with Gaussian processes in modelling dynamic systems. [sent-201, score-0.807]

79 We have demonstrated with a simulation of an example nonlinear system that Gaussian process priors can combine derivative and function observations in a principled manner which is highly applicable in nonlinear dynamic systems modelling tasks. [sent-202, score-1.384]

80 Any smoothing procedure involving linearisations needs to satisfy an integrability constraint, which has not been solved in a satisfactory fashion in other widely-used approaches (e. [sent-203, score-0.509]

81 multiple model [10], or Takagi-Sugeno fuzzy methods [11]), but which is inherently solved within the Gaussian process formulation. [sent-205, score-0.106]

82 The method scales to higher input dimensions well, adding only an extra derivative observations + one function observation for each linearisation. [sent-206, score-1.088]

83 In fact the real benefits may become more obvious in higher dimensions, with increased quantities of training data which can be efficiently summarised by linearisations, and more severe problems in blending local linearisations together consistently. [sent-207, score-0.525]

84 On curve fitting and optimal design for regression (with discussion). [sent-210, score-0.028]

85 Prediction with Gaussian processes: From linear regression to linear prediction and beyond. [sent-225, score-0.088]

86 Gaussian processes to speed up Hybrid Monte Carlo for expensive Bayesian integrals. [sent-254, score-0.1]

87 On transient dynamics, off-equilibrium behaviour and identification in blended multiple model structures. [sent-267, score-0.107]

88 Nonlinear adaptive control using non-parametric Gaussian process prior models. [sent-272, score-0.163]

89 Divide & conquer identification: Using Gaussian process priors to combine derivative and non-derivative observations in a consistent manner. [sent-281, score-1.025]

90 5 2 2 1 2 1 1 0 1 0 0 −1 0 −1 −1 −2 u −2 −1 −2 u x (a) Derivative observations from linearisations identified from the perturbation data. [sent-309, score-0.862]

91 point with noisy ( −2 x (b) Derivative observations on equilibrium, and off-equilibrium function observations from a transient trajectory. [sent-311, score-0.852]

92 ¦ ¨ ¦ ¦ ©§ ¥ £ £¡ ¤¢  ¡ Figure 3: The manifold of equilibria on the true function. [sent-312, score-0.156]

93 Circles indicate points at which a derivative observation is made. [sent-313, score-0.746]

94 5 2 (c) Derivative observations   £  −2. [sent-342, score-0.349]

95 5 −2    £  Figure 4: Inferred values of function and derivatives, with contours, as and are varied along manifold of equilibria (c. [sent-343, score-0.17]

96 Circles indicate the locations of the derivative observations points, lines indicate the uncertainty of observations ( standard deviations. [sent-347, score-1.411]

97 2 true system GP with off−equilibrium data Equilibrium data GP 0 2 −0. [sent-350, score-0.08]

98 GP trained with both on and off-equilibrium data is close to true system, unlike model based only on equilibrium data. [sent-359, score-0.212]

99 −2 (b) Inferred mean and surfaces using linearisations and off-equilibrium data. [sent-360, score-0.46]

100 The trajectory of the simulation shown in a) is plotted for comparison. [sent-361, score-0.046]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('derivative', 0.539), ('linearisations', 0.438), ('observations', 0.349), ('linearisation', 0.192), ('equilibrium', 0.154), ('covariance', 0.14), ('observation', 0.133), ('uncertainty', 0.116), ('derivatives', 0.113), ('identi', 0.109), ('gp', 0.108), ('gaussian', 0.102), ('processes', 0.1), ('realisations', 0.095), ('ireland', 0.095), ('modelling', 0.085), ('control', 0.078), ('perturbation', 0.075), ('transient', 0.075), ('covariate', 0.071), ('summarising', 0.071), ('equilibria', 0.07), ('glasgow', 0.061), ('prediction', 0.06), ('manifold', 0.06), ('process', 0.058), ('effect', 0.057), ('kildare', 0.055), ('leith', 0.055), ('leithead', 0.055), ('solak', 0.055), ('nonlinear', 0.054), ('system', 0.054), ('away', 0.052), ('cov', 0.052), ('optimisation', 0.052), ('dynamic', 0.05), ('epsrc', 0.048), ('fuzzy', 0.048), ('hamilton', 0.048), ('integrability', 0.048), ('maynooth', 0.048), ('scotland', 0.048), ('simulation', 0.046), ('points', 0.045), ('engineers', 0.043), ('indistinguishable', 0.043), ('function', 0.04), ('ph', 0.04), ('wish', 0.039), ('noisy', 0.039), ('bene', 0.038), ('inference', 0.033), ('grant', 0.033), ('response', 0.033), ('quantities', 0.033), ('crosses', 0.032), ('behaviour', 0.032), ('close', 0.032), ('priors', 0.03), ('rarely', 0.03), ('training', 0.03), ('matrix', 0.029), ('indicate', 0.029), ('inferred', 0.028), ('numerical', 0.028), ('increases', 0.028), ('regression', 0.028), ('variance', 0.028), ('constraining', 0.027), ('adding', 0.027), ('prior', 0.027), ('output', 0.026), ('true', 0.026), ('ec', 0.026), ('cation', 0.026), ('combine', 0.025), ('viewed', 0.025), ('circles', 0.024), ('conquer', 0.024), ('multivariable', 0.024), ('fusion', 0.024), ('acquiring', 0.024), ('bill', 0.024), ('congress', 0.024), ('draft', 0.024), ('girard', 0.024), ('ifac', 0.024), ('slopes', 0.024), ('summarised', 0.024), ('vf', 0.024), ('involving', 0.023), ('dependent', 0.023), ('applications', 0.023), ('engineering', 0.023), ('great', 0.023), ('mean', 0.022), ('infer', 0.022), ('validating', 0.022), ('francis', 0.022)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000006 65 nips-2002-Derivative Observations in Gaussian Process Models of Dynamic Systems

Author: E. Solak, R. Murray-smith, W. E. Leithead, D. J. Leith, Carl E. Rasmussen

Abstract: Gaussian processes provide an approach to nonparametric modelling which allows a straightforward combination of function and derivative observations in an empirical model. This is of particular importance in identification of nonlinear dynamic systems from experimental data. 1) It allows us to combine derivative information, and associated uncertainty with normal function observations into the learning and inference process. This derivative information can be in the form of priors specified by an expert or identified from perturbation data close to equilibrium. 2) It allows a seamless fusion of multiple local linear models in a consistent manner, inferring consistent models and ensuring that integrability constraints are met. 3) It improves dramatically the computational efficiency of Gaussian process models for dynamic system identification, by summarising large quantities of near-equilibrium data by a handful of linearisations, reducing the training set size – traditionally a problem for Gaussian process models.

2 0.11937989 95 nips-2002-Gaussian Process Priors with Uncertain Inputs Application to Multiple-Step Ahead Time Series Forecasting

Author: Agathe Girard, Carl Edward Rasmussen, Joaquin Quiñonero Candela, Roderick Murray-Smith

Abstract: We consider the problem of multi-step ahead prediction in time series analysis using the non-parametric Gaussian process model. -step ahead forecasting of a discrete-time non-linear dynamic system can be performed by doing repeated one-step ahead predictions. For a state-space model of the form , the prediction of at time is based on the point estimates of the previous outputs. In this paper, we show how, using an analytical Gaussian approximation, we can formally incorporate the uncertainty about intermediate regressor values, thus updating the uncertainty on the current prediction.   ¡ % # ¢ ¡     ¢ ¡¨ ¦ ¤ ¢ $

3 0.089595273 41 nips-2002-Bayesian Monte Carlo

Author: Zoubin Ghahramani, Carl E. Rasmussen

Abstract: We investigate Bayesian alternatives to classical Monte Carlo methods for evaluating integrals. Bayesian Monte Carlo (BMC) allows the incorporation of prior knowledge, such as smoothness of the integrand, into the estimation. In a simple problem we show that this outperforms any classical importance sampling method. We also attempt more challenging multidimensional integrals involved in computing marginal likelihoods of statistical models (a.k.a. partition functions and model evidences). We find that Bayesian Monte Carlo outperformed Annealed Importance Sampling, although for very high dimensional problems or problems with massive multimodality BMC may be less adequate. One advantage of the Bayesian approach to Monte Carlo is that samples can be drawn from any distribution. This allows for the possibility of active design of sample points so as to maximise information gain.

4 0.083961211 86 nips-2002-Fast Sparse Gaussian Process Methods: The Informative Vector Machine

Author: Ralf Herbrich, Neil D. Lawrence, Matthias Seeger

Abstract: We present a framework for sparse Gaussian process (GP) methods which uses forward selection with criteria based on informationtheoretic principles, previously suggested for active learning. Our goal is not only to learn d–sparse predictors (which can be evaluated in O(d) rather than O(n), d n, n the number of training points), but also to perform training under strong restrictions on time and memory requirements. The scaling of our method is at most O(n · d2 ), and in large real-world classification experiments we show that it can match prediction performance of the popular support vector machine (SVM), yet can be significantly faster in training. In contrast to the SVM, our approximation produces estimates of predictive probabilities (‘error bars’), allows for Bayesian model selection and is less complex in implementation. 1

5 0.079349205 9 nips-2002-A Minimal Intervention Principle for Coordinated Movement

Author: Emanuel Todorov, Michael I. Jordan

Abstract: Behavioral goals are achieved reliably and repeatedly with movements rarely reproducible in their detail. Here we offer an explanation: we show that not only are variability and goal achievement compatible, but indeed that allowing variability in redundant dimensions is the optimal control strategy in the face of uncertainty. The optimal feedback control laws for typical motor tasks obey a “minimal intervention” principle: deviations from the average trajectory are only corrected when they interfere with the task goals. The resulting behavior exhibits task-constrained variability, as well as synergetic coupling among actuators—which is another unexplained empirical phenomenon.

6 0.079267956 133 nips-2002-Learning to Perceive Transparency from the Statistics of Natural Scenes

7 0.073046163 128 nips-2002-Learning a Forward Model of a Reflex

8 0.072701924 115 nips-2002-Informed Projections

9 0.07032571 201 nips-2002-Transductive and Inductive Methods for Approximate Gaussian Process Regression

10 0.069593988 77 nips-2002-Effective Dimension and Generalization of Kernel Learning

11 0.069418728 181 nips-2002-Self Supervised Boosting

12 0.069314152 169 nips-2002-Real-Time Particle Filters

13 0.066761121 110 nips-2002-Incremental Gaussian Processes

14 0.064343087 173 nips-2002-Recovering Intrinsic Images from a Single Image

15 0.05980771 124 nips-2002-Learning Graphical Models with Mercer Kernels

16 0.058782592 138 nips-2002-Manifold Parzen Windows

17 0.05549847 127 nips-2002-Learning Sparse Topographic Representations with Products of Student-t Distributions

18 0.054750334 49 nips-2002-Charting a Manifold

19 0.054536052 73 nips-2002-Dynamic Bayesian Networks with Deterministic Latent Tables

20 0.054138988 155 nips-2002-Nonparametric Representation of Policies and Value Functions: A Trajectory-Based Approach


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.172), (1, 0.009), (2, -0.061), (3, 0.034), (4, -0.029), (5, 0.023), (6, -0.086), (7, 0.065), (8, 0.032), (9, 0.048), (10, 0.017), (11, 0.013), (12, 0.136), (13, -0.025), (14, 0.009), (15, 0.053), (16, -0.118), (17, -0.015), (18, 0.059), (19, 0.013), (20, 0.102), (21, 0.04), (22, 0.097), (23, 0.109), (24, 0.111), (25, -0.047), (26, 0.054), (27, 0.022), (28, 0.022), (29, -0.018), (30, 0.112), (31, 0.151), (32, 0.05), (33, -0.081), (34, 0.118), (35, -0.07), (36, -0.061), (37, 0.025), (38, 0.05), (39, 0.086), (40, 0.232), (41, 0.077), (42, -0.03), (43, 0.092), (44, -0.072), (45, -0.091), (46, 0.158), (47, -0.049), (48, 0.061), (49, 0.08)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97547019 65 nips-2002-Derivative Observations in Gaussian Process Models of Dynamic Systems

Author: E. Solak, R. Murray-smith, W. E. Leithead, D. J. Leith, Carl E. Rasmussen

Abstract: Gaussian processes provide an approach to nonparametric modelling which allows a straightforward combination of function and derivative observations in an empirical model. This is of particular importance in identification of nonlinear dynamic systems from experimental data. 1) It allows us to combine derivative information, and associated uncertainty with normal function observations into the learning and inference process. This derivative information can be in the form of priors specified by an expert or identified from perturbation data close to equilibrium. 2) It allows a seamless fusion of multiple local linear models in a consistent manner, inferring consistent models and ensuring that integrability constraints are met. 3) It improves dramatically the computational efficiency of Gaussian process models for dynamic system identification, by summarising large quantities of near-equilibrium data by a handful of linearisations, reducing the training set size – traditionally a problem for Gaussian process models.

2 0.76090437 95 nips-2002-Gaussian Process Priors with Uncertain Inputs Application to Multiple-Step Ahead Time Series Forecasting

Author: Agathe Girard, Carl Edward Rasmussen, Joaquin Quiñonero Candela, Roderick Murray-Smith

Abstract: We consider the problem of multi-step ahead prediction in time series analysis using the non-parametric Gaussian process model. -step ahead forecasting of a discrete-time non-linear dynamic system can be performed by doing repeated one-step ahead predictions. For a state-space model of the form , the prediction of at time is based on the point estimates of the previous outputs. In this paper, we show how, using an analytical Gaussian approximation, we can formally incorporate the uncertainty about intermediate regressor values, thus updating the uncertainty on the current prediction.   ¡ % # ¢ ¡     ¢ ¡¨ ¦ ¤ ¢ $

3 0.52255017 81 nips-2002-Expected and Unexpected Uncertainty: ACh and NE in the Neocortex

Author: Peter Dayan, Angela J. Yu

Abstract: Inference and adaptation in noisy and changing, rich sensory environments are rife with a variety of specific sorts of variability. Experimental and theoretical studies suggest that these different forms of variability play different behavioral, neural and computational roles, and may be reported by different (notably neuromodulatory) systems. Here, we refine our previous theory of acetylcholine’s role in cortical inference in the (oxymoronic) terms of expected uncertainty, and advocate a theory for norepinephrine in terms of unexpected uncertainty. We suggest that norepinephrine reports the radical divergence of bottom-up inputs from prevailing top-down interpretations, to influence inference and plasticity. We illustrate this proposal using an adaptive factor analysis model.

4 0.4660162 201 nips-2002-Transductive and Inductive Methods for Approximate Gaussian Process Regression

Author: Anton Schwaighofer, Volker Tresp

Abstract: Gaussian process regression allows a simple analytical treatment of exact Bayesian inference and has been found to provide good performance, yet scales badly with the number of training data. In this paper we compare several approaches towards scaling Gaussian processes regression to large data sets: the subset of representers method, the reduced rank approximation, online Gaussian processes, and the Bayesian committee machine. Furthermore we provide theoretical insight into some of our experimental results. We found that subset of representers methods can give good and particularly fast predictions for data sets with high and medium noise levels. On complex low noise data sets, the Bayesian committee machine achieves significantly better accuracy, yet at a higher computational cost.

5 0.44581813 41 nips-2002-Bayesian Monte Carlo

Author: Zoubin Ghahramani, Carl E. Rasmussen

Abstract: We investigate Bayesian alternatives to classical Monte Carlo methods for evaluating integrals. Bayesian Monte Carlo (BMC) allows the incorporation of prior knowledge, such as smoothness of the integrand, into the estimation. In a simple problem we show that this outperforms any classical importance sampling method. We also attempt more challenging multidimensional integrals involved in computing marginal likelihoods of statistical models (a.k.a. partition functions and model evidences). We find that Bayesian Monte Carlo outperformed Annealed Importance Sampling, although for very high dimensional problems or problems with massive multimodality BMC may be less adequate. One advantage of the Bayesian approach to Monte Carlo is that samples can be drawn from any distribution. This allows for the possibility of active design of sample points so as to maximise information gain.

6 0.44525975 178 nips-2002-Robust Novelty Detection with Single-Class MPM

7 0.40458071 9 nips-2002-A Minimal Intervention Principle for Coordinated Movement

8 0.40146786 71 nips-2002-Dopamine Induced Bistability Enhances Signal Processing in Spiny Neurons

9 0.38966808 128 nips-2002-Learning a Forward Model of a Reflex

10 0.38336018 138 nips-2002-Manifold Parzen Windows

11 0.37877721 133 nips-2002-Learning to Perceive Transparency from the Statistics of Natural Scenes

12 0.3782835 169 nips-2002-Real-Time Particle Filters

13 0.3539055 96 nips-2002-Generalized² Linear² Models

14 0.35350531 86 nips-2002-Fast Sparse Gaussian Process Methods: The Informative Vector Machine

15 0.3500101 168 nips-2002-Real-Time Monitoring of Complex Industrial Processes with Particle Filters

16 0.34653658 110 nips-2002-Incremental Gaussian Processes

17 0.34474775 115 nips-2002-Informed Projections

18 0.3438246 63 nips-2002-Critical Lines in Symmetry of Mixture Models and its Application to Component Splitting

19 0.33508334 22 nips-2002-Adaptive Nonlinear System Identification with Echo State Networks

20 0.32250363 124 nips-2002-Learning Graphical Models with Mercer Kernels


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(11, 0.023), (23, 0.021), (37, 0.222), (42, 0.13), (54, 0.138), (55, 0.074), (64, 0.011), (67, 0.015), (68, 0.037), (74, 0.086), (92, 0.028), (98, 0.132)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.88845587 81 nips-2002-Expected and Unexpected Uncertainty: ACh and NE in the Neocortex

Author: Peter Dayan, Angela J. Yu

Abstract: Inference and adaptation in noisy and changing, rich sensory environments are rife with a variety of specific sorts of variability. Experimental and theoretical studies suggest that these different forms of variability play different behavioral, neural and computational roles, and may be reported by different (notably neuromodulatory) systems. Here, we refine our previous theory of acetylcholine’s role in cortical inference in the (oxymoronic) terms of expected uncertainty, and advocate a theory for norepinephrine in terms of unexpected uncertainty. We suggest that norepinephrine reports the radical divergence of bottom-up inputs from prevailing top-down interpretations, to influence inference and plasticity. We illustrate this proposal using an adaptive factor analysis model.

2 0.88110995 75 nips-2002-Dynamical Causal Learning

Author: David Danks, Thomas L. Griffiths, Joshua B. Tenenbaum

Abstract: Current psychological theories of human causal learning and judgment focus primarily on long-run predictions: two by estimating parameters of a causal Bayes nets (though for different parameterizations), and a third through structural learning. This paper focuses on people's short-run behavior by examining dynamical versions of these three theories, and comparing their predictions to a real-world dataset. 1

same-paper 3 0.83546317 65 nips-2002-Derivative Observations in Gaussian Process Models of Dynamic Systems

Author: E. Solak, R. Murray-smith, W. E. Leithead, D. J. Leith, Carl E. Rasmussen

Abstract: Gaussian processes provide an approach to nonparametric modelling which allows a straightforward combination of function and derivative observations in an empirical model. This is of particular importance in identification of nonlinear dynamic systems from experimental data. 1) It allows us to combine derivative information, and associated uncertainty with normal function observations into the learning and inference process. This derivative information can be in the form of priors specified by an expert or identified from perturbation data close to equilibrium. 2) It allows a seamless fusion of multiple local linear models in a consistent manner, inferring consistent models and ensuring that integrability constraints are met. 3) It improves dramatically the computational efficiency of Gaussian process models for dynamic system identification, by summarising large quantities of near-equilibrium data by a handful of linearisations, reducing the training set size – traditionally a problem for Gaussian process models.

4 0.74159682 46 nips-2002-Boosting Density Estimation

Author: Saharon Rosset, Eran Segal

Abstract: Several authors have suggested viewing boosting as a gradient descent search for a good fit in function space. We apply gradient-based boosting methodology to the unsupervised learning problem of density estimation. We show convergence properties of the algorithm and prove that a strength of weak learnability property applies to this problem as well. We illustrate the potential of this approach through experiments with boosting Bayesian networks to learn density models.

5 0.73824888 127 nips-2002-Learning Sparse Topographic Representations with Products of Student-t Distributions

Author: Max Welling, Simon Osindero, Geoffrey E. Hinton

Abstract: We propose a model for natural images in which the probability of an image is proportional to the product of the probabilities of some filter outputs. We encourage the system to find sparse features by using a Studentt distribution to model each filter output. If the t-distribution is used to model the combined outputs of sets of neurally adjacent filters, the system learns a topographic map in which the orientation, spatial frequency and location of the filters change smoothly across the map. Even though maximum likelihood learning is intractable in our model, the product form allows a relatively efficient learning procedure that works well even for highly overcomplete sets of filters. Once the model has been learned it can be used as a prior to derive the “iterated Wiener filter” for the purpose of denoising images.

6 0.73513591 52 nips-2002-Cluster Kernels for Semi-Supervised Learning

7 0.73013365 10 nips-2002-A Model for Learning Variance Components of Natural Images

8 0.73002195 21 nips-2002-Adaptive Classification by Variational Kalman Filtering

9 0.72782516 169 nips-2002-Real-Time Particle Filters

10 0.72775531 3 nips-2002-A Convergent Form of Approximate Policy Iteration

11 0.72371876 41 nips-2002-Bayesian Monte Carlo

12 0.72330749 14 nips-2002-A Probabilistic Approach to Single Channel Blind Signal Separation

13 0.72164327 193 nips-2002-Temporal Coherence, Natural Image Sequences, and the Visual Cortex

14 0.72147101 159 nips-2002-Optimality of Reinforcement Learning Algorithms with Linear Function Approximation

15 0.72009885 110 nips-2002-Incremental Gaussian Processes

16 0.7197867 2 nips-2002-A Bilinear Model for Sparse Coding

17 0.7196576 68 nips-2002-Discriminative Densities from Maximum Contrast Estimation

18 0.7175073 88 nips-2002-Feature Selection and Classification on Matrix Data: From Large Margins to Small Covering Numbers

19 0.71723676 141 nips-2002-Maximally Informative Dimensions: Analyzing Neural Responses to Natural Signals

20 0.71692306 24 nips-2002-Adaptive Scaling for Feature Selection in SVMs