nips nips2008 nips2008-71 knowledge-graph by maker-knowledge-mining

71 nips-2008-Efficient Sampling for Gaussian Process Inference using Control Variables

Source: pdf

Author: Neil D. Lawrence, Magnus Rattray, Michalis K. Titsias

Abstract: Sampling functions in Gaussian process (GP) models is challenging because of the highly correlated posterior distribution. We describe an efﬁcient Markov chain Monte Carlo algorithm for sampling from the posterior process of the GP model. This algorithm uses control variables which are auxiliary function values that provide a low dimensional representation of the function. At each iteration, the algorithm proposes new values for the control variables and generates the function from the conditional GP prior. The control variable input locations are found by minimizing an objective function. We demonstrate the algorithm on regression and classiﬁcation problems and we use it to estimate the parameters of a differential equation model of gene regulation. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Lawrence and Magnus Rattray School of Computer Science, University of Manchester Manchester M13 9PL, UK Abstract Sampling functions in Gaussian process (GP) models is challenging because of the highly correlated posterior distribution. [sent-3, score-0.173]

2 We describe an efﬁcient Markov chain Monte Carlo algorithm for sampling from the posterior process of the GP model. [sent-4, score-0.201]

3 This algorithm uses control variables which are auxiliary function values that provide a low dimensional representation of the function. [sent-5, score-0.256]

4 At each iteration, the algorithm proposes new values for the control variables and generates the function from the conditional GP prior. [sent-6, score-0.267]

5 The control variable input locations are found by minimizing an objective function. [sent-7, score-0.176]

6 We demonstrate the algorithm on regression and classiﬁcation problems and we use it to estimate the parameters of a differential equation model of gene regulation. [sent-8, score-0.238]

7 However, in recent applications of GP models in systems biology [1] that require the estimation of ordinary differential equation models [2, 13, 8], the development of deterministic approximations is difﬁcult since the likelihood can be highly complex. [sent-11, score-0.209]

8 Another advantage is that the sampling scheme will often not depend on details of the likelihood function, and is therefore very generally applicable. [sent-15, score-0.16]

9 This has proved to be particularly difﬁcult in many GP applications, because the posterior distribution describes a highly correlated high-dimensional variable. [sent-17, score-0.149]

10 Thus simple MCMC sampling schemes such as Gibbs sampling can be very inefﬁcient. [sent-18, score-0.18]

11 In this contribution we describe an efﬁcient MCMC algorithm for sampling from the posterior process of a GP model which constructs the proposal distributions by utilizing the GP prior. [sent-19, score-0.311]

12 This algorithm uses control variables which are auxiliary function values. [sent-20, score-0.256]

13 At each iteration, the algorithm proposes new values for the control variables and samples the function by drawing from the conditional GP prior. [sent-21, score-0.267]

14 The control variables are highly informative points that provide a low dimensional representation of the function. [sent-22, score-0.316]

15 The control input locations are found by minimizing an objective function. [sent-23, score-0.176]

16 The objective function used is the expected least squares error of reconstructing the function values from the control variables, where the expectation is over the GP prior. [sent-24, score-0.176]

17 We also apply the algorithm to inference in a systems biology model where a set of genes is regulated by a transcription factor protein [8]. [sent-26, score-0.3]

18 For GP models, ﬁnding a good proposal distribution is challenging since f is high dimensional and the posterior distribution can be highly correlated. [sent-53, score-0.238]

19 However, sampling from the GP prior is very inefﬁcient as it is unlikely to obtain a sample that will ﬁt the data. [sent-58, score-0.157]

20 On the other hand, sampling from the prior is appealing because any generated sample satisﬁes the smoothness requirement imposed by the covariance function. [sent-60, score-0.212]

21 The other extreme choice for the proposal, that has been considered in [10], is to apply Gibbs sampling where we iteratively draw samples from each posterior conditional density p(fi |f−i , y) with f−i = f \fi . [sent-62, score-0.193]

22 However, Gibbs sampling can be extremely slow for densely discretized functions, as in the regression problem of Figure 1, where the posterior GP process is highly correlated. [sent-63, score-0.312]

23 To clarify this, note that the variance of the posterior conditional p(fi |f−i , y) is smaller or equal to the variance of the conditional GP prior p(fi |f−i ). [sent-64, score-0.259]

24 A similar algorithm to Gibbs sampling can be expressed by using the sequence of the conditional densities p(fi |f−i ) as a proposal distribution for the MH algorithm1 . [sent-68, score-0.256]

25 This algorithm can exhibit a high acceptance rate, but it is inefﬁcient to sample from highly correlated functions. [sent-70, score-0.204]

26 A simple generalization of the Gibbs-like algorithm that is more appropriate for sampling from smooth functions is to divide the domain of the function into regions and sample the entire function within each region by conditioning on the remaining function regions. [sent-71, score-0.143]

27 Local region sampling iteratively draws each block of functions values fk from 1 Thus we replace the proposal distribution p(fi |f−i , y) with the prior conditional p(fi |f−i ). [sent-72, score-0.356]

28 However, this scheme is still inefﬁcient to sample from highly correlated functions since the variance of the proposal distribution can be very small close to the boundaries between neighbouring function regions. [sent-74, score-0.291]

29 In the next section we discuss an algorithm using control variables that can efﬁciently sample from highly correlated functions. [sent-76, score-0.335]

30 1 Sampling using control variables Let fc be a set of M auxiliary function values that are evaluated at inputs Xc and drawn from the GP prior. [sent-78, score-1.017]

31 We call fc the control variables and their meaning is analogous to the auxiliary inducing variables used in sparse GP models [15]. [sent-79, score-1.032]

32 To compute the posterior p(f |y) based on control variables we use the expression p(f |y) = p(f |fc , y)p(fc |y)dfc . [sent-80, score-0.339]

33 (3) fc Assuming that fc is highly informative about f , so that p(f |fc , y) p(f |fc ), we can approximately sample from p(f |y) in a two-stage manner: ﬁrstly sample the control variables from p(fc |y) and then generate f from the conditional prior p(f |fc ). [sent-81, score-1.861]

34 This scheme can allow us to introduce a MH (t+1) (t) |fc ), that will mimic algorithm, where we need to specify only a proposal distribution q(fc sampling from p(fc |y), and always sample f from the conditional prior p(f |fc ). [sent-82, score-0.374]

35 The whole proposal distribution takes the form (t+1) (t) (t) (t+1) (t+1) (t) Q(f (t+1) , fc |f , fc ) = p(f (t+1) |fc )q(fc |fc ). [sent-83, score-1.572]

36 (t) |fc ) p(y|f (t) )p(fc ) q(fc (5) The usefulness of the above sampling scheme stems from the fact that the control variables can form a low-dimensional representation of the function. [sent-86, score-0.352]

37 Assuming that these variables are much fewer than the points in f , the sampling is mainly carried out in the low dimensional space. [sent-87, score-0.167]

38 2 we describe how to select the number M of control variables and the inputs Xc so as fc becomes highly informative about f . [sent-89, score-1.055]

39 Firstly, tuning a full covariance matrix is time consuming and in our case this adaption process must be carried out simultaneously with searching for an appropriate set of control variables. [sent-94, score-0.257]

40 (5), using a diagonal covariance for the q distribution has the risk of proposing control variables that may not satisfy the GP prior smoothness requirement. [sent-96, score-0.331]

41 (3) a suitable choice for q must mimic the sampling from the posterior p(fc |y). [sent-99, score-0.2]

42 Given that the control points are far apart from each other, Gibbs sampling in the control variables space can be efﬁcient. [sent-100, score-0.519]

43 However, iteratively sampling fci from the conditional posterior p(fci |fc−i , y) ∝ p(y|fc )p(fci |fc−i ), where fc−i = fc \ fci is intractable for non-Gaussian likelihoods2 . [sent-101, score-1.36]

44 An attractive alternative is to use a Gibbs-like algorithm where each fci is drawn from (t+1) (t) the conditional GP prior p(fci |fc−i ) and is accepted using the MH step. [sent-102, score-0.324]

45 More speciﬁcally, the (t+1) (t+1) (t) proposal distribution draws a new fci for a certain control variable i from p(fci |fc−i ) and (t+1) (t) (t+1) generates the function f (t+1) from p(f (t+1) |fci , fc−i ). [sent-103, score-0.517]

46 This scheme of sampling the control variables one-at-a-time and resampling f is iterated between different control variables. [sent-105, score-0.528]

47 A complete iteration of the algorithm consists of a full scan over all control variables. [sent-106, score-0.199]

48 The iteration between different control variables is illustrated in Figure 1. [sent-109, score-0.254]

49 8 1 Figure 1: Visualization of iterating between control variables. [sent-123, score-0.176]

50 The red solid line is the current f (t) , the blue (t) line is the proposed f (t+1) , the red circles are the current control variables fc while the diamond (in magenta) (t+1) (t) (t+1) is the proposed control variable fci . [sent-124, score-1.339]

51 Although the control variables are sampled one-at-at-time, f can still be drawn with a considerable variance. [sent-126, score-0.231]

52 This conditional prior can have considerable variance close to fci and in all regions that are not close to the remaining control variables. [sent-128, score-0.494]

53 As illustrated in Figure 1, the iteration over different control variables allow f to be drawn with a considerable variance everywhere in the input space. [sent-129, score-0.28]

54 2 Selection of the control variables To apply the previous algorithm we need to select the number, M , of the control points and the associated inputs Xc . [sent-131, score-0.469]

55 Xc must be chosen so that knowledge of fc can determine f with small −1 error. [sent-132, score-0.721]

56 The prediction of f given fc is equal to Kf,c Kc,c fc which is the mean of the conditional prior −1 p(f |fc ). [sent-133, score-1.523]

57 A suitable way to search over Xc is to minimize the reconstruction error ||f − Kf,c Kc,c fc ||2 averaged over any possible value of (f , fc ): −1 −1 T ||f − Kf,c Kc,c fc ||2 p(f |fc )p(fc )df dfc = Tr(Kf,f − Kf,c Kc,c Kf,c ). [sent-134, score-2.217]

58 To ﬁnd the number M of control points we minimize G(Xc ) by incrementally adding control variables until the total variance of p(f |fc ) becomes smaller than a certain percentage of the total variance of the prior p(f ). [sent-141, score-0.526]

59 According to standard heuristics [12] which suggest that desirable acceptance rates of MH algorithms are around 1/4, we require a full iteration of the algorithm (a complete scan over the control variables) to have an acceptance rate larger than 1/4. [sent-144, score-0.399]

60 When for the current set of control inputs Xc the chain has a low acceptance rate, it means that the variance of p(f |fc ) is still too high and we need to add more control points in order to further reduce G(Xc ). [sent-145, score-0.56]

61 The process of observing the acceptance rate and adding control variables is continued until we reach the desirable acceptance rate. [sent-146, score-0.455]

62 In general, the minimization of G places the control inputs close to the clusters of the input data in such a way that the kernel function is taken into account. [sent-148, score-0.24]

63 3 Applications We consider two applications where exact inference is intractable due to a non-linear likelihood function: classiﬁcation and parameter estimation in a differential equation model of gene regulation. [sent-150, score-0.316]

64 Our MCMC implementation conﬁrms these ﬁndings since sampling using control variables gave similar classiﬁcation accuracy to EP. [sent-153, score-0.321]

65 Transcriptional regulation: We consider a small biological sub-system where a set of target genes are regulated by one transcription factor (TF) protein. [sent-154, score-0.197]

66 The concentration of the TF and the gene speciﬁc kinetic parameters are typically unknown and need to be estimated by making use of a set of observed gene expression levels. [sent-156, score-0.471]

67 [2] introduce a linear ODE model for gene activation from TF. [sent-159, score-0.166]

68 Additionally, the kinetic parameters of each gene αj = (Bj , Dj , Sj , Aj ) are unknown and also need to be estimated. [sent-167, score-0.221]

69 Let yjt denote the observed gene expression level of gene j at time t and let y = {yjt } collect together all these observations. [sent-169, score-0.42]

70 Assuming a Gaussian noise for the observed gene expressions the likelihood of our data has the form N T p(y|f , {αj }N ) = j=1 p(yjt |f1≤p≤Pt , αj ), (8) j=1 t=1 where each probability density in the above product is a Gaussian with mean given by eq. [sent-170, score-0.205]

71 Further, this likelihood does not have a factorized form, as in the regression and classiﬁcation cases, since an observed gene expression depends on the protein concentration activity in all previous times points. [sent-173, score-0.373]

72 Also note that the discretization of the TF in P time points corresponds to a very dense grid, while the gene expression measurements are sparse, i. [sent-174, score-0.229]

73 The protein concentration f is a positive quantity, thus a suitable prior is to consider a GP prior for log f . [sent-178, score-0.213]

74 The kinetic parameters of each gene are all positive scalars. [sent-179, score-0.221]

75 Sampling from the kinetic parameters is carried using Gaussian proposal distributions with diagonal covariance matrices that sample the positive kinetic parameters in the log space. [sent-183, score-0.292]

76 25 60 Gibbs region control 50 number of control variables gibbs region control KL(real||empirical) KL(real||empirical) 15 10 5 40 30 20 10 0 2 4 6 MCMC iterations (a) 8 10 4 x 10 2 4 6 dimension (b) 8 10 0. [sent-185, score-0.835]

77 05 corrCoef control 10 0 0 0 70 2 4 6 dimension 8 0 10 (c) Figure 2: (a) shows the evolution of the KL divergence (against the number of MCMC iterations) between the true posterior and the empirically estimated posteriors for a 5-dimensional regression dataset. [sent-189, score-0.332]

78 (b) shows the mean values with one-standard error bars of the KL divergence (against the input dimension) between the true posterior and the empirically estimated posteriors. [sent-190, score-0.14]

79 (c) plots the number of control variables together with the average correlation coefﬁcient of the GP prior. [sent-191, score-0.231]

80 4 Experiments In the ﬁrst experiment we compare Gibbs sampling (Gibbs), sampling using local regions (region) (see the supplementary ﬁle) and sampling using control variables (control) in standard regression problems of varied input dimensions. [sent-192, score-0.528]

81 For the control algorithm we observe that the KL divergence is very close to zero for all dimensions. [sent-214, score-0.207]

82 Figure 2(c) shows the increase in the number of control variables used as the input dimension increases. [sent-215, score-0.262]

83 This is very intuitive, since one should expect the number of control variables to increase as the function values become more independent. [sent-217, score-0.231]

84 Figures 3(a) and (b) shows 3 For Gibbs we used 2 × 104 iterations since the region and control algorithms require additional iterations during the adaption phase. [sent-224, score-0.326]

85 1 −264 200 400 600 MCMC iterations 800 −50 1000 200 400 600 MCMC iterations (a) 800 1000 0 gibbs contr ep (b) gibbs contr ep (c) Figure 3: We show results for GP classiﬁcation. [sent-232, score-0.444]

86 Log-likelihood values are shown for MCMC samples obtained from (a) Gibbs and (b) control applied to the WBC dataset. [sent-233, score-0.176]

87 5 0 10 20 30 40 50 60 3 0 10 20 30 40 50 60 Figure 4: First row: The left plot shows the inferred TF concentration for p53; the small plot on top-right shows the ground-truth protein concentration obtained by a Western blot experiment [2]. [sent-248, score-0.21]

88 The middle plot shows the predicted expression of a gene obtained by the estimated ODE model; red crosses correspond to the actual gene expression measurements. [sent-249, score-0.435]

89 the log-likelihood for MCMC samples on the WBC dataset, for the Gibbs and control algorithms respectively. [sent-255, score-0.176]

90 It can be observed that mixing is far superior for the control algorithm and it has also converged to a much higher likelihood. [sent-256, score-0.176]

91 The proposed control algorithm shows similar classiﬁcation performance to EP, while the Gibbs algorithm performs signiﬁcantly worse on both datasets. [sent-258, score-0.176]

92 In the ﬁnal two experiments we apply the control algorithm to infer the protein concentration of TFs that activate or repress a set of target genes. [sent-259, score-0.304]

93 The latent function in these problems is always onedimensional and densely discretized and thus the control algorithm is the only one that can converge to the GP posterior process in a reasonable time. [sent-260, score-0.363]

94 Seven samples of the expression levels of ﬁve target genes in three replicas are collected as the raw time course data. [sent-262, score-0.148]

95 +f (t) (6) where the Michaelis constant for the jth gene is given by γj . [sent-264, score-0.166]

96 During sampling, 7 control variables were needed to obtain the desirable acceptance rate. [sent-267, score-0.331]

97 Our inferred TF proﬁle and reconstructed target gene proﬁles are similar to those obtained in [13]. [sent-278, score-0.219]

98 However, for certain genes, our model provides a better ﬁt to the gene proﬁle. [sent-279, score-0.166]

99 In this paper, we presented an MCMC algorithm that uses control variables. [sent-281, score-0.176]

100 We showed that this sampling scheme can efﬁciently deal with highly correlated posterior GP processes. [sent-282, score-0.27]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('fc', 0.721), ('gp', 0.311), ('fci', 0.211), ('control', 0.176), ('tf', 0.176), ('gene', 0.166), ('mcmc', 0.154), ('xc', 0.131), ('proposal', 0.13), ('gibbs', 0.113), ('acceptance', 0.1), ('sampling', 0.09), ('mh', 0.084), ('genes', 0.079), ('posterior', 0.067), ('fi', 0.067), ('transcription', 0.063), ('kl', 0.062), ('protein', 0.057), ('dj', 0.056), ('kinetic', 0.055), ('variables', 0.055), ('wbc', 0.054), ('michaelis', 0.047), ('yjt', 0.047), ('iterations', 0.046), ('inef', 0.045), ('prior', 0.045), ('differential', 0.045), ('concentration', 0.043), ('inference', 0.042), ('bars', 0.042), ('ode', 0.042), ('correlated', 0.041), ('expression', 0.041), ('highly', 0.041), ('odes', 0.041), ('inputs', 0.04), ('gaussian', 0.039), ('likelihood', 0.039), ('mrna', 0.037), ('conditional', 0.036), ('densely', 0.035), ('classi', 0.033), ('latent', 0.033), ('bj', 0.033), ('ep', 0.032), ('accepted', 0.032), ('biology', 0.032), ('contr', 0.031), ('dfc', 0.031), ('edj', 0.031), ('menten', 0.031), ('pid', 0.031), ('divergence', 0.031), ('region', 0.031), ('dimension', 0.031), ('scheme', 0.031), ('covariance', 0.03), ('sj', 0.03), ('ordinary', 0.029), ('discretized', 0.028), ('regulation', 0.028), ('target', 0.028), ('regression', 0.027), ('regulated', 0.027), ('thinned', 0.027), ('barenco', 0.027), ('adaption', 0.027), ('variance', 0.026), ('bayesian', 0.025), ('smoothness', 0.025), ('pt', 0.025), ('inferred', 0.025), ('auxiliary', 0.025), ('process', 0.024), ('fk', 0.024), ('pro', 0.024), ('intractable', 0.024), ('places', 0.024), ('transcriptional', 0.023), ('manchester', 0.023), ('clarify', 0.023), ('iteration', 0.023), ('deterministic', 0.023), ('carlo', 0.023), ('yj', 0.023), ('suitable', 0.023), ('aj', 0.022), ('informative', 0.022), ('monte', 0.022), ('sample', 0.022), ('divergences', 0.022), ('points', 0.022), ('plot', 0.021), ('wp', 0.021), ('grey', 0.021), ('chain', 0.02), ('dimensions', 0.02), ('mimic', 0.02)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999952 71 nips-2008-Efficient Sampling for Gaussian Process Inference using Control Variables

Author: Neil D. Lawrence, Magnus Rattray, Michalis K. Titsias

2 0.29190424 12 nips-2008-Accelerating Bayesian Inference over Nonlinear Differential Equations with Gaussian Processes

Author: Ben Calderhead, Mark Girolami, Neil D. Lawrence

Abstract: Identiﬁcation and comparison of nonlinear dynamical system models using noisy and sparse experimental data is a vital task in many ﬁelds, however current methods are computationally expensive and prone to error due in part to the nonlinear nature of the likelihood surfaces induced. We present an accelerated sampling procedure which enables Bayesian inference of parameters in nonlinear ordinary and delay differential equations via the novel use of Gaussian processes (GP). Our method involves GP regression over time-series data, and the resulting derivative and time delay estimates make parameter inference possible without solving the dynamical system explicitly, resulting in dramatic savings of computational time. We demonstrate the speed and statistical accuracy of our approach using examples of both ordinary and delay differential equations, and provide a comprehensive comparison with current state of the art methods. 1

3 0.20845911 193 nips-2008-Regularized Co-Clustering with Dual Supervision

Author: Vikas Sindhwani, Jianying Hu, Aleksandra Mojsilovic

Abstract: By attempting to simultaneously partition both the rows (examples) and columns (features) of a data matrix, Co-clustering algorithms often demonstrate surprisingly impressive performance improvements over traditional one-sided row clustering techniques. A good clustering of features may be seen as a combinatorial transformation of the data matrix, effectively enforcing a form of regularization that may lead to a better clustering of examples (and vice-versa). In many applications, partial supervision in the form of a few row labels as well as column labels may be available to potentially assist co-clustering. In this paper, we develop two novel semi-supervised multi-class classiﬁcation algorithms motivated respectively by spectral bipartite graph partitioning and matrix approximation formulations for co-clustering. These algorithms (i) support dual supervision in the form of labels for both examples and/or features, (ii) provide principled predictive capability on out-of-sample test data, and (iii) arise naturally from the classical Representer theorem applied to regularization problems posed on a collection of Reproducing Kernel Hilbert Spaces. Empirical results demonstrate the effectiveness and utility of our algorithms. 1

4 0.14054722 32 nips-2008-Bayesian Kernel Shaping for Learning Control

Author: Jo-anne Ting, Mrinal Kalakrishnan, Sethu Vijayakumar, Stefan Schaal

Abstract: In kernel-based regression learning, optimizing each kernel individually is useful when the data density, curvature of regression surfaces (or decision boundaries) or magnitude of output noise varies spatially. Previous work has suggested gradient descent techniques or complex statistical hypothesis methods for local kernel shaping, typically requiring some amount of manual tuning of meta parameters. We introduce a Bayesian formulation of nonparametric regression that, with the help of variational approximations, results in an EM-like algorithm for simultaneous estimation of regression and kernel parameters. The algorithm is computationally efﬁcient, requires no sampling, automatically rejects outliers and has only one prior to be speciﬁed. It can be used for nonparametric regression with local polynomials or as a novel method to achieve nonstationary regression with Gaussian processes. Our methods are particularly useful for learning control, where reliable estimation of local tangent planes is essential for adaptive controllers and reinforcement learning. We evaluate our methods on several synthetic data sets and on an actual robot which learns a task-level control law. 1

5 0.12130242 9 nips-2008-A mixture model for the evolution of gene expression in non-homogeneous datasets

Author: Gerald Quon, Yee W. Teh, Esther Chan, Timothy Hughes, Michael Brudno, Quaid D. Morris

Abstract: We address the challenge of assessing conservation of gene expression in complex, non-homogeneous datasets. Recent studies have demonstrated the success of probabilistic models in studying the evolution of gene expression in simple eukaryotic organisms such as yeast, for which measurements are typically scalar and independent. Models capable of studying expression evolution in much more complex organisms such as vertebrates are particularly important given the medical and scientiﬁc interest in species such as human and mouse. We present Brownian Factor Phylogenetic Analysis, a statistical model that makes a number of signiﬁcant extensions to previous models to enable characterization of changes in expression among highly complex organisms. We demonstrate the efﬁcacy of our method on a microarray dataset proﬁling diverse tissues from multiple vertebrate species. We anticipate that the model will be invaluable in the study of gene expression patterns in other diverse organisms as well, such as worms and insects. 1

6 0.10532285 213 nips-2008-Sparse Convolved Gaussian Processes for Multi-output Regression

7 0.10377559 86 nips-2008-Finding Latent Causes in Causal Networks: an Efficient Approach Based on Markov Blankets

8 0.10274793 146 nips-2008-Multi-task Gaussian Process Learning of Robot Inverse Dynamics

9 0.08998619 235 nips-2008-The Infinite Hierarchical Factor Regression Model

10 0.08803577 233 nips-2008-The Gaussian Process Density Sampler

11 0.084118657 249 nips-2008-Variational Mixture of Gaussian Process Experts

12 0.07991448 108 nips-2008-Integrating Locally Learned Causal Structures with Overlapping Variables

13 0.079903029 138 nips-2008-Modeling human function learning with Gaussian processes

14 0.077538028 221 nips-2008-Stochastic Relational Models for Large-scale Dyadic Data using MCMC

15 0.071025804 129 nips-2008-MAS: a multiplicative approximation scheme for probabilistic inference

16 0.066191278 152 nips-2008-Non-stationary dynamic Bayesian networks

17 0.061465673 62 nips-2008-Differentiable Sparse Coding

18 0.057354786 77 nips-2008-Evaluating probabilities under high-dimensional latent variable models

19 0.056650896 231 nips-2008-Temporal Dynamics of Cognitive Control

20 0.056548361 125 nips-2008-Local Gaussian Process Regression for Real Time Online Model Learning

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.172), (1, 0.004), (2, 0.065), (3, 0.064), (4, 0.129), (5, -0.09), (6, 0.03), (7, 0.271), (8, 0.011), (9, 0.043), (10, 0.089), (11, 0.059), (12, 0.154), (13, -0.184), (14, 0.182), (15, -0.027), (16, -0.13), (17, 0.11), (18, 0.114), (19, -0.071), (20, -0.006), (21, 0.019), (22, -0.035), (23, 0.026), (24, 0.061), (25, 0.108), (26, -0.123), (27, 0.072), (28, -0.087), (29, -0.075), (30, -0.019), (31, 0.032), (32, 0.069), (33, -0.087), (34, 0.06), (35, -0.115), (36, -0.018), (37, 0.087), (38, -0.048), (39, 0.064), (40, 0.05), (41, 0.055), (42, -0.024), (43, 0.025), (44, 0.003), (45, -0.1), (46, 0.025), (47, 0.078), (48, 0.021), (49, -0.165)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94365585 71 nips-2008-Efficient Sampling for Gaussian Process Inference using Control Variables

Author: Neil D. Lawrence, Magnus Rattray, Michalis K. Titsias

2 0.76578164 12 nips-2008-Accelerating Bayesian Inference over Nonlinear Differential Equations with Gaussian Processes

Author: Ben Calderhead, Mark Girolami, Neil D. Lawrence

3 0.69786131 213 nips-2008-Sparse Convolved Gaussian Processes for Multi-output Regression

Author: Mauricio Alvarez, Neil D. Lawrence

Abstract: We present a sparse approximation approach for dependent output Gaussian processes (GP). Employing a latent function framework, we apply the convolution process formalism to establish dependencies between output variables, where each latent function is represented as a GP. Based on these latent functions, we establish an approximation scheme using a conditional independence assumption between the output processes, leading to an approximation of the full covariance which is determined by the locations at which the latent functions are evaluated. We show results of the proposed methodology for synthetic data and real world applications on pollution prediction and a sensor network. 1

4 0.66114801 146 nips-2008-Multi-task Gaussian Process Learning of Robot Inverse Dynamics

Author: Christopher Williams, Stefan Klanke, Sethu Vijayakumar, Kian M. Chai

Abstract: The inverse dynamics problem for a robotic manipulator is to compute the torques needed at the joints to drive it along a given trajectory; it is beneﬁcial to be able to learn this function for adaptive control. A robotic manipulator will often need to be controlled while holding different loads in its end effector, giving rise to a multi-task learning problem. By placing independent Gaussian process priors over the latent functions of the inverse dynamics, we obtain a multi-task Gaussian process prior for handling multiple loads, where the inter-task similarity depends on the underlying inertial parameters. Experiments demonstrate that this multi-task formulation is effective in sharing information among the various loads, and generally improves performance over either learning only on single tasks or pooling the data over all tasks. 1

5 0.54918516 32 nips-2008-Bayesian Kernel Shaping for Learning Control

Author: Jo-anne Ting, Mrinal Kalakrishnan, Sethu Vijayakumar, Stefan Schaal

6 0.4650369 221 nips-2008-Stochastic Relational Models for Large-scale Dyadic Data using MCMC

7 0.46437159 9 nips-2008-A mixture model for the evolution of gene expression in non-homogeneous datasets

8 0.45116714 11 nips-2008-A spatially varying two-sample recombinant coalescent, with applications to HIV escape response

9 0.44443166 125 nips-2008-Local Gaussian Process Regression for Real Time Online Model Learning

10 0.44128218 233 nips-2008-The Gaussian Process Density Sampler

11 0.43495208 249 nips-2008-Variational Mixture of Gaussian Process Experts

12 0.4328824 193 nips-2008-Regularized Co-Clustering with Dual Supervision

13 0.42216575 82 nips-2008-Fast Computation of Posterior Mode in Multi-Level Hierarchical Models

14 0.40742576 30 nips-2008-Bayesian Experimental Design of Magnetic Resonance Imaging Sequences

15 0.39949629 90 nips-2008-Gaussian-process factor analysis for low-dimensional single-trial analysis of neural population activity

16 0.36944965 152 nips-2008-Non-stationary dynamic Bayesian networks

17 0.34578738 129 nips-2008-MAS: a multiplicative approximation scheme for probabilistic inference

18 0.33806017 105 nips-2008-Improving on Expectation Propagation

19 0.33567128 235 nips-2008-The Infinite Hierarchical Factor Regression Model

20 0.32270548 70 nips-2008-Efficient Inference in Phylogenetic InDel Trees

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(6, 0.062), (7, 0.2), (12, 0.016), (15, 0.012), (28, 0.178), (57, 0.105), (59, 0.021), (63, 0.018), (70, 0.178), (71, 0.016), (77, 0.031), (78, 0.011), (83, 0.058)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.87829125 71 nips-2008-Efficient Sampling for Gaussian Process Inference using Control Variables

Author: Neil D. Lawrence, Magnus Rattray, Michalis K. Titsias

2 0.84912419 92 nips-2008-Generative versus discriminative training of RBMs for classification of fMRI images

Author: Tanya Schmah, Geoffrey E. Hinton, Steven L. Small, Stephen Strother, Richard S. Zemel

Abstract: Neuroimaging datasets often have a very large number of voxels and a very small number of training cases, which means that overﬁtting of models for this data can become a very serious problem. Working with a set of fMRI images from a study on stroke recovery, we consider a classiﬁcation task for which logistic regression performs poorly, even when L1- or L2- regularized. We show that much better discrimination can be achieved by ﬁtting a generative model to each separate condition and then seeing which model is most likely to have generated the data. We compare discriminative training of exactly the same set of models, and we also consider convex blends of generative and discriminative training. 1

3 0.84258181 45 nips-2008-Characterizing neural dependencies with copula models

Author: Pietro Berkes, Frank Wood, Jonathan W. Pillow

Abstract: The coding of information by neural populations depends critically on the statistical dependencies between neuronal responses. However, there is no simple model that can simultaneously account for (1) marginal distributions over single-neuron spike counts that are discrete and non-negative; and (2) joint distributions over the responses of multiple neurons that are often strongly dependent. Here, we show that both marginal and joint properties of neural responses can be captured using copula models. Copulas are joint distributions that allow random variables with arbitrary marginals to be combined while incorporating arbitrary dependencies between them. Different copulas capture different kinds of dependencies, allowing for a richer and more detailed description of dependencies than traditional summary statistics, such as correlation coefﬁcients. We explore a variety of copula models for joint neural response distributions, and derive an efﬁcient maximum likelihood procedure for estimating them. We apply these models to neuronal data collected in macaque pre-motor cortex, and quantify the improvement in coding accuracy afforded by incorporating the dependency structure between pairs of neurons. We ﬁnd that more than one third of neuron pairs shows dependency concentrated in the lower or upper tails for their ﬁring rate distribution. 1

4 0.83314872 56 nips-2008-Deep Learning with Kernel Regularization for Visual Recognition

Author: Kai Yu, Wei Xu, Yihong Gong

Abstract: In this paper we aim to train deep neural networks for rapid visual recognition. The task is highly challenging, largely due to the lack of a meaningful regularizer on the functions realized by the networks. We propose a novel regularization method that takes advantage of kernel methods, where an oracle kernel function represents prior knowledge about the recognition task of interest. We derive an efﬁcient algorithm using stochastic gradient descent, and demonstrate encouraging results on a wide range of recognition tasks, in terms of both accuracy and speed. 1

5 0.82036191 12 nips-2008-Accelerating Bayesian Inference over Nonlinear Differential Equations with Gaussian Processes

Author: Ben Calderhead, Mark Girolami, Neil D. Lawrence

6 0.81381196 109 nips-2008-Interpreting the neural code with Formal Concept Analysis

7 0.81195855 51 nips-2008-Convergence and Rate of Convergence of a Manifold-Based Dimension Reduction Algorithm

8 0.79697716 221 nips-2008-Stochastic Relational Models for Large-scale Dyadic Data using MCMC

9 0.79487193 137 nips-2008-Modeling Short-term Noise Dependence of Spike Counts in Macaque Prefrontal Cortex

10 0.78753781 63 nips-2008-Dimensionality Reduction for Data in Multiple Feature Representations

11 0.78586918 62 nips-2008-Differentiable Sparse Coding

12 0.78567487 66 nips-2008-Dynamic visual attention: searching for coding length increments

13 0.78356433 192 nips-2008-Reducing statistical dependencies in natural signals using radial Gaussianization

14 0.78276438 213 nips-2008-Sparse Convolved Gaussian Processes for Multi-output Regression

15 0.78202081 64 nips-2008-DiscLDA: Discriminative Learning for Dimensionality Reduction and Classification

16 0.77959931 60 nips-2008-Designing neurophysiology experiments to optimally constrain receptive field models along parametric submanifolds

17 0.77813947 54 nips-2008-Covariance Estimation for High Dimensional Data Vectors Using the Sparse Matrix Transform

18 0.77503091 248 nips-2008-Using matrices to model symbolic relationship

19 0.77348644 138 nips-2008-Modeling human function learning with Gaussian processes

20 0.77318937 79 nips-2008-Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning