nips nips2008 nips2008-71 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Neil D. Lawrence, Magnus Rattray, Michalis K. Titsias
Abstract: Sampling functions in Gaussian process (GP) models is challenging because of the highly correlated posterior distribution. We describe an efficient Markov chain Monte Carlo algorithm for sampling from the posterior process of the GP model. This algorithm uses control variables which are auxiliary function values that provide a low dimensional representation of the function. At each iteration, the algorithm proposes new values for the control variables and generates the function from the conditional GP prior. The control variable input locations are found by minimizing an objective function. We demonstrate the algorithm on regression and classification problems and we use it to estimate the parameters of a differential equation model of gene regulation. 1
Reference: text
sentIndex sentText sentNum sentScore
1 Lawrence and Magnus Rattray School of Computer Science, University of Manchester Manchester M13 9PL, UK Abstract Sampling functions in Gaussian process (GP) models is challenging because of the highly correlated posterior distribution. [sent-3, score-0.173]
2 We describe an efficient Markov chain Monte Carlo algorithm for sampling from the posterior process of the GP model. [sent-4, score-0.201]
3 This algorithm uses control variables which are auxiliary function values that provide a low dimensional representation of the function. [sent-5, score-0.256]
4 At each iteration, the algorithm proposes new values for the control variables and generates the function from the conditional GP prior. [sent-6, score-0.267]
5 The control variable input locations are found by minimizing an objective function. [sent-7, score-0.176]
6 We demonstrate the algorithm on regression and classification problems and we use it to estimate the parameters of a differential equation model of gene regulation. [sent-8, score-0.238]
7 However, in recent applications of GP models in systems biology [1] that require the estimation of ordinary differential equation models [2, 13, 8], the development of deterministic approximations is difficult since the likelihood can be highly complex. [sent-11, score-0.209]
8 Another advantage is that the sampling scheme will often not depend on details of the likelihood function, and is therefore very generally applicable. [sent-15, score-0.16]
9 This has proved to be particularly difficult in many GP applications, because the posterior distribution describes a highly correlated high-dimensional variable. [sent-17, score-0.149]
10 Thus simple MCMC sampling schemes such as Gibbs sampling can be very inefficient. [sent-18, score-0.18]
11 In this contribution we describe an efficient MCMC algorithm for sampling from the posterior process of a GP model which constructs the proposal distributions by utilizing the GP prior. [sent-19, score-0.311]
12 This algorithm uses control variables which are auxiliary function values. [sent-20, score-0.256]
13 At each iteration, the algorithm proposes new values for the control variables and samples the function by drawing from the conditional GP prior. [sent-21, score-0.267]
14 The control variables are highly informative points that provide a low dimensional representation of the function. [sent-22, score-0.316]
15 The control input locations are found by minimizing an objective function. [sent-23, score-0.176]
16 The objective function used is the expected least squares error of reconstructing the function values from the control variables, where the expectation is over the GP prior. [sent-24, score-0.176]
17 We also apply the algorithm to inference in a systems biology model where a set of genes is regulated by a transcription factor protein [8]. [sent-26, score-0.3]
18 For GP models, finding a good proposal distribution is challenging since f is high dimensional and the posterior distribution can be highly correlated. [sent-53, score-0.238]
19 However, sampling from the GP prior is very inefficient as it is unlikely to obtain a sample that will fit the data. [sent-58, score-0.157]
20 On the other hand, sampling from the prior is appealing because any generated sample satisfies the smoothness requirement imposed by the covariance function. [sent-60, score-0.212]
21 The other extreme choice for the proposal, that has been considered in [10], is to apply Gibbs sampling where we iteratively draw samples from each posterior conditional density p(fi |f−i , y) with f−i = f \fi . [sent-62, score-0.193]
22 However, Gibbs sampling can be extremely slow for densely discretized functions, as in the regression problem of Figure 1, where the posterior GP process is highly correlated. [sent-63, score-0.312]
23 To clarify this, note that the variance of the posterior conditional p(fi |f−i , y) is smaller or equal to the variance of the conditional GP prior p(fi |f−i ). [sent-64, score-0.259]
24 A similar algorithm to Gibbs sampling can be expressed by using the sequence of the conditional densities p(fi |f−i ) as a proposal distribution for the MH algorithm1 . [sent-68, score-0.256]
25 This algorithm can exhibit a high acceptance rate, but it is inefficient to sample from highly correlated functions. [sent-70, score-0.204]
26 A simple generalization of the Gibbs-like algorithm that is more appropriate for sampling from smooth functions is to divide the domain of the function into regions and sample the entire function within each region by conditioning on the remaining function regions. [sent-71, score-0.143]
27 Local region sampling iteratively draws each block of functions values fk from 1 Thus we replace the proposal distribution p(fi |f−i , y) with the prior conditional p(fi |f−i ). [sent-72, score-0.356]
28 However, this scheme is still inefficient to sample from highly correlated functions since the variance of the proposal distribution can be very small close to the boundaries between neighbouring function regions. [sent-74, score-0.291]
29 In the next section we discuss an algorithm using control variables that can efficiently sample from highly correlated functions. [sent-76, score-0.335]
30 1 Sampling using control variables Let fc be a set of M auxiliary function values that are evaluated at inputs Xc and drawn from the GP prior. [sent-78, score-1.017]
31 We call fc the control variables and their meaning is analogous to the auxiliary inducing variables used in sparse GP models [15]. [sent-79, score-1.032]
32 To compute the posterior p(f |y) based on control variables we use the expression p(f |y) = p(f |fc , y)p(fc |y)dfc . [sent-80, score-0.339]
33 (3) fc Assuming that fc is highly informative about f , so that p(f |fc , y) p(f |fc ), we can approximately sample from p(f |y) in a two-stage manner: firstly sample the control variables from p(fc |y) and then generate f from the conditional prior p(f |fc ). [sent-81, score-1.861]
34 This scheme can allow us to introduce a MH (t+1) (t) |fc ), that will mimic algorithm, where we need to specify only a proposal distribution q(fc sampling from p(fc |y), and always sample f from the conditional prior p(f |fc ). [sent-82, score-0.374]
35 The whole proposal distribution takes the form (t+1) (t) (t) (t+1) (t+1) (t) Q(f (t+1) , fc |f , fc ) = p(f (t+1) |fc )q(fc |fc ). [sent-83, score-1.572]
36 (t) |fc ) p(y|f (t) )p(fc ) q(fc (5) The usefulness of the above sampling scheme stems from the fact that the control variables can form a low-dimensional representation of the function. [sent-86, score-0.352]
37 Assuming that these variables are much fewer than the points in f , the sampling is mainly carried out in the low dimensional space. [sent-87, score-0.167]
38 2 we describe how to select the number M of control variables and the inputs Xc so as fc becomes highly informative about f . [sent-89, score-1.055]
39 Firstly, tuning a full covariance matrix is time consuming and in our case this adaption process must be carried out simultaneously with searching for an appropriate set of control variables. [sent-94, score-0.257]
40 (5), using a diagonal covariance for the q distribution has the risk of proposing control variables that may not satisfy the GP prior smoothness requirement. [sent-96, score-0.331]
41 (3) a suitable choice for q must mimic the sampling from the posterior p(fc |y). [sent-99, score-0.2]
42 Given that the control points are far apart from each other, Gibbs sampling in the control variables space can be efficient. [sent-100, score-0.519]
43 However, iteratively sampling fci from the conditional posterior p(fci |fc−i , y) ∝ p(y|fc )p(fci |fc−i ), where fc−i = fc \ fci is intractable for non-Gaussian likelihoods2 . [sent-101, score-1.36]
44 An attractive alternative is to use a Gibbs-like algorithm where each fci is drawn from (t+1) (t) the conditional GP prior p(fci |fc−i ) and is accepted using the MH step. [sent-102, score-0.324]
45 More specifically, the (t+1) (t+1) (t) proposal distribution draws a new fci for a certain control variable i from p(fci |fc−i ) and (t+1) (t) (t+1) generates the function f (t+1) from p(f (t+1) |fci , fc−i ). [sent-103, score-0.517]
46 This scheme of sampling the control variables one-at-a-time and resampling f is iterated between different control variables. [sent-105, score-0.528]
47 A complete iteration of the algorithm consists of a full scan over all control variables. [sent-106, score-0.199]
48 The iteration between different control variables is illustrated in Figure 1. [sent-109, score-0.254]
49 8 1 Figure 1: Visualization of iterating between control variables. [sent-123, score-0.176]
50 The red solid line is the current f (t) , the blue (t) line is the proposed f (t+1) , the red circles are the current control variables fc while the diamond (in magenta) (t+1) (t) (t+1) is the proposed control variable fci . [sent-124, score-1.339]
51 Although the control variables are sampled one-at-at-time, f can still be drawn with a considerable variance. [sent-126, score-0.231]
52 This conditional prior can have considerable variance close to fci and in all regions that are not close to the remaining control variables. [sent-128, score-0.494]
53 As illustrated in Figure 1, the iteration over different control variables allow f to be drawn with a considerable variance everywhere in the input space. [sent-129, score-0.28]
54 2 Selection of the control variables To apply the previous algorithm we need to select the number, M , of the control points and the associated inputs Xc . [sent-131, score-0.469]
55 Xc must be chosen so that knowledge of fc can determine f with small −1 error. [sent-132, score-0.721]
56 The prediction of f given fc is equal to Kf,c Kc,c fc which is the mean of the conditional prior −1 p(f |fc ). [sent-133, score-1.523]
57 A suitable way to search over Xc is to minimize the reconstruction error ||f − Kf,c Kc,c fc ||2 averaged over any possible value of (f , fc ): −1 −1 T ||f − Kf,c Kc,c fc ||2 p(f |fc )p(fc )df dfc = Tr(Kf,f − Kf,c Kc,c Kf,c ). [sent-134, score-2.217]
58 To find the number M of control points we minimize G(Xc ) by incrementally adding control variables until the total variance of p(f |fc ) becomes smaller than a certain percentage of the total variance of the prior p(f ). [sent-141, score-0.526]
59 According to standard heuristics [12] which suggest that desirable acceptance rates of MH algorithms are around 1/4, we require a full iteration of the algorithm (a complete scan over the control variables) to have an acceptance rate larger than 1/4. [sent-144, score-0.399]
60 When for the current set of control inputs Xc the chain has a low acceptance rate, it means that the variance of p(f |fc ) is still too high and we need to add more control points in order to further reduce G(Xc ). [sent-145, score-0.56]
61 The process of observing the acceptance rate and adding control variables is continued until we reach the desirable acceptance rate. [sent-146, score-0.455]
62 In general, the minimization of G places the control inputs close to the clusters of the input data in such a way that the kernel function is taken into account. [sent-148, score-0.24]
63 3 Applications We consider two applications where exact inference is intractable due to a non-linear likelihood function: classification and parameter estimation in a differential equation model of gene regulation. [sent-150, score-0.316]
64 Our MCMC implementation confirms these findings since sampling using control variables gave similar classification accuracy to EP. [sent-153, score-0.321]
65 Transcriptional regulation: We consider a small biological sub-system where a set of target genes are regulated by one transcription factor (TF) protein. [sent-154, score-0.197]
66 The concentration of the TF and the gene specific kinetic parameters are typically unknown and need to be estimated by making use of a set of observed gene expression levels. [sent-156, score-0.471]
67 [2] introduce a linear ODE model for gene activation from TF. [sent-159, score-0.166]
68 Additionally, the kinetic parameters of each gene αj = (Bj , Dj , Sj , Aj ) are unknown and also need to be estimated. [sent-167, score-0.221]
69 Let yjt denote the observed gene expression level of gene j at time t and let y = {yjt } collect together all these observations. [sent-169, score-0.42]
70 Assuming a Gaussian noise for the observed gene expressions the likelihood of our data has the form N T p(y|f , {αj }N ) = j=1 p(yjt |f1≤p≤Pt , αj ), (8) j=1 t=1 where each probability density in the above product is a Gaussian with mean given by eq. [sent-170, score-0.205]
71 Further, this likelihood does not have a factorized form, as in the regression and classification cases, since an observed gene expression depends on the protein concentration activity in all previous times points. [sent-173, score-0.373]
72 Also note that the discretization of the TF in P time points corresponds to a very dense grid, while the gene expression measurements are sparse, i. [sent-174, score-0.229]
73 The protein concentration f is a positive quantity, thus a suitable prior is to consider a GP prior for log f . [sent-178, score-0.213]
74 The kinetic parameters of each gene are all positive scalars. [sent-179, score-0.221]
75 Sampling from the kinetic parameters is carried using Gaussian proposal distributions with diagonal covariance matrices that sample the positive kinetic parameters in the log space. [sent-183, score-0.292]
76 25 60 Gibbs region control 50 number of control variables gibbs region control KL(real||empirical) KL(real||empirical) 15 10 5 40 30 20 10 0 2 4 6 MCMC iterations (a) 8 10 4 x 10 2 4 6 dimension (b) 8 10 0. [sent-185, score-0.835]
77 05 corrCoef control 10 0 0 0 70 2 4 6 dimension 8 0 10 (c) Figure 2: (a) shows the evolution of the KL divergence (against the number of MCMC iterations) between the true posterior and the empirically estimated posteriors for a 5-dimensional regression dataset. [sent-189, score-0.332]
78 (b) shows the mean values with one-standard error bars of the KL divergence (against the input dimension) between the true posterior and the empirically estimated posteriors. [sent-190, score-0.14]
79 (c) plots the number of control variables together with the average correlation coefficient of the GP prior. [sent-191, score-0.231]
80 4 Experiments In the first experiment we compare Gibbs sampling (Gibbs), sampling using local regions (region) (see the supplementary file) and sampling using control variables (control) in standard regression problems of varied input dimensions. [sent-192, score-0.528]
81 For the control algorithm we observe that the KL divergence is very close to zero for all dimensions. [sent-214, score-0.207]
82 Figure 2(c) shows the increase in the number of control variables used as the input dimension increases. [sent-215, score-0.262]
83 This is very intuitive, since one should expect the number of control variables to increase as the function values become more independent. [sent-217, score-0.231]
84 Figures 3(a) and (b) shows 3 For Gibbs we used 2 × 104 iterations since the region and control algorithms require additional iterations during the adaption phase. [sent-224, score-0.326]
85 1 −264 200 400 600 MCMC iterations 800 −50 1000 200 400 600 MCMC iterations (a) 800 1000 0 gibbs contr ep (b) gibbs contr ep (c) Figure 3: We show results for GP classification. [sent-232, score-0.444]
86 Log-likelihood values are shown for MCMC samples obtained from (a) Gibbs and (b) control applied to the WBC dataset. [sent-233, score-0.176]
87 5 0 10 20 30 40 50 60 3 0 10 20 30 40 50 60 Figure 4: First row: The left plot shows the inferred TF concentration for p53; the small plot on top-right shows the ground-truth protein concentration obtained by a Western blot experiment [2]. [sent-248, score-0.21]
88 The middle plot shows the predicted expression of a gene obtained by the estimated ODE model; red crosses correspond to the actual gene expression measurements. [sent-249, score-0.435]
89 the log-likelihood for MCMC samples on the WBC dataset, for the Gibbs and control algorithms respectively. [sent-255, score-0.176]
90 It can be observed that mixing is far superior for the control algorithm and it has also converged to a much higher likelihood. [sent-256, score-0.176]
91 The proposed control algorithm shows similar classification performance to EP, while the Gibbs algorithm performs significantly worse on both datasets. [sent-258, score-0.176]
92 In the final two experiments we apply the control algorithm to infer the protein concentration of TFs that activate or repress a set of target genes. [sent-259, score-0.304]
93 The latent function in these problems is always onedimensional and densely discretized and thus the control algorithm is the only one that can converge to the GP posterior process in a reasonable time. [sent-260, score-0.363]
94 Seven samples of the expression levels of five target genes in three replicas are collected as the raw time course data. [sent-262, score-0.148]
95 +f (t) (6) where the Michaelis constant for the jth gene is given by γj . [sent-264, score-0.166]
96 During sampling, 7 control variables were needed to obtain the desirable acceptance rate. [sent-267, score-0.331]
97 Our inferred TF profile and reconstructed target gene profiles are similar to those obtained in [13]. [sent-278, score-0.219]
98 However, for certain genes, our model provides a better fit to the gene profile. [sent-279, score-0.166]
99 In this paper, we presented an MCMC algorithm that uses control variables. [sent-281, score-0.176]
100 We showed that this sampling scheme can efficiently deal with highly correlated posterior GP processes. [sent-282, score-0.27]
wordName wordTfidf (topN-words)
[('fc', 0.721), ('gp', 0.311), ('fci', 0.211), ('control', 0.176), ('tf', 0.176), ('gene', 0.166), ('mcmc', 0.154), ('xc', 0.131), ('proposal', 0.13), ('gibbs', 0.113), ('acceptance', 0.1), ('sampling', 0.09), ('mh', 0.084), ('genes', 0.079), ('posterior', 0.067), ('fi', 0.067), ('transcription', 0.063), ('kl', 0.062), ('protein', 0.057), ('dj', 0.056), ('kinetic', 0.055), ('variables', 0.055), ('wbc', 0.054), ('michaelis', 0.047), ('yjt', 0.047), ('iterations', 0.046), ('inef', 0.045), ('prior', 0.045), ('differential', 0.045), ('concentration', 0.043), ('inference', 0.042), ('bars', 0.042), ('ode', 0.042), ('correlated', 0.041), ('expression', 0.041), ('highly', 0.041), ('odes', 0.041), ('inputs', 0.04), ('gaussian', 0.039), ('likelihood', 0.039), ('mrna', 0.037), ('conditional', 0.036), ('densely', 0.035), ('classi', 0.033), ('latent', 0.033), ('bj', 0.033), ('ep', 0.032), ('accepted', 0.032), ('biology', 0.032), ('contr', 0.031), ('dfc', 0.031), ('edj', 0.031), ('menten', 0.031), ('pid', 0.031), ('divergence', 0.031), ('region', 0.031), ('dimension', 0.031), ('scheme', 0.031), ('covariance', 0.03), ('sj', 0.03), ('ordinary', 0.029), ('discretized', 0.028), ('regulation', 0.028), ('target', 0.028), ('regression', 0.027), ('regulated', 0.027), ('thinned', 0.027), ('barenco', 0.027), ('adaption', 0.027), ('variance', 0.026), ('bayesian', 0.025), ('smoothness', 0.025), ('pt', 0.025), ('inferred', 0.025), ('auxiliary', 0.025), ('process', 0.024), ('fk', 0.024), ('pro', 0.024), ('intractable', 0.024), ('places', 0.024), ('transcriptional', 0.023), ('manchester', 0.023), ('clarify', 0.023), ('iteration', 0.023), ('deterministic', 0.023), ('carlo', 0.023), ('yj', 0.023), ('suitable', 0.023), ('aj', 0.022), ('informative', 0.022), ('monte', 0.022), ('sample', 0.022), ('divergences', 0.022), ('points', 0.022), ('plot', 0.021), ('wp', 0.021), ('grey', 0.021), ('chain', 0.02), ('dimensions', 0.02), ('mimic', 0.02)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999952 71 nips-2008-Efficient Sampling for Gaussian Process Inference using Control Variables
Author: Neil D. Lawrence, Magnus Rattray, Michalis K. Titsias
Abstract: Sampling functions in Gaussian process (GP) models is challenging because of the highly correlated posterior distribution. We describe an efficient Markov chain Monte Carlo algorithm for sampling from the posterior process of the GP model. This algorithm uses control variables which are auxiliary function values that provide a low dimensional representation of the function. At each iteration, the algorithm proposes new values for the control variables and generates the function from the conditional GP prior. The control variable input locations are found by minimizing an objective function. We demonstrate the algorithm on regression and classification problems and we use it to estimate the parameters of a differential equation model of gene regulation. 1
2 0.29190424 12 nips-2008-Accelerating Bayesian Inference over Nonlinear Differential Equations with Gaussian Processes
Author: Ben Calderhead, Mark Girolami, Neil D. Lawrence
Abstract: Identification and comparison of nonlinear dynamical system models using noisy and sparse experimental data is a vital task in many fields, however current methods are computationally expensive and prone to error due in part to the nonlinear nature of the likelihood surfaces induced. We present an accelerated sampling procedure which enables Bayesian inference of parameters in nonlinear ordinary and delay differential equations via the novel use of Gaussian processes (GP). Our method involves GP regression over time-series data, and the resulting derivative and time delay estimates make parameter inference possible without solving the dynamical system explicitly, resulting in dramatic savings of computational time. We demonstrate the speed and statistical accuracy of our approach using examples of both ordinary and delay differential equations, and provide a comprehensive comparison with current state of the art methods. 1
3 0.20845911 193 nips-2008-Regularized Co-Clustering with Dual Supervision
Author: Vikas Sindhwani, Jianying Hu, Aleksandra Mojsilovic
Abstract: By attempting to simultaneously partition both the rows (examples) and columns (features) of a data matrix, Co-clustering algorithms often demonstrate surprisingly impressive performance improvements over traditional one-sided row clustering techniques. A good clustering of features may be seen as a combinatorial transformation of the data matrix, effectively enforcing a form of regularization that may lead to a better clustering of examples (and vice-versa). In many applications, partial supervision in the form of a few row labels as well as column labels may be available to potentially assist co-clustering. In this paper, we develop two novel semi-supervised multi-class classification algorithms motivated respectively by spectral bipartite graph partitioning and matrix approximation formulations for co-clustering. These algorithms (i) support dual supervision in the form of labels for both examples and/or features, (ii) provide principled predictive capability on out-of-sample test data, and (iii) arise naturally from the classical Representer theorem applied to regularization problems posed on a collection of Reproducing Kernel Hilbert Spaces. Empirical results demonstrate the effectiveness and utility of our algorithms. 1
4 0.14054722 32 nips-2008-Bayesian Kernel Shaping for Learning Control
Author: Jo-anne Ting, Mrinal Kalakrishnan, Sethu Vijayakumar, Stefan Schaal
Abstract: In kernel-based regression learning, optimizing each kernel individually is useful when the data density, curvature of regression surfaces (or decision boundaries) or magnitude of output noise varies spatially. Previous work has suggested gradient descent techniques or complex statistical hypothesis methods for local kernel shaping, typically requiring some amount of manual tuning of meta parameters. We introduce a Bayesian formulation of nonparametric regression that, with the help of variational approximations, results in an EM-like algorithm for simultaneous estimation of regression and kernel parameters. The algorithm is computationally efficient, requires no sampling, automatically rejects outliers and has only one prior to be specified. It can be used for nonparametric regression with local polynomials or as a novel method to achieve nonstationary regression with Gaussian processes. Our methods are particularly useful for learning control, where reliable estimation of local tangent planes is essential for adaptive controllers and reinforcement learning. We evaluate our methods on several synthetic data sets and on an actual robot which learns a task-level control law. 1
5 0.12130242 9 nips-2008-A mixture model for the evolution of gene expression in non-homogeneous datasets
Author: Gerald Quon, Yee W. Teh, Esther Chan, Timothy Hughes, Michael Brudno, Quaid D. Morris
Abstract: We address the challenge of assessing conservation of gene expression in complex, non-homogeneous datasets. Recent studies have demonstrated the success of probabilistic models in studying the evolution of gene expression in simple eukaryotic organisms such as yeast, for which measurements are typically scalar and independent. Models capable of studying expression evolution in much more complex organisms such as vertebrates are particularly important given the medical and scientific interest in species such as human and mouse. We present Brownian Factor Phylogenetic Analysis, a statistical model that makes a number of significant extensions to previous models to enable characterization of changes in expression among highly complex organisms. We demonstrate the efficacy of our method on a microarray dataset profiling diverse tissues from multiple vertebrate species. We anticipate that the model will be invaluable in the study of gene expression patterns in other diverse organisms as well, such as worms and insects. 1
6 0.10532285 213 nips-2008-Sparse Convolved Gaussian Processes for Multi-output Regression
7 0.10377559 86 nips-2008-Finding Latent Causes in Causal Networks: an Efficient Approach Based on Markov Blankets
8 0.10274793 146 nips-2008-Multi-task Gaussian Process Learning of Robot Inverse Dynamics
9 0.08998619 235 nips-2008-The Infinite Hierarchical Factor Regression Model
10 0.08803577 233 nips-2008-The Gaussian Process Density Sampler
11 0.084118657 249 nips-2008-Variational Mixture of Gaussian Process Experts
12 0.07991448 108 nips-2008-Integrating Locally Learned Causal Structures with Overlapping Variables
13 0.079903029 138 nips-2008-Modeling human function learning with Gaussian processes
14 0.077538028 221 nips-2008-Stochastic Relational Models for Large-scale Dyadic Data using MCMC
15 0.071025804 129 nips-2008-MAS: a multiplicative approximation scheme for probabilistic inference
16 0.066191278 152 nips-2008-Non-stationary dynamic Bayesian networks
17 0.061465673 62 nips-2008-Differentiable Sparse Coding
18 0.057354786 77 nips-2008-Evaluating probabilities under high-dimensional latent variable models
19 0.056650896 231 nips-2008-Temporal Dynamics of Cognitive Control
20 0.056548361 125 nips-2008-Local Gaussian Process Regression for Real Time Online Model Learning
topicId topicWeight
[(0, -0.172), (1, 0.004), (2, 0.065), (3, 0.064), (4, 0.129), (5, -0.09), (6, 0.03), (7, 0.271), (8, 0.011), (9, 0.043), (10, 0.089), (11, 0.059), (12, 0.154), (13, -0.184), (14, 0.182), (15, -0.027), (16, -0.13), (17, 0.11), (18, 0.114), (19, -0.071), (20, -0.006), (21, 0.019), (22, -0.035), (23, 0.026), (24, 0.061), (25, 0.108), (26, -0.123), (27, 0.072), (28, -0.087), (29, -0.075), (30, -0.019), (31, 0.032), (32, 0.069), (33, -0.087), (34, 0.06), (35, -0.115), (36, -0.018), (37, 0.087), (38, -0.048), (39, 0.064), (40, 0.05), (41, 0.055), (42, -0.024), (43, 0.025), (44, 0.003), (45, -0.1), (46, 0.025), (47, 0.078), (48, 0.021), (49, -0.165)]
simIndex simValue paperId paperTitle
same-paper 1 0.94365585 71 nips-2008-Efficient Sampling for Gaussian Process Inference using Control Variables
Author: Neil D. Lawrence, Magnus Rattray, Michalis K. Titsias
Abstract: Sampling functions in Gaussian process (GP) models is challenging because of the highly correlated posterior distribution. We describe an efficient Markov chain Monte Carlo algorithm for sampling from the posterior process of the GP model. This algorithm uses control variables which are auxiliary function values that provide a low dimensional representation of the function. At each iteration, the algorithm proposes new values for the control variables and generates the function from the conditional GP prior. The control variable input locations are found by minimizing an objective function. We demonstrate the algorithm on regression and classification problems and we use it to estimate the parameters of a differential equation model of gene regulation. 1
2 0.76578164 12 nips-2008-Accelerating Bayesian Inference over Nonlinear Differential Equations with Gaussian Processes
Author: Ben Calderhead, Mark Girolami, Neil D. Lawrence
Abstract: Identification and comparison of nonlinear dynamical system models using noisy and sparse experimental data is a vital task in many fields, however current methods are computationally expensive and prone to error due in part to the nonlinear nature of the likelihood surfaces induced. We present an accelerated sampling procedure which enables Bayesian inference of parameters in nonlinear ordinary and delay differential equations via the novel use of Gaussian processes (GP). Our method involves GP regression over time-series data, and the resulting derivative and time delay estimates make parameter inference possible without solving the dynamical system explicitly, resulting in dramatic savings of computational time. We demonstrate the speed and statistical accuracy of our approach using examples of both ordinary and delay differential equations, and provide a comprehensive comparison with current state of the art methods. 1
3 0.69786131 213 nips-2008-Sparse Convolved Gaussian Processes for Multi-output Regression
Author: Mauricio Alvarez, Neil D. Lawrence
Abstract: We present a sparse approximation approach for dependent output Gaussian processes (GP). Employing a latent function framework, we apply the convolution process formalism to establish dependencies between output variables, where each latent function is represented as a GP. Based on these latent functions, we establish an approximation scheme using a conditional independence assumption between the output processes, leading to an approximation of the full covariance which is determined by the locations at which the latent functions are evaluated. We show results of the proposed methodology for synthetic data and real world applications on pollution prediction and a sensor network. 1
4 0.66114801 146 nips-2008-Multi-task Gaussian Process Learning of Robot Inverse Dynamics
Author: Christopher Williams, Stefan Klanke, Sethu Vijayakumar, Kian M. Chai
Abstract: The inverse dynamics problem for a robotic manipulator is to compute the torques needed at the joints to drive it along a given trajectory; it is beneficial to be able to learn this function for adaptive control. A robotic manipulator will often need to be controlled while holding different loads in its end effector, giving rise to a multi-task learning problem. By placing independent Gaussian process priors over the latent functions of the inverse dynamics, we obtain a multi-task Gaussian process prior for handling multiple loads, where the inter-task similarity depends on the underlying inertial parameters. Experiments demonstrate that this multi-task formulation is effective in sharing information among the various loads, and generally improves performance over either learning only on single tasks or pooling the data over all tasks. 1
5 0.54918516 32 nips-2008-Bayesian Kernel Shaping for Learning Control
Author: Jo-anne Ting, Mrinal Kalakrishnan, Sethu Vijayakumar, Stefan Schaal
Abstract: In kernel-based regression learning, optimizing each kernel individually is useful when the data density, curvature of regression surfaces (or decision boundaries) or magnitude of output noise varies spatially. Previous work has suggested gradient descent techniques or complex statistical hypothesis methods for local kernel shaping, typically requiring some amount of manual tuning of meta parameters. We introduce a Bayesian formulation of nonparametric regression that, with the help of variational approximations, results in an EM-like algorithm for simultaneous estimation of regression and kernel parameters. The algorithm is computationally efficient, requires no sampling, automatically rejects outliers and has only one prior to be specified. It can be used for nonparametric regression with local polynomials or as a novel method to achieve nonstationary regression with Gaussian processes. Our methods are particularly useful for learning control, where reliable estimation of local tangent planes is essential for adaptive controllers and reinforcement learning. We evaluate our methods on several synthetic data sets and on an actual robot which learns a task-level control law. 1
6 0.4650369 221 nips-2008-Stochastic Relational Models for Large-scale Dyadic Data using MCMC
7 0.46437159 9 nips-2008-A mixture model for the evolution of gene expression in non-homogeneous datasets
8 0.45116714 11 nips-2008-A spatially varying two-sample recombinant coalescent, with applications to HIV escape response
9 0.44443166 125 nips-2008-Local Gaussian Process Regression for Real Time Online Model Learning
10 0.44128218 233 nips-2008-The Gaussian Process Density Sampler
11 0.43495208 249 nips-2008-Variational Mixture of Gaussian Process Experts
12 0.4328824 193 nips-2008-Regularized Co-Clustering with Dual Supervision
13 0.42216575 82 nips-2008-Fast Computation of Posterior Mode in Multi-Level Hierarchical Models
14 0.40742576 30 nips-2008-Bayesian Experimental Design of Magnetic Resonance Imaging Sequences
15 0.39949629 90 nips-2008-Gaussian-process factor analysis for low-dimensional single-trial analysis of neural population activity
16 0.36944965 152 nips-2008-Non-stationary dynamic Bayesian networks
17 0.34578738 129 nips-2008-MAS: a multiplicative approximation scheme for probabilistic inference
18 0.33806017 105 nips-2008-Improving on Expectation Propagation
19 0.33567128 235 nips-2008-The Infinite Hierarchical Factor Regression Model
20 0.32270548 70 nips-2008-Efficient Inference in Phylogenetic InDel Trees
topicId topicWeight
[(6, 0.062), (7, 0.2), (12, 0.016), (15, 0.012), (28, 0.178), (57, 0.105), (59, 0.021), (63, 0.018), (70, 0.178), (71, 0.016), (77, 0.031), (78, 0.011), (83, 0.058)]
simIndex simValue paperId paperTitle
same-paper 1 0.87829125 71 nips-2008-Efficient Sampling for Gaussian Process Inference using Control Variables
Author: Neil D. Lawrence, Magnus Rattray, Michalis K. Titsias
Abstract: Sampling functions in Gaussian process (GP) models is challenging because of the highly correlated posterior distribution. We describe an efficient Markov chain Monte Carlo algorithm for sampling from the posterior process of the GP model. This algorithm uses control variables which are auxiliary function values that provide a low dimensional representation of the function. At each iteration, the algorithm proposes new values for the control variables and generates the function from the conditional GP prior. The control variable input locations are found by minimizing an objective function. We demonstrate the algorithm on regression and classification problems and we use it to estimate the parameters of a differential equation model of gene regulation. 1
2 0.84912419 92 nips-2008-Generative versus discriminative training of RBMs for classification of fMRI images
Author: Tanya Schmah, Geoffrey E. Hinton, Steven L. Small, Stephen Strother, Richard S. Zemel
Abstract: Neuroimaging datasets often have a very large number of voxels and a very small number of training cases, which means that overfitting of models for this data can become a very serious problem. Working with a set of fMRI images from a study on stroke recovery, we consider a classification task for which logistic regression performs poorly, even when L1- or L2- regularized. We show that much better discrimination can be achieved by fitting a generative model to each separate condition and then seeing which model is most likely to have generated the data. We compare discriminative training of exactly the same set of models, and we also consider convex blends of generative and discriminative training. 1
3 0.84258181 45 nips-2008-Characterizing neural dependencies with copula models
Author: Pietro Berkes, Frank Wood, Jonathan W. Pillow
Abstract: The coding of information by neural populations depends critically on the statistical dependencies between neuronal responses. However, there is no simple model that can simultaneously account for (1) marginal distributions over single-neuron spike counts that are discrete and non-negative; and (2) joint distributions over the responses of multiple neurons that are often strongly dependent. Here, we show that both marginal and joint properties of neural responses can be captured using copula models. Copulas are joint distributions that allow random variables with arbitrary marginals to be combined while incorporating arbitrary dependencies between them. Different copulas capture different kinds of dependencies, allowing for a richer and more detailed description of dependencies than traditional summary statistics, such as correlation coefficients. We explore a variety of copula models for joint neural response distributions, and derive an efficient maximum likelihood procedure for estimating them. We apply these models to neuronal data collected in macaque pre-motor cortex, and quantify the improvement in coding accuracy afforded by incorporating the dependency structure between pairs of neurons. We find that more than one third of neuron pairs shows dependency concentrated in the lower or upper tails for their firing rate distribution. 1
4 0.83314872 56 nips-2008-Deep Learning with Kernel Regularization for Visual Recognition
Author: Kai Yu, Wei Xu, Yihong Gong
Abstract: In this paper we aim to train deep neural networks for rapid visual recognition. The task is highly challenging, largely due to the lack of a meaningful regularizer on the functions realized by the networks. We propose a novel regularization method that takes advantage of kernel methods, where an oracle kernel function represents prior knowledge about the recognition task of interest. We derive an efficient algorithm using stochastic gradient descent, and demonstrate encouraging results on a wide range of recognition tasks, in terms of both accuracy and speed. 1
5 0.82036191 12 nips-2008-Accelerating Bayesian Inference over Nonlinear Differential Equations with Gaussian Processes
Author: Ben Calderhead, Mark Girolami, Neil D. Lawrence
Abstract: Identification and comparison of nonlinear dynamical system models using noisy and sparse experimental data is a vital task in many fields, however current methods are computationally expensive and prone to error due in part to the nonlinear nature of the likelihood surfaces induced. We present an accelerated sampling procedure which enables Bayesian inference of parameters in nonlinear ordinary and delay differential equations via the novel use of Gaussian processes (GP). Our method involves GP regression over time-series data, and the resulting derivative and time delay estimates make parameter inference possible without solving the dynamical system explicitly, resulting in dramatic savings of computational time. We demonstrate the speed and statistical accuracy of our approach using examples of both ordinary and delay differential equations, and provide a comprehensive comparison with current state of the art methods. 1
6 0.81381196 109 nips-2008-Interpreting the neural code with Formal Concept Analysis
7 0.81195855 51 nips-2008-Convergence and Rate of Convergence of a Manifold-Based Dimension Reduction Algorithm
8 0.79697716 221 nips-2008-Stochastic Relational Models for Large-scale Dyadic Data using MCMC
9 0.79487193 137 nips-2008-Modeling Short-term Noise Dependence of Spike Counts in Macaque Prefrontal Cortex
10 0.78753781 63 nips-2008-Dimensionality Reduction for Data in Multiple Feature Representations
11 0.78586918 62 nips-2008-Differentiable Sparse Coding
12 0.78567487 66 nips-2008-Dynamic visual attention: searching for coding length increments
13 0.78356433 192 nips-2008-Reducing statistical dependencies in natural signals using radial Gaussianization
14 0.78276438 213 nips-2008-Sparse Convolved Gaussian Processes for Multi-output Regression
15 0.78202081 64 nips-2008-DiscLDA: Discriminative Learning for Dimensionality Reduction and Classification
16 0.77959931 60 nips-2008-Designing neurophysiology experiments to optimally constrain receptive field models along parametric submanifolds
17 0.77813947 54 nips-2008-Covariance Estimation for High Dimensional Data Vectors Using the Sparse Matrix Transform
18 0.77503091 248 nips-2008-Using matrices to model symbolic relationship
19 0.77348644 138 nips-2008-Modeling human function learning with Gaussian processes
20 0.77318937 79 nips-2008-Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning