nips nips2008 nips2008-213 knowledge-graph by maker-knowledge-mining

213 nips-2008-Sparse Convolved Gaussian Processes for Multi-output Regression

Source: pdf

Author: Mauricio Alvarez, Neil D. Lawrence

Abstract: We present a sparse approximation approach for dependent output Gaussian processes (GP). Employing a latent function framework, we apply the convolution process formalism to establish dependencies between output variables, where each latent function is represented as a GP. Based on these latent functions, we establish an approximation scheme using a conditional independence assumption between the output processes, leading to an approximation of the full covariance which is determined by the locations at which the latent functions are evaluated. We show results of the proposed methodology for synthetic data and real world applications on pollution prediction and a sensor network. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 uk Abstract We present a sparse approximation approach for dependent output Gaussian processes (GP). [sent-12, score-0.247]

2 Employing a latent function framework, we apply the convolution process formalism to establish dependencies between output variables, where each latent function is represented as a GP. [sent-13, score-0.56]

3 Based on these latent functions, we establish an approximation scheme using a conditional independence assumption between the output processes, leading to an approximation of the full covariance which is determined by the locations at which the latent functions are evaluated. [sent-14, score-0.833]

4 We show results of the proposed methodology for synthetic data and real world applications on pollution prediction and a sensor network. [sent-15, score-0.097]

5 1 Introduction We consider the problem of modeling correlated outputs from a single Gaussian process (GP). [sent-16, score-0.11]

6 Applications of modeling multiple outputs include multi-task learning (see e. [sent-17, score-0.084]

7 Modelling multiple output variables is a challenge as we are required to compute cross covariances between the different outputs. [sent-20, score-0.112]

8 Whilst cross covariances allow us to improve our predictions of one output given the others because the correlations between outputs are modelled [6, 2, 15, 12] they also come with a computational and storage overhead. [sent-22, score-0.275]

9 The main aim of this paper is to address these overheads in the context of convolution processes [6, 2]. [sent-23, score-0.176]

10 One neat approach to account for non-trivial correlations between outputs employs convolution processes (CP). [sent-24, score-0.282]

11 When using CPs each output can be expressed as the convolution between a smoothing kernel and a latent function [6, 2]. [sent-25, score-0.452]

12 Let’s assume that the latent function is drawn from a GP. [sent-26, score-0.156]

13 If we also share the same latent function across several convolutions (each with a potentially different smoothing kernel) then, since a convolution is a linear operator on a function, the outputs of the convolutions can be expressed as a jointly distributed GP. [sent-27, score-0.516]

14 This approach was proposed by [6, 2] who focussed on a white noise process for the latent function. [sent-29, score-0.21]

15 Even though the CP framework is an elegant way for constructing dependent output processes, the fact that the full covariance function of the joint GP must be considered results in signiﬁcant storage and computational demands. [sent-30, score-0.292]

16 For Q output dimensions and N data points the covariance matrix scales as QN leading to O(Q3 N 3 ) computational complexity and O(N 2 Q2 ) storage. [sent-31, score-0.262]

17 We are interested in exploiting the richer class of covariance structures allowed by the CP framework, but without the additional computational overhead they imply. [sent-33, score-0.119]

18 We propose a sparse approximation for the full covariance matrix involved in the multiple output convolution process, exploiting the fact that each of the outputs is conditional independent of all others given the input process. [sent-34, score-0.682]

19 This leads to an approximation for the covariance matrix which keeps intact the covariances of each output and approximates the cross-covariances terms with a low rank matrix. [sent-35, score-0.337]

20 Inference and learning can then be undertaken with the same computational complexity as a set of independent GPs. [sent-36, score-0.083]

21 The approximation turns out to be strongly related to the partially independent training conditional (PITC) [10] approximation for a single output GP. [sent-37, score-0.326]

22 To introduce our sparse approximation some review of the CP framework is required (Section 2). [sent-39, score-0.123]

23 2 Convolution Processes Consider a set of Q functions {fq (x)}Q , where each function is expressed as the convolution q=1 between a smoothing kernel {kq (x)}Q , and a latent function u(z), q=1 ∞ kq (x − z)u(z)dz. [sent-43, score-0.428]

24 fq (x) = −∞ More generally, we can consider the inﬂuence of more than one latent function, {ur (z)}R , and r=1 corrupt each of the outputs of the convolutions with an independent process (which could also include a noise term), wq (x), to obtain R ∞ r=1 −∞ kqr (x − z)ur (z)dz + wq (x). [sent-44, score-0.831]

25 cov [ur (z), up (z )] = σur δrp δz,z , so the expression (2) is simpliﬁed as R ∞ 2 σ ur cov [fq (x), fs (x )] = r=1 kqr (x − z)ksr (x − z)dz. [sent-47, score-0.903]

26 −∞ We are going to relax this constraint on the latent processes, we assume that each inducing function is an independent GP, i. [sent-48, score-0.299]

27 cov [ur (z), up (z )] = kur up (z, z )δrp , where kur ur (z, z ) is the covariance function for ur (z). [sent-50, score-1.31]

28 With this simpliﬁcation, (2) can be written as R ∞ ∞ kqr (x − z) cov [fq (x), fs (x )] = r=1 −∞ ksr (x − z )kur ur (z, z )dz dz. [sent-51, score-0.785]

29 (3) −∞ As well as this correlation across outputs, the correlation between the latent function, ur (z), and any given output, fq (x), can be computed, ∞ kqr (x − z )kur ur (z , z)dz . [sent-52, score-1.259]

30 cov [fq (x), ur (z))] = −∞ (4) 3 Sparse Approximation Given the convolution formalism, we can construct a full GP over the set of outputs. [sent-53, score-0.714]

31 , yQ is the set of output functions with yq = [yq (x1 ), . [sent-57, score-0.242]

32 , yq (xN )] ; Kf ,f ∈ QN ×QN is the covariance matrix relating all data points at all outputs, with elements 2 cov [fq (x), fs (x )] in (3); Σ = Σ ⊗ IN , where Σ is a diagonal matrix with elements {σq }Q ; φ q=1 is the set of parameters of the covariance matrix and X = {x1 , . [sent-60, score-0.713]

33 , xN } is the set of training input vectors at which the covariance is evaluated. [sent-63, score-0.141]

34 Once the parameters have been learned, prediction is O(N Q) for the predictive mean and O((N Q)2 ) for the predictive variance. [sent-66, score-0.109]

35 If we had observed the entire length of each latent function, ur (z), then from (1) we see that each yq (x) would be independent, i. [sent-68, score-0.699]

36 we can write, Q Q R R p({yq (x)}q=1 | {ur (z)}r=1 , θ) = p(yq (x) | {ur (z)}r=1 , θ), q=1 where θ are the parameters of the kernels and covariance functions. [sent-70, score-0.119]

37 Our key assumption is that this independence will hold even if we have only observed M samples from ur (z) rather than the whole function. [sent-71, score-0.428]

38 The observed values of these M samples are then marginalized (as they are for the exact case) to obtain the approximation to the likelihood. [sent-72, score-0.074]

39 Our intuition is that the approximation should be more accurate for larger M and smoother latent functions, as in this domain the latent function could be very well characterized from only a few samples. [sent-73, score-0.386]

40 as the samples from the latent function with ur = We deﬁne u = u1 , . [sent-74, score-0.553]

41 , zM } is the set of input vectors at which the covariance Ku,u is evaluated. [sent-83, score-0.119]

42 We now make the conditional independence assumption given the samples from the latent functions, Q Q 2 N Kfq ,u K−1 u, Kfq ,fq − Kfq ,u K−1 Ku,fq + σq I . [sent-84, score-0.223]

43 We now marginalize the values of the samples from the latent functions by using their process priors, i. [sent-87, score-0.204]

44 u,u (7) Notice that, compared to (5), the full covariance matrix Kf ,f has been replaced by the low rank covariance Kf ,u K−1 Ku,f in all entries except in the diagonal blocks corresponding to Kfq ,fq . [sent-91, score-0.312]

45 The complexity of this inversion is O(N 3 Q) + O(N QM 2 ), storage of the matrix is O(N 2 Q) + O(N QM ). [sent-93, score-0.126]

46 Note that if we set M = N these reduce to O(N 3 Q) and O(N 2 Q) respectively which matches the computational complexity of applying Q independent GPs to model the multiple outputs. [sent-94, score-0.083]

47 The predictive distribution is expressed through the integration of (6), evaluated at X∗ , with (8), giving p(y∗ |y, X, X∗ , Z, θ) = p(y∗ |u, Z, X∗ , θ)p(u|y, X, Z, θ)du =N Kf∗ ,u A−1 Ku,f (D + Σ)−1 y, D∗ + Kf∗ ,u A−1 Ku,f∗ + Σ (9) with D∗ = blockdiag Kf∗ ,f∗ − Kf∗ ,u K−1 Ku,f∗ . [sent-97, score-0.128]

48 u,u The functional form of (7) is almost identical to that of the PITC approximation [10], with the samples we retain from the latent function providing the same role as the inducing values in the partially independent training conditional (PITC) approximation. [sent-98, score-0.431]

49 A key difference is that in PITC it is not obvious which variables should be grouped together when making the conditional independence assumption, here it is clear from the structure of the model that each of the outputs should be grouped separately. [sent-100, score-0.151]

50 However, the similarities are such that we ﬁnd it convenient to follow the terminology of [10] and also refer to our approximation as a PITC approximation. [sent-101, score-0.074]

51 We have already noted that our sparse approximation reduces the computational complexity of multioutput regression with GPs to that of applying independent GPs to each output. [sent-102, score-0.206]

52 For larger data sets the N 3 term in the computational complexity and the N 2 term in the storage is still likely to be prohibitive. [sent-103, score-0.094]

53 In the fully independent training conditional (FITC) [13, 14] a factorization across the data points is assumed. [sent-105, score-0.104]

54 Similar u,u u,u equations are obtained for the posterior (8), predictive (9) and marginal likelihood distributions (7) leading to the Fully Independent Training Conditional (FITC) approximation [13, 10]. [sent-107, score-0.14]

55 Note that the marginal likelihood might be optimized both with respect to the parameters associated with the covariance matrices and with respect to Z. [sent-108, score-0.144]

56 Under the convolution process framework, the semiparametric latent factor model (SLFM) proposed in [15] corresponds to a speciﬁc choice for the smoothing kernel function in (1) namely, kqr (x) = φqr δ(x). [sent-111, score-0.52]

57 The latent functions are assumed to be independent GPs and in such a case, cov [fq (x), fs (x )] = r φqr φsr kur ur (x, x ). [sent-112, score-0.978]

58 In the multi-task learning model (MTLM) proposed in [1], the covariance matrix is expressed as Kf ,f = K f ⊗ k(x, x ), with K f being constrained positive semi-deﬁnite and k(x, x ) a covariance function over inputs. [sent-115, score-0.304]

59 As stated in [1] with respect o to SLFM, the convolution process is related with MTLM when the smoothing kernel function is given again by kqr (x) = φqr δ(x) and there is only one latent function with covariance kuu (x, x ) = k(x, x ). [sent-117, score-0.613]

60 In this way, cov [fq (x), fs (x )] = φq φs k(x, x ) and in matrix notation Kf ,f = ΦΦ ⊗ k(x, x ). [sent-118, score-0.265]

61 In [2], the latent processes correspond to white Gaussian noises and the covariance matrix is given by eq. [sent-119, score-0.385]

62 Finally, [12] use a similar covariance function to the MTLM approach but use an IVM style approach to sparsiﬁcation. [sent-122, score-0.119]

63 In the dependent GP model of [2] it is introduced in the covariance function. [sent-124, score-0.119]

64 Our approach considers the more general case when neither kernel nor covariance function is given by the δ function. [sent-125, score-0.145]

65 5 Results For all our experiments we considered squared exponential covariance functions for the latent process of the form kur ur (x, x ) = exp − 1 (x − x ) Lr (x − x ) , where Lr is a diagonal matrix 2 which allows for different length-scales along each dimension. [sent-126, score-0.876]

66 The smoothing kernel had the same S |L |1/2 1 form, kqr (τ ) = qr qr exp − 2 τ Lqr τ , where Sqr ∈ R and Lqr is a symmetric positive def(2π)p/2 inite matrix. [sent-127, score-0.306]

67 The toy problem consists of Q = 4 outputs, one latent function, R = 1, and N = 200 observation points for each output. [sent-130, score-0.179]

68 The training data was sampled from the full GP with the following parameters, S11 = S21 = 1, S31 = S41 = 5, L11 = L21 = 50, L31 = 300, L41 = 200 for the outputs and L1 = 100 for the latent function. [sent-131, score-0.304]

69 For the independent processes, wq (x), we 2 2 2 2 simply added white noise with variances σ1 = σ2 = 0. [sent-132, score-0.159]

70 For the sparse approximations we used M = 30 ﬁxed inducing points equally spaced between the range of the input and R = 1. [sent-135, score-0.168]

71 The predictions shown correspond to the full GP (Figure 1(a)), an independent GP (Figure 1(b)), the FITC approximation (Figure 1(c)) and the PITC approximation (Figure 1(d)). [sent-139, score-0.236]

72 Due to the strong dependencies between the signals, our model is able to capture the correlations and predicts accurately the missing information. [sent-140, score-0.079]

73 We used 300 points to compute the standarized mean square error (SMSE) [11] and ten repetitions of the experiment, so that we also included one standard deviation for the ten repetitions. [sent-142, score-0.109]

74 Table 1, shows that the SMSE of the sparse approximations is similar to the one obtained with the full GP with a considerable reduction of training times. [sent-150, score-0.113]

75 We selected one of the sensors signals, tide height, and applied the PITC approximation scheme with an additional squared exponential independent kernel for each wq (x) [11]. [sent-181, score-0.389]

76 We followed [12] in simulating sensor failure by introducing some missing ranges for these signals. [sent-184, score-0.105]

77 5 1 (c) Output 4 using the FITC approximation (d) Output 4 using the PITC approximation Figure 1: Predictive mean and variance using the full multi-output GP, the sparse approximation and an independent GP for output 4. [sent-193, score-0.433]

78 The crosses in ﬁgures 1(c) and 1(d) corresponds to the locations of the inducing inputs. [sent-199, score-0.146]

79 For the sparse approximation we took M = 100 equally spaced inducing inputs. [sent-206, score-0.242]

80 We see from Figure 2 that the PITC approximation captures the dependencies and predicts closely the behavior of the signal in the missing range. [sent-207, score-0.131]

81 We compare results of independent GP, the PITC approximation, the full GP and ordinary co-kriging. [sent-215, score-0.121]

82 For the PITC experiments, a k-means procedure is employed ﬁrst to ﬁnd the initial locations of the inducing values and then these locations are optimized in the same optimization procedure used for the parameters. [sent-216, score-0.195]

83 Figure 3 shows results of prediction for cadmium (Cd) and copper (Cu). [sent-221, score-0.203]

84 From ﬁgure 3(a), it can be noticed that using 50 inducing values, the approximation exhibits a similar performance to the co-kriging method. [sent-222, score-0.171]

85 5 Time (days) 2 (d) Cambermet using PITC Figure 2: Predictive Mean and variance using independent GPs and the PITC approximation for the tide height signal in the sensor dataset. [sent-257, score-0.394]

86 The crosses in ﬁgures 2(b) and 2(d) corresponds to the locations of the inducing inputs. [sent-261, score-0.146]

87 inducing values are included, the approximation follows the performance of the full GP, as it would be expected. [sent-262, score-0.213]

88 From ﬁgure 3(b), it can be observed that, although the approximation is better that the independent GP, it does not obtain similar results to the full GP. [sent-263, score-0.162]

89 15]) shows higher variability for the copper dataset than for the cadmium dataset, which explains in some extent the different behaviors. [sent-265, score-0.176]

90 6 Conclusions We have presented a sparse approximation for multiple output GPs, capturing the correlated information among outputs and reducing the amount of computational load for prediction and optimization purposes. [sent-276, score-0.342]

91 The reduction in computational complexity for the PITC approximation is from O(N 3 Q3 ) to O(N 3 Q). [sent-277, score-0.111]

92 This matches the computational complexity for modeling with independent GPs. [sent-278, score-0.083]

93 However, as we have seen, the predictive power of independent GPs is lower. [sent-279, score-0.087]

94 Linear dynamical systems responses can be expressed as a convolution between the impulse response of the system with some input function. [sent-280, score-0.16]

95 This convolution approach is an equivalent way of representing the behavior of the system through a linear differential equation. [sent-281, score-0.126]

96 One could optimize with respect to positions of the values of the latent functions. [sent-283, score-0.156]

97 Gaussian process modelling of latent chemical species: Applications to inferring transcription factor activities. [sent-330, score-0.182]

98 Learning for larger datasets with the Gaussian process latent variable model. [sent-350, score-0.182]

99 Fast sparse Gaussian process methods: The informative vector machine. [sent-357, score-0.075]

100 A unifying view of sparse approximate Gaussian process n regression. [sent-372, score-0.075]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('kf', 0.51), ('ur', 0.397), ('pitc', 0.336), ('gp', 0.219), ('fq', 0.185), ('latent', 0.156), ('cov', 0.149), ('yq', 0.146), ('convolution', 0.126), ('fitc', 0.124), ('kur', 0.124), ('tide', 0.124), ('kqr', 0.124), ('covariance', 0.119), ('inducing', 0.097), ('cadmium', 0.088), ('copper', 0.088), ('wq', 0.085), ('outputs', 0.084), ('fs', 0.084), ('gps', 0.082), ('height', 0.08), ('approximation', 0.074), ('output', 0.074), ('bramblemet', 0.071), ('kfq', 0.071), ('sensor', 0.07), ('qr', 0.06), ('storage', 0.057), ('dz', 0.055), ('blockdiag', 0.053), ('cambermet', 0.053), ('fgp', 0.053), ('mtlm', 0.053), ('smse', 0.053), ('cp', 0.053), ('qm', 0.053), ('processes', 0.05), ('sparse', 0.049), ('locations', 0.049), ('cu', 0.046), ('igp', 0.046), ('independent', 0.046), ('qn', 0.043), ('meila', 0.043), ('full', 0.042), ('predictive', 0.041), ('convolutions', 0.04), ('covariances', 0.038), ('ten', 0.037), ('complexity', 0.037), ('shen', 0.036), ('conditional', 0.036), ('smoothing', 0.036), ('ivm', 0.035), ('jura', 0.035), ('nickel', 0.035), ('slfm', 0.035), ('standarized', 0.035), ('days', 0.035), ('missing', 0.035), ('stands', 0.034), ('load', 0.034), ('sensors', 0.034), ('expressed', 0.034), ('ordinary', 0.033), ('matrix', 0.032), ('geostatistics', 0.031), ('dash', 0.031), ('lqr', 0.031), ('zinc', 0.031), ('ksr', 0.031), ('independence', 0.031), ('editors', 0.03), ('cd', 0.029), ('kq', 0.028), ('secs', 0.028), ('isbn', 0.028), ('white', 0.028), ('prediction', 0.027), ('manchester', 0.026), ('semiparametric', 0.026), ('iq', 0.026), ('process', 0.026), ('kernel', 0.026), ('gaussian', 0.026), ('marginal', 0.025), ('ck', 0.025), ('snelson', 0.024), ('lr', 0.024), ('whilst', 0.023), ('ys', 0.023), ('toy', 0.023), ('functions', 0.022), ('training', 0.022), ('correlations', 0.022), ('spaced', 0.022), ('dependencies', 0.022), ('ma', 0.021), ('environmental', 0.021)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999988 213 nips-2008-Sparse Convolved Gaussian Processes for Multi-output Regression

Author: Mauricio Alvarez, Neil D. Lawrence

2 0.15115617 12 nips-2008-Accelerating Bayesian Inference over Nonlinear Differential Equations with Gaussian Processes

Author: Ben Calderhead, Mark Girolami, Neil D. Lawrence

Abstract: Identiﬁcation and comparison of nonlinear dynamical system models using noisy and sparse experimental data is a vital task in many ﬁelds, however current methods are computationally expensive and prone to error due in part to the nonlinear nature of the likelihood surfaces induced. We present an accelerated sampling procedure which enables Bayesian inference of parameters in nonlinear ordinary and delay differential equations via the novel use of Gaussian processes (GP). Our method involves GP regression over time-series data, and the resulting derivative and time delay estimates make parameter inference possible without solving the dynamical system explicitly, resulting in dramatic savings of computational time. We demonstrate the speed and statistical accuracy of our approach using examples of both ordinary and delay differential equations, and provide a comprehensive comparison with current state of the art methods. 1

3 0.11050747 32 nips-2008-Bayesian Kernel Shaping for Learning Control

Author: Jo-anne Ting, Mrinal Kalakrishnan, Sethu Vijayakumar, Stefan Schaal

Abstract: In kernel-based regression learning, optimizing each kernel individually is useful when the data density, curvature of regression surfaces (or decision boundaries) or magnitude of output noise varies spatially. Previous work has suggested gradient descent techniques or complex statistical hypothesis methods for local kernel shaping, typically requiring some amount of manual tuning of meta parameters. We introduce a Bayesian formulation of nonparametric regression that, with the help of variational approximations, results in an EM-like algorithm for simultaneous estimation of regression and kernel parameters. The algorithm is computationally efﬁcient, requires no sampling, automatically rejects outliers and has only one prior to be speciﬁed. It can be used for nonparametric regression with local polynomials or as a novel method to achieve nonstationary regression with Gaussian processes. Our methods are particularly useful for learning control, where reliable estimation of local tangent planes is essential for adaptive controllers and reinforcement learning. We evaluate our methods on several synthetic data sets and on an actual robot which learns a task-level control law. 1

4 0.10859045 146 nips-2008-Multi-task Gaussian Process Learning of Robot Inverse Dynamics

Author: Christopher Williams, Stefan Klanke, Sethu Vijayakumar, Kian M. Chai

Abstract: The inverse dynamics problem for a robotic manipulator is to compute the torques needed at the joints to drive it along a given trajectory; it is beneﬁcial to be able to learn this function for adaptive control. A robotic manipulator will often need to be controlled while holding different loads in its end effector, giving rise to a multi-task learning problem. By placing independent Gaussian process priors over the latent functions of the inverse dynamics, we obtain a multi-task Gaussian process prior for handling multiple loads, where the inter-task similarity depends on the underlying inertial parameters. Experiments demonstrate that this multi-task formulation is effective in sharing information among the various loads, and generally improves performance over either learning only on single tasks or pooling the data over all tasks. 1

5 0.10532285 71 nips-2008-Efficient Sampling for Gaussian Process Inference using Control Variables

Author: Neil D. Lawrence, Magnus Rattray, Michalis K. Titsias

Abstract: Sampling functions in Gaussian process (GP) models is challenging because of the highly correlated posterior distribution. We describe an efﬁcient Markov chain Monte Carlo algorithm for sampling from the posterior process of the GP model. This algorithm uses control variables which are auxiliary function values that provide a low dimensional representation of the function. At each iteration, the algorithm proposes new values for the control variables and generates the function from the conditional GP prior. The control variable input locations are found by minimizing an objective function. We demonstrate the algorithm on regression and classiﬁcation problems and we use it to estimate the parameters of a differential equation model of gene regulation. 1

6 0.074936852 138 nips-2008-Modeling human function learning with Gaussian processes

7 0.067255042 120 nips-2008-Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text

8 0.067176349 216 nips-2008-Sparse probabilistic projections

9 0.066855296 233 nips-2008-The Gaussian Process Density Sampler

10 0.064360164 72 nips-2008-Empirical performance maximization for linear rank statistics

11 0.063041255 249 nips-2008-Variational Mixture of Gaussian Process Experts

12 0.055529676 171 nips-2008-Online Prediction on Large Diameter Graphs

13 0.053935651 221 nips-2008-Stochastic Relational Models for Large-scale Dyadic Data using MCMC

14 0.052922558 90 nips-2008-Gaussian-process factor analysis for low-dimensional single-trial analysis of neural population activity

15 0.051600955 82 nips-2008-Fast Computation of Posterior Mode in Multi-Level Hierarchical Models

16 0.050244719 62 nips-2008-Differentiable Sparse Coding

17 0.049628176 135 nips-2008-Model Selection in Gaussian Graphical Models: High-Dimensional Consistency of \boldmath$\ell 1$-regularized MLE

18 0.047321174 125 nips-2008-Local Gaussian Process Regression for Real Time Online Model Learning

19 0.041080922 166 nips-2008-On the asymptotic equivalence between differential Hebbian and temporal difference learning using a local third factor

20 0.038556904 61 nips-2008-Diffeomorphic Dimensionality Reduction

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.131), (1, -0.016), (2, 0.031), (3, 0.054), (4, 0.071), (5, -0.054), (6, 0.028), (7, 0.168), (8, 0.01), (9, 0.055), (10, 0.053), (11, 0.019), (12, 0.099), (13, -0.047), (14, 0.108), (15, -0.079), (16, -0.047), (17, 0.1), (18, 0.044), (19, -0.057), (20, 0.002), (21, -0.132), (22, 0.032), (23, 0.075), (24, 0.012), (25, 0.093), (26, -0.036), (27, 0.009), (28, -0.103), (29, -0.043), (30, -0.005), (31, -0.029), (32, -0.034), (33, 0.024), (34, 0.046), (35, -0.0), (36, -0.02), (37, 0.073), (38, -0.029), (39, 0.033), (40, -0.001), (41, 0.056), (42, -0.022), (43, 0.051), (44, -0.096), (45, -0.041), (46, 0.045), (47, -0.009), (48, 0.083), (49, 0.032)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.93209517 213 nips-2008-Sparse Convolved Gaussian Processes for Multi-output Regression

Author: Mauricio Alvarez, Neil D. Lawrence

2 0.76472121 12 nips-2008-Accelerating Bayesian Inference over Nonlinear Differential Equations with Gaussian Processes

Author: Ben Calderhead, Mark Girolami, Neil D. Lawrence

3 0.75441706 146 nips-2008-Multi-task Gaussian Process Learning of Robot Inverse Dynamics

Author: Christopher Williams, Stefan Klanke, Sethu Vijayakumar, Kian M. Chai

4 0.74639452 71 nips-2008-Efficient Sampling for Gaussian Process Inference using Control Variables

Author: Neil D. Lawrence, Magnus Rattray, Michalis K. Titsias

5 0.68999016 32 nips-2008-Bayesian Kernel Shaping for Learning Control

Author: Jo-anne Ting, Mrinal Kalakrishnan, Sethu Vijayakumar, Stefan Schaal

6 0.6243366 125 nips-2008-Local Gaussian Process Regression for Real Time Online Model Learning

7 0.59545302 249 nips-2008-Variational Mixture of Gaussian Process Experts

8 0.56665337 221 nips-2008-Stochastic Relational Models for Large-scale Dyadic Data using MCMC

9 0.55646574 90 nips-2008-Gaussian-process factor analysis for low-dimensional single-trial analysis of neural population activity

10 0.47184622 138 nips-2008-Modeling human function learning with Gaussian processes

11 0.43335375 216 nips-2008-Sparse probabilistic projections

12 0.4223344 31 nips-2008-Bayesian Exponential Family PCA

13 0.40271428 233 nips-2008-The Gaussian Process Density Sampler

14 0.40021762 61 nips-2008-Diffeomorphic Dimensionality Reduction

15 0.37585935 82 nips-2008-Fast Computation of Posterior Mode in Multi-Level Hierarchical Models

16 0.36412284 54 nips-2008-Covariance Estimation for High Dimensional Data Vectors Using the Sparse Matrix Transform

17 0.36329126 105 nips-2008-Improving on Expectation Propagation

18 0.3501403 126 nips-2008-Localized Sliced Inverse Regression

19 0.32055667 68 nips-2008-Efficient Direct Density Ratio Estimation for Non-stationarity Adaptation and Outlier Detection

20 0.31820339 62 nips-2008-Differentiable Sparse Coding

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(6, 0.066), (7, 0.117), (12, 0.014), (15, 0.019), (27, 0.329), (28, 0.113), (47, 0.015), (57, 0.084), (59, 0.016), (63, 0.021), (71, 0.014), (77, 0.041), (83, 0.048)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.76291186 213 nips-2008-Sparse Convolved Gaussian Processes for Multi-output Regression

Author: Mauricio Alvarez, Neil D. Lawrence

2 0.70198077 137 nips-2008-Modeling Short-term Noise Dependence of Spike Counts in Macaque Prefrontal Cortex

Author: Arno Onken, Steffen Grünewälder, Matthias Munk, Klaus Obermayer

Abstract: Correlations between spike counts are often used to analyze neural coding. The noise is typically assumed to be Gaussian. Yet, this assumption is often inappropriate, especially for low spike counts. In this study, we present copulas as an alternative approach. With copulas it is possible to use arbitrary marginal distributions such as Poisson or negative binomial that are better suited for modeling noise distributions of spike counts. Furthermore, copulas place a wide range of dependence structures at the disposal and can be used to analyze higher order interactions. We develop a framework to analyze spike count data by means of copulas. Methods for parameter inference based on maximum likelihood estimates and for computation of mutual information are provided. We apply the method to our data recorded from macaque prefrontal cortex. The data analysis leads to three ﬁndings: (1) copula-based distributions provide signiﬁcantly better ﬁts than discretized multivariate normal distributions; (2) negative binomial margins ﬁt the data signiﬁcantly better than Poisson margins; and (3) the dependence structure carries 12% of the mutual information between stimuli and responses. 1

3 0.65609831 99 nips-2008-High-dimensional support union recovery in multivariate regression

Author: Guillaume R. Obozinski, Martin J. Wainwright, Michael I. Jordan

Abstract: We study the behavior of block 1 / 2 regularization for multivariate regression, where a K-dimensional response vector is regressed upon a ﬁxed set of p covariates. The problem of support union recovery is to recover the subset of covariates that are active in at least one of the regression problems. Studying this problem under high-dimensional scaling (where the problem parameters as well as sample size n tend to inﬁnity simultaneously), our main result is to show that exact recovery is possible once the order parameter given by θ 1 / 2 (n, p, s) : = n/[2ψ(B ∗ ) log(p − s)] exceeds a critical threshold. Here n is the sample size, p is the ambient dimension of the regression model, s is the size of the union of supports, and ψ(B ∗ ) is a sparsity-overlap function that measures a combination of the sparsities and overlaps of the K-regression coefﬁcient vectors that constitute the model. This sparsity-overlap function reveals that block 1 / 2 regularization for multivariate regression never harms performance relative to a naive 1 -approach, and can yield substantial improvements in sample complexity (up to a factor of K) when the regression vectors are suitably orthogonal relative to the design. We complement our theoretical results with simulations that demonstrate the sharpness of the result, even for relatively small problems. 1

4 0.64858472 167 nips-2008-One sketch for all: Theory and Application of Conditional Random Sampling

Author: Ping Li, Kenneth W. Church, Trevor J. Hastie

Abstract: Conditional Random Sampling (CRS) was originally proposed for efﬁciently computing pairwise (l2 , l1 ) distances, in static, large-scale, and sparse data. This study modiﬁes the original CRS and extends CRS to handle dynamic or streaming data, which much better reﬂect the real-world situation than assuming static data. Compared with many other sketching algorithms for dimension reductions such as stable random projections, CRS exhibits a signiﬁcant advantage in that it is “one-sketch-for-all.” In particular, we demonstrate the effectiveness of CRS in efﬁciently computing the Hamming norm, the Hamming distance, the lp distance, and the χ2 distance. A generic estimator and an approximate variance formula are also provided, for approximating any type of distances. We recommend CRS as a promising tool for building highly scalable systems, in machine learning, data mining, recommender systems, and information retrieval. 1

5 0.53119296 45 nips-2008-Characterizing neural dependencies with copula models

Author: Pietro Berkes, Frank Wood, Jonathan W. Pillow

Abstract: The coding of information by neural populations depends critically on the statistical dependencies between neuronal responses. However, there is no simple model that can simultaneously account for (1) marginal distributions over single-neuron spike counts that are discrete and non-negative; and (2) joint distributions over the responses of multiple neurons that are often strongly dependent. Here, we show that both marginal and joint properties of neural responses can be captured using copula models. Copulas are joint distributions that allow random variables with arbitrary marginals to be combined while incorporating arbitrary dependencies between them. Different copulas capture different kinds of dependencies, allowing for a richer and more detailed description of dependencies than traditional summary statistics, such as correlation coefﬁcients. We explore a variety of copula models for joint neural response distributions, and derive an efﬁcient maximum likelihood procedure for estimating them. We apply these models to neuronal data collected in macaque pre-motor cortex, and quantify the improvement in coding accuracy afforded by incorporating the dependency structure between pairs of neurons. We ﬁnd that more than one third of neuron pairs shows dependency concentrated in the lower or upper tails for their ﬁring rate distribution. 1

6 0.50160253 62 nips-2008-Differentiable Sparse Coding

7 0.49939263 71 nips-2008-Efficient Sampling for Gaussian Process Inference using Control Variables

8 0.49301484 63 nips-2008-Dimensionality Reduction for Data in Multiple Feature Representations

9 0.49220461 192 nips-2008-Reducing statistical dependencies in natural signals using radial Gaussianization

10 0.4902637 27 nips-2008-Artificial Olfactory Brain for Mixture Identification

11 0.48990625 56 nips-2008-Deep Learning with Kernel Regularization for Visual Recognition

12 0.48980343 194 nips-2008-Regularized Learning with Networks of Features

13 0.48931217 66 nips-2008-Dynamic visual attention: searching for coding length increments

14 0.48896468 79 nips-2008-Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning

15 0.48891991 221 nips-2008-Stochastic Relational Models for Large-scale Dyadic Data using MCMC

16 0.48836601 200 nips-2008-Robust Kernel Principal Component Analysis

17 0.48551032 54 nips-2008-Covariance Estimation for High Dimensional Data Vectors Using the Sparse Matrix Transform

18 0.48531672 64 nips-2008-DiscLDA: Discriminative Learning for Dimensionality Reduction and Classification

19 0.48517251 75 nips-2008-Estimating vector fields using sparse basis field expansions

20 0.48515007 226 nips-2008-Supervised Dictionary Learning