nips nips2003 nips2003-141 knowledge-graph by maker-knowledge-mining

141 nips-2003-Nonstationary Covariance Functions for Gaussian Process Regression

Source: pdf

Author: Christopher J. Paciorek, Mark J. Schervish

Abstract: We introduce a class of nonstationary covariance functions for Gaussian process (GP) regression. Nonstationary covariance functions allow the model to adapt to functions whose smoothness varies with the inputs. The class includes a nonstationary version of the Matérn stationary covariance, in which the differentiability of the regression function is controlled by a parameter, freeing one from ﬁxing the differentiability in advance. In experiments, the nonstationary GP regression model performs well when the input space is two or three dimensions, outperforming a neural network model and Bayesian free-knot spline models, and competitive with a Bayesian neural network, but is outperformed in one dimension by a state-of-the-art Bayesian free-knot spline model. The model readily generalizes to non-Gaussian data. Use of computational methods for speeding GP ﬁtting may allow for implementation of the method on larger datasets. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract We introduce a class of nonstationary covariance functions for Gaussian process (GP) regression. [sent-7, score-0.838]

2 Nonstationary covariance functions allow the model to adapt to functions whose smoothness varies with the inputs. [sent-8, score-0.548]

3 The class includes a nonstationary version of the Matérn stationary covariance, in which the differentiability of the regression function is controlled by a parameter, freeing one from ﬁxing the differentiability in advance. [sent-9, score-1.177]

4 1 Introduction Gaussian processes (GPs) have been used successfully for regression and classiﬁcation tasks. [sent-13, score-0.152]

5 Standard GP models use a stationary covariance, in which the covariance between any two points is a function of Euclidean distance. [sent-14, score-0.397]

6 However, stationary GPs fail to adapt to variable smoothness in the function of interest [1, 2]. [sent-15, score-0.304]

7 This is of particular importance in geophysical and other spatial datasets, in which domain knowledge suggests that the function may vary more quickly in some parts of the input space than in others. [sent-16, score-0.197]

8 Spatial statistics researchers have made some progress in deﬁning nonstationary covariance structures for kriging, a form of GP regression. [sent-18, score-0.77]

9 We extend the nonstationary covariance structure of [3], of which [1] gives a special case, to a class of nonstationary covariance functions. [sent-19, score-1.47]

10 The class includes a Matérn form, which in contrast to most covariance functions has the added ﬂexibility of a parameter that controls the differentiability of sample functions drawn from the GP distribution. [sent-20, score-0.639]

11 We use the nonstationary covariance structure for one, two, and three dimensional input spaces in a standard GP regression model, as done previously only for one-dimensional input spaces [1]. [sent-21, score-0.976]

12 The issue has been addressed in regression spline models by choosing the knot locations during the ﬁtting [6] and in smoothing splines by choosing an adaptive penalizer on the integrated squared derivative [7]. [sent-23, score-0.457]

13 The general approach in spline and other models involves learning the underlying basis functions, either explicitly or implicitly, rather than ﬁxing the functions in advance. [sent-24, score-0.208]

14 One alternative to a nonstationary GP model is mixtures of stationary GPs [8, 9]. [sent-25, score-0.72]

15 Such methods adapt to variable smoothness by using different stationary GPs in different parts of the input space. [sent-26, score-0.343]

16 The main difﬁculty is that the class membership is a function of the inputs; this involves additional unknown functions in the hierarchy of the model. [sent-27, score-0.174]

17 One possibility is to use stationary GPs for these additional unknown functions [8], while [9] reduce computational complexity by using a local estimate of the class membership, but do not know if the resulting model is well-deﬁned probabilistically. [sent-28, score-0.31]

18 In our model, there are unknown functions in the hierarchy of the model that determine the nonstationary covariance structure. [sent-30, score-0.906]

19 We choose to fully model the functions as Gaussian processes themselves, but recognize the computational cost and suggest that simpler representations are worth investigating. [sent-31, score-0.151]

20 2 Covariance functions and sample function differentiability The covariance function is crucial in GP regression because it controls how much the data are smoothed in estimating the unknown function. [sent-32, score-0.7]

21 GP distributions are distributions over functions; the covariance function determines the properties of sample functions drawn from the distribution. [sent-33, score-0.358]

22 The stochastic process literature gives conditions for determining sample function properties of GPs based on the covariance function of the process, summarized in [10] for several common covariance functions. [sent-34, score-0.526]

23 Stationary, isotropic covariance functions are functions only of Euclidean distance, τ . [sent-35, score-0.393]

24 Of particular note, the squared exponential (also called the Gaussian) covariance function, C(τ ) = σ 2 exp −(τ /κ)2 , where σ 2 is the variance and κ is a correlation scale parameter, has sample functions with inﬁnitely many derivatives. [sent-36, score-0.49]

25 In contrast, spline regression models have sample functions that are typically only twice differentiable. [sent-37, score-0.36]

26 In addition to being of theoretical concern from an asymptotic perspective [11], other covariance forms might better ﬁt real data for which it is unlikely that the unknown function is so highly differentiable. [sent-38, score-0.267]

27 In spatial statistics, the exponential covariance, C(τ ) = σ 2 exp (−τ /κ) , is commonly used, but this form gives sample functions that, while continuous, are not differentiable. [sent-39, score-0.236]

28 Recent work√ spatial statistics has in √ ν 1 focused on the Matérn form, C(τ ) = σ 2 Γ(ν)2ν−1 (2 ντ /κ) Kν (2 ντ /κ) , where Kν (·) is the modiﬁed Bessel function of the second kind, whose order is the differentiability parameter, ν > 0. [sent-40, score-0.311]

29 This form has the desirable property that sample functions are ν − 1 times differentiable. [sent-41, score-0.129]

30 Standard covariance functions require one to place all of one’s prior probability on a particular degree of differentiability; use of the Matérn allows one to more accurately, yet easily, express prior lack of knowledge about sample function differentiability. [sent-44, score-0.406]

31 [12] suggest using the squared exponential covariance but with anisotropic distance, τ (xi , xj ) = (xi − xj )T ∆−1 (xi − xj ), where ∆ is an arbitrary positive deﬁnite matrix, rather than the standard diagonal matrix. [sent-46, score-0.527]

32 The nonstationary covariance function we introduce next builds on this more general form. [sent-48, score-0.735]

33 3 Nonstationary covariance functions One nonstationary covariance function, introduced by [3], is C(xi , xj ) = 2 , and kx (·) is a ker2 kxi (u)kxj (u)du, where xi , xj , and u are locations in nel function centered at x. [sent-49, score-1.263]

34 The form (1) is a squared exponential correlation function, but in place of a ﬁxed matrix, ∆, in the quadratic form, we average the kernel matrices for the two locations. [sent-55, score-0.286]

35 The evolution of the kernel matrices in space produces nonstationary covariance, with kernels that drop off quickly producing locally short correlation scales. [sent-56, score-0.748]

36 Independently, [1] derived a special case in which the kernel matrices are diagonal. [sent-57, score-0.154]

37 Unfortunately, so long as the kernel matrices vary smoothly in the input space, sample functions from GPs with the covariance (1) are inﬁnitely differentiable [10], just as for the stationary squared exponential. [sent-58, score-0.874]

38 To generalize (1) and introduce functions for which sample path differentiability varies, we extend (1) as proven in [10]: Theorem 1 Let Qij be deﬁned as in (2). [sent-59, score-0.328]

39 If a stationary correlation function, R S (τ ), is positive deﬁnite on p for every p = 1, 2, . [sent-60, score-0.234]

40 , then 1 1 RN S (xi , xj ) = |Σi | 4 |Σj | 4 |(Σi + Σj ) /2| is a nonstationary correlation function, positive deﬁnite on 1 −2 p RS Qij (3) , p = 1, 2, . [sent-63, score-0.629]

41 One example of nonstationary covariance functions constructed in this way is a nonstationary version of the Matérn covariance, 1 C NS 1 σ 2 |Σi | 4 |Σj | 4 Σi + Σj (xi , xj ) = Γ(ν)2ν−1 2 −1 2 ν 2 νQij Kν 2 νQij . [sent-67, score-1.38]

42 (4) Provided the kernel matrices vary smoothly in space, the sample function differentiability of the nonstationary form follows that of the stationary form, so for the nonstationary Matérn, the sample function differentiability increases with ν [10]. [sent-68, score-1.945]

43 4 Bayesian regression model and implementation Assume independent observations, Y1 , . [sent-69, score-0.154]

44 , Yn , indexed by a vector of input or feature values, xi ∈ P , with Yi ∼ N (f (xi ), η 2 ), where η 2 is the noise variance. [sent-72, score-0.117]

45 Specify a Gaussian N N process prior, f (·) ∼ GP µf , Cf S (·, ·) , where Cf S (·, ·) is the nonstationary Matérn covariance function (4) constructed from a set of Gaussian kernels as described below. [sent-73, score-0.778]

46 For the differentiability parameter, we use the prior, νf ∼ U(0. [sent-74, score-0.199]

47 The main challenge is to parameterize the kernel matrices, since their evolution determines how quickly the covariance structure changes in the input space and the degree to which the model adapts to variable smoothness in the unknown function. [sent-78, score-0.534]

48 In many problems, it seems natural that the covariance structure would evolve smoothly; if so, the differentiability of the regression function will be determined by νf . [sent-79, score-0.558]

49 We put a prior distribution on the kernel matrices as follows. [sent-80, score-0.178]

50 Any location in the input space, xi , has a Gaussian kernel with mean xi and covariance (kernel) matrix, Σi . [sent-81, score-0.502]

51 When the input space is one-dimensional, each kernel ’matrix’ is just a scalar, the variance of the kernel, and we use a stationary Matérn GP prior on the log variance so that the variances evolve smoothly in the input space. [sent-82, score-0.444]

52 Next consider multi-dimensional input spaces; since there are (implicitly) kernel matrices at each location in the input space, we have a multivariate process, the matrix-valued function, Σ(·). [sent-83, score-0.276]

53 We use the spectral decomposition of an individual covariance matrix, Σi , Σi = Γ(γ1 (xi ), . [sent-85, score-0.229]

54 , Q, which are functions on the input space, construct Σ(·). [sent-101, score-0.121]

55 To have the kernel matrices vary smoothly, we ensure that their eigenvalues and eigenvectors vary smoothly by taking each φ(·) to have a GP prior with a single stationary, anisotropic Matérn correlation function, common to all the processes and described later. [sent-110, score-0.549]

56 Parameterizing the eigenvectors of the kernel matrices using Givens angles, with each angle a function on P , the input space, is difﬁcult, because the angle functions have range [0, 2π) ≡ S 1 , which is not compatible with the range of a GP. [sent-114, score-0.305]

57 Here, we demonstrate the construction of the eigenvectors for xi ∈ 2 and xi ∈ 3 ; a similar approach, albeit with more parameters, applies to higher-dimensional spaces, but is probably infeasible in dimensions larger than ﬁve or so. [sent-116, score-0.212]

58 In 3 , we construct an eigenvector matrix for an individual location as Γ = Γ3 Γ2 , where     a −b −ac 1 0 0 labc lab lab labc  b  u −v  a −bc Γ3 =  labc lab lab labc  , Γ2 =  0 luv luv . [sent-117, score-0.996]

59 v u lab c 0 luv luv 0 labc labc The elements of Γ3 are functions of three random variables, {A, B, C}, where labc = √ √ a2 + b2 + c2 and lab = a2 + b2 . [sent-118, score-0.76]

60 To have the matrices, Σ(·), vary smoothly in space, a, b, c, u and v, are the values of the processes, γ1 (·), . [sent-121, score-0.119]

61 In the stationary GP model, the marginal posterior contains a small number of hyperparameters to either optimize or sample via MCMC. [sent-126, score-0.25]

62 In the nonstationary case, the presence of the additional GPs for the kernel matrices (5) precludes straightforward optimization, leaving MCMC. [sent-127, score-0.66]

63 The parameter vector θ, involving P correlation scale parameters and P (P − 1)/2 Givens angles, is used to construct an anisotropic distance matrix, ∆(θ), shared by the φ vectors, creating a stationary, anisotropic correlation structure common to all the eigenprocesses. [sent-130, score-0.281]

64 L(∆(θ)) is a generalized Cholesky decomposition of the correlation matrix shared by the φ vectors that deals 12 6 0 8 0. [sent-132, score-0.127]

65 0 Figure 1: On the left are the three test functions in one dimension, with one simulated set of observations (of the 50 used in the evaluation), while the right shows the test function with two inputs. [sent-156, score-0.138]

66 with numerically singular correlation matrices by setting the ith column of the matrix to all zeroes when φi is numerically a linear combination of φ1 , . [sent-157, score-0.176]

67 One never calculates L(∆(θ))−1 or |L(∆(θ))|, which are not deﬁned, and does not need to introduce jitter, and therefore discontinuity in φ(·), into the covariance structure. [sent-161, score-0.229]

68 We use three test functions [6]: a smoothly-varying function, a spatially inhomogeneous function, and a function with a sharp jump (Figure 1a). [sent-163, score-0.178]

69 For each, we generate 50 sets of noisy data and compare the models using the means, ˆ ¯ ˆ averaged over the 50 sets, of the standardized MSE, i (fi − fi )2 / i (fi − f )2 , where fi ¯ is the mean of the true values. [sent-164, score-0.171]

70 In the non-Bayesian neural is the posterior mean at xi , and f ˆ network model, fi is the ﬁtted value and, as a simpliﬁcation, we use a network with the optimal number of hidden units (3, 3, and 8 for the three functions), thereby giving an overly optimistic assessment of the performance. [sent-165, score-0.307]

71 For higher-dimensional inputs, we compare the nonstationary GP to the stationary GP, the neural network models, and two free-knot spline methods, Bayesian multivariate linear splines (BMLS) [14] and Bayesian multivariate automatic regression splines (BMARS) [15], a Bayesian version of MARS [16]. [sent-167, score-1.354]

72 We choose to compare to neural networks and 1 N We implement the stationary GP model by replacing Cf S (·, ·) with the Matérn stationary correlation, still using a differentiability parameter, νf , that is allowed to vary. [sent-168, score-0.583]

73 Table 1: Mean (over 50 data samples) and 95% conﬁdence interval for standardized MSE for the ﬁve methods on the three test functions with one-dimensional input. [sent-173, score-0.187]

74 The second is a real dataset of air temperature as a function of latitude and longitude [17] that allows assessment on a spatial dataset with distinct variable smoothness. [sent-226, score-0.209]

75 Table 1 shows that the nonstationary GP does as well or better than the stationary GP, but that BARS does as well or better than the other methods on all three datasets with one input. [sent-236, score-0.712]

76 Part of the difﬁculty for the nonstationary GP with the third function, which has the sharp jump, is that our parameterization forces smoothly-varying kernel matrices, which prevents our particular implementation from picking up sharp jumps. [sent-237, score-0.671]

77 A potential improvement would be to parameterize kernel matrices that do not vary so smoothly. [sent-238, score-0.242]

78 Table 2 shows that for the known function on two dimensions, the GP models outperform both the spline models and the non-Bayesian neural network, but not the Bayesian network. [sent-239, score-0.152]

79 The stationary and nonstationary GPs are very similar, indicative of the relative homogeneity of the function. [sent-240, score-0.674]

80 For the two real datasets, the nonstationary GP model outperforms the other methods, except the Bayesian network on the temperature dataset. [sent-241, score-0.642]

81 Predictive density calculations that assess the ﬁts of the functions drawn during the MCMC are similar to the point estimate MSE calculations in terms of model comparison, although we do not have predictive density values for the non-Bayesian neural network implementation. [sent-242, score-0.193]

82 Take f (·) to have a nonstationary GP prior; it cannot be integrated out of the model because of the lack of conjugacy, which causes slow MCMC mixing. [sent-244, score-0.551]

83 Table 2: For test function with two inputs, mean (over 50 data samples) and 95% conﬁdence interval for standardized MSE at 225 test locations, and for the temperature and ozone datasets, cross-validated standardized MSE, for the six methods. [sent-246, score-0.357]

84 07 for a neural network implementation We ﬁt the model to the Tokyo rainfall dataset [19]. [sent-281, score-0.253]

85 The data are the presence of rainfall greater than 1 mm for every calendar day in 1983 and 1984. [sent-282, score-0.238]

86 Assuming independence between years [19], conditional on f (·) = logit(p(·)), the likelihood for a given calendar day, xi , is binomial with two trials and unknown probability of rainfall, p(xi ). [sent-283, score-0.235]

87 The model detects inhomogeneity in the function, with more smoothness in the ﬁrst few months and less smoothness later (Figure 2b). [sent-285, score-0.146]

88 8 (a) (b) 0 7 100 200 calendar day 300 Figure 2. [sent-290, score-0.123]

89 (a) Posterior mean estimate, from nonstationary GP model, of p(·), the probability of rainfall as a function of calendar day, with 95% pointwise credible intervals. [sent-291, score-0.698]

90 Dots are empirical probabilities of rainfall based on the two binomial trials. [sent-292, score-0.157]

91 (b) Posterior geometric mean kernel size (square root of geometric mean kernel eigenvalue). [sent-293, score-0.156]

92 Discussion We introduce a class of nonstationary covariance functions that can be used in GP regression (and classiﬁcation) models and allow the model to adapt to variable smoothness in the unknown function. [sent-294, score-1.118]

93 The nonstationary GPs improve on stationary GP models on several test datasets. [sent-295, score-0.702]

94 The nonstationary GP may be of particular interest for data indexed by spatial coordinates, where the low dimensionality keeps the parameter complexity manageable. [sent-297, score-0.583]

95 Unfortunately, the nonstationary GP requires many more parameters than a stationary GP, particularly as the dimension grows, losing the attractive simplicity of the stationary GP model. [sent-298, score-0.842]

96 Use of GP priors in the hierarchy of the model to parameterize the nonstationary covariance results in slow computation, limiting the feasibility of the model to approximately n < 1000, because the Cholesky decomposition is O(n3 ). [sent-299, score-0.871]

97 Also, approaches that use low-rank approximations to the covariance matrix [20, 21] may speed ﬁtting. [sent-301, score-0.263]

98 Bayesian inference for nonstationary spatial covariance structure via spatial deformations. [sent-339, score-0.889]

99 Bayesian mixture of splines for spatially adaptive nonparametric regression. [sent-448, score-0.165]

100 Adaptive Bayesian regression splines in semiparametric generalized linear models. [sent-462, score-0.241]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('nonstationary', 0.506), ('gp', 0.495), ('covariance', 0.229), ('differentiability', 0.199), ('mat', 0.182), ('stationary', 0.168), ('splines', 0.136), ('labc', 0.134), ('gps', 0.126), ('spline', 0.126), ('rainfall', 0.115), ('regression', 0.105), ('bayesian', 0.102), ('rn', 0.097), ('ozone', 0.096), ('mse', 0.088), ('qij', 0.085), ('functions', 0.082), ('kernel', 0.078), ('xi', 0.078), ('spatial', 0.077), ('calendar', 0.077), ('luv', 0.077), ('standardized', 0.077), ('matrices', 0.076), ('smoothly', 0.071), ('correlation', 0.066), ('network', 0.063), ('smoothness', 0.062), ('lab', 0.061), ('anisotropic', 0.061), ('bmars', 0.057), ('bmls', 0.057), ('xj', 0.057), ('temperature', 0.051), ('vary', 0.048), ('adapt', 0.048), ('sample', 0.047), ('fi', 0.047), ('processes', 0.047), ('day', 0.046), ('cf', 0.045), ('multivariate', 0.044), ('binomial', 0.042), ('parameterize', 0.04), ('input', 0.039), ('gaussian', 0.039), ('inputs', 0.038), ('unknown', 0.038), ('eigenprocesses', 0.038), ('givens', 0.038), ('paciorek', 0.038), ('radiation', 0.038), ('datasets', 0.038), ('jump', 0.038), ('squared', 0.036), ('statistics', 0.035), ('hyperparameters', 0.035), ('matrix', 0.034), ('schervish', 0.033), ('logit', 0.033), ('geophysical', 0.033), ('eigenvectors', 0.03), ('assessment', 0.03), ('exponential', 0.03), ('sharp', 0.03), ('xing', 0.03), ('adaptive', 0.029), ('spaces', 0.029), ('hierarchy', 0.029), ('cholesky', 0.028), ('eigenvector', 0.028), ('test', 0.028), ('shared', 0.027), ('dietterich', 0.027), ('editors', 0.027), ('implementation', 0.027), ('neural', 0.026), ('dimensions', 0.026), ('variable', 0.026), ('membership', 0.025), ('air', 0.025), ('evolve', 0.025), ('parameterizing', 0.025), ('locations', 0.025), ('prior', 0.024), ('volker', 0.024), ('biometrika', 0.024), ('bars', 0.024), ('mixtures', 0.024), ('varies', 0.023), ('slow', 0.023), ('angles', 0.022), ('mcmc', 0.022), ('model', 0.022), ('kernels', 0.022), ('massachusetts', 0.022), ('tresp', 0.022), ('competitive', 0.022), ('process', 0.021)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000002 141 nips-2003-Nonstationary Covariance Functions for Gaussian Process Regression

Author: Christopher J. Paciorek, Mark J. Schervish

2 0.39119184 194 nips-2003-Warped Gaussian Processes

Author: Edward Snelson, Zoubin Ghahramani, Carl E. Rasmussen

Abstract: We generalise the Gaussian process (GP) framework for regression by learning a nonlinear transformation of the GP outputs. This allows for non-Gaussian processes and non-Gaussian noise. The learning algorithm chooses a nonlinear transformation such that transformed data is well-modelled by a GP. This can be seen as including a preprocessing transformation as an integral part of the probabilistic modelling problem, rather than as an ad-hoc step. We demonstrate on several real regression problems that learning the transformation can lead to signiﬁcantly better performance than using a regular GP, or a GP with a ﬁxed transformation. 1

3 0.29148892 78 nips-2003-Gaussian Processes in Reinforcement Learning

Author: Malte Kuss, Carl E. Rasmussen

Abstract: We exploit some useful properties of Gaussian process (GP) regression models for reinforcement learning in continuous state spaces and discrete time. We demonstrate how the GP model allows evaluation of the value function in closed form. The resulting policy iteration algorithm is demonstrated on a simple problem with a two dimensional state space. Further, we speculate that the intrinsic ability of GP models to characterise distributions of functions would allow the method to capture entire distributions over future values instead of merely their expectation, which has traditionally been the focus of much of reinforcement learning.

4 0.13006499 76 nips-2003-GPPS: A Gaussian Process Positioning System for Cellular Networks

Author: Anton Schwaighofer, Marian Grigoras, Volker Tresp, Clemens Hoffmann

Abstract: In this article, we present a novel approach to solving the localization problem in cellular networks. The goal is to estimate a mobile user’s position, based on measurements of the signal strengths received from network base stations. Our solution works by building Gaussian process models for the distribution of signal strengths, as obtained in a series of calibration measurements. In the localization stage, the user’s position can be estimated by maximizing the likelihood of received signal strengths with respect to the position. We investigate the accuracy of the proposed approach on data obtained within a large indoor cellular network. 1

5 0.11431948 160 nips-2003-Prediction on Spike Data Using Kernel Algorithms

Author: Jan Eichhorn, Andreas Tolias, Alexander Zien, Malte Kuss, Jason Weston, Nikos Logothetis, Bernhard Schölkopf, Carl E. Rasmussen

Abstract: We report and compare the performance of different learning algorithms based on data from cortical recordings. The task is to predict the orientation of visual stimuli from the activity of a population of simultaneously recorded neurons. We compare several ways of improving the coding of the input (i.e., the spike data) as well as of the output (i.e., the orientation), and report the results obtained using different kernel algorithms. 1

6 0.10124949 114 nips-2003-Limiting Form of the Sample Covariance Eigenspectrum in PCA and Kernel PCA

7 0.079466678 79 nips-2003-Gene Expression Clustering with Functional Mixture Models

8 0.073854014 31 nips-2003-Approximate Analytical Bootstrap Averages for Support Vector Classifiers

9 0.069228783 117 nips-2003-Linear Response for Approximate Inference

10 0.067563176 112 nips-2003-Learning to Find Pre-Images

11 0.066634759 167 nips-2003-Robustness in Markov Decision Problems with Uncertain Transition Matrices

12 0.063555844 176 nips-2003-Sequential Bayesian Kernel Regression

13 0.062484626 9 nips-2003-A Kullback-Leibler Divergence Based Kernel for SVM Classification in Multimedia Applications

14 0.05970883 96 nips-2003-Invariant Pattern Recognition by Semi-Definite Programming Machines

15 0.058539577 92 nips-2003-Information Bottleneck for Gaussian Variables

16 0.056788385 80 nips-2003-Generalised Propagation for Fast Fourier Transforms with Partial or Missing Data

17 0.055388886 115 nips-2003-Linear Dependent Dimensionality Reduction

18 0.055206724 98 nips-2003-Kernel Dimensionality Reduction for Supervised Learning

19 0.054485846 73 nips-2003-Feature Selection in Clustering Problems

20 0.052146181 103 nips-2003-Learning Bounds for a Generalized Family of Bayesian Posterior Distributions

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.206), (1, 0.005), (2, -0.034), (3, 0.01), (4, 0.013), (5, 0.227), (6, 0.078), (7, -0.217), (8, 0.202), (9, 0.151), (10, -0.276), (11, 0.32), (12, 0.001), (13, -0.239), (14, 0.012), (15, -0.109), (16, -0.14), (17, 0.091), (18, -0.098), (19, -0.044), (20, -0.012), (21, -0.135), (22, -0.075), (23, -0.025), (24, 0.04), (25, 0.031), (26, -0.052), (27, 0.009), (28, -0.014), (29, 0.093), (30, 0.085), (31, -0.009), (32, -0.01), (33, -0.07), (34, 0.048), (35, -0.051), (36, -0.008), (37, -0.077), (38, -0.013), (39, 0.046), (40, 0.062), (41, 0.024), (42, -0.002), (43, 0.022), (44, -0.019), (45, 0.071), (46, 0.039), (47, -0.036), (48, -0.011), (49, 0.011)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94636256 141 nips-2003-Nonstationary Covariance Functions for Gaussian Process Regression

Author: Christopher J. Paciorek, Mark J. Schervish

2 0.93773043 194 nips-2003-Warped Gaussian Processes

Author: Edward Snelson, Zoubin Ghahramani, Carl E. Rasmussen

3 0.65069765 76 nips-2003-GPPS: A Gaussian Process Positioning System for Cellular Networks

Author: Anton Schwaighofer, Marian Grigoras, Volker Tresp, Clemens Hoffmann

4 0.62393445 78 nips-2003-Gaussian Processes in Reinforcement Learning

Author: Malte Kuss, Carl E. Rasmussen

5 0.32322457 160 nips-2003-Prediction on Spike Data Using Kernel Algorithms

Author: Jan Eichhorn, Andreas Tolias, Alexander Zien, Malte Kuss, Jason Weston, Nikos Logothetis, Bernhard Schölkopf, Carl E. Rasmussen

6 0.28939873 176 nips-2003-Sequential Bayesian Kernel Regression

7 0.26746738 178 nips-2003-Sparse Greedy Minimax Probability Machine Classification

8 0.26691815 114 nips-2003-Limiting Form of the Sample Covariance Eigenspectrum in PCA and Kernel PCA

9 0.26151168 98 nips-2003-Kernel Dimensionality Reduction for Supervised Learning

10 0.25470391 112 nips-2003-Learning to Find Pre-Images

11 0.24432071 196 nips-2003-Wormholes Improve Contrastive Divergence

12 0.24272364 77 nips-2003-Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data

13 0.24198028 79 nips-2003-Gene Expression Clustering with Functional Mixture Models

14 0.22957693 31 nips-2003-Approximate Analytical Bootstrap Averages for Support Vector Classifiers

15 0.22487926 96 nips-2003-Invariant Pattern Recognition by Semi-Definite Programming Machines

16 0.22197196 115 nips-2003-Linear Dependent Dimensionality Reduction

17 0.22153156 92 nips-2003-Information Bottleneck for Gaussian Variables

18 0.2213883 54 nips-2003-Discriminative Fields for Modeling Spatial Dependencies in Natural Images

19 0.21589802 126 nips-2003-Measure Based Regularization

20 0.21526542 66 nips-2003-Extreme Components Analysis

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.038), (11, 0.016), (27, 0.307), (29, 0.022), (30, 0.013), (35, 0.049), (48, 0.015), (53, 0.133), (69, 0.027), (71, 0.068), (76, 0.071), (85, 0.064), (91, 0.084), (99, 0.012)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.79198134 26 nips-2003-An MDP-Based Approach to Online Mechanism Design

Author: David C. Parkes, Satinder P. Singh

Abstract: Online mechanism design (MD) considers the problem of providing incentives to implement desired system-wide outcomes in systems with self-interested agents that arrive and depart dynamically. Agents can choose to misrepresent their arrival and departure times, in addition to information about their value for diﬀerent outcomes. We consider the problem of maximizing the total longterm value of the system despite the self-interest of agents. The online MD problem induces a Markov Decision Process (MDP), which when solved can be used to implement optimal policies in a truth-revealing Bayesian-Nash equilibrium. 1

same-paper 2 0.78461719 141 nips-2003-Nonstationary Covariance Functions for Gaussian Process Regression

Author: Christopher J. Paciorek, Mark J. Schervish

3 0.74408025 100 nips-2003-Laplace Propagation

Author: Eleazar Eskin, Alex J. Smola, S.v.n. Vishwanathan

Abstract: We present a novel method for approximate inference in Bayesian models and regularized risk functionals. It is based on the propagation of mean and variance derived from the Laplace approximation of conditional probabilities in factorizing distributions, much akin to Minka’s Expectation Propagation. In the jointly normal case, it coincides with the latter and belief propagation, whereas in the general case, it provides an optimization strategy containing Support Vector chunking, the Bayes Committee Machine, and Gaussian Process chunking as special cases. 1

4 0.5413546 126 nips-2003-Measure Based Regularization

Author: Olivier Bousquet, Olivier Chapelle, Matthias Hein

Abstract: We address in this paper the question of how the knowledge of the marginal distribution P (x) can be incorporated in a learning algorithm. We suggest three theoretical methods for taking into account this distribution for regularization and provide links to existing graph-based semi-supervised learning algorithms. We also propose practical implementations. 1

5 0.53853244 107 nips-2003-Learning Spectral Clustering

Author: Francis R. Bach, Michael I. Jordan

Abstract: Spectral clustering refers to a class of techniques which rely on the eigenstructure of a similarity matrix to partition points into disjoint clusters with points in the same cluster having high similarity and points in different clusters having low similarity. In this paper, we derive a new cost function for spectral clustering based on a measure of error between a given partition and a solution of the spectral relaxation of a minimum normalized cut problem. Minimizing this cost function with respect to the partition leads to a new spectral clustering algorithm. Minimizing with respect to the similarity matrix leads to an algorithm for learning the similarity matrix. We develop a tractable approximation of our cost function that is based on the power method of computing eigenvectors. 1

6 0.53772342 138 nips-2003-Non-linear CCA and PCA by Alignment of Local Models

7 0.53714579 112 nips-2003-Learning to Find Pre-Images

8 0.53700978 93 nips-2003-Information Dynamics and Emergent Computation in Recurrent Circuits of Spiking Neurons

9 0.53678793 80 nips-2003-Generalised Propagation for Fast Fourier Transforms with Partial or Missing Data

10 0.53631389 113 nips-2003-Learning with Local and Global Consistency

11 0.5352841 103 nips-2003-Learning Bounds for a Generalized Family of Bayesian Posterior Distributions

12 0.53506118 66 nips-2003-Extreme Components Analysis

13 0.53456724 82 nips-2003-Geometric Clustering Using the Information Bottleneck Method

14 0.53391701 179 nips-2003-Sparse Representation and Its Applications in Blind Source Separation

15 0.53362232 120 nips-2003-Locality Preserving Projections

16 0.53340495 9 nips-2003-A Kullback-Leibler Divergence Based Kernel for SVM Classification in Multimedia Applications

17 0.5329318 78 nips-2003-Gaussian Processes in Reinforcement Learning

18 0.5326997 47 nips-2003-Computing Gaussian Mixture Models with EM Using Equivalence Constraints

19 0.53199798 54 nips-2003-Discriminative Fields for Modeling Spatial Dependencies in Natural Images

20 0.53119415 115 nips-2003-Linear Dependent Dimensionality Reduction