nips nips2000 nips2000-122 knowledge-graph by maker-knowledge-mining

122 nips-2000-Sparse Representation for Gaussian Process Models


Source: pdf

Author: Lehel Csatč´¸, Manfred Opper

Abstract: We develop an approach for a sparse representation for Gaussian Process (GP) models in order to overcome the limitations of GPs caused by large data sets. The method is based on a combination of a Bayesian online algorithm together with a sequential construction of a relevant subsample of the data which fully specifies the prediction of the model. Experimental results on toy examples and large real-world data sets indicate the efficiency of the approach.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 uk Abstract We develop an approach for a sparse representation for Gaussian Process (GP) models in order to overcome the limitations of GPs caused by large data sets. [sent-3, score-0.441]

2 The method is based on a combination of a Bayesian online algorithm together with a sequential construction of a relevant subsample of the data which fully specifies the prediction of the model. [sent-4, score-0.618]

3 Experimental results on toy examples and large real-world data sets indicate the efficiency of the approach. [sent-5, score-0.101]

4 1 Introduction Gaussian processes (GP) [1; 15] provide promising non-parametric tools for modelling real-world statistical problems. [sent-6, score-0.219]

5 Support Vector Machines (SVMs) [13], they combine a high flexibility ofthe model by working in high (often 00) dimensional feature spaces with the simplicity that all operations are "kernelized" i. [sent-9, score-0.128]

6 they are performed in the (lower dimensional) input space using positive definite kernels. [sent-11, score-0.098]

7 An important advantage of GPs over other non-Bayesian models is the explicit probabilistic formulation of the model. [sent-12, score-0.128]

8 This does not only provide the modeller with (Bayesian) confidence intervals (for regression) or posterior class probabilities (for classification) but also immediately opens the possibility to treat other nonstandard data models (e. [sent-13, score-0.545]

9 Unfortunately the drawback of GP models (which was originally apparent in SVMs as well, but has now been overcome [6]) lies in the huge increase of the computational cost with the number of training data. [sent-16, score-0.505]

10 This seems to preclude applications of GPs to large datasets. [sent-17, score-0.137]

11 This paper presents an approach to overcome this problem. [sent-18, score-0.197]

12 It is based on a combination of an online learning approach requiring only a single sweep through the data and a method to reduce the number of parameters representing the model. [sent-19, score-0.477]

13 Making use of the proposed parametrisation the method extracts a subset of the examples and the prediction relies only on these basis vectors (BV). [sent-20, score-0.179]

14 The memory requirement of the algorithm scales thus only with the size of this set. [sent-21, score-0.054]

15 Experiments with real-world datasets confirm the good performance of the proposed method. [sent-22, score-0.161]

16 1 1A different approach for dealing with large datasets was suggested by V. [sent-23, score-0.216]

17 His method 2 Gaussian Process Models GPs belong to Bayesian non-parametric models where likelihoods are parametrised by a Gaussian stochastic process (random field) a(x) which is indexed by the continuous input variable x . [sent-25, score-0.556]

18 The prior knowledge about a is expressed in the prior mean and the covariance given by the kernel Ko(x,x') = Cov(a(x), a(x')) [14; 15]. [sent-26, score-0.276]

19 In supervised learning the process a(x) is used as a latent variable in the likelihood P(yla(x)) which denotes the probability of output Y given the input x . [sent-28, score-0.333]

20 Based on a set of input-output pairs (xn, Yn) with Xn E R m and Yn E R (n = 1, N) the Bayesian learning method computes the posterior distribution of the process a(x) using the prior and likelihood [14; 15; 3]. [sent-29, score-0.556]

21 Although the prior is a Gaussian process, the posterior process usually is not Gaussian (except for the special case of regression with Gaussian noise). [sent-30, score-0.587]

22 Nevertheless, various approaches have been introduced recently to approximate the posterior averages [11 ; 9]. [sent-31, score-0.257]

23 Our approach is based on the idea of approximating the true posterior process p{ a} by a Gaussian process q{a} which is fully specified by a covariance kernel Kt(x,x') and posterior mean (a(x))t, where t is the number of training data processed by the algorithm so far. [sent-32, score-1.215]

24 Such an approximation could be formulated within the variational approach, where q is chosen such that the relative entropy D(q,p) == Eq In ~ is minimal [9]. [sent-33, score-0.11]

25 However, in this formulation, the expectation is over the approximate process q rather than over p. [sent-34, score-0.293]

26 It seems intuitively better to minimise the other KL divergence given by D(p, q) == Ep In ~, because the expectation is over the true distribution. [sent-35, score-0.266]

27 The following online approach can be understood as an approximation to this task. [sent-37, score-0.331]

28 3 Online learning for Gaussian Processes In this section we briefly review the main idea of the Bayesian online approach (see e. [sent-38, score-0.269]

29 We process the training data sequentially one after the other. [sent-41, score-0.29]

30 Assume we have a Gaussian approximation to the posterior process at time t. [sent-42, score-0.506]

31 We use the next example t + 1 to update the posterior using Bayes rule via p(a) = P(Yt+1la(Xt+l))Pt(q) (P(Yt+1la(xt+1)))t Since the resulting posterior p(q) is non-Gaussian, we project it to the closest Gaussian process q which minimises the KL divergence D(p, q). [sent-43, score-0.881]

32 Note, that now the new approximation q is on "correct" side of the KL divergence. [sent-44, score-0.062]

33 The minimisation can be performed exactly, leading to a match of the means and covariances of p and q. [sent-45, score-0.175]

34 Derivatives are is based on splitting the data-set into smaller subsets and training individual GP predictors on each of them. [sent-47, score-0.269]

35 The final prediction is achieved by a specific weighting of the individual predictors. [sent-48, score-0.168]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('gp', 0.351), ('gps', 0.277), ('process', 0.232), ('kt', 0.223), ('posterior', 0.212), ('online', 0.207), ('gaussian', 0.166), ('predictors', 0.165), ('xt', 0.15), ('kl', 0.14), ('overcome', 0.135), ('yt', 0.112), ('bayesian', 0.11), ('datasets', 0.096), ('yn', 0.093), ('svms', 0.09), ('divergence', 0.085), ('kernel', 0.083), ('minimises', 0.082), ('preclude', 0.082), ('yla', 0.082), ('bv', 0.082), ('huge', 0.082), ('prediction', 0.077), ('regression', 0.075), ('ko', 0.075), ('birmingham', 0.075), ('quantum', 0.075), ('manfred', 0.075), ('tresp', 0.075), ('subsample', 0.075), ('dimensional', 0.074), ('fully', 0.071), ('kingdom', 0.069), ('opens', 0.069), ('cov', 0.069), ('unfortunately', 0.068), ('prior', 0.068), ('models', 0.067), ('sparse', 0.067), ('minimise', 0.065), ('confirm', 0.065), ('eq', 0.065), ('processes', 0.062), ('approximation', 0.062), ('approach', 0.062), ('originally', 0.061), ('minimisation', 0.061), ('united', 0.061), ('opper', 0.061), ('sweep', 0.061), ('covariances', 0.061), ('ep', 0.061), ('formulation', 0.061), ('expectation', 0.061), ('xn', 0.059), ('splitting', 0.058), ('sequentially', 0.058), ('closest', 0.058), ('extracts', 0.058), ('immediately', 0.058), ('dealing', 0.058), ('covariance', 0.057), ('apparent', 0.056), ('limitations', 0.056), ('tools', 0.056), ('moments', 0.056), ('toy', 0.056), ('likelihoods', 0.056), ('seems', 0.055), ('processed', 0.054), ('drawback', 0.054), ('flexibility', 0.054), ('promising', 0.054), ('caused', 0.054), ('requirement', 0.054), ('requiring', 0.054), ('performed', 0.053), ('variable', 0.053), ('belong', 0.052), ('indexed', 0.052), ('nevertheless', 0.052), ('pt', 0.05), ('lies', 0.05), ('specifies', 0.05), ('combination', 0.049), ('formulated', 0.048), ('latent', 0.048), ('confidence', 0.047), ('modelling', 0.047), ('scalar', 0.047), ('treat', 0.047), ('individual', 0.046), ('sequential', 0.045), ('efficiency', 0.045), ('averages', 0.045), ('definite', 0.045), ('weighting', 0.045), ('intervals', 0.045), ('method', 0.044)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999958 122 nips-2000-Sparse Representation for Gaussian Process Models

Author: Lehel Csatč´¸, Manfred Opper

Abstract: We develop an approach for a sparse representation for Gaussian Process (GP) models in order to overcome the limitations of GPs caused by large data sets. The method is based on a combination of a Bayesian online algorithm together with a sequential construction of a relevant subsample of the data which fully specifies the prediction of the model. Experimental results on toy examples and large real-world data sets indicate the efficiency of the approach.

2 0.2718972 35 nips-2000-Computing with Finite and Infinite Networks

Author: Ole Winther

Abstract: Using statistical mechanics results, I calculate learning curves (average generalization error) for Gaussian processes (GPs) and Bayesian neural networks (NNs) used for regression. Applying the results to learning a teacher defined by a two-layer network, I can directly compare GP and Bayesian NN learning. I find that a GP in general requires CJ (d S )-training examples to learn input features of order s (d is the input dimension), whereas a NN can learn the task with order the number of adjustable weights training examples. Since a GP can be considered as an infinite NN, the results show that even in the Bayesian approach, it is important to limit the complexity of the learning machine. The theoretical findings are confirmed in simulations with analytical GP learning and a NN mean field algorithm.

3 0.19099458 77 nips-2000-Learning Curves for Gaussian Processes Regression: A Framework for Good Approximations

Author: Dörthe Malzahn, Manfred Opper

Abstract: Based on a statistical mechanics approach, we develop a method for approximately computing average case learning curves for Gaussian process regression models. The approximation works well in the large sample size limit and for arbitrary dimensionality of the input space. We explain how the approximation can be systematically improved and argue that similar techniques can be applied to general likelihood models. 1

4 0.18070056 120 nips-2000-Sparse Greedy Gaussian Process Regression

Author: Alex J. Smola, Peter L. Bartlett

Abstract: We present a simple sparse greedy technique to approximate the maximum a posteriori estimate of Gaussian Processes with much improved scaling behaviour in the sample size m. In particular, computational requirements are O(n 2 m), storage is O(nm), the cost for prediction is 0 (n) and the cost to compute confidence bounds is O(nm), where n «: m. We show how to compute a stopping criterion, give bounds on the approximation error, and show applications to large scale problems. 1

5 0.14134574 75 nips-2000-Large Scale Bayes Point Machines

Author: Ralf Herbrich, Thore Graepel

Abstract: The concept of averaging over classifiers is fundamental to the Bayesian analysis of learning. Based on this viewpoint, it has recently been demonstrated for linear classifiers that the centre of mass of version space (the set of all classifiers consistent with the training set) - also known as the Bayes point - exhibits excellent generalisation abilities. However, the billiard algorithm as presented in [4] is restricted to small sample size because it requires o (m 2 ) of memory and 0 (N . m2 ) computational steps where m is the number of training patterns and N is the number of random draws from the posterior distribution. In this paper we present a method based on the simple perceptron learning algorithm which allows to overcome this algorithmic drawback. The method is algorithmically simple and is easily extended to the multi-class case. We present experimental results on the MNIST data set of handwritten digits which show that Bayes point machines (BPMs) are competitive with the current world champion, the support vector machine. In addition, the computational complexity of BPMs can be tuned by varying the number of samples from the posterior. Finally, rejecting test points on the basis of their (approximative) posterior probability leads to a rapid decrease in generalisation error, e.g. 0.1% generalisation error for a given rejection rate of 10%. 1

6 0.13801908 106 nips-2000-Propagation Algorithms for Variational Bayesian Learning

7 0.1122117 133 nips-2000-The Kernel Gibbs Sampler

8 0.10645082 121 nips-2000-Sparse Kernel Principal Component Analysis

9 0.096735582 85 nips-2000-Mixtures of Gaussian Processes

10 0.093308382 92 nips-2000-Occam's Razor

11 0.089327425 140 nips-2000-Tree-Based Modeling and Estimation of Gaussian Processes on Graphs with Cycles

12 0.080139771 76 nips-2000-Learning Continuous Distributions: Simulations With Field Theoretic Priors

13 0.077767558 54 nips-2000-Feature Selection for SVMs

14 0.076866433 17 nips-2000-Active Learning for Parameter Estimation in Bayesian Networks

15 0.073539421 130 nips-2000-Text Classification using String Kernels

16 0.070679806 137 nips-2000-The Unscented Particle Filter

17 0.07020513 134 nips-2000-The Kernel Trick for Distances

18 0.070021547 7 nips-2000-A New Approximate Maximal Margin Classification Algorithm

19 0.069804311 74 nips-2000-Kernel Expansions with Unlabeled Examples

20 0.067625515 20 nips-2000-Algebraic Information Geometry for Learning Machines with Singularities


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.27), (1, 0.099), (2, 0.075), (3, 0.009), (4, 0.142), (5, 0.142), (6, -0.097), (7, 0.096), (8, -0.011), (9, -0.271), (10, 0.153), (11, 0.114), (12, 0.012), (13, -0.09), (14, 0.018), (15, 0.1), (16, -0.159), (17, 0.103), (18, 0.061), (19, -0.066), (20, 0.003), (21, -0.176), (22, 0.282), (23, 0.187), (24, -0.013), (25, -0.034), (26, -0.066), (27, -0.006), (28, -0.053), (29, 0.018), (30, -0.035), (31, 0.042), (32, 0.036), (33, -0.022), (34, 0.097), (35, 0.14), (36, -0.017), (37, -0.026), (38, 0.031), (39, -0.08), (40, 0.006), (41, 0.067), (42, -0.045), (43, 0.041), (44, -0.043), (45, -0.014), (46, -0.017), (47, -0.116), (48, -0.026), (49, 0.038)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97592378 122 nips-2000-Sparse Representation for Gaussian Process Models

Author: Lehel Csatč´¸, Manfred Opper

Abstract: We develop an approach for a sparse representation for Gaussian Process (GP) models in order to overcome the limitations of GPs caused by large data sets. The method is based on a combination of a Bayesian online algorithm together with a sequential construction of a relevant subsample of the data which fully specifies the prediction of the model. Experimental results on toy examples and large real-world data sets indicate the efficiency of the approach.

2 0.81771523 35 nips-2000-Computing with Finite and Infinite Networks

Author: Ole Winther

Abstract: Using statistical mechanics results, I calculate learning curves (average generalization error) for Gaussian processes (GPs) and Bayesian neural networks (NNs) used for regression. Applying the results to learning a teacher defined by a two-layer network, I can directly compare GP and Bayesian NN learning. I find that a GP in general requires CJ (d S )-training examples to learn input features of order s (d is the input dimension), whereas a NN can learn the task with order the number of adjustable weights training examples. Since a GP can be considered as an infinite NN, the results show that even in the Bayesian approach, it is important to limit the complexity of the learning machine. The theoretical findings are confirmed in simulations with analytical GP learning and a NN mean field algorithm.

3 0.69499052 77 nips-2000-Learning Curves for Gaussian Processes Regression: A Framework for Good Approximations

Author: Dörthe Malzahn, Manfred Opper

Abstract: Based on a statistical mechanics approach, we develop a method for approximately computing average case learning curves for Gaussian process regression models. The approximation works well in the large sample size limit and for arbitrary dimensionality of the input space. We explain how the approximation can be systematically improved and argue that similar techniques can be applied to general likelihood models. 1

4 0.62464499 120 nips-2000-Sparse Greedy Gaussian Process Regression

Author: Alex J. Smola, Peter L. Bartlett

Abstract: We present a simple sparse greedy technique to approximate the maximum a posteriori estimate of Gaussian Processes with much improved scaling behaviour in the sample size m. In particular, computational requirements are O(n 2 m), storage is O(nm), the cost for prediction is 0 (n) and the cost to compute confidence bounds is O(nm), where n «: m. We show how to compute a stopping criterion, give bounds on the approximation error, and show applications to large scale problems. 1

5 0.56923932 85 nips-2000-Mixtures of Gaussian Processes

Author: Volker Tresp

Abstract: We introduce the mixture of Gaussian processes (MGP) model which is useful for applications in which the optimal bandwidth of a map is input dependent. The MGP is derived from the mixture of experts model and can also be used for modeling general conditional probability densities. We discuss how Gaussian processes -in particular in form of Gaussian process classification, the support vector machine and the MGP modelcan be used for quantifying the dependencies in graphical models.

6 0.40637574 106 nips-2000-Propagation Algorithms for Variational Bayesian Learning

7 0.40280211 133 nips-2000-The Kernel Gibbs Sampler

8 0.39336696 75 nips-2000-Large Scale Bayes Point Machines

9 0.37727976 140 nips-2000-Tree-Based Modeling and Estimation of Gaussian Processes on Graphs with Cycles

10 0.34930748 92 nips-2000-Occam's Razor

11 0.32446381 46 nips-2000-Ensemble Learning and Linear Response Theory for ICA

12 0.31628934 5 nips-2000-A Mathematical Programming Approach to the Kernel Fisher Algorithm

13 0.30984917 7 nips-2000-A New Approximate Maximal Margin Classification Algorithm

14 0.30652404 76 nips-2000-Learning Continuous Distributions: Simulations With Field Theoretic Priors

15 0.29793626 53 nips-2000-Feature Correspondence: A Markov Chain Monte Carlo Approach

16 0.29703459 54 nips-2000-Feature Selection for SVMs

17 0.28023604 121 nips-2000-Sparse Kernel Principal Component Analysis

18 0.27869755 74 nips-2000-Kernel Expansions with Unlabeled Examples

19 0.274831 115 nips-2000-Sequentially Fitting ``Inclusive'' Trees for Inference in Noisy-OR Networks

20 0.27042878 51 nips-2000-Factored Semi-Tied Covariance Matrices


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(10, 0.042), (17, 0.144), (33, 0.031), (54, 0.081), (55, 0.045), (62, 0.041), (67, 0.067), (75, 0.025), (76, 0.089), (81, 0.043), (88, 0.212), (90, 0.055), (91, 0.011), (97, 0.029)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.83124763 122 nips-2000-Sparse Representation for Gaussian Process Models

Author: Lehel Csatč´¸, Manfred Opper

Abstract: We develop an approach for a sparse representation for Gaussian Process (GP) models in order to overcome the limitations of GPs caused by large data sets. The method is based on a combination of a Bayesian online algorithm together with a sequential construction of a relevant subsample of the data which fully specifies the prediction of the model. Experimental results on toy examples and large real-world data sets indicate the efficiency of the approach.

2 0.67801028 140 nips-2000-Tree-Based Modeling and Estimation of Gaussian Processes on Graphs with Cycles

Author: Martin J. Wainwright, Erik B. Sudderth, Alan S. Willsky

Abstract: We present the embedded trees algorithm, an iterative technique for estimation of Gaussian processes defined on arbitrary graphs. By exactly solving a series of modified problems on embedded spanning trees, it computes the conditional means with an efficiency comparable to or better than other techniques. Unlike other methods, the embedded trees algorithm also computes exact error covariances. The error covariance computation is most efficient for graphs in which removing a small number of edges reveals an embedded tree. In this context, we demonstrate that sparse loopy graphs can provide a significant increase in modeling power relative to trees, with only a minor increase in estimation complexity. 1

3 0.67119902 10 nips-2000-A Productive, Systematic Framework for the Representation of Visual Structure

Author: Shimon Edelman, Nathan Intrator

Abstract: We describe a unified framework for the understanding of structure representation in primate vision. A model derived from this framework is shown to be effectively systematic in that it has the ability to interpret and associate together objects that are related through a rearrangement of common

4 0.65754431 35 nips-2000-Computing with Finite and Infinite Networks

Author: Ole Winther

Abstract: Using statistical mechanics results, I calculate learning curves (average generalization error) for Gaussian processes (GPs) and Bayesian neural networks (NNs) used for regression. Applying the results to learning a teacher defined by a two-layer network, I can directly compare GP and Bayesian NN learning. I find that a GP in general requires CJ (d S )-training examples to learn input features of order s (d is the input dimension), whereas a NN can learn the task with order the number of adjustable weights training examples. Since a GP can be considered as an infinite NN, the results show that even in the Bayesian approach, it is important to limit the complexity of the learning machine. The theoretical findings are confirmed in simulations with analytical GP learning and a NN mean field algorithm.

5 0.63024974 95 nips-2000-On a Connection between Kernel PCA and Metric Multidimensional Scaling

Author: Christopher K. I. Williams

Abstract: In this paper we show that the kernel peA algorithm of Sch6lkopf et al (1998) can be interpreted as a form of metric multidimensional scaling (MDS) when the kernel function k(x, y) is isotropic, i.e. it depends only on Ilx - yll. This leads to a metric MDS algorithm where the desired configuration of points is found via the solution of an eigenproblem rather than through the iterative optimization of the stress objective function. The question of kernel choice is also discussed. 1

6 0.62731606 104 nips-2000-Processing of Time Series by Neural Circuits with Biologically Realistic Synaptic Dynamics

7 0.62476498 74 nips-2000-Kernel Expansions with Unlabeled Examples

8 0.61965472 4 nips-2000-A Linear Programming Approach to Novelty Detection

9 0.61961347 60 nips-2000-Gaussianization

10 0.61936665 106 nips-2000-Propagation Algorithms for Variational Bayesian Learning

11 0.61498725 146 nips-2000-What Can a Single Neuron Compute?

12 0.6146723 7 nips-2000-A New Approximate Maximal Margin Classification Algorithm

13 0.61161339 79 nips-2000-Learning Segmentation by Random Walks

14 0.61104631 21 nips-2000-Algorithmic Stability and Generalization Performance

15 0.60991418 107 nips-2000-Rate-coded Restricted Boltzmann Machines for Face Recognition

16 0.60947102 51 nips-2000-Factored Semi-Tied Covariance Matrices

17 0.60944724 37 nips-2000-Convergence of Large Margin Separable Linear Classification

18 0.60922718 71 nips-2000-Interactive Parts Model: An Application to Recognition of On-line Cursive Script

19 0.60759318 130 nips-2000-Text Classification using String Kernels

20 0.60692686 98 nips-2000-Partially Observable SDE Models for Image Sequence Recognition Tasks