nips nips2002 nips2002-41 knowledge-graph by maker-knowledge-mining

41 nips-2002-Bayesian Monte Carlo


Source: pdf

Author: Zoubin Ghahramani, Carl E. Rasmussen

Abstract: We investigate Bayesian alternatives to classical Monte Carlo methods for evaluating integrals. Bayesian Monte Carlo (BMC) allows the incorporation of prior knowledge, such as smoothness of the integrand, into the estimation. In a simple problem we show that this outperforms any classical importance sampling method. We also attempt more challenging multidimensional integrals involved in computing marginal likelihoods of statistical models (a.k.a. partition functions and model evidences). We find that Bayesian Monte Carlo outperformed Annealed Importance Sampling, although for very high dimensional problems or problems with massive multimodality BMC may be less adequate. One advantage of the Bayesian approach to Monte Carlo is that samples can be drawn from any distribution. This allows for the possibility of active design of sample points so as to maximise information gain.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 uk Abstract We investigate Bayesian alternatives to classical Monte Carlo methods for evaluating integrals. [sent-8, score-0.096]

2 Bayesian Monte Carlo (BMC) allows the incorporation of prior knowledge, such as smoothness of the integrand, into the estimation. [sent-9, score-0.072]

3 In a simple problem we show that this outperforms any classical importance sampling method. [sent-10, score-0.4]

4 We also attempt more challenging multidimensional integrals involved in computing marginal likelihoods of statistical models (a. [sent-11, score-0.222]

5 One advantage of the Bayesian approach to Monte Carlo is that samples can be drawn from any distribution. [sent-16, score-0.185]

6 This allows for the possibility of active design of sample points so as to maximise information gain. [sent-17, score-0.164]

7 This leads to several inconsistencies which we review below, outlined in a paper by O’Hagan [1987] with the title “Monte Carlo is Fundamentally Unsound”. [sent-22, score-0.075]

8 We then investigate Bayesian counterparts to the classical Monte Carlo. [sent-23, score-0.096]

9 For example, could be the posterior distribution and the predictions made by a model with parameters , or could be the parameter prior and the likelihood so that equation (1) evaluates the marginal likelihood (evidence) for a model. [sent-26, score-0.343]

10    ©   © ¡    ©   ©  © $#¡   $© ¤ £¡ ¢  ¦   As O’Hagan [1987] points out, there are two important objections to these procedures. [sent-29, score-0.091]

11 First, the estimator not only depends on the values of but also on the entirely arbitrary choice of the sampling distribution . [sent-30, score-0.165]

12 Thus, if the same set of samples , conveying exactly the same information about , were obtained from two different sampling distributions, two different estimates of would be obtained. [sent-31, score-0.337]

13 The second objection is that classical Monte Carlo procedures entirely ignore the values of the when forming the estimate. [sent-33, score-0.127]

14 Consider the simple example of three points that are sampled from and the third happens to fall on the same point as the second, , conveying no extra information about the integrand. [sent-34, score-0.135]

15 Simply averaging the integrand at these three points, which is the classical Monte Carlo estimate, is clearly inappropriate; it would make much more sense to average the first two (or the first and third). [sent-35, score-0.216]

16 In practice points are unlikely to fall on top of each other in continuous spaces, however, a procedure that weights points equally regardless of their spatial distribution is ignoring relevant information. [sent-36, score-0.155]

17 To summarize the objections, classical Monte Carlo bases its estimate on irrelevant information and throws away relevant information. [sent-37, score-0.154]

18  ¤ ¢ ¡ ¦     #$    " ¤ #"    " ¨    We seek to turn the problem of evaluating the integral (1) into a Bayesian inference problem which, as we will see, avoids the inconsistencies of classical Monte Carlo and can result in better estimates. [sent-39, score-0.31]

19 Although this interpretation is not the most usual one, it is entirely consistent with the Bayesian view that all forms of uncertainty are represented using probabilities: in this case uncertainty arises because we cannot afford to compute at every location. [sent-41, score-0.113]

20 Since the desired is a function of (which is unknown until we evaluate it) we proceed by putting a prior on , combining it with the observations to obtain the posterior over which in turn implies a distribution over the desired . [sent-42, score-0.118]

21 ¢ ¡  © $¡  © ¡ ¡ ¢  £¡ ¡ ¢ ¡ A very convenient way of putting priors over functions is through Gaussian Processes (GP). [sent-43, score-0.084]

22 The covariance matrix is given by the covariance function, a convenient choice being:1 (5) " $ C B$  6   T ¢  ` a bX  X  X  $© ¨ Y§ VWU T R P H G E C X¥ ¢ SQIFD¤    6  © $¡ "  ¢  © ¡ © A9 7 6¢ B@8¤ ¨4 C where the parameters are hyperparameters. [sent-45, score-0.164]

23 1 Although the function values obtained are assumed to be noise-free, we added a tiny constant to the diagonal of the covariance matrix to improve numerical conditioning. [sent-47, score-0.092]

24 2 The Bayesian Monte Carlo Method   ¡ # ©  The Bayesian Monte Carlo method starts with a prior over the function, and makes inferences about from a set of samples giving the posterior distribution . [sent-48, score-0.269]

25 Under a GP prior the posterior is (an infinite dimensional joint) Gaussian; since the integral eq. [sent-49, score-0.219]

26  © $#  If the density and the covariance function eq. [sent-62, score-0.118]

27 1 A Simple Example To illustrate the method we evaluated the integral of a one-dimensional function under a Gaussian density (figure 1, left). [sent-70, score-0.18]

28 We generated samples independently from , evaluated at those points, and optimised the hyperparameters of our Gaussian process fit to the function. [sent-71, score-0.228]

29 Figure 1 (middle) compares the error in the Bayesian Monte Carlo (BMC) estimate of the integral (1) to the Simple Monte Carlo (SMC) estimate using the same samples. [sent-72, score-0.199]

30 As we would expect the squared error in the Simple Monte Carlo estimate decreases where is the sample size. [sent-73, score-0.092]

31 This is achieved because the prior on allows  ©  ¡ £  © ¡ £`¢ 1 the method to interpolate between sample points. [sent-75, score-0.104]

32 Moreover, whereas the SMC estimate is invariant to permutations of the values on the axis, BMC makes use of the smoothness of the function. [sent-76, score-0.084]

33 In SMC if two samples happen to fall close to each other the function value there will be counted with double weight. [sent-78, score-0.199]

34 This effect means that large numbers of samples are needed to adequately represent . [sent-79, score-0.155]

35  $# ©  © $# In figure 1 left, the negative log density of the true value of the integral under the predictive distributions are compared for BMC and SMC. [sent-84, score-0.261]

36 For not too small sample sizes, BMC outperforms SMC. [sent-85, score-0.057]

37 Notice however, that for very small sample sizes BMC occasionally has very bad performance. [sent-86, score-0.134]

38 This problem is to a large extent caused by the optimization of the length scale hyperparameters of the covariance function; we ought instead to have integrated over all possible length scales. [sent-88, score-0.14]

39 This integration would effectively “blend in” distributions with much larger variance (since the data is also consistent with a shorter length scale), thus alleviating the problem, but unfortunately this is not possible in closed form. [sent-89, score-0.107]

40 The problem disappears for sample sizes of around 16 or greater. [sent-90, score-0.102]

41 2 Optimal Importance Sampler For the simple example discussed above, it is also interesting to ask whether the efficiency of SMC could be improved by generating independent samples from more-cleverly designed distributions. [sent-98, score-0.155]

42 As we have seen in equation (3), importance sampling gives an unbiased estimate of by sampling from and computing:  ¦ ¦   ©    $ #  ©¦   ©  © $¡ ¦¥ ¢ (13)  ¦ ¤ ¢ ¢  £¡  ¤ ¡ ¤ £ . [sent-99, score-0.499]

43 The variance of this estimator is given by:   ¥ T  ¦$ ¢   ¡ S! [sent-100, score-0.068]

44 Using calculus of variations it is simple to show that the optimal (minimum variance) importance sampling distribution is:  © (  ©  £¡ ( (14)  © ¡ § '  ! [sent-102, score-0.304]

45   ¡ −2 10 Bayesian inference Simple Monte Carlo Optimal importance 0. [sent-107, score-0.217]

46 5 −4 Bayesian inference Simple Monte Carlo minus log density of correct value function f(x) measure p(x) 0. [sent-116, score-0.154]

47 5 20 15 10 5 0 −5 −7 −2 0 2 10 4 1 2 10 10 sample size 1 10 2 10 sample size ¡ Figure 1: Left: a simple one-dimensional function (full) and Gaussian density (dashed) with respect to which we wish to integrate . [sent-117, score-0.214]

48 Middle: average squared error for simple Monte Carlo sampling from (dashed), the optimal achievable bound for importance sampling (dot-dashed), and the Bayesian Monte Carlo estimates. [sent-118, score-0.438]

49 Right: Minus the log of the Gaussian predictive density with mean eq. [sent-120, score-0.106]

50 (7), evaluated at the true value of the integral (found by numerical integration), ‘x’. [sent-122, score-0.18]

51 Similarly for the Simple Monte Carlo procedure, where the mean and variance of the predictive distribution are computed from the samples, ’o’. [sent-123, score-0.093]

52 negative values which is a constant times the variance of a Bernoulli random variable (sign ). [sent-125, score-0.068]

53 The lower bound from this optimal importance sampler as a function of number of samples is shown in figure 1, middle. [sent-126, score-0.396]

54 As we can see, Bayesian Monte Carlo improves on the optimal importance sampler considerably. [sent-127, score-0.241]

55 We stress that the optimal importance sampler is not practically achievable since it requires knowledge of the quantity we are trying to estimate. [sent-128, score-0.241]

56 3 Computing Marginal Likelihoods We now consider the problem of estimating the marginal likelihood of a statistical model. [sent-129, score-0.156]

57 Here we compare the Bayesian Monte Carlo method to two other techniques: Simple Monte Carlo sampling (SMC) and Annealed Importance Sampling (AIS). [sent-132, score-0.134]

58 Simple Monte Carlo, sampling from the prior, is generally considered inadequate for this problem, because the likelihood is typically sharply peaked and samples from the prior are unlikely to fall in these confined areas, leading to huge variance in the estimates (although they are unbiased). [sent-133, score-0.539]

59 A family of promising “thermodynamic integration” techniques for computing marginal likelihoods are discussed under the name of Bridge and Path sampling in [Gelman and Meng, 1998] and Annealed Importance Sampling (AIS) in [Neal, 2001]. [sent-134, score-0.346]

60 The central idea is to divide one difficult integral into a series of easier ones, parameterised by (inverse) temperature, . [sent-135, score-0.129]

61 Each of the intermediate ratios are much easier to compute than the original ratio, since the likelihood function to the power of a small number is much better behaved that the likelihood itself. [sent-138, score-0.154]

62 Often elaborate non-linear cooling schedules are used, but for simplicity we will just take a linear schedule for the inverse temperature. [sent-139, score-0.057]

63 The samples at each temperature are drawn using a single Metropolis proposal, where the proposal width is chosen to get a fairly high fraction of acceptances. [sent-140, score-0.295]

64 The model in question for which we attempt to compute the marginal likelihood was itself a Gaussian process regression fit to the an artificial dataset suggested by [Friedman, 1988]. [sent-141, score-0.18]

65 2 We had length scale hyperparameters, a signal variance ( ) and an explicit noise variance parameter. [sent-142, score-0.136]

66 Thus the marginal likelihood is an integral over a 7 dimensional priors. [sent-143, score-0.285]

67 The log of the hyperparameters are given E C  W # $¤ $ "" 2 © 1 ! [sent-145, score-0.103]

68 Further, the difference between AIS and SMC would be more dramatic in higher dimensions and for more highly peaked likelihood functions (i. [sent-148, score-0.091]

69 The Bayesian Monte Carlo method was run on the same samples as were generate by the AIS procedure. [sent-151, score-0.155]

70 Note that BMC can use samples from any distribution, as long as can be evaluated. [sent-152, score-0.155]

71 Another obvious choice for generating samples for BMC would be to use an MCMC method to draw samples from the posterior. [sent-153, score-0.335]

72 Because BMC needs to model the integrand using a GP, we need to limit the number of samples since computation (for fitting hyperparameters and computing the ’s) scales as . [sent-154, score-0.377]

73 Thus for sample size greater than we limit the number of samples to , chosen equally spaced from the AIS Markov chain. [sent-155, score-0.237]

74 Despite this thinning of the samples we see a generally superior performance of BMC, especially for smaller sample sizes. [sent-156, score-0.212]

75 In fact, BMC seems to perform equally well for almost any of the investigated sample sizes. [sent-157, score-0.082]

76 Even for this fairly large number of samples, the generation of points from the AIS still dominates compute time. [sent-158, score-0.067]

77  ©  " ¥ '# 0)2 U % '# (&2 U 4 Discussion An important aspect which we have not explored in this paper is the idea that the GP model used to fit the integrand gives errorbars (uncertainties) on the integrand. [sent-159, score-0.12]

78 These error bars 2 The data was 100 samples generated from the 5-dimensional function , where is zero mean unit variance Gaussian noise and the inputs are sampled independently from a uniform [0, 1] distribution. [sent-160, score-0.223]

79 The true value (solid straight line) is estimated from a single sample long run of AIS. [sent-162, score-0.083]

80 For comparison, the maximum log likelihood is (which is an upper bound on the true value). [sent-163, score-0.121]

81 A simple approach would be to evaluate the function at points where the GP has large uncertainty and is not too small: the expected contribution to the uncertainty in the estimate of the . [sent-167, score-0.136]

82 For a fixed Gaussian Process covariance function these design integral scales as points can often be pre-computed, see e. [sent-168, score-0.266]

83 However, as we are adapting the covariance function depending on the observed function values, active learning would have to be an integral part of the procedure. [sent-171, score-0.233]

84 Classical Monte Carlo approaches cannot make use of active learning since the samples need to be drawn from a given distribution. [sent-172, score-0.222]

85  © $# When using BMC to compute marginal likelihoods, the Gaussian covariance function used here (equation 5) is not ideally suited to modeling the likelihood. [sent-175, score-0.182]

86 Firstly, likelihoods are non-negative whereas the prior is not restricted in the values the function can take. [sent-176, score-0.117]

87 Secondly, the likelihood tends to have some regions of high magnitude and variability and other regions which are low and flat; this is not well-modelled by a stationary covariance function. [sent-177, score-0.132]

88 In practice this misfit between the GP prior and the function modelled has even occasionally led to negative values for the estimate of the marginal likelihood! [sent-178, score-0.205]

89 An importance distribution such as one computed from a Laplace approximation or a mixture of Gaussians can be used to dampen the variability in the integrand [Kennedy, 1998]. [sent-180, score-0.29]

90 The GP could be used to model the log of the likelihood [Rasmussen, 2002]; however this makes integration more difficult. [sent-181, score-0.158]

91 Although the choice of Gaussian process priors is computationally convenient in certain circumstances, in general other function approximation priors can be used to model the integrand. [sent-183, score-0.105]

92 For discrete (or mixed) variables the GP model could still be used with appropriate choice of covariance function. [sent-184, score-0.067]

93 In such cases a large number of samples may be required to obtain good estimates of the function. [sent-189, score-0.155]

94 Inference using a Gaussian Process prior is at present limited computationally to a few thousand samples. [sent-190, score-0.07]

95 This contrasts with classical MC where many methods only require that samples can be drawn from some distribution , for which the normalising constant is not necessarily known (such as in equation 16). [sent-194, score-0.281]

96 Unfortunately, this limitation makes it difficult, for example, to design a Bayesian analogue to Annealed Importance Sampling. [sent-195, score-0.051]

97   ©   © $# We believe that the problem of computing an integral using a limited number of function evaluations should be treated as an inference problem and that all prior knowledge about the function being integrated should be incorporated into the inference. [sent-196, score-0.279]

98 Bayesian quadrature with non-normal approximating functions, Statistics and Computing, 8, pp. [sent-206, score-0.071]

99 (1998) Simulating normalizing constants: From importance sampling to bridge sampling to path sampling, Statistical Science, vol. [sent-219, score-0.471]

100 (2000) Deriving quadrature rules from Gaussian processes, Technical Report, Statistics Department, Carnegie Mellon University. [sent-224, score-0.071]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('bmc', 0.527), ('carlo', 0.338), ('monte', 0.323), ('smc', 0.24), ('ais', 0.208), ('importance', 0.17), ('samples', 0.155), ('hagan', 0.146), ('sampling', 0.134), ('gp', 0.13), ('integral', 0.129), ('bayesian', 0.126), ('integrand', 0.12), ('annealed', 0.106), ('classical', 0.096), ('marginal', 0.091), ('rasmussen', 0.084), ('hyperparameters', 0.073), ('kennedy', 0.072), ('sampler', 0.071), ('quadrature', 0.071), ('likelihoods', 0.07), ('variance', 0.068), ('covariance', 0.067), ('gaussian', 0.066), ('likelihood', 0.065), ('sample', 0.057), ('schedule', 0.057), ('temperature', 0.053), ('density', 0.051), ('conveying', 0.048), ('gelman', 0.048), ('meng', 0.048), ('objections', 0.048), ('inference', 0.047), ('prior', 0.047), ('sizes', 0.045), ('fall', 0.044), ('points', 0.043), ('posterior', 0.043), ('unsound', 0.042), ('neal', 0.041), ('integration', 0.039), ('dif', 0.038), ('berger', 0.038), ('bernardo', 0.038), ('inconsistencies', 0.038), ('outlined', 0.037), ('active', 0.037), ('processes', 0.036), ('metropolis', 0.035), ('eds', 0.035), ('minka', 0.035), ('estimate', 0.035), ('bridge', 0.033), ('dawid', 0.033), ('proposal', 0.032), ('occasionally', 0.032), ('evaluates', 0.032), ('integrals', 0.032), ('draws', 0.032), ('entirely', 0.031), ('cult', 0.031), ('convenient', 0.03), ('log', 0.03), ('drawn', 0.03), ('computing', 0.029), ('smith', 0.029), ('laplace', 0.029), ('mackay', 0.029), ('uncertainty', 0.029), ('dashed', 0.029), ('statistics', 0.029), ('fundamentally', 0.028), ('putting', 0.028), ('detail', 0.028), ('annealing', 0.027), ('evaluations', 0.027), ('design', 0.027), ('priors', 0.026), ('true', 0.026), ('unbiased', 0.026), ('peaked', 0.026), ('minus', 0.026), ('wish', 0.026), ('predictive', 0.025), ('equally', 0.025), ('draw', 0.025), ('inferred', 0.025), ('smoothness', 0.025), ('numerical', 0.025), ('fraction', 0.025), ('makes', 0.024), ('compute', 0.024), ('friedman', 0.024), ('williams', 0.024), ('irrelevant', 0.023), ('integrate', 0.023), ('computationally', 0.023), ('name', 0.022)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999958 41 nips-2002-Bayesian Monte Carlo

Author: Zoubin Ghahramani, Carl E. Rasmussen

Abstract: We investigate Bayesian alternatives to classical Monte Carlo methods for evaluating integrals. Bayesian Monte Carlo (BMC) allows the incorporation of prior knowledge, such as smoothness of the integrand, into the estimation. In a simple problem we show that this outperforms any classical importance sampling method. We also attempt more challenging multidimensional integrals involved in computing marginal likelihoods of statistical models (a.k.a. partition functions and model evidences). We find that Bayesian Monte Carlo outperformed Annealed Importance Sampling, although for very high dimensional problems or problems with massive multimodality BMC may be less adequate. One advantage of the Bayesian approach to Monte Carlo is that samples can be drawn from any distribution. This allows for the possibility of active design of sample points so as to maximise information gain.

2 0.1804474 174 nips-2002-Regularized Greedy Importance Sampling

Author: Finnegan Southey, Dale Schuurmans, Ali Ghodsi

Abstract: Greedy importance sampling is an unbiased estimation technique that reduces the variance of standard importance sampling by explicitly searching for modes in the estimation objective. Previous work has demonstrated the feasibility of implementing this method and proved that the technique is unbiased in both discrete and continuous domains. In this paper we present a reformulation of greedy importance sampling that eliminates the free parameters from the original estimator, and introduces a new regularization strategy that further reduces variance without compromising unbiasedness. The resulting estimator is shown to be effective for difficult estimation problems arising in Markov random field inference. In particular, improvements are achieved over standard MCMC estimators when the distribution has multiple peaked modes.

3 0.17184967 116 nips-2002-Interpreting Neural Response Variability as Monte Carlo Sampling of the Posterior

Author: Patrik O. Hoyer, Aapo Hyvärinen

Abstract: The responses of cortical sensory neurons are notoriously variable, with the number of spikes evoked by identical stimuli varying significantly from trial to trial. This variability is most often interpreted as ‘noise’, purely detrimental to the sensory system. In this paper, we propose an alternative view in which the variability is related to the uncertainty, about world parameters, which is inherent in the sensory stimulus. Specifically, the responses of a population of neurons are interpreted as stochastic samples from the posterior distribution in a latent variable model. In addition to giving theoretical arguments supporting such a representational scheme, we provide simulations suggesting how some aspects of response variability might be understood in this framework.

4 0.10676968 95 nips-2002-Gaussian Process Priors with Uncertain Inputs Application to Multiple-Step Ahead Time Series Forecasting

Author: Agathe Girard, Carl Edward Rasmussen, Joaquin Quiñonero Candela, Roderick Murray-Smith

Abstract: We consider the problem of multi-step ahead prediction in time series analysis using the non-parametric Gaussian process model. -step ahead forecasting of a discrete-time non-linear dynamic system can be performed by doing repeated one-step ahead predictions. For a state-space model of the form , the prediction of at time is based on the point estimates of the previous outputs. In this paper, we show how, using an analytical Gaussian approximation, we can formally incorporate the uncertainty about intermediate regressor values, thus updating the uncertainty on the current prediction.   ¡ % # ¢ ¡     ¢ ¡¨ ¦ ¤ ¢ $

5 0.10173774 86 nips-2002-Fast Sparse Gaussian Process Methods: The Informative Vector Machine

Author: Ralf Herbrich, Neil D. Lawrence, Matthias Seeger

Abstract: We present a framework for sparse Gaussian process (GP) methods which uses forward selection with criteria based on informationtheoretic principles, previously suggested for active learning. Our goal is not only to learn d–sparse predictors (which can be evaluated in O(d) rather than O(n), d n, n the number of training points), but also to perform training under strong restrictions on time and memory requirements. The scaling of our method is at most O(n · d2 ), and in large real-world classification experiments we show that it can match prediction performance of the popular support vector machine (SVM), yet can be significantly faster in training. In contrast to the SVM, our approximation produces estimates of predictive probabilities (‘error bars’), allows for Bayesian model selection and is less complex in implementation. 1

6 0.097390942 110 nips-2002-Incremental Gaussian Processes

7 0.094457604 38 nips-2002-Bayesian Estimation of Time-Frequency Coefficients for Audio Signal Enhancement

8 0.092680342 168 nips-2002-Real-Time Monitoring of Complex Industrial Processes with Particle Filters

9 0.089595273 65 nips-2002-Derivative Observations in Gaussian Process Models of Dynamic Systems

10 0.088911057 169 nips-2002-Real-Time Particle Filters

11 0.083815604 21 nips-2002-Adaptive Classification by Variational Kalman Filtering

12 0.080314659 157 nips-2002-On the Dirichlet Prior and Bayesian Regularization

13 0.07017611 124 nips-2002-Learning Graphical Models with Mercer Kernels

14 0.067239031 181 nips-2002-Self Supervised Boosting

15 0.064340368 64 nips-2002-Data-Dependent Bounds for Bayesian Mixture Methods

16 0.059496887 79 nips-2002-Evidence Optimization Techniques for Estimating Stimulus-Response Functions

17 0.058464844 114 nips-2002-Information Regularization with Partially Labeled Data

18 0.056309782 39 nips-2002-Bayesian Image Super-Resolution

19 0.052945632 17 nips-2002-A Statistical Mechanics Approach to Approximate Analytical Bootstrap Averages

20 0.052494217 73 nips-2002-Dynamic Bayesian Networks with Deterministic Latent Tables


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.174), (1, -0.005), (2, -0.033), (3, 0.062), (4, -0.027), (5, 0.032), (6, -0.177), (7, 0.096), (8, 0.023), (9, 0.038), (10, 0.057), (11, -0.075), (12, 0.218), (13, 0.064), (14, 0.025), (15, -0.036), (16, -0.126), (17, -0.043), (18, 0.028), (19, 0.006), (20, 0.08), (21, 0.016), (22, -0.056), (23, 0.154), (24, -0.081), (25, -0.215), (26, 0.003), (27, 0.089), (28, -0.081), (29, 0.108), (30, 0.278), (31, 0.008), (32, -0.074), (33, -0.031), (34, 0.152), (35, -0.015), (36, -0.068), (37, -0.084), (38, 0.129), (39, 0.109), (40, -0.09), (41, -0.025), (42, -0.01), (43, 0.013), (44, 0.014), (45, -0.054), (46, -0.11), (47, -0.041), (48, -0.012), (49, -0.081)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96642107 41 nips-2002-Bayesian Monte Carlo

Author: Zoubin Ghahramani, Carl E. Rasmussen

Abstract: We investigate Bayesian alternatives to classical Monte Carlo methods for evaluating integrals. Bayesian Monte Carlo (BMC) allows the incorporation of prior knowledge, such as smoothness of the integrand, into the estimation. In a simple problem we show that this outperforms any classical importance sampling method. We also attempt more challenging multidimensional integrals involved in computing marginal likelihoods of statistical models (a.k.a. partition functions and model evidences). We find that Bayesian Monte Carlo outperformed Annealed Importance Sampling, although for very high dimensional problems or problems with massive multimodality BMC may be less adequate. One advantage of the Bayesian approach to Monte Carlo is that samples can be drawn from any distribution. This allows for the possibility of active design of sample points so as to maximise information gain.

2 0.85586739 174 nips-2002-Regularized Greedy Importance Sampling

Author: Finnegan Southey, Dale Schuurmans, Ali Ghodsi

Abstract: Greedy importance sampling is an unbiased estimation technique that reduces the variance of standard importance sampling by explicitly searching for modes in the estimation objective. Previous work has demonstrated the feasibility of implementing this method and proved that the technique is unbiased in both discrete and continuous domains. In this paper we present a reformulation of greedy importance sampling that eliminates the free parameters from the original estimator, and introduces a new regularization strategy that further reduces variance without compromising unbiasedness. The resulting estimator is shown to be effective for difficult estimation problems arising in Markov random field inference. In particular, improvements are achieved over standard MCMC estimators when the distribution has multiple peaked modes.

3 0.63489741 168 nips-2002-Real-Time Monitoring of Complex Industrial Processes with Particle Filters

Author: Rubén Morales-menéndez, Nando D. Freitas, David Poole

Abstract: This paper discusses the application of particle filtering algorithms to fault diagnosis in complex industrial processes. We consider two ubiquitous processes: an industrial dryer and a level tank. For these applications, we compared three particle filtering variants: standard particle filtering, Rao-Blackwellised particle filtering and a version of RaoBlackwellised particle filtering that does one-step look-ahead to select good sampling regions. We show that the overhead of the extra processing per particle of the more sophisticated methods is more than compensated by the decrease in error and variance.

4 0.58674788 95 nips-2002-Gaussian Process Priors with Uncertain Inputs Application to Multiple-Step Ahead Time Series Forecasting

Author: Agathe Girard, Carl Edward Rasmussen, Joaquin Quiñonero Candela, Roderick Murray-Smith

Abstract: We consider the problem of multi-step ahead prediction in time series analysis using the non-parametric Gaussian process model. -step ahead forecasting of a discrete-time non-linear dynamic system can be performed by doing repeated one-step ahead predictions. For a state-space model of the form , the prediction of at time is based on the point estimates of the previous outputs. In this paper, we show how, using an analytical Gaussian approximation, we can formally incorporate the uncertainty about intermediate regressor values, thus updating the uncertainty on the current prediction.   ¡ % # ¢ ¡     ¢ ¡¨ ¦ ¤ ¢ $

5 0.53454542 116 nips-2002-Interpreting Neural Response Variability as Monte Carlo Sampling of the Posterior

Author: Patrik O. Hoyer, Aapo Hyvärinen

Abstract: The responses of cortical sensory neurons are notoriously variable, with the number of spikes evoked by identical stimuli varying significantly from trial to trial. This variability is most often interpreted as ‘noise’, purely detrimental to the sensory system. In this paper, we propose an alternative view in which the variability is related to the uncertainty, about world parameters, which is inherent in the sensory stimulus. Specifically, the responses of a population of neurons are interpreted as stochastic samples from the posterior distribution in a latent variable model. In addition to giving theoretical arguments supporting such a representational scheme, we provide simulations suggesting how some aspects of response variability might be understood in this framework.

6 0.4401806 169 nips-2002-Real-Time Particle Filters

7 0.43952072 65 nips-2002-Derivative Observations in Gaussian Process Models of Dynamic Systems

8 0.43708253 107 nips-2002-Identity Uncertainty and Citation Matching

9 0.42358708 201 nips-2002-Transductive and Inductive Methods for Approximate Gaussian Process Regression

10 0.41186833 86 nips-2002-Fast Sparse Gaussian Process Methods: The Informative Vector Machine

11 0.39768654 110 nips-2002-Incremental Gaussian Processes

12 0.38400763 124 nips-2002-Learning Graphical Models with Mercer Kernels

13 0.37917382 38 nips-2002-Bayesian Estimation of Time-Frequency Coefficients for Audio Signal Enhancement

14 0.35202888 114 nips-2002-Information Regularization with Partially Labeled Data

15 0.33340555 157 nips-2002-On the Dirichlet Prior and Bayesian Regularization

16 0.31068498 81 nips-2002-Expected and Unexpected Uncertainty: ACh and NE in the Neocortex

17 0.29036096 37 nips-2002-Automatic Derivation of Statistical Algorithms: The EM Family and Beyond

18 0.28949863 181 nips-2002-Self Supervised Boosting

19 0.28345591 178 nips-2002-Robust Novelty Detection with Single-Class MPM

20 0.2782096 138 nips-2002-Manifold Parzen Windows


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(11, 0.107), (23, 0.026), (42, 0.097), (54, 0.103), (55, 0.047), (67, 0.023), (68, 0.034), (74, 0.079), (76, 0.209), (92, 0.033), (98, 0.145)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.84229302 41 nips-2002-Bayesian Monte Carlo

Author: Zoubin Ghahramani, Carl E. Rasmussen

Abstract: We investigate Bayesian alternatives to classical Monte Carlo methods for evaluating integrals. Bayesian Monte Carlo (BMC) allows the incorporation of prior knowledge, such as smoothness of the integrand, into the estimation. In a simple problem we show that this outperforms any classical importance sampling method. We also attempt more challenging multidimensional integrals involved in computing marginal likelihoods of statistical models (a.k.a. partition functions and model evidences). We find that Bayesian Monte Carlo outperformed Annealed Importance Sampling, although for very high dimensional problems or problems with massive multimodality BMC may be less adequate. One advantage of the Bayesian approach to Monte Carlo is that samples can be drawn from any distribution. This allows for the possibility of active design of sample points so as to maximise information gain.

2 0.72382993 174 nips-2002-Regularized Greedy Importance Sampling

Author: Finnegan Southey, Dale Schuurmans, Ali Ghodsi

Abstract: Greedy importance sampling is an unbiased estimation technique that reduces the variance of standard importance sampling by explicitly searching for modes in the estimation objective. Previous work has demonstrated the feasibility of implementing this method and proved that the technique is unbiased in both discrete and continuous domains. In this paper we present a reformulation of greedy importance sampling that eliminates the free parameters from the original estimator, and introduces a new regularization strategy that further reduces variance without compromising unbiasedness. The resulting estimator is shown to be effective for difficult estimation problems arising in Markov random field inference. In particular, improvements are achieved over standard MCMC estimators when the distribution has multiple peaked modes.

3 0.70967889 127 nips-2002-Learning Sparse Topographic Representations with Products of Student-t Distributions

Author: Max Welling, Simon Osindero, Geoffrey E. Hinton

Abstract: We propose a model for natural images in which the probability of an image is proportional to the product of the probabilities of some filter outputs. We encourage the system to find sparse features by using a Studentt distribution to model each filter output. If the t-distribution is used to model the combined outputs of sets of neurally adjacent filters, the system learns a topographic map in which the orientation, spatial frequency and location of the filters change smoothly across the map. Even though maximum likelihood learning is intractable in our model, the product form allows a relatively efficient learning procedure that works well even for highly overcomplete sets of filters. Once the model has been learned it can be used as a prior to derive the “iterated Wiener filter” for the purpose of denoising images.

4 0.70428491 163 nips-2002-Prediction and Semantic Association

Author: Thomas L. Griffiths, Mark Steyvers

Abstract: We explore the consequences of viewing semantic association as the result of attempting to predict the concepts likely to arise in a particular context. We argue that the success of existing accounts of semantic representation comes as a result of indirectly addressing this problem, and show that a closer correspondence to human data can be obtained by taking a probabilistic approach that explicitly models the generative structure of language. 1

5 0.69605881 158 nips-2002-One-Class LP Classifiers for Dissimilarity Representations

Author: Elzbieta Pekalska, David Tax, Robert Duin

Abstract: Problems in which abnormal or novel situations should be detected can be approached by describing the domain of the class of typical examples. These applications come from the areas of machine diagnostics, fault detection, illness identification or, in principle, refer to any problem where little knowledge is available outside the typical class. In this paper we explain why proximities are natural representations for domain descriptors and we propose a simple one-class classifier for dissimilarity representations. By the use of linear programming an efficient one-class description can be found, based on a small number of prototype objects. This classifier can be made (1) more robust by transforming the dissimilarities and (2) cheaper to compute by using a reduced representation set. Finally, a comparison to a comparable one-class classifier by Campbell and Bennett is given.

6 0.69573534 125 nips-2002-Learning Semantic Similarity

7 0.69267482 170 nips-2002-Real Time Voice Processing with Audiovisual Feedback: Toward Autonomous Agents with Perfect Pitch

8 0.68658394 46 nips-2002-Boosting Density Estimation

9 0.68304884 88 nips-2002-Feature Selection and Classification on Matrix Data: From Large Margins to Small Covering Numbers

10 0.67989194 3 nips-2002-A Convergent Form of Approximate Policy Iteration

11 0.67979985 65 nips-2002-Derivative Observations in Gaussian Process Models of Dynamic Systems

12 0.67956007 11 nips-2002-A Model for Real-Time Computation in Generic Neural Microcircuits

13 0.67955434 102 nips-2002-Hidden Markov Model of Cortical Synaptic Plasticity: Derivation of the Learning Rule

14 0.67919219 21 nips-2002-Adaptive Classification by Variational Kalman Filtering

15 0.67818773 169 nips-2002-Real-Time Particle Filters

16 0.67731357 204 nips-2002-VIBES: A Variational Inference Engine for Bayesian Networks

17 0.67679888 52 nips-2002-Cluster Kernels for Semi-Supervised Learning

18 0.67593515 24 nips-2002-Adaptive Scaling for Feature Selection in SVMs

19 0.67578053 147 nips-2002-Monaural Speech Separation

20 0.67521608 10 nips-2002-A Model for Learning Variance Components of Natural Images