nips nips2012 nips2012-11 knowledge-graph by maker-knowledge-mining

11 nips-2012-A Marginalized Particle Gaussian Process Regression


Source: pdf

Author: Yali Wang, Brahim Chaib-draa

Abstract: We present a novel marginalized particle Gaussian process (MPGP) regression, which provides a fast, accurate online Bayesian filtering framework to model the latent function. Using a state space model established by the data construction procedure, our MPGP recursively filters out the estimation of hidden function values by a Gaussian mixture. Meanwhile, it provides a new online method for training hyperparameters with a number of weighted particles. We demonstrate the estimated performance of our MPGP on both simulated and real large data sets. The results show that our MPGP is a robust estimation algorithm with high computational efficiency, which outperforms other state-of-art sparse GP methods. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 ca Abstract We present a novel marginalized particle Gaussian process (MPGP) regression, which provides a fast, accurate online Bayesian filtering framework to model the latent function. [sent-4, score-0.185]

2 Using a state space model established by the data construction procedure, our MPGP recursively filters out the estimation of hidden function values by a Gaussian mixture. [sent-5, score-0.063]

3 Meanwhile, it provides a new online method for training hyperparameters with a number of weighted particles. [sent-6, score-0.08]

4 The results show that our MPGP is a robust estimation algorithm with high computational efficiency, which outperforms other state-of-art sparse GP methods. [sent-8, score-0.032]

5 However, the O(n3 ) computational load for training the GP model would severely limit its applicability in practice when the number of training points n is larger than a few thousand [1]. [sent-10, score-0.072]

6 One typical method is a sparse pseudo-input Gaussian process (SPGP) [2] that uses a pseudo-input data set with m inputs (m n) to parameterize the GP predictive distribution to reduce the computational burden. [sent-12, score-0.033]

7 Then a sparse spectrum Gaussian process (SSGP) [3] was proposed to further improve the performance of SPGP while retaining the computational efficiency by using a stationary trigonometric Bayesian model with m basis functions. [sent-13, score-0.033]

8 However, both SPGP and SSGP learn hyperparameters offline by maximizing the marginal likelihood before making the inference. [sent-14, score-0.051]

9 Another recent model is a Kalman filter Gaussian process (KFGP) [4] which reduces computation load by correlating function values of data subsets at each Kalman filter iteration. [sent-16, score-0.059]

10 But it still causes underfitting or overfitting if the hyperparameters are badly learned offline. [sent-17, score-0.042]

11 On the contrary, we propose in this paper an online marginalized particle filter to simultaneously learn the hyperprameters and hidden function values. [sent-18, score-0.192]

12 By collecting small data subsets sequentially, we establish a novel state space model which allows us to estimate the marginal posterior distribution (not the marginal likelihood) of hyperparameters online with a number of weighted particles. [sent-19, score-0.116]

13 For each particle, a Kalman filter is applied to estimate the posterior distribution of hidden function values. [sent-20, score-0.034]

14 2 Data Construction In practice, the whole training data set is usually constructed by gathering small subsets several times. [sent-22, score-0.031]

15 For the tth collection, the training subset (Xt , yt ) consists of nt input-output pairs: n 1 i {(x1 , yt ), · · · (xnt , yt t )}. [sent-23, score-0.2]

16 Each scalar output yt is generated from a nonlinear function f (xi ) of a t t t d-dimensional input vector xi with an additive Gaussian noise N (0, a2 ). [sent-24, score-0.054]

17 All the pairs are separately t 0 organized as an input matrix Xt and output vector yt . [sent-25, score-0.054]

18 The goal refers to a regression issue - estimating the function value of f (x) at m test inputs X = [x1 , · · · xm ] given (X1:T , y1:T ). [sent-27, score-0.066]

19 Similar to a Gaussian distribution specified by a mean vector and covariance matrix, a GP is fully defined by a mean function m(x) = E[f (x)] and covariance function k(x, x ) = E[(f (x) − m(x))(f (x ) − m(x ))]. [sent-30, score-0.038]

20 Moreover, due to the spatial nonstationary phenomena in the real world, we choose k(x, x ) as kSE (x, x ) + kN N (x, x ) where kSE = a2 exp[−0. [sent-32, score-0.041]

21 5a−2 (x − x )T (x − x )] is the stationary squared exponential covariance func1 2 tion, kN N = a2 sin−1 [a−2 xT x ((1 + a−2 xT x)(1 + a−2 x T x ))−0. [sent-33, score-0.028]

22 5 ] is the nonstationary neural 3 4 ˜ ˜ 4 ˜ ˜ 4 ˜ ˜ network covariance function with the augmented input x = [1 xT ]T . [sent-34, score-0.051]

23 The regression problem could be solved by the standard GP with the following two steps: First, learning θ given (X1:T , y1:T ). [sent-36, score-0.032]

24 In order to derive a computational tractable GP model which preserves the estimation accuracy, we firstly explore a state space model from the data construction procedure, then propose a marginalized particle filter to estimate the hidden f (X ) and θ in an online Bayesian filtering framework. [sent-44, score-0.235]

25 1 State Space Model The standard state space model (SSM) consists of the state equation and observation equation. [sent-46, score-0.031]

26 The state equation reflects the Markovian evolution of hidden states (the hyperparamters and function values). [sent-47, score-0.052]

27 For hidden function values, we attempt to explore the relation between c c the (t − 1)th and tth data subset. [sent-51, score-0.039]

28 For simplicity, we denoted Xt = Xt ∪ X and ftc = f (Xt ). [sent-52, score-0.29]

29 2 Bayesian Inference by Marginalized Particle Filter In contrast to the GP regression with a two-step offline inference in section 3, we propose an online filtering framework to simultaneously learn hyperparameters and estimate hidden function values. [sent-59, score-0.098]

30 According to the SSM before, the inference problem refers to compute the posterior distribution p(ftc , θ1:t |X1:t , X , y1:t ). [sent-60, score-0.034]

31 Hence we choose another popular technique - particle filter. [sent-62, score-0.107]

32 However, for our SSM, the traditional sampling importance resampling (SIR) particle filter will introduce the unnecessary computational load due to the fact that (5) in the SSM is a linear structure given θt . [sent-63, score-0.162]

33 This inspires us to apply a more efficient marginalized particle filter (also called Rao-Blackwellised particle filter) [9, 11, 12, 13] to deal with the estimation problem by combining Kalman filter into particle filter. [sent-64, score-0.399]

34 Using Bayes rule, the posterior could be factorized as p(ftc , θ1:t |X1:t , X , y1:t ) = p(θ1:t |X1:t , X , y1:t )p(ftc |θ1:t , X1:t , X , y1:t ) p(θ1:t |X1:t , X , y1:t ) refers to a marginal posterior which could be solved by particle filter. [sent-65, score-0.205]

35 After obtaining the estimation of θ1:t , the second term p(ftc |θ1:t , X1:t , X , y1:t ) could be computed by Kalman filter since ftc is the hidden state in the linear substructure (equation (5)) of SSM. [sent-66, score-0.368]

36 The computational cost of the marginalized particle filter is governed by O(N T S 3 ) [10] where N is the number of particles, T is the number of data collections, S is the size of each collection. [sent-86, score-0.153]

37 Moreover, the MPGP propagates the previous estimation to improve the current accuracy in the recursive filtering framework. [sent-88, score-0.048]

38 From the algorithm above, we also find that f (X ) is estimated as a Gaussian mixture at each iteration since each hyperparameter particle accompanies with a Kalman filter for f (X ). [sent-89, score-0.137]

39 (a-b) show the estimation for f1 at t = 10 by SE-KFGP (blue line with blue dashed interval in (a)), SE-MPGP (red line with red dashed interval in (a)), SENN-KFGP (blue line with blue dashed interval in (b)), SENN-MPGP (red line with red dashed interval in (b)). [sent-125, score-0.12]

40 The black crosses are the training outputs at t = 10, the black line is the true f (X ). [sent-126, score-0.031]

41 (i-m), (n-r) are the estimation of the log hyperparameters (log(a0 ) to log(a4 )) for f1 , f2 over time. [sent-128, score-0.09]

42 But the offline learning procedure in KFGP will either take a long time using a large extra training data or fall into an unsatisfactory local optimum using a small extra training data. [sent-131, score-0.047]

43 In our MPGP, the local optimum could be used as the initial setting of hyperparameters, then the underlying θ could be learned online by the marginalized particle filter to improve the performance. [sent-132, score-0.211]

44 For f1 (x), we gather the training data with 100 collections. [sent-137, score-0.037]

45 For each collection, we randomly select 30 inputs from [-2, 2], then calculate their outputs by adding a Gaussian noise N (0, 0. [sent-138, score-0.02]

46 For f2 (x), we gather the training data with 50 collections. [sent-153, score-0.037]

47 For each collection, we randomly select 60 inputs from [0, 1], then calculate their outputs by adding a Gaussian noise N (0, 0. [sent-154, score-0.02]

48 The first experiment aims to evaluate the estimation performance in comparison of KFGP in [4]. [sent-158, score-0.053]

49 We denote SE-KFGP, SENN-KFGP as KFGP with the covariance function kSE , KFGP with the covariance function kSE + kN N . [sent-159, score-0.038]

50 The first row is for f1 , the second row is for f2 . [sent-196, score-0.024]

51 First, it is shown in Figure 1 that the estimation performance for both KFGP and MPGP is getting better and attempts to convergence over time (refers to (a-h)) since the previous estimation would be incorporated into the current estimation in the recursive Bayesian filtering. [sent-200, score-0.112]

52 Second, for both f1 and f2 , the estimation of MPGP is better than KFGP via the NMSE and MNLP comparison in Figure 2. [sent-201, score-0.032]

53 The KFGP uses offline learned hyperparameters all the time. [sent-202, score-0.042]

54 On the contrary, MPGP initializes hyperparameters using the ones by KFGP, then online learns the true hyperparameters (refers to (i-r) in Figure 1). [sent-203, score-0.103]

55 Finally, if we only focus on our MPGP, then we could find SENN-MPGP is better than SE-MPGP since SENN-MPGP takes the spatial nonstationary phenomenon into account. [sent-205, score-0.056]

56 The second experiment aims to illustrate the average performance of SE-MPGP and SENN-MPGP when the number of particles increases. [sent-206, score-0.126]

57 The reason is that the estimation accuracy and computational load of particle filters will increase when the number of particles increases. [sent-209, score-0.278]

58 Second, the average performance of SENN-MPGP is better than SE-MPGP since it captures the spatial nonstationarity, but SENN-MPGP needs more running time since the size of the hyperparameter vector to be inferred will increase. [sent-210, score-0.048]

59 The third experiment aims to compare our MPGP with the benchmarks. [sent-211, score-0.021]

60 The state-of-art sparse GP methods we choose are: sparse pseudo-input Gaussian process (SPGP) [2] and sparse spectrum Gaussian process (SSGP) [3]. [sent-212, score-0.037]

61 Moreover, we also want to examine the robustness of our MPGP since we should clarify whether the good estimation of our MPGP heavily depends on the order of training data collection. [sent-213, score-0.062]

62 Hence, we randomly interrupt the order of training subsets we used before, then implement SPGP with 5 pseudo inputs (5-SPGP), SSGP with 10 basis functions (10SSGP), SE-MPGP with 5 particles (5-SE-MPGP), SENN-MPGP with 5 particles (5-SENN-MPGP). [sent-214, score-0.279]

63 The reason is the synthetic functions are nonstationary but SE-MPGP uses a stationary SE kernel. [sent-268, score-0.041]

64 Hence we perform 5-SENN-MPGP with a nonstationary kernel to show that our MPGP is competitive with SSGP, and much better with shorter running time than SPGP. [sent-269, score-0.06]

65 We first gather the training data with 100 collections. [sent-274, score-0.037]

66 For each collection, we randomly select 90 data points where the input vector is the longitude and latitude location, the output is the temperature (o C). [sent-275, score-0.161]

67 There are two test data sets: the first one is a grid test input set (Longitude: -180:40:180, Latitude: -90:20:90) that is used to show the estimated surface temperature. [sent-276, score-0.033]

68 The second test input set (100 points) is randomly selected from the data website after obtaining all the training data. [sent-277, score-0.028]

69 The first experiment aims to show the predicted surface temperature at the grid test inputs. [sent-278, score-0.082]

70 We set the number of particles in the SE-MPGP and SENN-MPGP as 20. [sent-279, score-0.105]

71 From Figure 4, the KFGP methods stuck in the local optimum: SE-KFGP seems underfitting since it does not model the cold region around the location (100, 50), SENN-KFGP seems overfitting since it unexpectedly models the cold region around (-100, -50). [sent-280, score-0.036]

72 On the contrary, SE-MPGP and SENN-MPGP suitably fit the data set via the hyperparameter online learning. [sent-281, score-0.04]

73 The second experiment is to evaluate the estimation error of our MPGP using the second test data. [sent-282, score-0.05]

74 Moreover, SENN-MPGP is much lower than SE-MPGP, which shows that SENN-MPGP successfully models the spatial nonstationarity of the temperature data. [sent-285, score-0.064]

75 We also randomly interrupt the order of training subsets for the robustness consideration. [sent-291, score-0.049]

76 From Table 2, the comparison results show that our MPGP uses a shorter running time with a better estimation performance than SPGP and SSGP. [sent-292, score-0.06]

77 We firstly collect the training data 9 times, and 35 training data for each collection. [sent-296, score-0.038]

78 From Table 2, our SENN-MPGP obtains the estimation with the fastest speed and the smallest NMSE among all the methods, and the MNLP is competitive to SPGP. [sent-298, score-0.032]

79 7 90 90 90 90 90 50 8 50 50 50 50 0 0 0 latitude 0 latitude latitude 2 latitude 4 latitude 6 0 0 −2 −4 −50 −50 −50 −50 −50 −6 −90 −180 −8 −100 −90 100 180 −180 0 longitude 0. [sent-299, score-0.372]

80 98 0 50 t 100 −1 0 50 t 100 Figure 4: The temperature estimation at t = 100. [sent-336, score-0.069]

81 The first row (from left to right): the temperature value bar, the full training observation plot, the grid test output estimation by SE-KFGP, SENNKFGP, SE-MPGP, SENN-MPGP. [sent-337, score-0.109]

82 The second row (from left to right) is the estimation of log hyperparameters (log(a0 ) to log(a4 )). [sent-339, score-0.102]

83 Our MPGP framework does not only estimate the function value successfully, but it also provides a new technique for learning the unknown static hyperparameters by online estimating the marginal posterior of hyperparameters. [sent-368, score-0.084]

84 The small training set at each iteration would largely reduce the computation load while the estimation performance is improved over iteration due to the fact that recursive filtering would propagate the previous estimation to enhance the current estimation. [sent-369, score-0.151]

85 In comparison with other benchmarks, we have shown that our MPGP could provide a robust estimation with a competitively computational speed. [sent-370, score-0.047]

86 In the future, it would be interesting to explore the time-varying function estimation with our MPGP. [sent-371, score-0.032]

87 Ghahramani, Sparse gaussian processes using pseudo-inputs, in: NIPS, 2006, pp. [sent-380, score-0.043]

88 -Vidal, Sparse spectrum gaussian process regression, Journal of Machine Learning Research 11 (2010) 1865–1881. [sent-391, score-0.058]

89 Roberts, An introduction to gaussian processes for the kalman filter expert, in: FUSION, 2010. [sent-394, score-0.119]

90 Neal, Monte carlo implementation of gaussian process models for bayesian regression and classification, Tech. [sent-397, score-0.103]

91 MacKay, Introduction to gaussian processes, in: Neural Networks and Machine Learning, 1998, pp. [sent-403, score-0.034]

92 West, Combined parameter and state estimation in simulation-based filtering, in: Sequential Monte Carlo Methods in Practice, 2001, pp. [sent-412, score-0.043]

93 Kadirkamanathan, Estimation of parameters in a linear state space model using a Rao-Blackwellised particle filter, IEE Proceedings on Control Theory and Applications 151 (2004) 727–738. [sent-417, score-0.118]

94 Maciejowski, An overview of squential Monte Carlo methods for parameter estimation in general state space models, in: 15 th IFAC Symposium on System Identification, 2009. [sent-424, score-0.043]

95 Russell, Rao-Blackwellised particle filtering for dynamic Bayesian networks, in: UAI, 2000, pp. [sent-429, score-0.107]

96 de Freitas, Rao-Blackwellised particle filtering for fault diagnosis, in: IEEE Aerospace Conference Proceedings, 2002, pp. [sent-432, score-0.107]

97 Nordlund, Marginalized particle filters for mixed linear/nonlinear o state-space models, IEEE Transactions on Signal Processing 53 (2005) 2279 – 2289. [sent-438, score-0.107]

98 Fox, Gp-bayesfilters: Bayesian filtering using gaussian process prediction and observation models, in: IROS, 2008, pp. [sent-441, score-0.047]

99 Rasmussen, Robust filtering and smoothing with gaussian processes, IEEE Transactions on Automatic Control. [sent-452, score-0.034]

100 Wood, Bayesian mixture of splines for spatially adaptive nonparametric regression, Biometrika 89 (2002) 513–528. [sent-461, score-0.025]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('mpgp', 0.757), ('ftc', 0.29), ('senn', 0.258), ('kfgp', 0.22), ('mnlp', 0.194), ('ft', 0.175), ('nmse', 0.141), ('se', 0.139), ('particle', 0.107), ('particles', 0.105), ('pt', 0.104), ('gp', 0.095), ('xt', 0.083), ('ht', 0.078), ('kalman', 0.076), ('latitude', 0.062), ('longitude', 0.062), ('spgp', 0.062), ('yt', 0.054), ('ssgp', 0.053), ('lter', 0.053), ('kse', 0.047), ('marginalized', 0.046), ('ssm', 0.043), ('hyperparameters', 0.042), ('ltering', 0.042), ('temperature', 0.037), ('gaussian', 0.034), ('load', 0.034), ('estimation', 0.032), ('nonstationary', 0.032), ('rstly', 0.027), ('dftc', 0.026), ('lters', 0.025), ('bayesian', 0.024), ('ine', 0.022), ('wt', 0.022), ('dft', 0.021), ('hyperparameter', 0.021), ('kn', 0.02), ('hidden', 0.02), ('pendulum', 0.02), ('inputs', 0.02), ('refers', 0.02), ('tth', 0.019), ('online', 0.019), ('training', 0.019), ('covariance', 0.019), ('running', 0.018), ('gather', 0.018), ('contrary', 0.018), ('cold', 0.018), ('interrupt', 0.018), ('nonstationarity', 0.018), ('quebec', 0.018), ('rtime', 0.018), ('regression', 0.017), ('vt', 0.016), ('log', 0.016), ('recursive', 0.016), ('splines', 0.016), ('could', 0.015), ('surface', 0.015), ('carlo', 0.015), ('monte', 0.015), ('deisenroth', 0.014), ('benchmarks', 0.014), ('posterior', 0.014), ('rasmussen', 0.014), ('process', 0.013), ('tting', 0.013), ('aims', 0.012), ('importance', 0.012), ('dashed', 0.012), ('evolution', 0.012), ('subsets', 0.012), ('crosses', 0.012), ('row', 0.012), ('state', 0.011), ('freitas', 0.011), ('factorized', 0.011), ('collection', 0.011), ('spectrum', 0.011), ('clarify', 0.011), ('doucet', 0.01), ('interval', 0.01), ('shorter', 0.01), ('spatial', 0.009), ('resampling', 0.009), ('test', 0.009), ('sin', 0.009), ('experiment', 0.009), ('equation', 0.009), ('processes', 0.009), ('optimum', 0.009), ('marginal', 0.009), ('mcmc', 0.009), ('iteration', 0.009), ('spatially', 0.009), ('stationary', 0.009)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000001 11 nips-2012-A Marginalized Particle Gaussian Process Regression

Author: Yali Wang, Brahim Chaib-draa

Abstract: We present a novel marginalized particle Gaussian process (MPGP) regression, which provides a fast, accurate online Bayesian filtering framework to model the latent function. Using a state space model established by the data construction procedure, our MPGP recursively filters out the estimation of hidden function values by a Gaussian mixture. Meanwhile, it provides a new online method for training hyperparameters with a number of weighted particles. We demonstrate the estimated performance of our MPGP on both simulated and real large data sets. The results show that our MPGP is a robust estimation algorithm with high computational efficiency, which outperforms other state-of-art sparse GP methods. 1

2 0.1018164 118 nips-2012-Entangled Monte Carlo

Author: Seong-hwan Jun, Liangliang Wang, Alexandre Bouchard-côté

Abstract: We propose a novel method for scalable parallelization of SMC algorithms, Entangled Monte Carlo simulation (EMC). EMC avoids the transmission of particles between nodes, and instead reconstructs them from the particle genealogy. In particular, we show that we can reduce the communication to the particle weights for each machine while efficiently maintaining implicit global coherence of the parallel simulation. We explain methods to efficiently maintain a genealogy of particles from which any particle can be reconstructed. We demonstrate using examples from Bayesian phylogenetic that the computational gain from parallelization using EMC significantly outweighs the cost of particle reconstruction. The timing experiments show that reconstruction of particles is indeed much more efficient as compared to transmission of particles. 1

3 0.094227508 121 nips-2012-Expectation Propagation in Gaussian Process Dynamical Systems

Author: Marc Deisenroth, Shakir Mohamed

Abstract: Rich and complex time-series data, such as those generated from engineering systems, financial markets, videos, or neural recordings are now a common feature of modern data analysis. Explaining the phenomena underlying these diverse data sets requires flexible and accurate models. In this paper, we promote Gaussian process dynamical systems as a rich model class that is appropriate for such an analysis. We present a new approximate message-passing algorithm for Bayesian state estimation and inference in Gaussian process dynamical systems, a nonparametric probabilistic generalization of commonly used state-space models. We derive our message-passing algorithm using Expectation Propagation and provide a unifying perspective on message passing in general state-space models. We show that existing Gaussian filters and smoothers appear as special cases within our inference framework, and that these existing approaches can be improved upon using iterated message passing. Using both synthetic and real-world data, we demonstrate that iterated message passing can improve inference in a wide range of tasks in Bayesian state estimation, thus leading to improved predictions and more effective decision making. 1

4 0.088783607 138 nips-2012-Fully Bayesian inference for neural models with negative-binomial spiking

Author: James Scott, Jonathan W. Pillow

Abstract: Characterizing the information carried by neural populations in the brain requires accurate statistical models of neural spike responses. The negative-binomial distribution provides a convenient model for over-dispersed spike counts, that is, responses with greater-than-Poisson variability. Here we describe a powerful data-augmentation framework for fully Bayesian inference in neural models with negative-binomial spiking. Our approach relies on a recently described latentvariable representation of the negative-binomial distribution, which equates it to a Polya-gamma mixture of normals. This framework provides a tractable, conditionally Gaussian representation of the posterior that can be used to design efficient EM and Gibbs sampling based algorithms for inference in regression and dynamic factor models. We apply the model to neural data from primate retina and show that it substantially outperforms Poisson regression on held-out data, and reveals latent structure underlying spike count correlations in simultaneously recorded spike trains. 1

5 0.085398115 272 nips-2012-Practical Bayesian Optimization of Machine Learning Algorithms

Author: Jasper Snoek, Hugo Larochelle, Ryan P. Adams

Abstract: The use of machine learning algorithms frequently involves careful tuning of learning parameters and model hyperparameters. Unfortunately, this tuning is often a “black art” requiring expert experience, rules of thumb, or sometimes bruteforce search. There is therefore great appeal for automatic approaches that can optimize the performance of any given learning algorithm to the problem at hand. In this work, we consider this problem through the framework of Bayesian optimization, in which a learning algorithm’s generalization performance is modeled as a sample from a Gaussian process (GP). We show that certain choices for the nature of the GP, such as the type of kernel and the treatment of its hyperparameters, can play a crucial role in obtaining a good optimizer that can achieve expertlevel performance. We describe new algorithms that take into account the variable cost (duration) of learning algorithm experiments and that can leverage the presence of multiple cores for parallel experimentation. We show that these proposed algorithms improve on previous automatic procedures and can reach or surpass human expert-level optimization for many algorithms including latent Dirichlet allocation, structured SVMs and convolutional neural networks. 1

6 0.085387304 41 nips-2012-Ancestor Sampling for Particle Gibbs

7 0.083768912 64 nips-2012-Calibrated Elastic Regularization in Matrix Completion

8 0.078499265 218 nips-2012-Mixing Properties of Conditional Markov Chains with Unbounded Feature Functions

9 0.076897971 293 nips-2012-Relax and Randomize : From Value to Algorithms

10 0.076683111 56 nips-2012-Bayesian active learning with localized priors for fast receptive field characterization

11 0.074755549 13 nips-2012-A Nonparametric Conjugate Prior Distribution for the Maximizing Argument of a Noisy Function

12 0.070807345 33 nips-2012-Active Learning of Model Evidence Using Bayesian Quadrature

13 0.070120707 195 nips-2012-Learning visual motion in recurrent neural networks

14 0.067584768 252 nips-2012-On Multilabel Classification and Ranking with Partial Feedback

15 0.059599545 324 nips-2012-Stochastic Gradient Descent with Only One Projection

16 0.059103701 187 nips-2012-Learning curves for multi-task Gaussian process regression

17 0.051126763 233 nips-2012-Multiresolution Gaussian Processes

18 0.048144698 314 nips-2012-Slice Normalized Dynamic Markov Logic Networks

19 0.046072096 80 nips-2012-Confusion-Based Online Learning and a Passive-Aggressive Scheme

20 0.040857945 55 nips-2012-Bayesian Warped Gaussian Processes


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.092), (1, 0.007), (2, 0.055), (3, 0.141), (4, -0.008), (5, -0.054), (6, -0.004), (7, -0.033), (8, 0.005), (9, -0.089), (10, -0.086), (11, -0.031), (12, -0.038), (13, 0.025), (14, -0.051), (15, 0.053), (16, -0.008), (17, 0.053), (18, 0.019), (19, -0.032), (20, -0.029), (21, 0.004), (22, 0.007), (23, -0.068), (24, -0.051), (25, 0.04), (26, -0.081), (27, 0.013), (28, 0.036), (29, -0.125), (30, 0.008), (31, 0.009), (32, 0.038), (33, 0.052), (34, -0.001), (35, 0.074), (36, -0.01), (37, 0.107), (38, -0.067), (39, 0.007), (40, -0.004), (41, 0.03), (42, -0.026), (43, 0.051), (44, -0.009), (45, 0.108), (46, 0.028), (47, -0.083), (48, -0.026), (49, -0.018)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.90081722 11 nips-2012-A Marginalized Particle Gaussian Process Regression

Author: Yali Wang, Brahim Chaib-draa

Abstract: We present a novel marginalized particle Gaussian process (MPGP) regression, which provides a fast, accurate online Bayesian filtering framework to model the latent function. Using a state space model established by the data construction procedure, our MPGP recursively filters out the estimation of hidden function values by a Gaussian mixture. Meanwhile, it provides a new online method for training hyperparameters with a number of weighted particles. We demonstrate the estimated performance of our MPGP on both simulated and real large data sets. The results show that our MPGP is a robust estimation algorithm with high computational efficiency, which outperforms other state-of-art sparse GP methods. 1

2 0.5629077 41 nips-2012-Ancestor Sampling for Particle Gibbs

Author: Fredrik Lindsten, Thomas Schön, Michael I. Jordan

Abstract: We present a novel method in the family of particle MCMC methods that we refer to as particle Gibbs with ancestor sampling (PG-AS). Similarly to the existing PG with backward simulation (PG-BS) procedure, we use backward sampling to (considerably) improve the mixing of the PG kernel. Instead of using separate forward and backward sweeps as in PG-BS, however, we achieve the same effect in a single forward sweep. We apply the PG-AS framework to the challenging class of non-Markovian state-space models. We develop a truncation strategy of these models that is applicable in principle to any backward-simulation-based method, but which is particularly well suited to the PG-AS framework. In particular, as we show in a simulation study, PG-AS can yield an order-of-magnitude improved accuracy relative to PG-BS due to its robustness to the truncation error. Several application examples are discussed, including Rao-Blackwellized particle smoothing and inference in degenerate state-space models. 1

3 0.51093221 118 nips-2012-Entangled Monte Carlo

Author: Seong-hwan Jun, Liangliang Wang, Alexandre Bouchard-côté

Abstract: We propose a novel method for scalable parallelization of SMC algorithms, Entangled Monte Carlo simulation (EMC). EMC avoids the transmission of particles between nodes, and instead reconstructs them from the particle genealogy. In particular, we show that we can reduce the communication to the particle weights for each machine while efficiently maintaining implicit global coherence of the parallel simulation. We explain methods to efficiently maintain a genealogy of particles from which any particle can be reconstructed. We demonstrate using examples from Bayesian phylogenetic that the computational gain from parallelization using EMC significantly outweighs the cost of particle reconstruction. The timing experiments show that reconstruction of particles is indeed much more efficient as compared to transmission of particles. 1

4 0.47122321 121 nips-2012-Expectation Propagation in Gaussian Process Dynamical Systems

Author: Marc Deisenroth, Shakir Mohamed

Abstract: Rich and complex time-series data, such as those generated from engineering systems, financial markets, videos, or neural recordings are now a common feature of modern data analysis. Explaining the phenomena underlying these diverse data sets requires flexible and accurate models. In this paper, we promote Gaussian process dynamical systems as a rich model class that is appropriate for such an analysis. We present a new approximate message-passing algorithm for Bayesian state estimation and inference in Gaussian process dynamical systems, a nonparametric probabilistic generalization of commonly used state-space models. We derive our message-passing algorithm using Expectation Propagation and provide a unifying perspective on message passing in general state-space models. We show that existing Gaussian filters and smoothers appear as special cases within our inference framework, and that these existing approaches can be improved upon using iterated message passing. Using both synthetic and real-world data, we demonstrate that iterated message passing can improve inference in a wide range of tasks in Bayesian state estimation, thus leading to improved predictions and more effective decision making. 1

5 0.43036196 272 nips-2012-Practical Bayesian Optimization of Machine Learning Algorithms

Author: Jasper Snoek, Hugo Larochelle, Ryan P. Adams

Abstract: The use of machine learning algorithms frequently involves careful tuning of learning parameters and model hyperparameters. Unfortunately, this tuning is often a “black art” requiring expert experience, rules of thumb, or sometimes bruteforce search. There is therefore great appeal for automatic approaches that can optimize the performance of any given learning algorithm to the problem at hand. In this work, we consider this problem through the framework of Bayesian optimization, in which a learning algorithm’s generalization performance is modeled as a sample from a Gaussian process (GP). We show that certain choices for the nature of the GP, such as the type of kernel and the treatment of its hyperparameters, can play a crucial role in obtaining a good optimizer that can achieve expertlevel performance. We describe new algorithms that take into account the variable cost (duration) of learning algorithm experiments and that can leverage the presence of multiple cores for parallel experimentation. We show that these proposed algorithms improve on previous automatic procedures and can reach or surpass human expert-level optimization for many algorithms including latent Dirichlet allocation, structured SVMs and convolutional neural networks. 1

6 0.42845261 293 nips-2012-Relax and Randomize : From Value to Algorithms

7 0.41924223 233 nips-2012-Multiresolution Gaussian Processes

8 0.40301701 13 nips-2012-A Nonparametric Conjugate Prior Distribution for the Maximizing Argument of a Noisy Function

9 0.39722449 64 nips-2012-Calibrated Elastic Regularization in Matrix Completion

10 0.3970806 55 nips-2012-Bayesian Warped Gaussian Processes

11 0.39686537 66 nips-2012-Causal discovery with scale-mixture model for spatiotemporal variance dependencies

12 0.38701817 138 nips-2012-Fully Bayesian inference for neural models with negative-binomial spiking

13 0.38064376 218 nips-2012-Mixing Properties of Conditional Markov Chains with Unbounded Feature Functions

14 0.37784222 102 nips-2012-Distributed Non-Stochastic Experts

15 0.36660287 33 nips-2012-Active Learning of Model Evidence Using Bayesian Quadrature

16 0.36433429 258 nips-2012-Online L1-Dictionary Learning with Application to Novel Document Detection

17 0.35584262 56 nips-2012-Bayesian active learning with localized priors for fast receptive field characterization

18 0.33994481 187 nips-2012-Learning curves for multi-task Gaussian process regression

19 0.33857086 80 nips-2012-Confusion-Based Online Learning and a Passive-Aggressive Scheme

20 0.32608992 205 nips-2012-MCMC for continuous-time discrete-state systems


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.021), (10, 0.389), (21, 0.024), (36, 0.011), (38, 0.052), (39, 0.013), (42, 0.014), (54, 0.024), (55, 0.014), (74, 0.018), (76, 0.136), (80, 0.11), (92, 0.041)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.6993311 11 nips-2012-A Marginalized Particle Gaussian Process Regression

Author: Yali Wang, Brahim Chaib-draa

Abstract: We present a novel marginalized particle Gaussian process (MPGP) regression, which provides a fast, accurate online Bayesian filtering framework to model the latent function. Using a state space model established by the data construction procedure, our MPGP recursively filters out the estimation of hidden function values by a Gaussian mixture. Meanwhile, it provides a new online method for training hyperparameters with a number of weighted particles. We demonstrate the estimated performance of our MPGP on both simulated and real large data sets. The results show that our MPGP is a robust estimation algorithm with high computational efficiency, which outperforms other state-of-art sparse GP methods. 1

2 0.58942759 73 nips-2012-Coding efficiency and detectability of rate fluctuations with non-Poisson neuronal firing

Author: Shinsuke Koyama

Abstract: Statistical features of neuronal spike trains are known to be non-Poisson. Here, we investigate the extent to which the non-Poissonian feature affects the efficiency of transmitting information on fluctuating firing rates. For this purpose, we introduce the Kullback-Leibler (KL) divergence as a measure of the efficiency of information encoding, and assume that spike trains are generated by time-rescaled renewal processes. We show that the KL divergence determines the lower bound of the degree of rate fluctuations below which the temporal variation of the firing rates is undetectable from sparse data. We also show that the KL divergence, as well as the lower bound, depends not only on the variability of spikes in terms of the coefficient of variation, but also significantly on the higher-order moments of interspike interval (ISI) distributions. We examine three specific models that are commonly used for describing the stochastic nature of spikes (the gamma, inverse Gaussian (IG) and lognormal ISI distributions), and find that the time-rescaled renewal process with the IG distribution achieves the largest KL divergence, followed by the lognormal and gamma distributions.

3 0.58600301 173 nips-2012-Learned Prioritization for Trading Off Accuracy and Speed

Author: Jiarong Jiang, Adam Teichert, Jason Eisner, Hal Daume

Abstract: Users want inference to be both fast and accurate, but quality often comes at the cost of speed. The field has experimented with approximate inference algorithms that make different speed-accuracy tradeoffs (for particular problems and datasets). We aim to explore this space automatically, focusing here on the case of agenda-based syntactic parsing [12]. Unfortunately, off-the-shelf reinforcement learning techniques fail to learn good policies: the state space is simply too large to explore naively. An attempt to counteract this by applying imitation learning algorithms also fails: the “teacher” follows a far better policy than anything in our learner’s policy space, free of the speed-accuracy tradeoff that arises when oracle information is unavailable, and thus largely insensitive to the known reward functfion. We propose a hybrid reinforcement/apprenticeship learning algorithm that learns to speed up an initial policy, trading off accuracy for speed according to various settings of a speed term in the loss function. 1

4 0.46553314 218 nips-2012-Mixing Properties of Conditional Markov Chains with Unbounded Feature Functions

Author: Mathieu Sinn, Bei Chen

Abstract: Conditional Markov Chains (also known as Linear-Chain Conditional Random Fields in the literature) are a versatile class of discriminative models for the distribution of a sequence of hidden states conditional on a sequence of observable variables. Large-sample properties of Conditional Markov Chains have been first studied in [1]. The paper extends this work in two directions: first, mixing properties of models with unbounded feature functions are being established; second, necessary conditions for model identifiability and the uniqueness of maximum likelihood estimates are being given. 1

5 0.44337374 232 nips-2012-Multiplicative Forests for Continuous-Time Processes

Author: Jeremy Weiss, Sriraam Natarajan, David Page

Abstract: Learning temporal dependencies between variables over continuous time is an important and challenging task. Continuous-time Bayesian networks effectively model such processes but are limited by the number of conditional intensity matrices, which grows exponentially in the number of parents per variable. We develop a partition-based representation using regression trees and forests whose parameter spaces grow linearly in the number of node splits. Using a multiplicative assumption we show how to update the forest likelihood in closed form, producing efficient model updates. Our results show multiplicative forests can be learned from few temporal trajectories with large gains in performance and scalability.

6 0.43382761 315 nips-2012-Slice sampling normalized kernel-weighted completely random measure mixture models

7 0.43277586 121 nips-2012-Expectation Propagation in Gaussian Process Dynamical Systems

8 0.43098134 197 nips-2012-Learning with Recursive Perceptual Representations

9 0.4307974 279 nips-2012-Projection Retrieval for Classification

10 0.42545018 280 nips-2012-Proper losses for learning from partial labels

11 0.42468256 321 nips-2012-Spectral learning of linear dynamics from generalised-linear observations with application to neural population data

12 0.424173 251 nips-2012-On Lifting the Gibbs Sampling Algorithm

13 0.42329943 171 nips-2012-Latent Coincidence Analysis: A Hidden Variable Model for Distance Metric Learning

14 0.42219225 200 nips-2012-Local Supervised Learning through Space Partitioning

15 0.42152867 74 nips-2012-Collaborative Gaussian Processes for Preference Learning

16 0.42047983 41 nips-2012-Ancestor Sampling for Particle Gibbs

17 0.42001379 229 nips-2012-Multimodal Learning with Deep Boltzmann Machines

18 0.41982022 79 nips-2012-Compressive neural representation of sparse, high-dimensional probabilities

19 0.41925302 103 nips-2012-Distributed Probabilistic Learning for Camera Networks with Missing Data

20 0.41905633 188 nips-2012-Learning from Distributions via Support Measure Machines