nips nips2012 nips2012-336 knowledge-graph by maker-knowledge-mining

336 nips-2012-The Coloured Noise Expansion and Parameter Estimation of Diffusion Processes


Source: pdf

Author: Simon Lyons, Amos J. Storkey, Simo Särkkä

Abstract: Stochastic differential equations (SDE) are a natural tool for modelling systems that are inherently noisy or contain uncertainties that can be modelled as stochastic processes. Crucial to the process of using SDE to build mathematical models is the ability to estimate parameters of those models from observed data. Over the past few decades, significant progress has been made on this problem, but we are still far from having a definitive solution. We describe a novel method of approximating a diffusion process that we show to be useful in Markov chain Monte-Carlo (MCMC) inference algorithms. We take the ‘white’ noise that drives a diffusion process and decompose it into two terms. The first is a ‘coloured noise’ term that can be deterministically controlled by a set of auxilliary variables. The second term is small and enables us to form a linear Gaussian ‘small noise’ approximation. The decomposition allows us to take a diffusion process of interest and cast it in a form that is amenable to sampling by MCMC methods. We explain why many state-of-the-art inference methods fail on highly nonlinear inference problems, and we demonstrate experimentally that our method performs well in such situations. Our results show that this method is a promising new tool for use in inference and parameter estimation problems. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 uk Abstract Stochastic differential equations (SDE) are a natural tool for modelling systems that are inherently noisy or contain uncertainties that can be modelled as stochastic processes. [sent-13, score-0.34]

2 We describe a novel method of approximating a diffusion process that we show to be useful in Markov chain Monte-Carlo (MCMC) inference algorithms. [sent-16, score-0.581]

3 We take the ‘white’ noise that drives a diffusion process and decompose it into two terms. [sent-17, score-0.626]

4 The decomposition allows us to take a diffusion process of interest and cast it in a form that is amenable to sampling by MCMC methods. [sent-20, score-0.536]

5 We explain why many state-of-the-art inference methods fail on highly nonlinear inference problems, and we demonstrate experimentally that our method performs well in such situations. [sent-21, score-0.202]

6 1 Introduction Diffusion processes are a flexible and useful tool in stochastic modelling. [sent-23, score-0.176]

7 Many important real world systems are currently modelled and best understood in terms of stochastic differential equations in general and diffusions in particular. [sent-24, score-0.428]

8 The analysis of diffusions dates back to Feller and Kolmogorov, who studied them as the scaling limits of certain Markov processes (see [8]). [sent-26, score-0.185]

9 The theory of diffusion processes was revolutionised by Itˆ , who interpreted a diffusion process as the solution to a stochastic differential equation [9, o 10]. [sent-27, score-1.328]

10 This viewpoint allows one to see a diffusion process as the randomised counterpart of an ordinary differential equation. [sent-28, score-0.712]

11 One can argue that stochastic differential equations are the natural 1 tool for modelling continuously evolving systems of real valued quantities that are subject to noise or stochastic influences. [sent-29, score-0.472]

12 These equations are goverened by a set of input parameters (for example particle masses, reaction rates, or more general constants of proportionality) that determine the behaviour of the system. [sent-31, score-0.233]

13 We would like to know what the nature of this diffusion is. [sent-35, score-0.467]

14 In practice, this is computationally prohibitive: it is necessary to solve a partial differential equation known as the Fokker-Planck equation (see [11]) in order to find the transition density of the diffusion of interest. [sent-38, score-0.693]

15 In this paper, we propose a novel approximation for a nonlinear diffusion process X. [sent-40, score-0.692]

16 One heuristic way of thinking about a diffusion is as an ordinary differential equation that is perturbed by white noise. [sent-41, score-0.757]

17 We demonstrate that one can replace the white noise by a ‘coloured’ approximation without inducing much error. [sent-42, score-0.207]

18 The nature of the coloured noise expansion method enables us to control the behaviour of the diffusion over various length-scales. [sent-43, score-1.004]

19 This allows us to produce samples from the diffusion process that are consistent with observed data. [sent-44, score-0.536]

20 The main contributions of this paper are: • Novel development of a method for sampling from the time-t marginal distribution of a diffusion process based on a ‘coloured’ approximation of white noise. [sent-46, score-0.653]

21 • Demonstration that this approximation is a powerful and scalable tool for making parameter estimation feasible for general diffusions at minimal cost. [sent-47, score-0.235]

22 In Section 4, we discuss the coloured noise expansion and its use in controlling the behaviour of a diffusion process. [sent-50, score-1.004]

23 2 Parametric Diffusion Processes In this section we develop the basic notation and formalism for the diffusion processes used in this work. [sent-53, score-0.528]

24 First, we assume our data are generated by observing a k-dimensional diffusion processes with dynamics dXt = aθ (Xt )dt + Bθ dWt , X0 ∼ p(x0 ), (1) where the initial condition is drawn from some known distribution. [sent-54, score-0.528]

25 The driving noise W is a d-dimensional Brownian motion, and the equation is interpreted in the Itˆ sense. [sent-62, score-0.166]

26 That is, Yti = Xti + ti , (2) ti ∼ N (0, Σi ) We use the notation X to refer to the entire sample path of the diffusion, and Xt to denote the value of the process at time t. [sent-64, score-0.355]

27 In situations where this is not explicitly the case, one can often hope to reduce a diffusion to this form via the Lamperti transform. [sent-71, score-0.467]

28 A¨t-Sahalia [12] characterises ı the set of multivariate diffusions to which this transform can be applied. [sent-73, score-0.15]

29 2 3 Background Work Most approaches to parameter estimation of diffusion processes rely on the Monte-Carlo approximation. [sent-74, score-0.559]

30 [13] [14] employ a method based on rejection sampling to estimate parameters without introducing any discretisation error. [sent-76, score-0.213]

31 Roughly speaking, Gibbs samplers that exist in the literature alternate between drawing samples from some representation of the diffusion process X conditional on parameters θ, and samples from θ conditional on the current sample path of X. [sent-79, score-0.646]

32 The usual approach to the consistency issue is to make a proposal by conditioning a linear diffusion to hit some neighbourhood of the observation Yk , then to make a correction via a rejection sampling [18] or a Metropolis-Hastings [16] step. [sent-81, score-0.767]

33 However, as the inter-observation time grows, the qualitative difference between linear and nonlinear diffusions gets progressively more pronounced, and the rate of rejection grows accordingly. [sent-82, score-0.314]

34 Figure 1 shows the disparity between a sample from a nonlinear process and a sample from the linear proposal. [sent-83, score-0.181]

35 Sample path with noisy observations Nonlinear sample path and proposal 2 1 1 X(t) X(t) 0 −1 0 −1 −2 −2 −3 0 1 2 −3 0 3 5 t 10 t 15 20 (a) (b) Figure 1: (a) Sample path of a double well process (see equation (18)) with α = 2, γ = 2. [sent-87, score-0.483]

36 Current Gibbs samplers use linear proposals (dashed red line) with a rejection step to draw conditioned nonlinear paths. [sent-89, score-0.34]

37 In this case, the behaviour of the proposal is very different to that of the target, and the rate of rejection is high. [sent-90, score-0.269]

38 (b) Sample path of a double well process (solid blue line) with noisy observations (red dots). [sent-91, score-0.207]

39 Other approaches include variational methods [23, 24] that can compute continuous time Gaussian process approximations to more general stochastic differential systems, as well as various non-linear Kalman filtering and smoothing based approximations [25, 26, 27] . [sent-100, score-0.292]

40 4 Coloured Noise Expansions and Brownian Motion We now introduce a method of approximating a nonlinear diffusion that allows us to gain a considerable amount of control over the behaviour of the process. [sent-101, score-0.655]

41 Similar methods have been used 3 for stratified sampling of diffusion processes [28] and the solution of stochastic partial differential equations [29] . [sent-102, score-0.795]

42 One of the major challenges of using MCMC methods for parameter estimation in the present context is that it is typically very difficult to draw samples from a diffusion process conditional on observed data. [sent-103, score-0.6]

43 Our approximation separates the diffusion process X into the sum of a linear and nonlinear component. [sent-106, score-0.692]

44 Heuristically, one can think of the random process that drives the process defined in equation (1) as white noise. [sent-110, score-0.252]

45 In our approximation, we project this white noise into an N -dimensional subspace of L2 [0, T ], the Hilbert space of square-integrable functions defined on the interval [0, T ]. [sent-111, score-0.163]

46 This gives a ‘coloured noise’ process that approaches white noise asymptotically as N → ∞. [sent-112, score-0.232]

47 The coloured noise process is then used to drive an approximation of (1). [sent-113, score-0.485]

48 We can choose the space into which to project the white noise in such a way that we will gain some control over its behaviour. [sent-114, score-0.163]

49 This is analagous to the way that Fourier analysis allows us to manipulate properties of signals Recall that a standard Brownian motion on the interval [0, T ] is a one-dimentional Gaussian process with zero mean and covariance function k(s, t) = min{s, t}. [sent-115, score-0.152]

50 We can substitute this approximation into equation (1), which gives dXNL t = aθ (XNL ) + Bθ t dt N Φi Zi , XNL ∼ p(x0 ), 0 (7) i=1 where Φi is the diagonal d × d matrix with entries (φi1 , . [sent-129, score-0.17]

51 This derivation is useful because equation (7) gives us an alternative to the Euler-Maruyama discretisation for sampling approximately from the time-t marginal distribution of a diffusion process. [sent-136, score-0.602]

52 We 4 draw coefficients Zij from a standard normal distribution, and solve the appropriate vector-valued ordinary differential equation. [sent-137, score-0.209]

53 While the Euler discretisation is the de facto standard method for numerical approximation of SDE, other methods do exist. [sent-138, score-0.172]

54 One must typically employ a fine discretisation to get a good approximation to the true diffusion process. [sent-141, score-0.646]

55 Empirically, we find that one needs far fewer Gaussian inputs Zi for an accurate representation of XT using the coloured noise approximation. [sent-142, score-0.408]

56 For example, Corlay and Pages [28] employ related ideas to conduct stratified sampling of a diffusion process. [sent-144, score-0.508]

57 The seperation of behaviours across coefficients gives us a means to obtain fine-grained control over the behaviour of a diffusion process within a Metropolis-Hastings algorithm. [sent-149, score-0.612]

58 t t (9) Taylor expanding the drift term around XNL , we see that to first order, dXt ≈ = aθ XNL + Ja (XNL )XC dt + Bθ dWt t t t ˆ ˆ aθ XNL + Bθ dWt dt + Ja (XNL )XC dt + Bθ dWt − dWt . [sent-154, score-0.282]

59 The dynamics of XL satisfy dXL = Ja (XNL )XL dt + Bθ dRt , t t t XL = 0, 0 (11) ˆ where the driving noise is the ‘residual’ term R = W − W. [sent-158, score-0.21]

60 First, we compute a numerical approximation to the solution of the homogenous matrix-valued equation d Ψ(t) = Ja (XNL )Ψ(t), t dt Ψ(0) = In . [sent-160, score-0.204]

61 (14) 0 0 The process XNL is designed to capture the most significant nonlinear features of the original diffusion X, while the linear process XL corrects for the truncation of the sum (6), and can be understood using tools from the theory of Gaussian processes. [sent-164, score-0.717]

62 One can think of the linear term as the result of a ‘small-noise’ expansion about the nonlinear trajectory. [sent-165, score-0.201]

63 5 Parameter Estimation In this section, we describe a novel modification of the Gibbs sampler that does not suffer the drawbacks of the linear proposal strategy. [sent-169, score-0.147]

64 In Section 6, we demonstrate that for highly nonlinear problems it will perform significantly better than standard methods because of the nonlinear component of our approximation. [sent-170, score-0.224]

65 Suppose for now that we make a single noiseless observation at time t1 = T (for ease of notation, we will assume that observations are uniformly spaced through time with ti+1 − ti = T , though this is not necessary). [sent-171, score-0.15]

66 We draw Z(i)∗ from the proposal distribution, and compute XNL∗ i ∗ with initial condition Yi−1 . [sent-180, score-0.148]

67 The acceptance probability for this move is n α=1∧ ∗ N (Yi | XNL∗ , ki (T, T ))p(θ∗ )p(θ∗ → θ) i , N (Yi | XNL , ki (T, T ))p(θ)p(θ → θ∗ ) i i=1 (17) ∗ where XNL∗ and ki (T, T ) are computed using the proposed value of θ∗ . [sent-183, score-0.168]

68 i We noted earlier that when j is large, Zj governs the small-time oscillations of the diffusion process. [sent-184, score-0.505]

69 For this reason, we employ a Gaussian random walk proposal in Z1 with stepsize σRW = . [sent-187, score-0.196]

70 If the variance of the observation noise is high, it may be more efficient to target the joint posterior distribution p θ, {Zi , XL } | Y1:n . [sent-197, score-0.164]

71 i 1:N 6 Numerical Experiments The double-well diffusion is a widely-used benchmark for nonlinear inference problems [24, 32, 33, 34]. [sent-198, score-0.624]

72 It possesses nonlinear features that are sufficient to demonstrate the shortcomings of some existing inference methods, and how our approach overcomes these issues. [sent-200, score-0.157]

73 The dynamics of the process are given by 2 dXt = αXt γ 2 − Xt dt + BdWt . [sent-201, score-0.154]

74 Figure 1(b) shows a trajectory of a double-well diffusion over 20 units of time, with observations at times {1, 2, . [sent-205, score-0.504]

75 As we mentioned earlier, particle MCMC performs well in low-dimensional inference problems. [sent-212, score-0.158]

76 For this reason, the results of a particle MCMC inference algorithm (with N = 1, 000) particles are used as ’ground truth’. [sent-213, score-0.185]

77 We compare our Gibbs sampler to that of Golightly and Wilkinson [15], for which we use an Euler discretisation with stepsize ∆t = . [sent-216, score-0.166]

78 The broken red line is the output of the linear proposal method, and the broken and dotted blue line is the density estimate from the coloured noise expansion method. [sent-238, score-0.636]

79 Gibbs samplers that have been used in the past rely on making proposals by conditioning a linear diffusion to hit a target, and subsequently accepting or rejecting those proposals. [sent-240, score-0.617]

80 We make noisy observations with ti − ti−1 = 3 and Σ = . [sent-245, score-0.15]

81 From our previous discussion, one might expect the linear proposal strategy to perform poorly in this more nonlinear setting. [sent-248, score-0.227]

82 As in the previous experiment, we used a linear proposal Gibbs sampler with Euler stepsize dt = 0. [sent-250, score-0.272]

83 On the other hand, the coloured noise expansion method used N = 7 Gaussian inputs with a linear correction and was able to approximate the posterior accurately. [sent-254, score-0.617]

84 5 0 1 3 (b) Coloured noise expansion method 1. [sent-269, score-0.179]

85 5 3 (c) Linear proposal method Figure 3: p(γ|Y1:10 , B, α) after ten observations with a relatively large inter-observation time. [sent-271, score-0.152]

86 The coloured noise expansion method matches the ground truth, whereas the linear proposal method is inconsistent with the data. [sent-274, score-0.603]

87 Our inference method avoids the linear correction step, instead targeting the posterior over input variables directly. [sent-276, score-0.165]

88 We used a Taylor expansion to compute the covariance of the correction term. [sent-285, score-0.19]

89 In this paper, we restricted our attention to processes with a state-independent diffusion coefficient so that the covariance of the correction term could be computed. [sent-287, score-0.629]

90 We may be able to extend this methodology to process with state-dependent noise - certainly one could achieve this by taking a 0-th order Taylor expansion about XNL . [sent-288, score-0.248]

91 Variational Bayesian identification and prediction of stochastic nonlinear dynamic causal models. [sent-325, score-0.191]

92 Monte-Carlo maximum likelihood estimation for discretely observed diffusion processes. [sent-362, score-0.554]

93 Exact and computationally efficient likelihood-based estimation for discretely observed diffusion processes (with discussion). [sent-370, score-0.615]

94 Bayesian inference for nonlinear multivariate diffusion models observed with error. [sent-376, score-0.65]

95 Numerical techniques for maximum likelihood estimation of continuoustime diffusion processes (with comments). [sent-396, score-0.559]

96 Retrospective exact simulation of diffusion sample paths with applications. [sent-403, score-0.467]

97 Particle filters for stochastic differential equations of nonlinear diffusions. [sent-410, score-0.379]

98 Wiener chaos expansion and numerical solutions of stochastic partial differential equations. [sent-462, score-0.346]

99 A new adaptive Runge-Kutta method for stochastic differential equations. [sent-475, score-0.223]

100 Parameter estimation of nonlinear stochastic differential equations: simulated maximum likelihood versus extended Kalman filter and Itˆ -Taylor expansion. [sent-488, score-0.366]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('xnl', 0.512), ('diffusion', 0.467), ('coloured', 0.282), ('differential', 0.144), ('xl', 0.124), ('diffusions', 0.124), ('brownian', 0.119), ('proposal', 0.115), ('particle', 0.113), ('ti', 0.113), ('dwt', 0.113), ('nonlinear', 0.112), ('discretisation', 0.094), ('noise', 0.09), ('expansion', 0.089), ('ja', 0.087), ('dws', 0.085), ('dt', 0.085), ('stochastic', 0.079), ('rejection', 0.078), ('behaviour', 0.076), ('beskos', 0.075), ('correction', 0.074), ('white', 0.073), ('zi', 0.071), ('dxt', 0.069), ('process', 0.069), ('proposals', 0.067), ('cornford', 0.064), ('xc', 0.063), ('processes', 0.061), ('path', 0.06), ('gibbs', 0.059), ('mcmc', 0.057), ('papaspiliopoulos', 0.056), ('golightly', 0.056), ('discretely', 0.056), ('sde', 0.056), ('motion', 0.056), ('ki', 0.056), ('opper', 0.055), ('archambeau', 0.052), ('euler', 0.052), ('edinburgh', 0.051), ('samplers', 0.05), ('chemical', 0.049), ('du', 0.047), ('posterior', 0.046), ('inference', 0.045), ('strati', 0.045), ('approximation', 0.044), ('equations', 0.044), ('expansions', 0.043), ('corlay', 0.043), ('crichton', 0.043), ('kloeden', 0.043), ('lyons', 0.043), ('wti', 0.043), ('double', 0.041), ('equation', 0.041), ('employ', 0.041), ('stepsize', 0.04), ('rkk', 0.038), ('oscillations', 0.038), ('wilkinson', 0.038), ('observations', 0.037), ('modelled', 0.037), ('kalman', 0.037), ('gaussian', 0.036), ('inputs', 0.036), ('tool', 0.036), ('physics', 0.036), ('yi', 0.036), ('chib', 0.035), ('driving', 0.035), ('numerical', 0.034), ('draw', 0.033), ('street', 0.033), ('weather', 0.033), ('hit', 0.033), ('ordinary', 0.032), ('sampler', 0.032), ('durham', 0.031), ('estimation', 0.031), ('fourier', 0.03), ('zj', 0.03), ('wt', 0.03), ('drew', 0.03), ('broken', 0.03), ('increments', 0.03), ('orthonormal', 0.03), ('target', 0.028), ('proportionality', 0.028), ('covariance', 0.027), ('ltering', 0.027), ('particles', 0.027), ('drift', 0.027), ('ground', 0.027), ('multivariate', 0.026), ('taylor', 0.026)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000007 336 nips-2012-The Coloured Noise Expansion and Parameter Estimation of Diffusion Processes

Author: Simon Lyons, Amos J. Storkey, Simo Särkkä

Abstract: Stochastic differential equations (SDE) are a natural tool for modelling systems that are inherently noisy or contain uncertainties that can be modelled as stochastic processes. Crucial to the process of using SDE to build mathematical models is the ability to estimate parameters of those models from observed data. Over the past few decades, significant progress has been made on this problem, but we are still far from having a definitive solution. We describe a novel method of approximating a diffusion process that we show to be useful in Markov chain Monte-Carlo (MCMC) inference algorithms. We take the ‘white’ noise that drives a diffusion process and decompose it into two terms. The first is a ‘coloured noise’ term that can be deterministically controlled by a set of auxilliary variables. The second term is small and enables us to form a linear Gaussian ‘small noise’ approximation. The decomposition allows us to take a diffusion process of interest and cast it in a form that is amenable to sampling by MCMC methods. We explain why many state-of-the-art inference methods fail on highly nonlinear inference problems, and we demonstrate experimentally that our method performs well in such situations. Our results show that this method is a promising new tool for use in inference and parameter estimation problems. 1

2 0.20507306 97 nips-2012-Diffusion Decision Making for Adaptive k-Nearest Neighbor Classification

Author: Yung-kyun Noh, Frank Park, Daniel D. Lee

Abstract: This paper sheds light on some fundamental connections of the diffusion decision making model of neuroscience and cognitive psychology with k-nearest neighbor classification. We show that conventional k-nearest neighbor classification can be viewed as a special problem of the diffusion decision model in the asymptotic situation. By applying the optimal strategy associated with the diffusion decision model, an adaptive rule is developed for determining appropriate values of k in knearest neighbor classification. Making use of the sequential probability ratio test (SPRT) and Bayesian analysis, we propose five different criteria for adaptively acquiring nearest neighbors. Experiments with both synthetic and real datasets demonstrate the effectiveness of our classification criteria. 1

3 0.16732825 182 nips-2012-Learning Networks of Heterogeneous Influence

Author: Nan Du, Le Song, Ming Yuan, Alex J. Smola

Abstract: Information, disease, and influence diffuse over networks of entities in both natural systems and human society. Analyzing these transmission networks plays an important role in understanding the diffusion processes and predicting future events. However, the underlying transmission networks are often hidden and incomplete, and we observe only the time stamps when cascades of events happen. In this paper, we address the challenging problem of uncovering the hidden network only from the cascades. The structure discovery problem is complicated by the fact that the influence between networked entities is heterogeneous, which can not be described by a simple parametric model. Therefore, we propose a kernelbased method which can capture a diverse range of different types of influence without any prior assumption. In both synthetic and real cascade data, we show that our model can better recover the underlying diffusion network and drastically improve the estimation of the transmission functions among networked entities. 1

4 0.13553545 318 nips-2012-Sparse Approximate Manifolds for Differential Geometric MCMC

Author: Ben Calderhead, Mátyás A. Sustik

Abstract: One of the enduring challenges in Markov chain Monte Carlo methodology is the development of proposal mechanisms to make moves distant from the current point, that are accepted with high probability and at low computational cost. The recent introduction of locally adaptive MCMC methods based on the natural underlying Riemannian geometry of such models goes some way to alleviating these problems for certain classes of models for which the metric tensor is analytically tractable, however computational efficiency is not assured due to the necessity of potentially high-dimensional matrix operations at each iteration. In this paper we firstly investigate a sampling-based approach for approximating the metric tensor and suggest a valid MCMC algorithm that extends the applicability of Riemannian Manifold MCMC methods to statistical models that do not admit an analytically computable metric tensor. Secondly, we show how the approximation scheme we consider naturally motivates the use of 1 regularisation to improve estimates and obtain a sparse approximate inverse of the metric, which enables stable and sparse approximations of the local geometry to be made. We demonstrate the application of this algorithm for inferring the parameters of a realistic system of ordinary differential equations using a biologically motivated robust Student-t error model, for which the Expected Fisher Information is analytically intractable. 1

5 0.11794846 41 nips-2012-Ancestor Sampling for Particle Gibbs

Author: Fredrik Lindsten, Thomas Schön, Michael I. Jordan

Abstract: We present a novel method in the family of particle MCMC methods that we refer to as particle Gibbs with ancestor sampling (PG-AS). Similarly to the existing PG with backward simulation (PG-BS) procedure, we use backward sampling to (considerably) improve the mixing of the PG kernel. Instead of using separate forward and backward sweeps as in PG-BS, however, we achieve the same effect in a single forward sweep. We apply the PG-AS framework to the challenging class of non-Markovian state-space models. We develop a truncation strategy of these models that is applicable in principle to any backward-simulation-based method, but which is particularly well suited to the PG-AS framework. In particular, as we show in a simulation study, PG-AS can yield an order-of-magnitude improved accuracy relative to PG-BS due to its robustness to the truncation error. Several application examples are discussed, including Rao-Blackwellized particle smoothing and inference in degenerate state-space models. 1

6 0.11594381 333 nips-2012-Synchronization can Control Regularization in Neural Systems via Correlated Noise Processes

7 0.10896727 339 nips-2012-The Time-Marginalized Coalescent Prior for Hierarchical Clustering

8 0.0972303 118 nips-2012-Entangled Monte Carlo

9 0.092739053 13 nips-2012-A Nonparametric Conjugate Prior Distribution for the Maximizing Argument of a Noisy Function

10 0.090033263 126 nips-2012-FastEx: Hash Clustering with Exponential Families

11 0.085617028 205 nips-2012-MCMC for continuous-time discrete-state systems

12 0.082440302 82 nips-2012-Continuous Relaxations for Discrete Hamiltonian Monte Carlo

13 0.081913643 56 nips-2012-Bayesian active learning with localized priors for fast receptive field characterization

14 0.077640764 140 nips-2012-Fusion with Diffusion for Robust Visual Tracking

15 0.077296957 355 nips-2012-Truncation-free Online Variational Inference for Bayesian Nonparametric Models

16 0.076778434 73 nips-2012-Coding efficiency and detectability of rate fluctuations with non-Poisson neuronal firing

17 0.075551398 121 nips-2012-Expectation Propagation in Gaussian Process Dynamical Systems

18 0.072928049 107 nips-2012-Effective Split-Merge Monte Carlo Methods for Nonparametric Models of Sequential Data

19 0.065471947 325 nips-2012-Stochastic optimization and sparse statistical recovery: Optimal algorithms for high dimensions

20 0.064327218 305 nips-2012-Selective Labeling via Error Bound Minimization


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.179), (1, 0.024), (2, 0.02), (3, 0.065), (4, -0.115), (5, 0.009), (6, -0.003), (7, 0.022), (8, 0.003), (9, -0.048), (10, -0.067), (11, -0.089), (12, 0.029), (13, -0.082), (14, -0.093), (15, -0.015), (16, -0.105), (17, -0.146), (18, 0.082), (19, -0.034), (20, 0.131), (21, 0.085), (22, -0.028), (23, 0.004), (24, -0.049), (25, -0.114), (26, -0.058), (27, -0.004), (28, -0.1), (29, -0.14), (30, -0.035), (31, -0.007), (32, 0.045), (33, -0.054), (34, -0.017), (35, 0.035), (36, -0.012), (37, 0.063), (38, 0.054), (39, 0.223), (40, -0.035), (41, 0.02), (42, 0.026), (43, 0.061), (44, 0.051), (45, -0.054), (46, 0.055), (47, -0.031), (48, -0.033), (49, -0.027)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.93126637 336 nips-2012-The Coloured Noise Expansion and Parameter Estimation of Diffusion Processes

Author: Simon Lyons, Amos J. Storkey, Simo Särkkä

Abstract: Stochastic differential equations (SDE) are a natural tool for modelling systems that are inherently noisy or contain uncertainties that can be modelled as stochastic processes. Crucial to the process of using SDE to build mathematical models is the ability to estimate parameters of those models from observed data. Over the past few decades, significant progress has been made on this problem, but we are still far from having a definitive solution. We describe a novel method of approximating a diffusion process that we show to be useful in Markov chain Monte-Carlo (MCMC) inference algorithms. We take the ‘white’ noise that drives a diffusion process and decompose it into two terms. The first is a ‘coloured noise’ term that can be deterministically controlled by a set of auxilliary variables. The second term is small and enables us to form a linear Gaussian ‘small noise’ approximation. The decomposition allows us to take a diffusion process of interest and cast it in a form that is amenable to sampling by MCMC methods. We explain why many state-of-the-art inference methods fail on highly nonlinear inference problems, and we demonstrate experimentally that our method performs well in such situations. Our results show that this method is a promising new tool for use in inference and parameter estimation problems. 1

2 0.7347396 205 nips-2012-MCMC for continuous-time discrete-state systems

Author: Vinayak Rao, Yee W. Teh

Abstract: We propose a simple and novel framework for MCMC inference in continuoustime discrete-state systems with pure jump trajectories. We construct an exact MCMC sampler for such systems by alternately sampling a random discretization of time given a trajectory of the system, and then a new trajectory given the discretization. The first step can be performed efficiently using properties of the Poisson process, while the second step can avail of discrete-time MCMC techniques based on the forward-backward algorithm. We show the advantage of our approach compared to particle MCMC and a uniformization-based sampler. 1

3 0.64115953 97 nips-2012-Diffusion Decision Making for Adaptive k-Nearest Neighbor Classification

Author: Yung-kyun Noh, Frank Park, Daniel D. Lee

Abstract: This paper sheds light on some fundamental connections of the diffusion decision making model of neuroscience and cognitive psychology with k-nearest neighbor classification. We show that conventional k-nearest neighbor classification can be viewed as a special problem of the diffusion decision model in the asymptotic situation. By applying the optimal strategy associated with the diffusion decision model, an adaptive rule is developed for determining appropriate values of k in knearest neighbor classification. Making use of the sequential probability ratio test (SPRT) and Bayesian analysis, we propose five different criteria for adaptively acquiring nearest neighbors. Experiments with both synthetic and real datasets demonstrate the effectiveness of our classification criteria. 1

4 0.63214386 118 nips-2012-Entangled Monte Carlo

Author: Seong-hwan Jun, Liangliang Wang, Alexandre Bouchard-côté

Abstract: We propose a novel method for scalable parallelization of SMC algorithms, Entangled Monte Carlo simulation (EMC). EMC avoids the transmission of particles between nodes, and instead reconstructs them from the particle genealogy. In particular, we show that we can reduce the communication to the particle weights for each machine while efficiently maintaining implicit global coherence of the parallel simulation. We explain methods to efficiently maintain a genealogy of particles from which any particle can be reconstructed. We demonstrate using examples from Bayesian phylogenetic that the computational gain from parallelization using EMC significantly outweighs the cost of particle reconstruction. The timing experiments show that reconstruction of particles is indeed much more efficient as compared to transmission of particles. 1

5 0.62877899 182 nips-2012-Learning Networks of Heterogeneous Influence

Author: Nan Du, Le Song, Ming Yuan, Alex J. Smola

Abstract: Information, disease, and influence diffuse over networks of entities in both natural systems and human society. Analyzing these transmission networks plays an important role in understanding the diffusion processes and predicting future events. However, the underlying transmission networks are often hidden and incomplete, and we observe only the time stamps when cascades of events happen. In this paper, we address the challenging problem of uncovering the hidden network only from the cascades. The structure discovery problem is complicated by the fact that the influence between networked entities is heterogeneous, which can not be described by a simple parametric model. Therefore, we propose a kernelbased method which can capture a diverse range of different types of influence without any prior assumption. In both synthetic and real cascade data, we show that our model can better recover the underlying diffusion network and drastically improve the estimation of the transmission functions among networked entities. 1

6 0.57536942 41 nips-2012-Ancestor Sampling for Particle Gibbs

7 0.53441012 73 nips-2012-Coding efficiency and detectability of rate fluctuations with non-Poisson neuronal firing

8 0.4937295 56 nips-2012-Bayesian active learning with localized priors for fast receptive field characterization

9 0.48980087 140 nips-2012-Fusion with Diffusion for Robust Visual Tracking

10 0.47431099 333 nips-2012-Synchronization can Control Regularization in Neural Systems via Correlated Noise Processes

11 0.46631172 219 nips-2012-Modelling Reciprocating Relationships with Hawkes Processes

12 0.43425283 105 nips-2012-Dynamic Pruning of Factor Graphs for Maximum Marginal Prediction

13 0.42967024 82 nips-2012-Continuous Relaxations for Discrete Hamiltonian Monte Carlo

14 0.42145476 128 nips-2012-Fast Resampling Weighted v-Statistics

15 0.42031687 232 nips-2012-Multiplicative Forests for Continuous-Time Processes

16 0.41389534 37 nips-2012-Affine Independent Variational Inference

17 0.4121044 318 nips-2012-Sparse Approximate Manifolds for Differential Geometric MCMC

18 0.41164973 244 nips-2012-Nonconvex Penalization Using Laplace Exponents and Concave Conjugates

19 0.41105905 349 nips-2012-Training sparse natural image models with a fast Gibbs sampler of an extended state space

20 0.40915608 115 nips-2012-Efficient high dimensional maximum entropy modeling via symmetric partition functions


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.035), (17, 0.012), (21, 0.044), (38, 0.096), (39, 0.01), (42, 0.04), (54, 0.027), (55, 0.02), (65, 0.243), (74, 0.05), (76, 0.172), (80, 0.095), (83, 0.011), (92, 0.069)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.81501204 336 nips-2012-The Coloured Noise Expansion and Parameter Estimation of Diffusion Processes

Author: Simon Lyons, Amos J. Storkey, Simo Särkkä

Abstract: Stochastic differential equations (SDE) are a natural tool for modelling systems that are inherently noisy or contain uncertainties that can be modelled as stochastic processes. Crucial to the process of using SDE to build mathematical models is the ability to estimate parameters of those models from observed data. Over the past few decades, significant progress has been made on this problem, but we are still far from having a definitive solution. We describe a novel method of approximating a diffusion process that we show to be useful in Markov chain Monte-Carlo (MCMC) inference algorithms. We take the ‘white’ noise that drives a diffusion process and decompose it into two terms. The first is a ‘coloured noise’ term that can be deterministically controlled by a set of auxilliary variables. The second term is small and enables us to form a linear Gaussian ‘small noise’ approximation. The decomposition allows us to take a diffusion process of interest and cast it in a form that is amenable to sampling by MCMC methods. We explain why many state-of-the-art inference methods fail on highly nonlinear inference problems, and we demonstrate experimentally that our method performs well in such situations. Our results show that this method is a promising new tool for use in inference and parameter estimation problems. 1

2 0.75863636 85 nips-2012-Convergence and Energy Landscape for Cheeger Cut Clustering

Author: Xavier Bresson, Thomas Laurent, David Uminsky, James V. Brecht

Abstract: This paper provides both theoretical and algorithmic results for the 1 -relaxation of the Cheeger cut problem. The 2 -relaxation, known as spectral clustering, only loosely relates to the Cheeger cut; however, it is convex and leads to a simple optimization problem. The 1 -relaxation, in contrast, is non-convex but is provably equivalent to the original problem. The 1 -relaxation therefore trades convexity for exactness, yielding improved clustering results at the cost of a more challenging optimization. The first challenge is understanding convergence of algorithms. This paper provides the first complete proof of convergence for algorithms that minimize the 1 -relaxation. The second challenge entails comprehending the 1 energy landscape, i.e. the set of possible points to which an algorithm might converge. We show that 1 -algorithms can get trapped in local minima that are not globally optimal and we provide a classification theorem to interpret these local minima. This classification gives meaning to these suboptimal solutions and helps to explain, in terms of graph structure, when the 1 -relaxation provides the solution of the original Cheeger cut problem. 1

3 0.73650795 127 nips-2012-Fast Bayesian Inference for Non-Conjugate Gaussian Process Regression

Author: Emtiyaz Khan, Shakir Mohamed, Kevin P. Murphy

Abstract: We present a new variational inference algorithm for Gaussian process regression with non-conjugate likelihood functions, with application to a wide array of problems including binary and multi-class classification, and ordinal regression. Our method constructs a concave lower bound that is optimized using an efficient fixed-point updating algorithm. We show that the new algorithm has highly competitive computational complexity, matching that of alternative approximate inference methods. We also prove that the use of concave variational bounds provides stable and guaranteed convergence – a property not available to other approaches. We show empirically for both binary and multi-class classification that our new algorithm converges much faster than existing variational methods, and without any degradation in performance. 1

4 0.71078336 104 nips-2012-Dual-Space Analysis of the Sparse Linear Model

Author: Yi Wu, David P. Wipf

Abstract: Sparse linear (or generalized linear) models combine a standard likelihood function with a sparse prior on the unknown coefficients. These priors can conveniently be expressed as a maximization over zero-mean Gaussians with different variance hyperparameters. Standard MAP estimation (Type I) involves maximizing over both the hyperparameters and coefficients, while an empirical Bayesian alternative (Type II) first marginalizes the coefficients and then maximizes over the hyperparameters, leading to a tractable posterior approximation. The underlying cost functions can be related via a dual-space framework from [22], which allows both the Type I or Type II objectives to be expressed in either coefficient or hyperparmeter space. This perspective is useful because some analyses or extensions are more conducive to development in one space or the other. Herein we consider the estimation of a trade-off parameter balancing sparsity and data fit. As this parameter is effectively a variance, natural estimators exist by assessing the problem in hyperparameter (variance) space, transitioning natural ideas from Type II to solve what is much less intuitive for Type I. In contrast, for analyses of update rules and sparsity properties of local and global solutions, as well as extensions to more general likelihood models, we can leverage coefficient-space techniques developed for Type I and apply them to Type II. For example, this allows us to prove that Type II-inspired techniques can be successful recovering sparse coefficients when unfavorable restricted isometry properties (RIP) lead to failure of popular ℓ1 reconstructions. It also facilitates the analysis of Type II when non-Gaussian likelihood models lead to intractable integrations. 1

5 0.69059157 197 nips-2012-Learning with Recursive Perceptual Representations

Author: Oriol Vinyals, Yangqing Jia, Li Deng, Trevor Darrell

Abstract: Linear Support Vector Machines (SVMs) have become very popular in vision as part of state-of-the-art object recognition and other classification tasks but require high dimensional feature spaces for good performance. Deep learning methods can find more compact representations but current methods employ multilayer perceptrons that require solving a difficult, non-convex optimization problem. We propose a deep non-linear classifier whose layers are SVMs and which incorporates random projection as its core stacking element. Our method learns layers of linear SVMs recursively transforming the original data manifold through a random projection of the weak prediction computed from each layer. Our method scales as linear SVMs, does not rely on any kernel computations or nonconvex optimization, and exhibits better generalization ability than kernel-based SVMs. This is especially true when the number of training samples is smaller than the dimensionality of data, a common scenario in many real-world applications. The use of random projections is key to our method, as we show in the experiments section, in which we observe a consistent improvement over previous –often more complicated– methods on several vision and speech benchmarks. 1

6 0.6880917 188 nips-2012-Learning from Distributions via Support Measure Machines

7 0.68667549 90 nips-2012-Deep Learning of Invariant Features via Simulated Fixations in Video

8 0.68540889 318 nips-2012-Sparse Approximate Manifolds for Differential Geometric MCMC

9 0.68492615 321 nips-2012-Spectral learning of linear dynamics from generalised-linear observations with application to neural population data

10 0.68480861 229 nips-2012-Multimodal Learning with Deep Boltzmann Machines

11 0.6847592 209 nips-2012-Max-Margin Structured Output Regression for Spatio-Temporal Action Localization

12 0.68368196 329 nips-2012-Super-Bit Locality-Sensitive Hashing

13 0.68366998 279 nips-2012-Projection Retrieval for Classification

14 0.6832695 41 nips-2012-Ancestor Sampling for Particle Gibbs

15 0.68299913 74 nips-2012-Collaborative Gaussian Processes for Preference Learning

16 0.68293309 316 nips-2012-Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models

17 0.68177116 203 nips-2012-Locating Changes in Highly Dependent Data with Unknown Number of Change Points

18 0.68070197 163 nips-2012-Isotropic Hashing

19 0.67994207 103 nips-2012-Distributed Probabilistic Learning for Camera Networks with Missing Data

20 0.67987621 126 nips-2012-FastEx: Hash Clustering with Exponential Families