nips nips2012 nips2012-205 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Vinayak Rao, Yee W. Teh
Abstract: We propose a simple and novel framework for MCMC inference in continuoustime discrete-state systems with pure jump trajectories. We construct an exact MCMC sampler for such systems by alternately sampling a random discretization of time given a trajectory of the system, and then a new trajectory given the discretization. The first step can be performed efficiently using properties of the Poisson process, while the second step can avail of discrete-time MCMC techniques based on the forward-backward algorithm. We show the advantage of our approach compared to particle MCMC and a uniformization-based sampler. 1
Reference: text
sentIndex sentText sentNum sentScore
1 We construct an exact MCMC sampler for such systems by alternately sampling a random discretization of time given a trajectory of the system, and then a new trajectory given the discretization. [sent-8, score-0.643]
2 Examples include point processes [1], Markov processes [2], structured Markov processes [3], infinite state Markov processes [4], semi-Markov processes [5] etc. [sent-12, score-0.321]
3 Here, we construct an exact MCMC sampler for pure jump processes in continuous time, using a workhorse of the discrete-time domain, the forward-filtering backward-sampling algorithm [7, 8], to make efficient updates. [sent-19, score-0.382]
4 While the forward-backward algorithm was developed originally for finite state hidden Markov models and linear Gaussian systems, it also forms the core of samplers for more complicated systems like nonlinear/non-Gaussian [9], infinite state [10], and non-Markovian [11] time series. [sent-26, score-0.247]
5 We compare our sampler with two other continuous-time MCMC samplers, a particle MCMC sampler [12], and a uniformizationbased sampler [13]. [sent-29, score-0.75]
6 Then, the sMJP is parametrized by π0 , an (arbitrary) initial distribution over states, as well as an N ×N matrix of hazard functions, Ass (·) ∀s, s ∈ S. [sent-34, score-0.308]
7 For any τ , Ass (τ ) gives the rate of transitioning to state s , τ time units after entering state s (we allow self-transitions, so s can equal s). [sent-35, score-0.273]
8 [16]): rss (τs ) = Ass (τs )e(− τ s 0 Ass (u)du) τs , Ass (τs ) = rss (τs )/ 1 − rss (u)du (1) 0 Sampling an sMJP trajectory proceeds as follows: on entering state s, sample waiting times τs ∼ Ass (·) ∀s ∈ S. [sent-38, score-0.563]
9 An equivalent characterization of many continuous-time processes is to first sample the waiting time τhold , and then draw a new state s . [sent-45, score-0.232]
10 A special sMJP is the Markov jump process (MJP) where the hazard functions are constant (giving exponential waiting times). [sent-47, score-0.541]
11 We represent an sMJP trajectory on an interval [tstart , tend ] as (S, T ), where T = (t0 , · · · , t|T | ) is the sequence of jump times (including the endpoints) and S = (s0 , · · · , s|S| ) is the corresponding sequence of state values. [sent-50, score-0.528]
12 The filled circles in figure 1(c) represent (S, T ); since the process is right-continuous, si gives the state after the jump at ti . [sent-52, score-0.355]
13 1 Sampling by dependent thinning We now describe an alternate thinning-based approach to sampling an sMJP trajectory. [sent-54, score-0.232]
14 Our approach will produce candidate event times at a rate higher that the actual event rates in the system. [sent-55, score-0.283]
15 Define W as the sequence of actual event times T , together with the thinned event times (which we call U , these are the empty circles in figure 1(c)). [sent-57, score-0.42]
16 W = (w0 , · · · , w|W | ) forms a random discretization of time (with |W | = |T | + |U |); define V = (v0 , · · · , v|W | ) as a sequence of state assignments to the times W . [sent-58, score-0.249]
17 At any wi , let li represent the time since the last sMJP transition (so that, li = wi − maxt∈T,t≤wi t), and let L = l1 , · · · , l|W | . [sent-59, score-0.676]
18 (V, L, W ) forms an equivalent representation of (S, T ) that includes a redundant set of thinned events U . [sent-61, score-0.327]
19 Note that if the ith event is thinned, vi = vi−1 , however this is not a self-transition. [sent-62, score-0.225]
20 L helps distinguish self-transitions (having associated l’s equal to 0) from thinned events. [sent-63, score-0.216]
21 For each hazard function As (τ ), define another dominating hazard function Bs (τ ), so that Bs (τ ) ≥ As (τ ) ∀s, τ . [sent-65, score-0.697]
22 Suppose we have instantiated the system trajectory until time wi , with the sMJP having just entered state vi ∈ S (so that li = 0). [sent-66, score-0.757]
23 We sample the next candidate event time wi+1 , with ∆wi = (wi+1 − wi ) drawn from the hazard function Bvi (·). [sent-67, score-0.638]
24 If this is the case, we vi i i sample a new state vi+1 with probability proportional to Avi vi+1 (∆wi + li ), and set li+1 = 0. [sent-70, score-0.351]
25 On the other hand, if the event is rejected, we set vi+1 to vi , and li+1 = (∆wi + li ). [sent-71, score-0.35]
26 We now sample 2 Figure 1: a) Instantaneous hazard rates given a trajectory b) State holding times, L(t) c) sMJP state values S(t) d) Graphical model for the randomized time-discretization e) Resampling the sMJP trajectory. [sent-72, score-0.596]
27 In b) and c), the filled and empty circles represent actual and thinned events respectively. [sent-73, score-0.327]
28 Dominating hazard functions Bs (τ ) ≥ As (τ ) ∀τ, s, where As (τ ) = s Ass (τ ). [sent-82, score-0.308]
29 Output: A piecewise constant path (V, L, W ) ≡ ((vi , li , wi )) on the interval [tstart , tend ]. [sent-83, score-0.481]
30 2: while wi < tend do 3: Sample τhold ∼ Bvi (·), with τhold > li . [sent-86, score-0.382]
31 2 Posterior inference via MCMC We now define an auxiliary variable Gibbs sampler, setting up a Markov chain that converges to the posterior distribution over the thinned representation (V, L, W ) given observations X of the sMJP trajectory. [sent-93, score-0.295]
32 By construction, the sMJP stays in a single state vi over this interval; let P (xi |vi ) be the corresponding likelihood vector. [sent-95, score-0.226]
33 Given a time discretization W ≡ (U ∪ T ) and the observations X, we discard the old state labels (V, L), and sample a new path ˜ ˜ ˜ ˜ ˜ (V , L, W ) ≡ (S, T ) using the forward-backward algorithm. [sent-96, score-0.304]
34 We then discard the thinned events U , ˜ T ), resample new thinned events Unew , resulting in a new time discretization ˜ and given the path (S, ˜ Wnew ≡ (T ∪ Unew ). [sent-97, score-0.856]
35 Resampling the sMJP trajectory given the set of times W : Given W (and thus all ∆wi ), this involves assigning each element wi ∈ W , a label (vi , li ) (see figure 1(d)). [sent-99, score-0.491]
36 Observe from this figure that the joint distribution factorizes as: |W |−1 P (xi |vi )P (∆wi |vi , li )P (vi+1 , li+1 |vi , li , ∆wi ) P (V, L, W, X) = P (v0 , l0 ) (3) i=0 − (li +∆wi ) B (t)dt vi li . [sent-101, score-0.525]
37 From equation 2, (with B instead of A), P (∆wi |vi , li ) = Bvi (li + ∆wi )e The term P (vi+1 , li+1 |vi , li , ∆wi ) is the thinning/state-transition probability from steps 4 and 5 of algorithm 1. [sent-102, score-0.25]
38 Observe that li can take (i + 1) values (in the set {0, wi − wi−1 , · · · , wi − w0 }), with the value of li affecting P (vi+1 , li+1 |vi , li , ∆wi+1 ). [sent-105, score-0.771]
39 [11] use a slice sampler to limit the maximum holding time of a state, and thus limit li ). [sent-108, score-0.373]
40 Resampling the thinned events given the sMJP trajectory: Having obtained a new sMJP trajectory (V, L, W ), we discard all thinned events U , so that the ˜ current state of the sampler is now (S, T ). [sent-109, score-1.078]
41 We then resample the thinned events U , recovering a new ˜ , L, W ), and with it, a new discretization of time. [sent-110, score-0.478]
42 To simplify notation, ˜ ˜ thinned representation (V we define the instantaneous hazard functions A(t) and B(t) (see figure 1(a)): A(t) = AS(t) (L(t)), and B(t) = BS(t) (L(t)) (4) These were the event rates relevant at any time t during the generative process. [sent-111, score-0.688]
43 The events W (whether thinned or not) were generated from a rate B(·) process, while the probability that an event wi was thinned is 1 − A(wi )/B(wi ). [sent-113, score-0.856]
44 The Poisson thinning theorem [17] then suggests that the thinned events U are distributed as a Poisson process with intensity (B(t) − A(t)). [sent-114, score-0.605]
45 Conditioned on a trajectory (S, T ) of the sMJP, the thinned events U are distributed as a Poisson process with intensity (B(t) − A(t)). [sent-117, score-0.543]
46 Our sampler reduces to this when a constant dominating rate B > maxs,τ As (τ ) is used to bound all event rates. [sent-127, score-0.382]
47 Moreover, with a single rate B, the average number of candidate events |W |, (and thus the computational cost of the algorithm), scales with the leaving rate of the most unstable state. [sent-129, score-0.262]
48 This allows our sampler to adapt the granularity of time-discretization to that required by the posterior trajectories, moreover this granularity can vary over the time interval. [sent-133, score-0.29]
49 4 Experiments In this section, we evaluate our sampler on a 3-state sMJP with Weibull hazard rates. [sent-138, score-0.494]
50 Here αss −1 α αss τ αss τ rss (τ |αss , λss ) = e(−(τ /λss ) ss ) , Ass (τ |αss , λss ) = λss λss λss λss αss −1 where λss is the scale parameter, and the shape parameter αss controls the stability of a state s. [sent-139, score-0.419]
51 When αss < 1, on entering state s, the system is likely to quickly jump to state s . [sent-140, score-0.347]
52 Note that for αss < 1, the hazard function tends to infinity as τ → 0. [sent-142, score-0.308]
53 We use the following simple upper bound Bss (τ ): Bss (τ ) = ΩAss (τ |αss , λss ) = Ωαss λss τ λss αss −1 = αss ˜ λss τ ˜ ss λ αss −1 (5) √ ˜ Here, λ = λ/ α Ω for any λ and α. [sent-144, score-0.273]
54 Thus, sampling from the dominating hazard function Bss (·) ˜ reduces to straightforward sampling from a Weibull with a smaller scale parameter λss . [sent-145, score-0.447]
55 Sampling thinned events on an interval (ti , ti+1 ) (where the sMJP is in state si ) involves sampling from a Poisson process with intensity (B(t) − A(t)) = (Ω − 1)A(t) = (Ω − 1) s Asi s (t − ti ). [sent-148, score-0.709]
56 As before, A(·) is a Weibull hazard function √ α obtained by correcting the scale parameter λ of A(·) by Ω − 1. [sent-150, score-0.308]
57 A simple way to sample such a Poisson process is by first drawing the number of events from a Poisson distribution with mean (ti+1 −ti ) ˆ ˆ Asi n (u)du, and then drawing that many events i. [sent-151, score-0.252]
58 Then Wi ≡ T + ti is the set of resampled thinned events on the interval (ti , ti+1 ). [sent-157, score-0.517]
59 In the following experiments, the shape parameters for each Weibull hazard (αss ) was randomly drawn from the interval [0. [sent-159, score-0.407]
60 The unbounded hazards associated with αss < 1 meant that uniformization is not applicable to this problem, and we only compared our sampler with pMCMC. [sent-162, score-0.388]
61 Our MCMC sampler was set up with Ω = 2, so that the dominating hazard rate at any instant equalled twice the true hazard rate (i. [sent-164, score-0.963]
62 For pMCMC, we implemented the particle independent Metropolis-Hastings sampler from [12]. [sent-168, score-0.378]
63 After any MCMC run, given a sequence of piecewise constant trajectories, we calculated the empirical distribution of 5 Thinning particle MCMC10 particle MCMC20 10 8 6 4 2 0 40 Thinning particle MCMC10 30 20 10 0 2 5 10 20 50 Effective samples per second Effective samples per second 0. [sent-171, score-0.702]
64 30 1 Thinning particle MCMC10 25 20 15 10 5 0 2 5 10 20 50 Figure 3: ESS per second for increasing interval lengths. [sent-190, score-0.335]
65 We see (as one might expect), that when the effect of the observations is weak, particle MCMC (which uses the prior distribution to make local proposals), outperforms our thinning-based sampler. [sent-200, score-0.226]
66 By contrast, our sampler is fairly insensitive to the effect of the likelihood, eventually outperforming the particle MCMC sampler. [sent-203, score-0.378]
67 While there exist techniques to generate more data-driven proposals for the particle MCMC [12, 20], these compromise the appealing simplicity of the original particle MCMC sampler. [sent-204, score-0.406]
68 The right plot in figure 2 shows the ESS per unit time for both samplers, now with the observation interval set to a smaller length of 2. [sent-206, score-0.23]
69 First, more observations per unit time requires rapid switching between states, a deviation from the prior that particle filtering is unlikely to propose. [sent-209, score-0.33]
70 Effect of the observation interval length In the next experiment, we more carefully compare the two samplers as the interval length varies. [sent-211, score-0.317]
71 9), we calculated the number of effective samples produced per unit time as the length of the observation interval increased from 2 to 50. [sent-215, score-0.281]
72 3 Markov jump processes In this section, we look at the Markov jump process (MJP), which we saw has constant hazard functions Ass . [sent-222, score-0.639]
73 If we use constant dominating hazard rates Bs , we see from algorithm 1 that all probabilities at time wi depend only on the current state si , and are independent of the holding time li . [sent-224, score-0.951]
74 The forward message at time wi needs only to represent the probability of vi taking different values in S; this completely specifies the state of the MJP. [sent-226, score-0.454]
75 In the next experiment, we compare Matlab implementations of our thinning-based sampler and the particle MCMC sampler with the uniformization-based sampler described in section 2. [sent-228, score-0.75]
76 Recall that the latter samples candidate event times W from a homogeneous Poisson process with a stateindependent rate B > maxs As . [sent-230, score-0.282]
77 Observe that for uniformization, the rate B is determined by the leaving rate of the most unstable state; often this is the state the system spends the least time in. [sent-235, score-0.267]
78 We distributed 5 observation times (again, favouring a random state by a factor of 100) over the interval [0, 10]. [sent-240, score-0.222]
79 Further, we see that while both the uniformization and our sampler perform comparably for low values of γ, our sampler starts to outperform uniformization for γ’s greater than 2. [sent-246, score-0.755]
80 In fact, for weak observations and large γs, even particle MCMC outperforms uniformization. [sent-247, score-0.226]
81 1 The M/M/∞ queue We finally apply our ideas to an infinite state MJP from queuing theory, the M/M/∞ queue (also called an immigration-death process [21]). [sent-250, score-0.242]
82 Then, under the M/M/∞ queue, the stochastic process S(t) evolves according to a simple birth-death Markov jump process on the space S = {1, · · · , ∞}, with rates As,s+1 = α and As,s−1 = sβ. [sent-255, score-0.225]
83 For some tend , the state of the system was observed perfectly at three times 0, tend /10 and tend , with values 10, 2 and 15 respectively. [sent-262, score-0.337]
84 Conditioned on these, we sought the posterior distribution over the 7 queue: a) ESS per unit time b) ESS per unit time scaled by interval length. [sent-263, score-0.329]
85 Since the state of the system at time 0 is perfectly observed to be 10, given any time-discretization, the maximum value of si at step i of the Markov chain is (10 + i). [sent-265, score-0.218]
86 We compared our sampler with uniformization; for this, we approximated the M/M/∞ system with an M/M/50/50 system. [sent-268, score-0.223]
87 Figure 5(a) shows the ESS per unit time for all three samplers as we varied the interval length tend from 1 to 20. [sent-272, score-0.378]
88 Sampling a trajectory over a long interval will take more time than over a short one, and to more clearly distinguish performance for large values of tend , we scale each ESS from the left plot with tend , the length of the interval, in the right subplot of figure 5. [sent-273, score-0.415]
89 Interestingly, running our thinning-based sampler on the truncated system offers no significant computational benefit over running it on the full model. [sent-275, score-0.246]
90 As the observation interval becomes longer and longer, the MJP trajectory can make larger and larger excursions (especially over the interval [tend /10, tend ]). [sent-276, score-0.398]
91 Thus as tend increases, event rates witnessed in posterior trajectories starts to increase. [sent-277, score-0.241]
92 As our sampler adapts to this, the number of thinned events in all three samplers start to become comparable, causing the uniformization-based sampler to approach the performance of the other two samplers. [sent-278, score-0.764]
93 At the same time, we see that the difference between our truncated and our untruncated sampler starts to widen. [sent-279, score-0.228]
94 Each MCMC iteration first samples a random discretization of time given the trajectory of the system. [sent-282, score-0.306]
95 Recall that an sMJP trajectory defines an instantaneous hazard function B(t); our scheme requires the discretization sampled from the old hazard function be compatible with the new hazard function. [sent-286, score-1.228]
96 Thus, the forward-backward algorithm is unlikely to return a trajectory associated with a hazard function that differ significantly from the old one. [sent-287, score-0.476]
97 By contrast, for uniformization, the hazard function is a constant B, independent of the system state. [sent-288, score-0.345]
98 An interesting direction for future work is too see how different choices of the dominating hazard function can help trade-off these factors. [sent-290, score-0.389]
99 For sMJPs whose hazard functions are constant beyond a ‘window of memory’, inference scales quadratically with the memory length, and only linearly with |W |. [sent-296, score-0.333]
100 Fast MCMC sampling for Markov jump processes and continuous time Bayesian networks. [sent-374, score-0.234]
wordName wordTfidf (topN-words)
[('smjp', 0.461), ('hazard', 0.308), ('ss', 0.273), ('thinned', 0.216), ('ass', 0.209), ('thinning', 0.203), ('wi', 0.198), ('particle', 0.192), ('sampler', 0.186), ('uniformization', 0.182), ('vi', 0.15), ('trajectory', 0.141), ('mcmc', 0.13), ('jump', 0.126), ('li', 0.125), ('ess', 0.117), ('discretization', 0.116), ('events', 0.111), ('poisson', 0.105), ('interval', 0.099), ('pmcmc', 0.099), ('bvi', 0.098), ('ti', 0.091), ('mjp', 0.086), ('dominating', 0.081), ('waiting', 0.077), ('state', 0.076), ('event', 0.075), ('bs', 0.071), ('rss', 0.07), ('smjps', 0.07), ('weibull', 0.07), ('queue', 0.068), ('samplers', 0.065), ('asi', 0.062), ('tend', 0.059), ('bss', 0.056), ('markov', 0.049), ('processes', 0.049), ('leaving', 0.044), ('per', 0.044), ('mjps', 0.042), ('snew', 0.042), ('trunc', 0.042), ('tstart', 0.042), ('gure', 0.041), ('rate', 0.04), ('rates', 0.039), ('hold', 0.038), ('system', 0.037), ('temperature', 0.037), ('maxs', 0.037), ('resample', 0.035), ('avi', 0.034), ('observations', 0.034), ('holding', 0.032), ('effective', 0.032), ('entering', 0.032), ('si', 0.032), ('du', 0.031), ('process', 0.03), ('unit', 0.03), ('time', 0.03), ('sampling', 0.029), ('unew', 0.028), ('population', 0.028), ('old', 0.027), ('trajectories', 0.027), ('candidate', 0.027), ('length', 0.027), ('homogeneous', 0.027), ('times', 0.027), ('granularity', 0.026), ('intensity', 0.025), ('quadratically', 0.025), ('ameliorate', 0.025), ('busy', 0.025), ('avail', 0.025), ('jobs', 0.025), ('servers', 0.025), ('varied', 0.024), ('chain', 0.023), ('truncated', 0.023), ('resampling', 0.022), ('posterior', 0.022), ('ltering', 0.022), ('proposals', 0.022), ('pure', 0.021), ('discard', 0.021), ('distributed', 0.02), ('instantaneous', 0.02), ('perfectly', 0.02), ('unbounded', 0.02), ('arnaud', 0.019), ('monte', 0.019), ('samples', 0.019), ('transitioning', 0.019), ('inferences', 0.019), ('looked', 0.019), ('starts', 0.019)]
simIndex simValue paperId paperTitle
same-paper 1 1.0 205 nips-2012-MCMC for continuous-time discrete-state systems
Author: Vinayak Rao, Yee W. Teh
Abstract: We propose a simple and novel framework for MCMC inference in continuoustime discrete-state systems with pure jump trajectories. We construct an exact MCMC sampler for such systems by alternately sampling a random discretization of time given a trajectory of the system, and then a new trajectory given the discretization. The first step can be performed efficiently using properties of the Poisson process, while the second step can avail of discrete-time MCMC techniques based on the forward-backward algorithm. We show the advantage of our approach compared to particle MCMC and a uniformization-based sampler. 1
2 0.16650841 41 nips-2012-Ancestor Sampling for Particle Gibbs
Author: Fredrik Lindsten, Thomas Schön, Michael I. Jordan
Abstract: We present a novel method in the family of particle MCMC methods that we refer to as particle Gibbs with ancestor sampling (PG-AS). Similarly to the existing PG with backward simulation (PG-BS) procedure, we use backward sampling to (considerably) improve the mixing of the PG kernel. Instead of using separate forward and backward sweeps as in PG-BS, however, we achieve the same effect in a single forward sweep. We apply the PG-AS framework to the challenging class of non-Markovian state-space models. We develop a truncation strategy of these models that is applicable in principle to any backward-simulation-based method, but which is particularly well suited to the PG-AS framework. In particular, as we show in a simulation study, PG-AS can yield an order-of-magnitude improved accuracy relative to PG-BS due to its robustness to the truncation error. Several application examples are discussed, including Rao-Blackwellized particle smoothing and inference in degenerate state-space models. 1
3 0.13749321 182 nips-2012-Learning Networks of Heterogeneous Influence
Author: Nan Du, Le Song, Ming Yuan, Alex J. Smola
Abstract: Information, disease, and influence diffuse over networks of entities in both natural systems and human society. Analyzing these transmission networks plays an important role in understanding the diffusion processes and predicting future events. However, the underlying transmission networks are often hidden and incomplete, and we observe only the time stamps when cascades of events happen. In this paper, we address the challenging problem of uncovering the hidden network only from the cascades. The structure discovery problem is complicated by the fact that the influence between networked entities is heterogeneous, which can not be described by a simple parametric model. Therefore, we propose a kernelbased method which can capture a diverse range of different types of influence without any prior assumption. In both synthetic and real cascade data, we show that our model can better recover the underlying diffusion network and drastically improve the estimation of the transmission functions among networked entities. 1
4 0.11909653 118 nips-2012-Entangled Monte Carlo
Author: Seong-hwan Jun, Liangliang Wang, Alexandre Bouchard-côté
Abstract: We propose a novel method for scalable parallelization of SMC algorithms, Entangled Monte Carlo simulation (EMC). EMC avoids the transmission of particles between nodes, and instead reconstructs them from the particle genealogy. In particular, we show that we can reduce the communication to the particle weights for each machine while efficiently maintaining implicit global coherence of the parallel simulation. We explain methods to efficiently maintain a genealogy of particles from which any particle can be reconstructed. We demonstrate using examples from Bayesian phylogenetic that the computational gain from parallelization using EMC significantly outweighs the cost of particle reconstruction. The timing experiments show that reconstruction of particles is indeed much more efficient as compared to transmission of particles. 1
5 0.10071028 315 nips-2012-Slice sampling normalized kernel-weighted completely random measure mixture models
Author: Nicholas Foti, Sinead Williamson
Abstract: A number of dependent nonparametric processes have been proposed to model non-stationary data with unknown latent dimensionality. However, the inference algorithms are often slow and unwieldy, and are in general highly specific to a given model formulation. In this paper, we describe a large class of dependent nonparametric processes, including several existing models, and present a slice sampler that allows efficient inference across this class of models. 1
6 0.088638119 126 nips-2012-FastEx: Hash Clustering with Exponential Families
7 0.085617028 336 nips-2012-The Coloured Noise Expansion and Parameter Estimation of Diffusion Processes
8 0.084598228 318 nips-2012-Sparse Approximate Manifolds for Differential Geometric MCMC
9 0.079220161 111 nips-2012-Efficient Sampling for Bipartite Matching Problems
10 0.078796506 82 nips-2012-Continuous Relaxations for Discrete Hamiltonian Monte Carlo
11 0.078245655 349 nips-2012-Training sparse natural image models with a fast Gibbs sampler of an extended state space
12 0.072663195 219 nips-2012-Modelling Reciprocating Relationships with Hawkes Processes
13 0.069335245 107 nips-2012-Effective Split-Merge Monte Carlo Methods for Nonparametric Models of Sequential Data
14 0.063907862 356 nips-2012-Unsupervised Structure Discovery for Semantic Analysis of Audio
15 0.063849851 276 nips-2012-Probabilistic Event Cascades for Alzheimer's disease
16 0.062297281 73 nips-2012-Coding efficiency and detectability of rate fluctuations with non-Poisson neuronal firing
17 0.060115092 138 nips-2012-Fully Bayesian inference for neural models with negative-binomial spiking
18 0.059321854 145 nips-2012-Gradient Weights help Nonparametric Regressors
19 0.05898333 3 nips-2012-A Bayesian Approach for Policy Learning from Trajectory Preference Queries
20 0.05842055 56 nips-2012-Bayesian active learning with localized priors for fast receptive field characterization
topicId topicWeight
[(0, 0.136), (1, -0.017), (2, -0.009), (3, 0.043), (4, -0.125), (5, -0.014), (6, -0.006), (7, 0.009), (8, 0.02), (9, -0.0), (10, -0.057), (11, -0.057), (12, 0.003), (13, -0.078), (14, -0.085), (15, -0.061), (16, -0.088), (17, -0.138), (18, 0.077), (19, -0.101), (20, 0.145), (21, 0.042), (22, -0.09), (23, -0.054), (24, -0.115), (25, -0.038), (26, -0.108), (27, -0.009), (28, -0.072), (29, -0.139), (30, 0.028), (31, -0.027), (32, 0.018), (33, 0.057), (34, -0.073), (35, 0.102), (36, -0.006), (37, 0.102), (38, 0.05), (39, 0.042), (40, 0.044), (41, 0.035), (42, -0.012), (43, -0.008), (44, -0.024), (45, 0.001), (46, 0.04), (47, 0.058), (48, 0.104), (49, -0.028)]
simIndex simValue paperId paperTitle
same-paper 1 0.94639605 205 nips-2012-MCMC for continuous-time discrete-state systems
Author: Vinayak Rao, Yee W. Teh
Abstract: We propose a simple and novel framework for MCMC inference in continuoustime discrete-state systems with pure jump trajectories. We construct an exact MCMC sampler for such systems by alternately sampling a random discretization of time given a trajectory of the system, and then a new trajectory given the discretization. The first step can be performed efficiently using properties of the Poisson process, while the second step can avail of discrete-time MCMC techniques based on the forward-backward algorithm. We show the advantage of our approach compared to particle MCMC and a uniformization-based sampler. 1
2 0.76724362 118 nips-2012-Entangled Monte Carlo
Author: Seong-hwan Jun, Liangliang Wang, Alexandre Bouchard-côté
Abstract: We propose a novel method for scalable parallelization of SMC algorithms, Entangled Monte Carlo simulation (EMC). EMC avoids the transmission of particles between nodes, and instead reconstructs them from the particle genealogy. In particular, we show that we can reduce the communication to the particle weights for each machine while efficiently maintaining implicit global coherence of the parallel simulation. We explain methods to efficiently maintain a genealogy of particles from which any particle can be reconstructed. We demonstrate using examples from Bayesian phylogenetic that the computational gain from parallelization using EMC significantly outweighs the cost of particle reconstruction. The timing experiments show that reconstruction of particles is indeed much more efficient as compared to transmission of particles. 1
3 0.71012425 41 nips-2012-Ancestor Sampling for Particle Gibbs
Author: Fredrik Lindsten, Thomas Schön, Michael I. Jordan
Abstract: We present a novel method in the family of particle MCMC methods that we refer to as particle Gibbs with ancestor sampling (PG-AS). Similarly to the existing PG with backward simulation (PG-BS) procedure, we use backward sampling to (considerably) improve the mixing of the PG kernel. Instead of using separate forward and backward sweeps as in PG-BS, however, we achieve the same effect in a single forward sweep. We apply the PG-AS framework to the challenging class of non-Markovian state-space models. We develop a truncation strategy of these models that is applicable in principle to any backward-simulation-based method, but which is particularly well suited to the PG-AS framework. In particular, as we show in a simulation study, PG-AS can yield an order-of-magnitude improved accuracy relative to PG-BS due to its robustness to the truncation error. Several application examples are discussed, including Rao-Blackwellized particle smoothing and inference in degenerate state-space models. 1
4 0.69252789 336 nips-2012-The Coloured Noise Expansion and Parameter Estimation of Diffusion Processes
Author: Simon Lyons, Amos J. Storkey, Simo Särkkä
Abstract: Stochastic differential equations (SDE) are a natural tool for modelling systems that are inherently noisy or contain uncertainties that can be modelled as stochastic processes. Crucial to the process of using SDE to build mathematical models is the ability to estimate parameters of those models from observed data. Over the past few decades, significant progress has been made on this problem, but we are still far from having a definitive solution. We describe a novel method of approximating a diffusion process that we show to be useful in Markov chain Monte-Carlo (MCMC) inference algorithms. We take the ‘white’ noise that drives a diffusion process and decompose it into two terms. The first is a ‘coloured noise’ term that can be deterministically controlled by a set of auxilliary variables. The second term is small and enables us to form a linear Gaussian ‘small noise’ approximation. The decomposition allows us to take a diffusion process of interest and cast it in a form that is amenable to sampling by MCMC methods. We explain why many state-of-the-art inference methods fail on highly nonlinear inference problems, and we demonstrate experimentally that our method performs well in such situations. Our results show that this method is a promising new tool for use in inference and parameter estimation problems. 1
5 0.49918899 82 nips-2012-Continuous Relaxations for Discrete Hamiltonian Monte Carlo
Author: Yichuan Zhang, Zoubin Ghahramani, Amos J. Storkey, Charles A. Sutton
Abstract: Continuous relaxations play an important role in discrete optimization, but have not seen much use in approximate probabilistic inference. Here we show that a general form of the Gaussian Integral Trick makes it possible to transform a wide class of discrete variable undirected models into fully continuous systems. The continuous representation allows the use of gradient-based Hamiltonian Monte Carlo for inference, results in new ways of estimating normalization constants (partition functions), and in general opens up a number of new avenues for inference in difficult discrete systems. We demonstrate some of these continuous relaxation inference algorithms on a number of illustrative problems. 1
6 0.49728477 349 nips-2012-Training sparse natural image models with a fast Gibbs sampler of an extended state space
7 0.47819072 107 nips-2012-Effective Split-Merge Monte Carlo Methods for Nonparametric Models of Sequential Data
8 0.47793758 111 nips-2012-Efficient Sampling for Bipartite Matching Problems
9 0.469699 315 nips-2012-Slice sampling normalized kernel-weighted completely random measure mixture models
10 0.46935606 219 nips-2012-Modelling Reciprocating Relationships with Hawkes Processes
11 0.44556108 136 nips-2012-Forward-Backward Activation Algorithm for Hierarchical Hidden Markov Models
12 0.44173741 182 nips-2012-Learning Networks of Heterogeneous Influence
13 0.42510667 318 nips-2012-Sparse Approximate Manifolds for Differential Geometric MCMC
14 0.4067719 73 nips-2012-Coding efficiency and detectability of rate fluctuations with non-Poisson neuronal firing
15 0.38864404 126 nips-2012-FastEx: Hash Clustering with Exponential Families
16 0.38659841 105 nips-2012-Dynamic Pruning of Factor Graphs for Maximum Marginal Prediction
17 0.38235125 299 nips-2012-Scalable imputation of genetic data with a discrete fragmentation-coagulation process
18 0.38032514 232 nips-2012-Multiplicative Forests for Continuous-Time Processes
19 0.3669481 276 nips-2012-Probabilistic Event Cascades for Alzheimer's disease
20 0.35886675 11 nips-2012-A Marginalized Particle Gaussian Process Regression
topicId topicWeight
[(0, 0.014), (21, 0.021), (38, 0.06), (42, 0.021), (54, 0.021), (55, 0.012), (74, 0.04), (76, 0.604), (80, 0.069), (92, 0.041)]
simIndex simValue paperId paperTitle
1 0.99244452 175 nips-2012-Learning High-Density Regions for a Generalized Kolmogorov-Smirnov Test in High-Dimensional Data
Author: Assaf Glazer, Michael Lindenbaum, Shaul Markovitch
Abstract: We propose an efficient, generalized, nonparametric, statistical KolmogorovSmirnov test for detecting distributional change in high-dimensional data. To implement the test, we introduce a novel, hierarchical, minimum-volume sets estimator to represent the distributions to be tested. Our work is motivated by the need to detect changes in data streams, and the test is especially efficient in this context. We provide the theoretical foundations of our test and show its superiority over existing methods. 1
2 0.98588336 33 nips-2012-Active Learning of Model Evidence Using Bayesian Quadrature
Author: Michael Osborne, Roman Garnett, Zoubin Ghahramani, David K. Duvenaud, Stephen J. Roberts, Carl E. Rasmussen
Abstract: Numerical integration is a key component of many problems in scientific computing, statistical modelling, and machine learning. Bayesian Quadrature is a modelbased method for numerical integration which, relative to standard Monte Carlo methods, offers increased sample efficiency and a more robust estimate of the uncertainty in the estimated integral. We propose a novel Bayesian Quadrature approach for numerical integration when the integrand is non-negative, such as the case of computing the marginal likelihood, predictive distribution, or normalising constant of a probabilistic model. Our approach approximately marginalises the quadrature model’s hyperparameters in closed form, and introduces an active learning scheme to optimally select function evaluations, as opposed to using Monte Carlo samples. We demonstrate our method on both a number of synthetic benchmarks and a real scientific problem from astronomy. 1
3 0.98034281 311 nips-2012-Shifting Weights: Adapting Object Detectors from Image to Video
Author: Kevin Tang, Vignesh Ramanathan, Li Fei-fei, Daphne Koller
Abstract: Typical object detectors trained on images perform poorly on video, as there is a clear distinction in domain between the two types of data. In this paper, we tackle the problem of adapting object detectors learned from images to work well on videos. We treat the problem as one of unsupervised domain adaptation, in which we are given labeled data from the source domain (image), but only unlabeled data from the target domain (video). Our approach, self-paced domain adaptation, seeks to iteratively adapt the detector by re-training the detector with automatically discovered target domain examples, starting with the easiest first. At each iteration, the algorithm adapts by considering an increased number of target domain examples, and a decreased number of source domain examples. To discover target domain examples from the vast amount of video data, we introduce a simple, robust approach that scores trajectory tracks instead of bounding boxes. We also show how rich and expressive features specific to the target domain can be incorporated under the same framework. We show promising results on the 2011 TRECVID Multimedia Event Detection [1] and LabelMe Video [2] datasets that illustrate the benefit of our approach to adapt object detectors to video. 1
4 0.97937852 286 nips-2012-Random Utility Theory for Social Choice
Author: Hossein Azari, David Parks, Lirong Xia
Abstract: Random utility theory models an agent’s preferences on alternatives by drawing a real-valued score on each alternative (typically independently) from a parameterized distribution, and then ranking the alternatives according to scores. A special case that has received significant attention is the Plackett-Luce model, for which fast inference methods for maximum likelihood estimators are available. This paper develops conditions on general random utility models that enable fast inference within a Bayesian framework through MC-EM, providing concave loglikelihood functions and bounded sets of global maxima solutions. Results on both real-world and simulated data provide support for the scalability of the approach and capability for model selection among general random utility models including Plackett-Luce. 1
5 0.97912896 28 nips-2012-A systematic approach to extracting semantic information from functional MRI data
Author: Francisco Pereira, Matthew Botvinick
Abstract: This paper introduces a novel classification method for functional magnetic resonance imaging datasets with tens of classes. The method is designed to make predictions using information from as many brain locations as possible, instead of resorting to feature selection, and does this by decomposing the pattern of brain activation into differently informative sub-regions. We provide results over a complex semantic processing dataset that show that the method is competitive with state-of-the-art feature selection and also suggest how the method may be used to perform group or exploratory analyses of complex class structure. 1
same-paper 6 0.9758746 205 nips-2012-MCMC for continuous-time discrete-state systems
7 0.97194487 169 nips-2012-Label Ranking with Partial Abstention based on Thresholded Probabilistic Models
8 0.93871564 164 nips-2012-Iterative Thresholding Algorithm for Sparse Inverse Covariance Estimation
9 0.93554622 247 nips-2012-Nonparametric Reduced Rank Regression
10 0.93397361 307 nips-2012-Semi-Crowdsourced Clustering: Generalizing Crowd Labeling by Robust Distance Metric Learning
11 0.88019198 338 nips-2012-The Perturbed Variation
12 0.871126 318 nips-2012-Sparse Approximate Manifolds for Differential Geometric MCMC
13 0.86662221 41 nips-2012-Ancestor Sampling for Particle Gibbs
14 0.86454642 142 nips-2012-Generalization Bounds for Domain Adaptation
15 0.85474616 99 nips-2012-Dip-means: an incremental clustering method for estimating the number of clusters
16 0.85450071 315 nips-2012-Slice sampling normalized kernel-weighted completely random measure mixture models
17 0.85267973 163 nips-2012-Isotropic Hashing
18 0.85163933 203 nips-2012-Locating Changes in Highly Dependent Data with Unknown Number of Change Points
19 0.8506484 327 nips-2012-Structured Learning of Gaussian Graphical Models
20 0.84977007 264 nips-2012-Optimal kernel choice for large-scale two-sample tests