jmlr jmlr2010 jmlr2010-51 knowledge-graph by maker-knowledge-mining

51 jmlr-2010-Importance Sampling for Continuous Time Bayesian Networks

Source: pdf

Author: Yu Fan, Jing Xu, Christian R. Shelton

Abstract: A continuous time Bayesian network (CTBN) uses a structured representation to describe a dynamic system with a ﬁnite number of states which evolves in continuous time. Exact inference in a CTBN is often intractable as the state space of the dynamic system grows exponentially with the number of variables. In this paper, we ﬁrst present an approximate inference algorithm based on importance sampling. We then extend it to continuous-time particle ﬁltering and smoothing algorithms. These three algorithms can estimate the expectation of any function of a trajectory, conditioned on any evidence set constraining the values of subsets of the variables over subsets of the time line. We present experimental results on both synthetic networks and a network learned from a real data set on people’s life history events. We show the accuracy as well as the time efﬁciency of our algorithms, and compare them to other approximate algorithms: expectation propagation and Gibbs sampling. Keywords: continuous time Bayesian networks, importance sampling, approximate inference, ﬁltering, smoothing

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Consider that if T is the transition matrix for a process with time interval ∆t, T 1/2 is the transition matrix for the same process with time interval ∆t . [sent-46, score-0.692]

2 Inference for a CTBN is the task of estimating the distribution over trajectories given a partial trajectory (in which some values or transitions are missing for some variables during some time intervals). [sent-69, score-0.625]

3 Then, in each iteration, it randomly picks one variable X, and samples an entire trajectory for that variable by ﬁxing the trajectory of all the other variables. [sent-89, score-0.62]

4 Since only X is not ﬁxed, the conditioned cumulative distribution that X stays in one state less than t and the state transition probabilities can be calculated exactly using standard forward and backward propagation within the Markov blanket of X. [sent-90, score-0.502]

5 The formulation of this sampling procedure is not trivial due to the inﬁnite extent of the trajectory space, both in the transition time continuum and the number of transitions. [sent-103, score-0.744]

6 In Section 3, we describe our importance sampling algorithm for CTBNs and extend the algorithm to particle ﬁltering and particle smoothing algorithms. [sent-109, score-0.842]

7 The behavior of X is described by the initial distribution 0 PX and the transition model which is often represented by the intensity matrix   −qx1 qx1 x2 · · · qx1 xn  qx x −qx · · · qx xn  2 2   21 QX =  . [sent-121, score-0.67]

8 qxn x1 qxn x2 ··· −qxn where qxi x j is the intensity with which X transitions from xi to x j and qxi = ∑ j=i qxi x j . [sent-133, score-0.616]

9 To model a dynamic system containing several variables, we can consider the whole system as one variable, enumerate the entire state space, calculate the transition intensity of each pair of these states and put them into a single intensity matrix. [sent-142, score-0.665]

10 The intensity matrix of X is called a conditional intensity matrix (CIM) QX|U , which is deﬁned as a set of intensity matrices QX|u , one for each instantiation u of the variable set U. [sent-147, score-0.578]

11 A continuous time Bayesian network N over 0 X consists of two components: an initial distribution PX , speciﬁed as a Bayesian network B over X, and a continuous transition model, speciﬁed using a directed (possibly cyclic) graph G whose nodes are X ∈ X. [sent-150, score-0.465]

12 Let B(t) be the person’s body weight ( Val(B(t)) = {b0 = normal, b1 = overweight}), E(t) be the exercise intensity (Val(E(t) = {e0 = light, e1 = heavy}), C(t) be his daily calorie intake ( Val(C(t) = {c0 = low, c1 = high}) and W (t) be the weather ( Val(W (t) = {w0 = rainy, w1 = sunny}). [sent-164, score-0.398]

13 If the person is doing light exercise and his calorie intake is low, the dynamics of his body weight are determined by the intensity matrix QB|e0 ,c0 . [sent-198, score-0.449]

14 For any transition involving only one of the variables, simply use the entry from the appropriate intensity matrix above. [sent-206, score-0.388]

15 That is, the likelihood of a trajectory on [0, T ) is equal to the likelihood based only on sufﬁcient statistics from time 0 to time t multiplied by the likelihood based only on sufﬁcient statistics from time t to time T . [sent-275, score-0.609]

16 Exact inference in a CTBN can be performed by generating a single joint intensity matrix over the entire state space of the CTBN and running the forward-backward algorithm on the joint intensity matrix of the homogeneous Markov process. [sent-290, score-0.527]

17 exp(Qi (ti+1 − ti )) represents the transition matrix for interval [ti ,ti+1 ) and Qi,i+1 corresponds to the transition probability density between two consecutive intervals at time ti+1 . [sent-308, score-0.679]

18 When the data set D is complete, where each trajectory σi is a complete set of state transitions and the times at which they occurred, the parameters can be learned by maximizing the log-likelihood of the data set (Nodelman et al. [sent-320, score-0.452]

19 1 Forward Sampling Queries that are not conditioned on evidence can be answered by randomly sampling many trajectories and looking at the fraction that match the query. [sent-347, score-0.543]

20 The algorithm for sampling a trajectory is shown in Figure 2. [sent-355, score-0.481]

21 For each variable X ∈ X, it maintains x(t)—the state of X at time t—and Time(X)—the next potential transition time for X. [sent-356, score-0.418]

22 The probability of sampling a trajectory in which that transition occurs at precisely that time is zero. [sent-365, score-0.744]

23 Thus, if we have evidence about transitions, with probability 1, none of our sampled trajectories will be relevant. [sent-366, score-0.409]

24 In each iteration, the sampler randomly picks one variable Xi and samples the entire trajectory of Xi by ﬁxing the trajectories 2122 I MPORTANCE S AMPLING FOR C ONTINUOUS T IME BAYESIAN N ETWORKS of the other variables Y = {X1 , . [sent-375, score-0.522]

25 To generate the entire trajectory of Xi according to the evidence e, the states and transitions of Xi need to be sampled in those intervals that Xi is not observed according to the evidence. [sent-382, score-0.664]

26 The trajectory in each unobserved interval of Xi can be generated by alternatively sampling transition time ∆t and new state x from the posterior distribution given e and the trajectories of the other variables Y . [sent-383, score-1.069]

27 Assume we are sampling the trajectory of X for the interval [0, T ], and Xi (0) = x0 , Xi (T ) = xT . [sent-384, score-0.564]

28 The transition probability that Xi transitions from x(0) to a new state x can be calculated similarly: Pr(Xi (t + ) = x|Xi (0 : t] = x(0) ,Y (0 : T ]) = X |Y ˜ qx0i ,x βx (t) . [sent-393, score-0.388]

29 However, sampling the transition time ∆t requires using a binary search algorithm and repeatedly computing the conditional cumulative distribution function F(t), which may require long running time. [sent-396, score-0.528]

30 In importance sampling, we generate samples from a proposal distribution P′ which guarantees that our sampled trajectories will conform to our evidence e. [sent-400, score-0.527]

31 As we are sampling a trajectory, we occasionally depart from the regular forward sampling algorithm and “force” the behavior of one or more variables to ensure consistency with the evidence. [sent-415, score-0.474]

32 We can not simply force a transition at time te to make the variables consistent with the evidence e: if the set contains more than one variable, the sample would have multiple simultaneous transitions, an event whose likelihood is zero. [sent-440, score-0.646]

33 If the current state of the variable does not agree with the upcoming evidence, we force the next sampled transition time to fall before the time of the conﬂicting evidence. [sent-442, score-0.564]

34 In particular, if we are currently at time t and there is conﬂicting evidence for X at time te > t, we sample from an exponential distribution with the same q value as the normal sampling procedure, but where the sample for ∆t (the time to the next transition) is required to be less than te − t. [sent-444, score-0.844]

35 The probability q exp(−q∆t) density of sampling ∆t from this truncated exponential is 1−exp(−q(te −t)) where q is the relevant intensity for the current state of X (the diagonal element of QX|UX corresponding to the current state of X). [sent-445, score-0.513]

36 Note that we cannot, in general, transition directly to the evidence state, as such a transition may not be possible (have 0 rate). [sent-449, score-0.57]

37 Furthermore, if we are still “far away” from the upcoming evidence, such a transition may lead to a highly unlikely trajectory resulting in an inefﬁcient algorithm. [sent-450, score-0.519]

38 σst be the collection for all variables X ∈ X of intervals x[t1 : t2 ) where the transition time is sampled from a truncated exponential distribution and the variable is involved in the next transition. [sent-456, score-0.421]

39 σs f be the collection for all variables X ∈ X of intervals x[t1 : t2 ) where the transition time is sampled from a truncated exponential distribution and the variable is not involved in the next transition. [sent-457, score-0.421]

40 Partitioning of the trajectory according to the evidence and the transitions. [sent-461, score-0.426]

41 σe equals x[τ3 : τ4 ) and x[τ7 : τ8 ) (d) Partitioning of the trajectory based on the different sampling situations. [sent-462, score-0.481]

42 For each variable x we must update the weight of trajectory to reﬂect the likelihood ratio for x[t : t + ∆t] based on the distribution we use to sample the “next time” and the transition variable we select. [sent-472, score-0.631]

43 For any variable x whose value is given in the evidence during the interval [t,t + ∆t), as we discussed ˜ above, the contribution to the trajectory weight is just LN (x[t : t + ∆t)). [sent-474, score-0.583]

44 The weight must be multiplied by the probability density of sampling the transition in PN divided by the the probability density in the sampling algorithm. [sent-479, score-0.662]

45 X X Note that eend (t) = t when there is point evidence at t, when t is the end of an interval of evidence, and X when there is a transition in the evidence at time t. [sent-490, score-0.694]

46 Time(X) might be set to the end of an interval of evidence which is not a transition time but simply a time when we need to resample a next potential transition. [sent-492, score-0.555]

47 Step 2 now accounts for evidence at the beginning of the trajectory (using standard likelihood weighting for Bayesian networks). [sent-495, score-0.465]

48 This may cause a variable to transition several times in a short interval before evidence as the variable “searches” to ﬁnd a way to transition into the evidence. [sent-502, score-0.729]

49 When sampling the next state for variable X at time t, instead of sampling from the multinomial according to θx(t)|uX (t) , we would like to sample from the distribution of the next state conditioned on the upcoming evidence. [sent-505, score-0.674]

50 Suppose X is in state xi at time t, and the next evidence for X is state xk at te . [sent-506, score-0.491]

51 We can therefore select our new state according ˜ to the distribution of P(Xt+∆t |X[t : t + ∆t) = xi , Xte = xk ) and, assuming state x j is selected, multiply the weight by θxi x j |uX (t) pi, j to account for the difference between the target and sampling distributions. [sent-508, score-0.369]

52 To apply this algorithm to the task of online inference in a dynamic system, we can generate multiple trajectories in parallel, advancing time forward as evidence is obtained. [sent-511, score-0.529]

53 For each X ∈ X such that Time(X) is undeﬁned: If eval (t) is deﬁned, set ∆t ← eend (t) − t X X Elseif eval (te ) is deﬁned where te = etime (t), x(t) = eval (te ), X X X choose ∆t from an exponential distribution with parameter qx(t)|uX (t) given ∆t < (te − t). [sent-515, score-0.591]

54 If eend (t) = t or eval (t) is deﬁned X X If eval (t) is deﬁned, set x(t) ← eval (t) X X Else choose x(t), the next value of X, from a multinomial with parameter θx(t)|uX (t) Add X ← x(t),t to σ. [sent-521, score-0.403]

55 For each X ∈ X such that eval (te ) is deﬁned, X where te = etime (t1 ), and x(t1 ) = eval (te ): X X If X = Y , w ← w · (1 − exp(−qx(t1 )|uX (t1 ) (te − t1 ))) 1−exp(−q (te −t1 )) Else w ← w · 1−exp(−qx(t1 )|uX (t1 ) (t 3. [sent-525, score-0.43]

56 If we let ti be the ith transition time and Xi be the value of X from ti−1 to ti , the following recursion holds. [sent-554, score-0.423]

57 The weighted approximation of this probability is given by N P(X[0 : tn )) ≈ ∑ w(X i [0 : tn ))δ(X[0 : tn ), X i [0 : tn )) i=1 where X i [0 : tn ) is the ith sample and w(X i [0 : tn )) is the normalized weight of the ith sample. [sent-556, score-0.678]

58 w(X i [0 : tn )) ∝ w(X i [0 : tn−1 )) ′ i ˜ LX (X [tn−1 : tn )) Thus, to sample multiple trajectories in parallel, we apply the CTBN importance sampling algorithm to each trajectory until a transition occurs. [sent-559, score-1.201]

59 This procedure is similar ∑i k to the regular particle ﬁltering algorithm except that all particles are not synchronized by time but the number 2129 FAN , X U AND S HELTON i i Procedure CTBN-Particle-Smoothing({Xmi ,tmi , wi i },tend , e) m i = 1 . [sent-561, score-0.438]

60 To answer queries in the time interval [0, T ), we propagate the particles until all of their last transitions are greater than T . [sent-580, score-0.469]

61 The procedure Sample-Segment loops from line 3 to 7 in Figure 4 until a transition occurs, returns the transition time and variables value, and updates the corresponding weight for that segment. [sent-583, score-0.507]

62 8 Particle Smoothing Although the resampling step in the particle ﬁltering algorithm reduces the skew of the weights, it leads to another problem: the diversity of the trajectories is also reduced since particles with higher weights are likely to be duplicated multiple times in the resampling step. [sent-588, score-0.523]

63 The smoothing algorithm for discrete-time systems generates trajectories using N weighted particles {xti , wti } from the particle ﬁltering algorithm. [sent-592, score-0.646]

64 It starts with the particles at time T , moves backward one step each iteration and samples a particle according to the product of its weight and the probability of it trani sitioning to the previously sampled particle. [sent-593, score-0.546]

65 In the backward smoothing steps it samples x according to wi i f (x time T with probability wT t+1 |xt ), t t|t+1 = wt where f (xt+1 |xti ) is the probability that the particle transitions from state xti to xt+1 . [sent-595, score-0.631]

66 Given the ﬁltered particles {Xmi ,tmi , wi i },we need m to sample both variable values and transition times at each step when we move backward. [sent-598, score-0.433]

67 A i i i particle {Xn ,tn , wi } is a valid candidate as the predecessor for {Xn+1 , tn+1 } only if (1) tn < tn+1 , (2) the values n i and X of Xn i n+1 differ in only one variable (thus a single transition is possible), and (3) e(tn ,tn+1 ) contains no transitions. [sent-601, score-0.589]

68 Figure 6 shows the smoothing algorithm which generates a trajectory from the ﬁltering particles. [sent-602, score-0.395]

69 Generating one trajectory with this smoothing process requires considering all the particles at each step. [sent-605, score-0.542]

70 The running time of sampling N trajectories using particle smoothing is N times of that of particle ﬁltering. [sent-606, score-1.015]

71 In the inference task, each evidence is a partially observed trajectory of the CTBN network. [sent-677, score-0.475]

72 The second is to generate a trajectory using the forward sampling algorithm and randomly remove some parts of the sampled trajectory. [sent-680, score-0.612]

73 In our experiments, we set our query to be one of three types: the expected total amount of time a variable X stays on some state xi , the expected total number of times that a variable transitions from state xi to state x j , or the distribution of variable at time t. [sent-684, score-0.636]

74 The training data were generated by sampling trajectories from the true model and randomly removing some portion of the information as described above. [sent-692, score-0.389]

75 3 Inference Experimental Results In this section, we evaluate the performance of our importance sampling based algorithms in answering queries and compare with the EP algorithm in Saria et al. [sent-696, score-0.43]

76 1 C OMPARISON OF I MPORTANCE S AMPLING AND P REDICTIVE L OOKAHEAD We ﬁrst tested the importance sampling algorithm and the predictive lookahead modiﬁcation using the drug effect network. [sent-701, score-0.412]

77 2 I MPORTANCE S AMPLING , PARTICLE F ILTERING AND S MOOTHING We then used the chain network to evaluate the efﬁciency of the importance sampling, particle ﬁltering, and smoothing algorithms. [sent-727, score-0.494]

78 This is done by forward sampling a trajectory from 0 to T and keeping only the information about X4 . [sent-734, score-0.537]

79 Note that this is the most difﬁcult case for 2 the importance sampling algorithm since the chain network is nearly deterministic. [sent-736, score-0.384]

80 In all four cases, the particle ﬁltering and smoothing algorithms both outperform the importance sampling algorithm when the sample size is small (small running time). [sent-741, score-0.702]

81 For simple evidence (Figure 10(a)), the importance sampling algorithm achieves comparable performance when the sample size is large. [sent-742, score-0.481]

82 When the trajectory is short, the particle ﬁltering algorithm is slightly better than the particle smoothing algorithm. [sent-744, score-0.787]

83 However, as the trajectory length increases, the particle smoothing algorithm outperforms the ﬁltering algorithm due to particle diversity problems. [sent-746, score-0.787]

84 The importance sampling algorithm and the particle ﬁltering algorithm outperforms the EP algorithm in answering both queries. [sent-761, score-0.56]

85 Among the sampling-based algorithms, the importance sampling algorithm performs the best and the smoothing algorithm is the worst. [sent-762, score-0.45]

86 At each transition time, the sampled trajectory has no choice as to the next state. [sent-764, score-0.555]

87 For Gibbs sampling algorithm, we ﬁrst ran 100 “burn-in” iterations for each sample size before we sample trajectories from the sampler. [sent-790, score-0.424]

88 For the drug effect network, the evidence trajectory begins at time t = 0 and ends at time t = 5. [sent-792, score-0.621]

89 From the ﬁgure, we can see that importance sampling outperforms Gibbs sampling in answering both queries. [sent-799, score-0.573]

90 The importance sampling algorithm outperformed the Gibbs sampling algorithm in answering the query of time. [sent-807, score-0.65]

91 In both networks, importance sampling outperformed Gibbs sampling in three of the four cases, even when the running time on “burn-in” iterations was not considered. [sent-809, score-0.647]

92 The evidence trajectory begins at time t = 0 and ends at time t = 5. [sent-815, score-0.536]

93 As we have mentioned before, the chain structured network is nearly deterministic, and it is the hardest case for the importance sampling algorithm. [sent-828, score-0.416]

94 The only observed state on X0 is s0 , which makes this experiment even harder for the importance sampling algorithm. [sent-830, score-0.389]

95 ) Although importance sampling can generate many more samples in the same period of time, most of these samples are trajectories with very small weights. [sent-833, score-0.507]

96 4 Parameter Estimation Experimental Results In this section, we evaluate the performance of importance sampling algorithm on parameter estimation and compare to the Gibbs sampling algorithm and the EP algorithm. [sent-835, score-0.536]

97 We sampled increasing We sampled increasing numbers of trajectories of 5 time lengths. [sent-839, score-0.385]

98 We chose the initial parameters for the EM algorithm by sampling the diagonal elements of the conditional intensity matrices from the Gamma distribution with parameters (0. [sent-852, score-0.389]

99 5, 1) and sampling the transition probabilities from a Dirichlet distribution. [sent-853, score-0.417]

100 For the Gibbs sampling algorithm, we dropped the ﬁrst 50 trajectories as “burn-in” iterations. [sent-858, score-0.389]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('ctbn', 0.327), ('trajectory', 0.272), ('lx', 0.215), ('sampling', 0.209), ('transition', 0.208), ('particle', 0.196), ('nodelman', 0.194), ('trajectories', 0.18), ('intensity', 0.18), ('gibbs', 0.169), ('mportance', 0.162), ('te', 0.158), ('evidence', 0.154), ('particles', 0.147), ('qx', 0.141), ('ampling', 0.132), ('smoothing', 0.123), ('eval', 0.121), ('importance', 0.118), ('transitions', 0.118), ('helton', 0.112), ('ux', 0.109), ('tn', 0.107), ('ontinuous', 0.093), ('qxi', 0.086), ('drug', 0.085), ('interval', 0.083), ('ti', 0.08), ('fan', 0.078), ('ctbns', 0.078), ('shelton', 0.078), ('query', 0.077), ('etworks', 0.076), ('sampled', 0.075), ('ime', 0.072), ('bhps', 0.071), ('calorie', 0.071), ('ltering', 0.069), ('filtering', 0.069), ('saria', 0.069), ('ep', 0.068), ('queries', 0.066), ('unde', 0.062), ('state', 0.062), ('exercise', 0.06), ('christian', 0.06), ('bayesian', 0.058), ('network', 0.057), ('forward', 0.056), ('running', 0.056), ('time', 0.055), ('pai', 0.054), ('person', 0.051), ('intake', 0.051), ('qe', 0.051), ('inference', 0.049), ('sn', 0.049), ('seconds', 0.049), ('propagation', 0.046), ('markov', 0.046), ('intervals', 0.045), ('continuous', 0.044), ('stomach', 0.043), ('qb', 0.043), ('eend', 0.04), ('household', 0.04), ('married', 0.04), ('uri', 0.04), ('wi', 0.04), ('xt', 0.04), ('likelihood', 0.039), ('upcoming', 0.039), ('em', 0.039), ('variable', 0.038), ('backward', 0.037), ('answering', 0.037), ('weight', 0.036), ('dynamic', 0.035), ('ran', 0.035), ('queueing', 0.035), ('px', 0.034), ('qi', 0.034), ('pn', 0.033), ('carlo', 0.033), ('structured', 0.032), ('force', 0.032), ('val', 0.032), ('sampler', 0.032), ('stays', 0.031), ('deviation', 0.031), ('pain', 0.031), ('ternary', 0.031), ('monte', 0.031), ('doucet', 0.03), ('etime', 0.03), ('godsill', 0.03), ('overweight', 0.03), ('qxn', 0.03), ('daphne', 0.03), ('british', 0.029)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000002 51 jmlr-2010-Importance Sampling for Continuous Time Bayesian Networks

Author: Yu Fan, Jing Xu, Christian R. Shelton

2 0.440292 28 jmlr-2010-Continuous Time Bayesian Network Reasoning and Learning Engine

Author: Christian R. Shelton, Yu Fan, William Lam, Joon Lee, Jing Xu

Abstract: We present a continuous time Bayesian network reasoning and learning engine (CTBN-RLE). A continuous time Bayesian network (CTBN) provides a compact (factored) description of a continuoustime Markov process. This software provides libraries and programs for most of the algorithms developed for CTBNs. For learning, CTBN-RLE implements structure and parameter learning for both complete and partial data. For inference, it implements exact inference and Gibbs and importance sampling approximate inference for any type of evidence pattern. Additionally, the library supplies visualization methods for graphically displaying CTBNs or trajectories of evidence. Keywords: continuous time Bayesian networks, C++, open source software

3 0.32695696 75 jmlr-2010-Mean Field Variational Approximation for Continuous-Time Bayesian Networks

Author: Ido Cohn, Tal El-Hay, Nir Friedman, Raz Kupferman

Abstract: Continuous-time Bayesian networks is a natural structured representation language for multicomponent stochastic processes that evolve continuously over time. Despite the compact representation provided by this language, inference in such models is intractable even in relatively simple structured networks. We introduce a mean ﬁeld variational approximation in which we use a product of inhomogeneous Markov processes to approximate a joint distribution over trajectories. This variational approach leads to a globally consistent distribution, which can be efﬁciently queried. Additionally, it provides a lower bound on the probability of observations, thus making it attractive for learning tasks. Here we describe the theoretical foundations for the approximation, an efﬁcient implementation that exploits the wide range of highly optimized ordinary differential equations (ODE) solvers, experimentally explore characterizations of processes for which this approximation is suitable, and show applications to a large-scale real-world inference problem. Keywords: continuous time Markov processes, continuous time Bayesian networks, variational approximations, mean ﬁeld approximation

4 0.14844665 64 jmlr-2010-Learning Non-Stationary Dynamic Bayesian Networks

Author: Joshua W. Robinson, Alexander J. Hartemink

Abstract: Learning dynamic Bayesian network structures provides a principled mechanism for identifying conditional dependencies in time-series data. An important assumption of traditional DBN structure learning is that the data are generated by a stationary process, an assumption that is not true in many important settings. In this paper, we introduce a new class of graphical model called a nonstationary dynamic Bayesian network, in which the conditional dependence structure of the underlying data-generation process is permitted to change over time. Non-stationary dynamic Bayesian networks represent a new framework for studying problems in which the structure of a network is evolving over time. Some examples of evolving networks are transcriptional regulatory networks during an organism’s development, neural pathways during learning, and trafﬁc patterns during the day. We deﬁne the non-stationary DBN model, present an MCMC sampling algorithm for learning the structure of the model from time-series data under different assumptions, and demonstrate the effectiveness of the algorithm on both simulated and biological data. Keywords: Bayesian networks, graphical models, model selection, structure learning, Monte Carlo methods

5 0.10406546 4 jmlr-2010-A Generalized Path Integral Control Approach to Reinforcement Learning

Author: Evangelos Theodorou, Jonas Buchli, Stefan Schaal

Abstract: With the goal to generate more scalable algorithms with higher efﬁciency and fewer open parameters, reinforcement learning (RL) has recently moved towards combining classical techniques from optimal control and dynamic programming with modern learning techniques from statistical estimation theory. In this vein, this paper suggests to use the framework of stochastic optimal control with path integrals to derive a novel approach to RL with parameterized policies. While solidly grounded in value function estimation and optimal control based on the stochastic Hamilton-JacobiBellman (HJB) equations, policy improvements can be transformed into an approximation problem of a path integral which has no open algorithmic parameters other than the exploration noise. The resulting algorithm can be conceived of as model-based, semi-model-based, or even model free, depending on how the learning problem is structured. The update equations have no danger of numerical instabilities as neither matrix inversions nor gradient learning rates are required. Our new algorithm demonstrates interesting similarities with previous RL research in the framework of probability matching and provides intuition why the slightly heuristically motivated probability matching approach can actually perform well. Empirical evaluations demonstrate signiﬁcant performance improvements over gradient-based policy learning and scalability to high-dimensional control problems. Finally, a learning experiment on a simulated 12 degree-of-freedom robot dog illustrates the functionality of our algorithm in a complex robot learning scenario. We believe that Policy Improvement with Path Integrals (PI2 ) offers currently one of the most efﬁcient, numerically robust, and easy to implement algorithms for RL based on trajectory roll-outs. Keywords: stochastic optimal control, reinforcement learning, parameterized policies

6 0.055154536 39 jmlr-2010-FastInf: An Efficient Approximate Inference Library

7 0.054311432 41 jmlr-2010-Gaussian Processes for Machine Learning (GPML) Toolbox

8 0.053752195 52 jmlr-2010-Incremental Sigmoid Belief Networks for Grammar Learning

9 0.051439665 79 jmlr-2010-Near-optimal Regret Bounds for Reinforcement Learning

10 0.051310465 78 jmlr-2010-Model Selection: Beyond the Bayesian Frequentist Divide

11 0.049391694 91 jmlr-2010-Posterior Regularization for Structured Latent Variable Models

12 0.044318546 102 jmlr-2010-Semi-Supervised Novelty Detection

13 0.043185171 27 jmlr-2010-Consistent Nonparametric Tests of Independence

14 0.042794883 117 jmlr-2010-Why Does Unsupervised Pre-training Help Deep Learning?

15 0.042363171 2 jmlr-2010-A Convergent Online Single Time Scale Actor Critic Algorithm

16 0.040124126 89 jmlr-2010-PAC-Bayesian Analysis of Co-clustering and Beyond

17 0.039628148 63 jmlr-2010-Learning Instance-Specific Predictive Models

18 0.038759064 58 jmlr-2010-Kronecker Graphs: An Approach to Modeling Networks

19 0.038215958 118 jmlr-2010-libDAI: A Free and Open Source C++ Library for Discrete Approximate Inference in Graphical Models

20 0.037451588 76 jmlr-2010-Message-passing for Graph-structured Linear Programs: Proximal Methods and Rounding Schemes

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.237), (1, 0.106), (2, -0.248), (3, -0.604), (4, 0.056), (5, 0.19), (6, -0.121), (7, 0.086), (8, 0.009), (9, -0.275), (10, 0.03), (11, -0.101), (12, -0.014), (13, 0.028), (14, -0.075), (15, -0.057), (16, -0.07), (17, -0.013), (18, 0.029), (19, 0.062), (20, 0.007), (21, 0.023), (22, 0.069), (23, 0.017), (24, 0.051), (25, 0.033), (26, 0.018), (27, 0.012), (28, -0.03), (29, 0.011), (30, -0.018), (31, -0.016), (32, -0.004), (33, -0.005), (34, 0.029), (35, 0.06), (36, -0.017), (37, -0.018), (38, -0.016), (39, -0.018), (40, 0.01), (41, -0.011), (42, 0.005), (43, -0.006), (44, 0.031), (45, 0.031), (46, 0.003), (47, 0.008), (48, -0.008), (49, 0.027)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96574581 51 jmlr-2010-Importance Sampling for Continuous Time Bayesian Networks

Author: Yu Fan, Jing Xu, Christian R. Shelton

2 0.95039392 28 jmlr-2010-Continuous Time Bayesian Network Reasoning and Learning Engine

Author: Christian R. Shelton, Yu Fan, William Lam, Joon Lee, Jing Xu

3 0.75553358 75 jmlr-2010-Mean Field Variational Approximation for Continuous-Time Bayesian Networks

Author: Ido Cohn, Tal El-Hay, Nir Friedman, Raz Kupferman

4 0.51689017 64 jmlr-2010-Learning Non-Stationary Dynamic Bayesian Networks

Author: Joshua W. Robinson, Alexander J. Hartemink

5 0.27035818 4 jmlr-2010-A Generalized Path Integral Control Approach to Reinforcement Learning

Author: Evangelos Theodorou, Jonas Buchli, Stefan Schaal

6 0.21460561 52 jmlr-2010-Incremental Sigmoid Belief Networks for Grammar Learning

7 0.20521614 39 jmlr-2010-FastInf: An Efficient Approximate Inference Library

8 0.20290761 63 jmlr-2010-Learning Instance-Specific Predictive Models

9 0.17450434 118 jmlr-2010-libDAI: A Free and Open Source C++ Library for Discrete Approximate Inference in Graphical Models

10 0.17230625 33 jmlr-2010-Efficient Heuristics for Discriminative Structure Learning of Bayesian Network Classifiers

11 0.17092958 78 jmlr-2010-Model Selection: Beyond the Bayesian Frequentist Divide

12 0.17058301 79 jmlr-2010-Near-optimal Regret Bounds for Reinforcement Learning

13 0.16550934 41 jmlr-2010-Gaussian Processes for Machine Learning (GPML) Toolbox

14 0.16324241 2 jmlr-2010-A Convergent Online Single Time Scale Actor Critic Algorithm

15 0.16223943 91 jmlr-2010-Posterior Regularization for Structured Latent Variable Models

16 0.15692189 58 jmlr-2010-Kronecker Graphs: An Approach to Modeling Networks

17 0.15175757 56 jmlr-2010-Introduction to Causal Inference

18 0.14933585 102 jmlr-2010-Semi-Supervised Novelty Detection

19 0.14452901 89 jmlr-2010-PAC-Bayesian Analysis of Co-clustering and Beyond

20 0.14279434 54 jmlr-2010-Information Retrieval Perspective to Nonlinear Dimensionality Reduction for Data Visualization

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(4, 0.012), (8, 0.015), (21, 0.021), (32, 0.038), (33, 0.013), (36, 0.149), (37, 0.054), (72, 0.249), (75, 0.279), (85, 0.055), (96, 0.015)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.8704561 51 jmlr-2010-Importance Sampling for Continuous Time Bayesian Networks

Author: Yu Fan, Jing Xu, Christian R. Shelton

2 0.83285719 102 jmlr-2010-Semi-Supervised Novelty Detection

Author: Gilles Blanchard, Gyemin Lee, Clayton Scott

Abstract: A common setting for novelty detection assumes that labeled examples from the nominal class are available, but that labeled examples of novelties are unavailable. The standard (inductive) approach is to declare novelties where the nominal density is low, which reduces the problem to density level set estimation. In this paper, we consider the setting where an unlabeled and possibly contaminated sample is also available at learning time. We argue that novelty detection in this semi-supervised setting is naturally solved by a general reduction to a binary classiﬁcation problem. In particular, a detector with a desired false positive rate can be achieved through a reduction to Neyman-Pearson classiﬁcation. Unlike the inductive approach, semi-supervised novelty detection (SSND) yields detectors that are optimal (e.g., statistically consistent) regardless of the distribution on novelties. Therefore, in novelty detection, unlabeled data have a substantial impact on the theoretical properties of the decision rule. We validate the practical utility of SSND with an extensive experimental study. We also show that SSND provides distribution-free, learning-theoretic solutions to two well known problems in hypothesis testing. First, our results provide a general solution to the general two-sample problem, that is, the problem of determining whether two random samples arise from the same distribution. Second, a specialization of SSND coincides with the standard p-value approach to multiple testing under the so-called random effects model. Unlike standard rejection regions based on thresholded p-values, the general SSND framework allows for adaptation to arbitrary alternative distributions in multiple dimensions. Keywords: semi-supervised learning, novelty detection, Neyman-Pearson classiﬁcation, learning reduction, two-sample problem, multiple testing

3 0.75790113 75 jmlr-2010-Mean Field Variational Approximation for Continuous-Time Bayesian Networks

Author: Ido Cohn, Tal El-Hay, Nir Friedman, Raz Kupferman

4 0.73611033 17 jmlr-2010-Bayesian Learning in Sparse Graphical Factor Models via Variational Mean-Field Annealing

Author: Ryo Yoshida, Mike West

Abstract: We describe a class of sparse latent factor models, called graphical factor models (GFMs), and relevant sparse learning algorithms for posterior mode estimation. Linear, Gaussian GFMs have sparse, orthogonal factor loadings matrices, that, in addition to sparsity of the implied covariance matrices, also induce conditional independence structures via zeros in the implied precision matrices. We describe the models and their use for robust estimation of sparse latent factor structure and data/signal reconstruction. We develop computational algorithms for model exploration and posterior mode search, addressing the hard combinatorial optimization involved in the search over a huge space of potential sparse conﬁgurations. A mean-ﬁeld variational technique coupled with annealing is developed to successively generate “artiﬁcial” posterior distributions that, at the limiting temperature in the annealing schedule, deﬁne required posterior modes in the GFM parameter space. Several detailed empirical studies and comparisons to related approaches are discussed, including analyses of handwritten digit image and cancer gene expression data. Keywords: annealing, graphical factor models, variational mean-ﬁeld method, MAP estimation, sparse factor analysis, gene expression proﬁling

5 0.7345767 64 jmlr-2010-Learning Non-Stationary Dynamic Bayesian Networks

Author: Joshua W. Robinson, Alexander J. Hartemink

6 0.73455232 63 jmlr-2010-Learning Instance-Specific Predictive Models

7 0.72553879 87 jmlr-2010-Online Learning for Matrix Factorization and Sparse Coding

8 0.72360313 28 jmlr-2010-Continuous Time Bayesian Network Reasoning and Learning Engine

9 0.72323865 6 jmlr-2010-A Rotation Test to Verify Latent Structure

10 0.72024632 111 jmlr-2010-Topology Selection in Graphical Models of Autoregressive Processes

11 0.71915805 46 jmlr-2010-High Dimensional Inverse Covariance Matrix Estimation via Linear Programming

12 0.71911943 56 jmlr-2010-Introduction to Causal Inference

13 0.71889514 26 jmlr-2010-Consensus-Based Distributed Support Vector Machines

14 0.71835077 69 jmlr-2010-Lp-Nested Symmetric Distributions

15 0.7163983 3 jmlr-2010-A Fast Hybrid Algorithm for Large-Scalel1-Regularized Logistic Regression

16 0.71630871 58 jmlr-2010-Kronecker Graphs: An Approach to Modeling Networks

17 0.71495038 65 jmlr-2010-Learning Translation Invariant Kernels for Classification

18 0.70828575 77 jmlr-2010-Model-based Boosting 2.0

19 0.70764107 89 jmlr-2010-PAC-Bayesian Analysis of Co-clustering and Beyond

20 0.70736712 74 jmlr-2010-Maximum Relative Margin and Data-Dependent Regularization