jmlr jmlr2005 jmlr2005-17 knowledge-graph by maker-knowledge-mining

17 jmlr-2005-Change Point Problems in Linear Dynamical Systems

Source: pdf

Author: Onno Zoeter, Tom Heskes

Abstract: We study the problem of learning two regimes (we have a normal and a prefault regime in mind) based on a train set of non-Markovian observation sequences. Key to the model is that we assume that once the system switches from the normal to the prefault regime it cannot restore and will eventually result in a fault. We refer to the particular setting as semi-supervised since we assume the only information given to the learner is whether a particular sequence ended with a stop (implying that the sequence was generated by the normal regime) or with a fault (implying that there was a switch from the normal to the fault regime). In the latter case the particular time point at which a switch occurred is not known. The underlying model used is a switching linear dynamical system (SLDS). The constraints in the regime transition probabilities result in an exact inference procedure that scales quadratically with the length of a sequence. Maximum aposteriori (MAP) parameter estimates can be found using an expectation maximization (EM) algorithm with this inference algorithm in the E-step. For long sequences this will not be practically feasible and an approximate inference and an approximate EM procedure is called for. We describe a Ä?Ĺš‚exible class of approximations corresponding to different choices of clusters in a Kikuchi free energy with weak consistency constraints. Keywords: change point problems, switching linear dynamical systems, strong junction trees, approximate inference, expectation propagation, Kikuchi free energies

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 NL Computer Science Radboud University Nijmegen Toernooiveld 1 NL 6525 ED Nijmegen, The Netherlands Editor: Donald Geman Abstract We study the problem of learning two regimes (we have a normal and a prefault regime in mind) based on a train set of non-Markovian observation sequences. [sent-7, score-0.568]

2 Key to the model is that we assume that once the system switches from the normal to the prefault regime it cannot restore and will eventually result in a fault. [sent-8, score-0.496]

3 The constraints in the regime transition probabilities result in an exact inference procedure that scales quadratically with the length of a sequence. [sent-12, score-0.391]

4 Ĺš‚exible class of approximations corresponding to different choices of clusters in a Kikuchi free energy with weak consistency constraints. [sent-16, score-0.242]

5 Keywords: change point problems, switching linear dynamical systems, strong junction trees, approximate inference, expectation propagation, Kikuchi free energies 1. [sent-17, score-0.304]

6 The stop and fault regimes are special in the sense that they are absorbing. [sent-21, score-0.227]

7 A key assumption in the problem is that once the system reaches a prefault state, it can never recover. [sent-23, score-0.18]

8 Ĺš ed as a switching linear dynamical system (SLDS), with restricted dynamics in the regime indicators. [sent-33, score-0.39]

9 The regime in every time-step is determined by (typically unobserved) discrete switches s 1:T . [sent-35, score-0.251]

10 For 1 < t Ă˘‰Â¤ T , st is either normal or prefault. [sent-36, score-0.239]

11 Within every regime the state transition and the observation model are linear Gaussian, and may differ per regime: p(xt |xtĂ˘ˆ’1 , st , ĂŽÂ¸) = N (xt ; Ast xtĂ˘ˆ’1 , Qst ) , p(yt |xt , st , ĂŽÂ¸) = N yt ;Cst xt + Ă‚Äžst , Rst . [sent-38, score-0.785]

12 As mentioned the current regime is encoded by discrete random variables s 1:T and are assumed to follow a Ä? [sent-45, score-0.251]

13 the assumption that a system cannot restore from a prefault state to a normal state results in a considerable simpliÄ? [sent-64, score-0.245]

14 Hence for relatively short sequences the constrained transition model makes exact inference feasible. [sent-69, score-0.184]

15 If the absorbing state sT +1 is not observed, there are T possible regime histories in the current model. [sent-82, score-0.321]

16 One normal sequence s1:T = n, and T Ă˘ˆ’ 1 fault sequences: s1:ÄŽ„ = n, sÄŽ„+1:T = p, with 1 Ă˘‰Â¤ ÄŽ„ Ă˘‰Â¤ T Ă˘ˆ’ 1. [sent-83, score-0.184]

17 In the remainder of this paper we let ÄŽ„ denote the time-slice up to and including which the regime has been normal, i. [sent-84, score-0.251]

18 Under our assumptions, a fault has to be preceded by at least one prefault state, so if s T +1 is observed to be a fault the entirely normal sequence gets zero weight. [sent-87, score-0.483]

19 If the parameters in the model, ĂŽÂ¸, are known, the exact posterior over a continuous state p(xt |y1:T , sT +1 = f, ĂŽÂ¸) is a mixture of Gaussians with T Ă˘ˆ’ 1 components, one for every regime history, and can be obtained by running the traditional Kalman Ä? [sent-90, score-0.313]

20 The number of regime histories with non-zero probability in such a system is less than or equal to T MĂ˘ˆ’1 . [sent-98, score-0.296]

21 If M > 1 there are T Ă˘ˆ’ (M Ă˘ˆ’ 1) + 1 < T possible starting points for the M-th regime (including the starting point T + 1, i. [sent-100, score-0.251]

22 So the number of distinct regime histories is bounded by T Ä‚— T MĂ˘ˆ’2 . [sent-106, score-0.296]

23 Ĺš ning ut Ă˘‰Ä„ {st , xt } we obtain a model that has the same conditional independence structure as the linear dynamical system and the HMM. [sent-114, score-0.528]

24 From time to time we will use a sum notation to denote both the summation over the domain of the discrete variables, and the integration over the domain of the continuous variables in ut . [sent-115, score-0.265]

25 The computational complexity in the current case is due to the parametric form of the (conditional) distributions over ut as discussed in Section 1. [sent-116, score-0.265]

26 Zt|tĂ˘ˆ’1 In the forward pass the message ĂŽÄ…t (ut ) Ă˘‰Ä„ p(xt , st |y1:t , ĂŽÂ¸) is not conditional Gaussian, but a mixture of Gaussians conditioned on the regime indicator st . [sent-148, score-0.734]

27 Note that the linear complexity in T is special for the change point model with the restricted regime transitions. [sent-152, score-0.279]

28 If s T +1 = f or not observed, the change point from the normal to the prefault regime is inferred in the E-step. [sent-164, score-0.561]

29 If there are sequences that are known to be entirely normal (when sT +1 = s) these sequences are only used to determine the characteristics of the normal regime. [sent-171, score-0.27]

30 Also, due to the change point restriction, some ambiguity is resolved since it is known that the normal precedes the prefault regime. [sent-172, score-0.273]

31 The strong junction tree framework of Lauritzen (1992) operates on trees with larger cliques and approximates messages on a global level. [sent-202, score-0.187]

32 2005 Z OETER AND H ESKES In terms of ut the exact posterior factors as p(u1:T |y1:T , ĂŽÂ¸) = T Ă˘ˆ? [sent-220, score-0.327]

33 (6) t=2 ut The minimization is now with respect to one-slice beliefs qt (ut ) and two-slice beliefs pt (utĂ˘ˆ’1,t ) Ă‹œ Ă‹œ under the constraints that these beliefs are properly normalized and consistent: pt (ut ) = qt (ut ) = pt+1 (ut ) . [sent-223, score-0.865]

34 Ă‹œ Ă‹œ Ă‹œ (7) To emphasize that the above constraints are exact, and to distinguish them from the weak consistency constraints that will be introduced below, we will refer to (7) as strong consistency constraints. [sent-224, score-0.188]

35 The conditional Gaussian form of ĂŽÂ¨t (utĂ˘ˆ’1,t ) and the conditional Gaussian choice for qt (ut ) Ă‹œ imply that at the minimum in (6) pt (utĂ˘ˆ’1,t ) is conditionally Gaussian as well (see Appendix D). [sent-237, score-0.278]

36 Ă‹œ If we restrict the form of qt (ut ), but leave the consistency constraints exact as in (7), a minimum Ă‹œ of the free energy has a very restricted form. [sent-238, score-0.249]

37 Ă‹œ To obtain non-trivial approximations, the single-slice beliefs qt (ut ) are restricted to be condiĂ‹œ tional Gaussian as outlined above, and in addition the consistency constraints are weakened. [sent-241, score-0.207]

38 Ĺš cient statistics of the conditional Gaussian family over ut as deÄ? [sent-243, score-0.31]

39 With these restrictions qt (ut ) is in general not the marginal of pt (utĂ˘ˆ’1,t ), so one-slice and twoĂ‹œ Ă‹œ slice beliefs satisfying (8) do not lead to a proper distribution of the form (5). [sent-245, score-0.254]

40 beliefs over outer clusters, pt (utĂ˘ˆ’1,t ), and Ă‹œ their overlaps, qt (utĂ˘ˆ’1,t ). [sent-268, score-0.358]

41 no parametric choice for the beliefs, and strong consistency constraints), the local beliefs are exact marginals, and as in (5), the counting numbers can be interpreted as powers that dictate how to construct a global distribution from the marginals. [sent-273, score-0.212]

42 In KikuchiĂ˘€™s extension the outer clusters are taken larger. [sent-274, score-0.196]

43 beliefs over outer clusters, their direct overlaps, the overlaps of the overlaps, etc. [sent-278, score-0.27]

44 Choose outer clusters uouter(i) and associate with them the counting number couter(i) = 1. [sent-282, score-0.248]

45 The outer clusters should be such that all domains utĂ˘ˆ’1,t of the model potentials ĂŽÂ¨t (utĂ˘ˆ’1,t ) are fully contained 2007 Z OETER AND H ESKES s1 / s2 / s3 / s4 / s5 / . [sent-283, score-0.267]

46 Ĺš ne the overlaps of the outer clusters u over(i) , the Ă‚Â¨ overlaps of the overlaps, etc. [sent-296, score-0.396]

47 uĂŽĹ‚ Ă˘ŠƒuĂŽĹ‚ A crucial observation for the SLDS is that it makes sense to take outer clusters larger than the cliques of a (weak) junction tree. [sent-298, score-0.277]

48 However, the restriction that qt (ut ) must be Ă‹œ conditional Gaussian, and the weak consistency constraints imply an approximation: only part of the information from the past can be passed on to the future and vice versa. [sent-300, score-0.224]

49 Ĺš cial to take larger outer clusters and larger overlaps, since the weak consistency constraints are then over a larger set of sufÄ? [sent-302, score-0.297]

50 Ĺš ne symmetric extensions of the outer clusters as depicted in Figure 3. [sent-306, score-0.196]

51 (10) (11) In the outer clusters only the discrete space is extended because the continuous part can be integrated out analytically and the result stays in the conditional Gaussian family. [sent-308, score-0.241]

52 This implies a choice where the number of outer clusters is as small as possible at the cost of a larger continuous part in the Ä? [sent-313, score-0.196]

53 Below the clusters are shown schematically, with outer clusters on the top row, and recursively the overlaps of overlaps, etc. [sent-317, score-0.388]

54 s1,2,3,4 x1,2,3 s2,3,4,5 x3,4 s2,3,4 x3 s3,4,5,6 x4,5 s3,4,5 x4 s3,4 s4,5,6,7 x5,6,7 s4,5,6 x5 s4,5 s4 The outer clusters all have counting number 1. [sent-318, score-0.248]

55 The direct overlaps each have two larger clusters in which they are contained. [sent-319, score-0.192]

56 2008 C HANGE P OINT P ROBLEMS IN L INEAR DYNAMICAL S YSTEMS The overlaps of overlaps have Ä? [sent-321, score-0.2]

57 A second crucial observation for the SLDS is that the choice of outer clusters (10) implies that we only have to consider outer clusters and direct overlaps, i. [sent-327, score-0.392]

58 the phenomenon that all clusters beyond the direct overlaps get an associated counting number of 0 in the example above extends to all ĂŽĹŸ. [sent-329, score-0.244]

59 In (12), N = T Ă˘ˆ’ 2ĂŽĹŸ Ă˘ˆ’ 1 denotes the number of outer clusters in the approximation. [sent-336, score-0.196]

60 the potentials pi (uouter(i) ), and qi (uover(i) ). [sent-340, score-0.197]

61 Ĺš t of the (weak) junction tree choice of outer clusters and overlaps is that we can employ the same algorithm for the ĂŽĹŸ = 0 as for the ĂŽĹŸ > 0 case. [sent-352, score-0.377]

62 Find qi (uover(i ) ) that approximates pi (uover(i ) ) best in Kullback-Leibler (KL) sense: Ă‹œ Ă‹œ Ă‹œ qi (uover(i ) ) = Collapse pi (uover(i ) ) . [sent-375, score-0.252]

63 With ĂŽĹŸ = T Ă˘ˆ’2 there is only one cluster and we obtain a 2 strong junction tree, and the found posteriors are exact. [sent-385, score-0.184]

64 Every component is encoded by a row, with white squares denoting normal, and black squares prefault regimes. [sent-392, score-0.18]

65 The messages ĂŽÄ… 1 (x2 , s2 ) and ĂŽË›2 (x3 , s3 ) are conditional Gaussian by construction and hence each have two components: one corresponding to normal and one to prefault. [sent-393, score-0.192]

66 1 Synthetic Data As discussed in Section 4 the constraints in the regime transitions aid in learning. [sent-410, score-0.31]

67 Also, the fact that the normal regime precedes the prefault regime resolves the invariance under relabeling that would be present in a general switching linear dynamical systems setting. [sent-412, score-0.886]

68 In a few replications the model has learned normal and prefault classes that are different from the true generating model and hence result in large errors. [sent-428, score-0.273]

69 The control trials were treated as normal sequences and the 8 others as fault sequences. [sent-478, score-0.254]

70 The initial parameter settings for the normal regime was in Fourier form (West and Harrison, 1997). [sent-483, score-0.316]

71 In the second phase, again only the normal regime was considered, but An was also Ä? [sent-490, score-0.316]

72 The prefault model was initialized as an outlier model, i. [sent-499, score-0.18]

73 the parameters for the prefault regime were copies of the normal regime, but the noise covariances were larger. [sent-501, score-0.496]

74 After convergence, the mean absolute distance between the MAP change point and the triggers in the 8 fault sequences is 4. [sent-503, score-0.217]

75 Figure 9 shows the posteriors p(s 1:t = n, st+1:T = p|y1:T , ĂŽÂ¸) and the trigger signals for the 8 fault sequences. [sent-506, score-0.189]

76 The degrees of freedom that are available in the prefault submodel are used to also explain outliers at the end of the normal regime. [sent-515, score-0.245]

77 The thick solid line shows the model prediction with the regime indicators clamped to normal. [sent-522, score-0.273]

78 The light thick line shows model predictions with all regime indicators clamped to normal just as in the left plot. [sent-524, score-0.338]

79 The dark thick line shows the model predictions with the regime indicators clamped to normal from 1 to 70 and to prefault from 71 to 104. [sent-525, score-0.518]

80 They were constructed by clamping the discrete states of the model to the MAP change point value and computing the predicted mean EMG signal (light lines represent the normal, dark lines the prefault regime). [sent-550, score-0.208]

81 Discussion Motivated by fault and change detection problems in dynamical systems we have introduced a switching linear dynamical system with constrained regime transition probabilities. [sent-562, score-0.682]

82 The system is assumed to start in a normal regime and to either result in an absorbing stop state or change to a prefault regime. [sent-563, score-0.585]

83 As discussed in Sections 2 and 3, the assumption that the system cannot recover can be exploited to yield an algorithm that computes exact state and regime posteriors in time polynomial in the number of observations. [sent-566, score-0.329]

84 One way of interpreting the algorithm is that it sends messages along a weak junction tree as if it was a strong junction tree. [sent-576, score-0.306]

85 Therefore we can straightforwardly choose outer clusters in the Kikuchi approximation such that they form a (weak) junction tree. [sent-581, score-0.277]

86 We have shown that the resulting GEP updates then simplify since only outer clusters and direct overlaps need to be considered, i. [sent-582, score-0.323]

87 complete graphs, which is notorious for the fact that with unfortunate choices of clusters the quality degrades with larger clusters (Kappen and Wiegerinck, 2002). [sent-593, score-0.184]

88 Ĺš rst pleasant property of the change point model leads to the observation that in approximations with weak consistency constraints it makes sense to take clusters larger than is necessary to form a (weak) junction tree. [sent-597, score-0.302]

89 In models with cycles and complicated parametric families, an algorithm can send messages as if it is sending messages on a strong junction tree, whereas the underlying cluster choices do not form a tree, neither a weak nor a strong one. [sent-603, score-0.365]

90 In canonical form the CG distribution is given by 1 p(s, x) = exp gs + x hs Ă˘ˆ’ x Ks x , 2 (14) with canonical parameters {gs , hs , Ks }. [sent-621, score-0.382]

91 In this section we will use ÄŽ†(s, x; {gs , hs , Ks }) to denote a CG potential over s and x with canonical parameters {gs , hs , Ks }. [sent-631, score-0.275]

92 extending s to [s t] we use parameters gst = gs , hst = hs , and Kst = Ks . [sent-646, score-0.201]

93 f (xtĂ˘ˆ’1 , xt ) = p(stĂ˘ˆ’1 = i, st = j|y1:T , sT +1 , ĂŽÂ¸old ) pt (i j) dxtĂ˘ˆ’1,t f (xtĂ˘ˆ’1 , xt )p(xtĂ˘ˆ’1,t |stĂ˘ˆ’1 = i, st = j, y1:T , sT +1 , ĂŽÂ¸old ) . [sent-691, score-0.676]

94 The updates read: Anew j T Ă˘ˆ‘ = xt xtĂ˘ˆ’1 t=2 Qnew j pt (Ă‚Ë‡ j) Ă˘ˆ‘ pt (Ă‚Ë‡ j) T Ă˘ˆ‘t=2 xt xt = Ă˘ˆ’1 T = x1 V1new = x1 x1 ĂŽÂ newj iĂ˘†’ Ă˘ˆ? [sent-697, score-0.574]

95 When sT +1 = f, or if it is not observed, posterior distributions such as p(xtĂ˘ˆ’1,t |stĂ˘ˆ’1 = i, st = j, y1:T , sT +1 = f, ĂŽÂ¸old ) are mixture of Gaussians (the sT +1 = s case results in a straightforward LDS variant). [sent-702, score-0.203]

96 For example, xt xtĂ˘ˆ’1 pt (Ă‚Ë‡n) , is based on a mixture with T Ă˘ˆ’ t Ă˘ˆ’ 1 components (if sT +1 is observed to be a fault), each corresponding to a possible end of the normal regime. [sent-706, score-0.284]

97 Prior Distributions In practice, if the underlying models for normal and prefault regimes are relatively Ă˘€œfar apartĂ˘€? [sent-714, score-0.317]

98 For example if the prefault regime has an entirely different offset in the observation model, the prefault subsequences lie in an entirely different region of sensor space, which makes it easy to distinguish between the two. [sent-716, score-0.611]

99 Our main concern is with priors on the regime transition probabilities. [sent-719, score-0.287]

100 Theorem 1 The collection of beliefs pt (ztĂ˘ˆ’1,t ) and qt (zt ) form Ä? [sent-768, score-0.254]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('uover', 0.584), ('uouter', 0.27), ('ut', 0.265), ('regime', 0.251), ('prefault', 0.18), ('st', 0.174), ('ks', 0.139), ('hs', 0.123), ('emg', 0.12), ('fault', 0.119), ('pt', 0.11), ('dynamical', 0.109), ('xt', 0.109), ('eskes', 0.105), ('oeter', 0.105), ('outer', 0.104), ('overlaps', 0.1), ('hange', 0.097), ('ystems', 0.097), ('clusters', 0.092), ('slds', 0.09), ('cg', 0.084), ('messages', 0.082), ('oint', 0.082), ('roblems', 0.082), ('junction', 0.081), ('qt', 0.078), ('gs', 0.078), ('zoeter', 0.075), ('ep', 0.073), ('inear', 0.072), ('regimes', 0.072), ('potentials', 0.071), ('sequences', 0.07), ('pi', 0.066), ('beliefs', 0.066), ('normal', 0.065), ('kikuchi', 0.063), ('heskes', 0.061), ('qi', 0.06), ('propagation', 0.059), ('backward', 0.057), ('old', 0.054), ('counting', 0.052), ('conditional', 0.045), ('gep', 0.045), ('histories', 0.045), ('sr', 0.045), ('inference', 0.045), ('posteriors', 0.045), ('energy', 0.043), ('yt', 0.041), ('pass', 0.04), ('weak', 0.038), ('collapse', 0.038), ('marginalization', 0.037), ('inferred', 0.037), ('consistency', 0.037), ('transition', 0.036), ('stop', 0.036), ('cluster', 0.034), ('exact', 0.033), ('gaussians', 0.033), ('transitions', 0.033), ('free', 0.032), ('oo', 0.031), ('stumbling', 0.031), ('switching', 0.03), ('ns', 0.03), ('cnew', 0.03), ('fgep', 0.03), ('nijmegen', 0.03), ('schillings', 0.03), ('vnp', 0.03), ('vns', 0.03), ('canonical', 0.029), ('posterior', 0.029), ('gaussian', 0.029), ('belief', 0.028), ('np', 0.028), ('replications', 0.028), ('yedidia', 0.028), ('change', 0.028), ('updates', 0.027), ('forward', 0.026), ('moment', 0.026), ('constraints', 0.026), ('absorbing', 0.025), ('kappen', 0.025), ('trigger', 0.025), ('log', 0.025), ('marginals', 0.025), ('message', 0.024), ('strong', 0.024), ('clamped', 0.022), ('harrison', 0.022), ('lgep', 0.022), ('muscle', 0.022), ('trainset', 0.022), ('bethe', 0.022)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999958 17 jmlr-2005-Change Point Problems in Linear Dynamical Systems

Author: Onno Zoeter, Tom Heskes

2 0.13188426 52 jmlr-2005-Loopy Belief Propagation: Convergence and Effects of Message Errors

Author: Alexander T. Ihler, John W. Fisher III, Alan S. Willsky

Abstract: Belief propagation (BP) is an increasingly popular method of performing approximate inference on arbitrary graphical models. At times, even further approximations are required, whether due to quantization of the messages or model parameters, from other simpliﬁed message or model representations, or from stochastic approximation methods. The introduction of such errors into the BP message computations has the potential to affect the solution obtained adversely. We analyze the effect resulting from message approximation under two particular measures of error, and show bounds on the accumulation of errors in the system. This analysis leads to convergence conditions for traditional BP message passing, and both strict bounds and estimates of the resulting error in systems of approximate BP message passing. Keywords: belief propagation, sum-product, convergence, approximate inference, quantization

3 0.084522575 51 jmlr-2005-Local Propagation in Conditional Gaussian Bayesian Networks

Author: Robert G. Cowell

Abstract: This paper describes a scheme for local computation in conditional Gaussian Bayesian networks that combines the approach of Lauritzen and Jensen (2001) with some elements of Shachter and Kenley (1989). Message passing takes place on an elimination tree structure rather than the more compact (and usual) junction tree of cliques. This yields a local computation scheme in which all calculations involving the continuous variables are performed by manipulating univariate regressions, and hence matrix operations are avoided. Keywords: Bayesian networks, conditional Gaussian distributions, propagation algorithm, elimination tree

4 0.081682861 32 jmlr-2005-Expectation Consistent Approximate Inference

Author: Manfred Opper, Ole Winther

Abstract: We propose a novel framework for approximations to intractable probabilistic models which is based on a free energy formulation. The approximation can be understood as replacing an average over the original intractable distribution with a tractable one. It requires two tractable probability distributions which are made consistent on a set of moments and encode diﬀerent features of the original intractable distribution. In this way we are able to use Gaussian approximations for models with discrete or bounded variables which allow us to include non-trivial correlations. These are neglected in many other methods. We test the framework on toy benchmark problems for binary variables on fully connected graphs and 2D grids and compare with other methods, such as loopy belief propagation. Good performance is already achieved by using single nodes as tractable substructures. Signiﬁcant improvements are obtained when a spanning tree is used instead.

5 0.072189696 71 jmlr-2005-Variational Message Passing

Author: John Winn, Christopher M. Bishop

Abstract: Bayesian inference is now widely established as one of the principal foundations for machine learning. In practice, exact inference is rarely possible, and so a variety of approximation techniques have been developed, one of the most widely used being a deterministic framework called variational inference. In this paper we introduce Variational Message Passing (VMP), a general purpose algorithm for applying variational inference to Bayesian Networks. Like belief propagation, VMP proceeds by sending messages between nodes in the network and updating posterior beliefs using local operations at each node. Each such update increases a lower bound on the log evidence (unless already at a local maximum). In contrast to belief propagation, VMP can be applied to a very general class of conjugate-exponential models because it uses a factorised variational approximation. Furthermore, by introducing additional variational parameters, VMP can be applied to models containing non-conjugate distributions. The VMP framework also allows the lower bound to be evaluated, and this can be used both for model comparison and for detection of convergence. Variational message passing has been implemented in the form of a general purpose inference engine called VIBES (‘Variational Inference for BayEsian networkS’) which allows models to be speciﬁed graphically and then solved variationally without recourse to coding. Keywords: Bayesian networks, variational inference, message passing

6 0.069575764 33 jmlr-2005-Fast Kernel Classifiers with Online and Active Learning

7 0.062406361 14 jmlr-2005-Assessing Approximate Inference for Binary Gaussian Process Classification

8 0.054652147 36 jmlr-2005-Gaussian Processes for Ordinal Regression

9 0.047130436 5 jmlr-2005-A Generalization Error for Q-Learning

10 0.045362704 68 jmlr-2005-Tree-Based Batch Mode Reinforcement Learning

11 0.041985143 18 jmlr-2005-Characterization of a Family of Algorithms for Generalized Discriminant Analysis on Undersampled Problems

12 0.041496001 2 jmlr-2005-A Bayesian Model for Supervised Clustering with the Dirichlet Process Prior

13 0.040420841 19 jmlr-2005-Clustering on the Unit Hypersphere using von Mises-Fisher Distributions

14 0.036219243 10 jmlr-2005-Adaptive Online Prediction by Following the Perturbed Leader

15 0.036007691 40 jmlr-2005-Inner Product Spaces for Bayesian Networks

16 0.033393241 55 jmlr-2005-Matrix Exponentiated Gradient Updates for On-line Learning and Bregman Projection

17 0.031149687 42 jmlr-2005-Large Margin Methods for Structured and Interdependent Output Variables

18 0.030827845 57 jmlr-2005-Multiclass Boosting for Weak Classifiers

19 0.029954629 66 jmlr-2005-Smooth ε-Insensitive Regression by Loss Symmetrization

20 0.028080303 6 jmlr-2005-A Modified Finite Newton Method for Fast Solution of Large Scale Linear SVMs

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.201), (1, -0.081), (2, 0.027), (3, -0.241), (4, -0.266), (5, 0.225), (6, 0.065), (7, 0.18), (8, -0.165), (9, -0.032), (10, -0.066), (11, -0.014), (12, 0.051), (13, -0.041), (14, -0.178), (15, -0.088), (16, -0.064), (17, -0.045), (18, -0.046), (19, -0.082), (20, -0.019), (21, 0.191), (22, 0.052), (23, -0.013), (24, 0.039), (25, 0.001), (26, -0.066), (27, 0.068), (28, -0.043), (29, -0.015), (30, 0.044), (31, 0.039), (32, 0.084), (33, -0.06), (34, -0.087), (35, 0.074), (36, -0.054), (37, 0.006), (38, -0.077), (39, -0.039), (40, -0.179), (41, -0.134), (42, -0.046), (43, 0.247), (44, 0.057), (45, 0.008), (46, -0.131), (47, -0.095), (48, 0.08), (49, 0.137)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9661544 17 jmlr-2005-Change Point Problems in Linear Dynamical Systems

Author: Onno Zoeter, Tom Heskes

2 0.57150781 52 jmlr-2005-Loopy Belief Propagation: Convergence and Effects of Message Errors

Author: Alexander T. Ihler, John W. Fisher III, Alan S. Willsky

3 0.34072679 33 jmlr-2005-Fast Kernel Classifiers with Online and Active Learning

Author: Antoine Bordes, Seyda Ertekin, Jason Weston, Léon Bottou

Abstract: Very high dimensional learning systems become theoretically possible when training examples are abundant. The computing cost then becomes the limiting factor. Any efﬁcient learning algorithm should at least take a brief look at each example. But should all examples be given equal attention? This contribution proposes an empirical answer. We ﬁrst present an online SVM algorithm based on this premise. LASVM yields competitive misclassiﬁcation rates after a single pass over the training examples, outspeeding state-of-the-art SVM solvers. Then we show how active example selection can yield faster training, higher accuracies, and simpler models, using only a fraction of the training example labels.

4 0.32769495 51 jmlr-2005-Local Propagation in Conditional Gaussian Bayesian Networks

Author: Robert G. Cowell

5 0.24181275 32 jmlr-2005-Expectation Consistent Approximate Inference

Author: Manfred Opper, Ole Winther

6 0.2322485 14 jmlr-2005-Assessing Approximate Inference for Binary Gaussian Process Classification

7 0.21010946 36 jmlr-2005-Gaussian Processes for Ordinal Regression

8 0.20249036 19 jmlr-2005-Clustering on the Unit Hypersphere using von Mises-Fisher Distributions

9 0.17278185 40 jmlr-2005-Inner Product Spaces for Bayesian Networks

10 0.17092121 18 jmlr-2005-Characterization of a Family of Algorithms for Generalized Discriminant Analysis on Undersampled Problems

11 0.16793098 68 jmlr-2005-Tree-Based Batch Mode Reinforcement Learning

12 0.16340216 71 jmlr-2005-Variational Message Passing

13 0.1566274 5 jmlr-2005-A Generalization Error for Q-Learning

14 0.15175363 66 jmlr-2005-Smooth ε-Insensitive Regression by Loss Symmetrization

15 0.1377555 57 jmlr-2005-Multiclass Boosting for Weak Classifiers

16 0.1309211 10 jmlr-2005-Adaptive Online Prediction by Following the Perturbed Leader

17 0.122305 43 jmlr-2005-Learning Hidden Variable Networks: The Information Bottleneck Approach

18 0.12215886 24 jmlr-2005-Core Vector Machines: Fast SVM Training on Very Large Data Sets

19 0.11870505 7 jmlr-2005-A Unifying View of Sparse Approximate Gaussian Process Regression

20 0.10978399 25 jmlr-2005-Denoising Source Separation

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(13, 0.511), (17, 0.015), (19, 0.034), (36, 0.036), (37, 0.031), (43, 0.02), (47, 0.012), (52, 0.059), (59, 0.017), (64, 0.029), (70, 0.033), (80, 0.021), (88, 0.063), (90, 0.019), (94, 0.018)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.8624233 17 jmlr-2005-Change Point Problems in Linear Dynamical Systems

Author: Onno Zoeter, Tom Heskes

2 0.82470173 26 jmlr-2005-Diffusion Kernels on Statistical Manifolds

Author: John Lafferty, Guy Lebanon

Abstract: A family of kernels for statistical learning is introduced that exploits the geometric structure of statistical models. The kernels are based on the heat equation on the Riemannian manifold deﬁned by the Fisher information metric associated with a statistical family, and generalize the Gaussian kernel of Euclidean space. As an important special case, kernels based on the geometry of multinomial families are derived, leading to kernel-based learning algorithms that apply naturally to discrete data. Bounds on covering numbers and Rademacher averages for the kernels are proved using bounds on the eigenvalues of the Laplacian on Riemannian manifolds. Experimental results are presented for document classiﬁcation, for which the use of multinomial geometry is natural and well motivated, and improvements are obtained over the standard use of Gaussian or linear kernels, which have been the standard for text classiﬁcation. Keywords: kernels, heat equation, diffusion, information geometry, text classiﬁcation

3 0.43845889 9 jmlr-2005-Active Learning to Recognize Multiple Types of Plankton

Author: Tong Luo, Kurt Kramer, Dmitry B. Goldgof, Lawrence O. Hall, Scott Samson, Andrew Remsen, Thomas Hopkins

Abstract: This paper presents an active learning method which reduces the labeling effort of domain experts in multi-class classiﬁcation problems. Active learning is applied in conjunction with support vector machines to recognize underwater zooplankton from higher-resolution, new generation SIPPER II images. Most previous work on active learning with support vector machines only deals with two class problems. In this paper, we propose an active learning approach “breaking ties” for multiclass support vector machines using the one-vs-one approach with a probability approximation. Experimental results indicate that our approach often requires signiﬁcantly less labeled images to reach a given accuracy than the approach of labeling the least certain test example and random sampling. It can also be applied in batch mode resulting in an accuracy comparable to labeling one image at a time and retraining. Keywords: active learning, support vector machine, plankton recognition, probabilistic output, multi-class support vector machine

4 0.33897534 51 jmlr-2005-Local Propagation in Conditional Gaussian Bayesian Networks

Author: Robert G. Cowell

5 0.33628401 64 jmlr-2005-Semigroup Kernels on Measures

Author: Marco Cuturi, Kenji Fukumizu, Jean-Philippe Vert

Abstract: We present a family of positive deﬁnite kernels on measures, characterized by the fact that the value of the kernel between two measures is a function of their sum. These kernels can be used to derive kernels on structured objects, such as images and texts, by representing these objects as sets of components, such as pixels or words, or more generally as measures on the space of components. Several kernels studied in this work make use of common quantities deﬁned on measures such as entropy or generalized variance to detect similarities. Given an a priori kernel on the space of components itself, the approach is further extended by restating the previous results in a more efﬁcient and ﬂexible framework using the “kernel trick”. Finally, a constructive approach to such positive deﬁnite kernels through an integral representation theorem is proved, before presenting experimental results on a benchmark experiment of handwritten digits classiﬁcation to illustrate the validity of the approach. Keywords: kernels on measures, semigroup theory, Jensen divergence, generalized variance, reproducing kernel Hilbert space

6 0.30588099 52 jmlr-2005-Loopy Belief Propagation: Convergence and Effects of Message Errors

7 0.29942155 33 jmlr-2005-Fast Kernel Classifiers with Online and Active Learning

8 0.29295754 32 jmlr-2005-Expectation Consistent Approximate Inference

9 0.28495696 48 jmlr-2005-Learning the Kernel Function via Regularization

10 0.28360507 63 jmlr-2005-Quasi-Geodesic Neural Learning Algorithms Over the Orthogonal Group: A Tutorial

11 0.28229782 71 jmlr-2005-Variational Message Passing

12 0.27202746 20 jmlr-2005-Clustering with Bregman Divergences

13 0.27102542 49 jmlr-2005-Learning the Kernel with Hyperkernels (Kernel Machines Section)

14 0.26887852 39 jmlr-2005-Information Bottleneck for Gaussian Variables

15 0.26609111 46 jmlr-2005-Learning a Mahalanobis Metric from Equivalence Constraints

16 0.26163846 18 jmlr-2005-Characterization of a Family of Algorithms for Generalized Discriminant Analysis on Undersampled Problems

17 0.26107904 23 jmlr-2005-Convergence Theorems for Generalized Alternating Minimization Procedures

18 0.2610555 31 jmlr-2005-Estimation of Non-Normalized Statistical Models by Score Matching

19 0.26088059 28 jmlr-2005-Efficient Computation of Gapped Substring Kernels on Large Alphabets

20 0.26062408 56 jmlr-2005-Maximum Margin Algorithms with Boolean Kernels