jmlr jmlr2005 jmlr2005-51 knowledge-graph by maker-knowledge-mining

51 jmlr-2005-Local Propagation in Conditional Gaussian Bayesian Networks


Source: pdf

Author: Robert G. Cowell

Abstract: This paper describes a scheme for local computation in conditional Gaussian Bayesian networks that combines the approach of Lauritzen and Jensen (2001) with some elements of Shachter and Kenley (1989). Message passing takes place on an elimination tree structure rather than the more compact (and usual) junction tree of cliques. This yields a local computation scheme in which all calculations involving the continuous variables are performed by manipulating univariate regressions, and hence matrix operations are avoided. Keywords: Bayesian networks, conditional Gaussian distributions, propagation algorithm, elimination tree

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Message passing takes place on an elimination tree structure rather than the more compact (and usual) junction tree of cliques. [sent-7, score-1.037]

2 Keywords: Bayesian networks, conditional Gaussian distributions, propagation algorithm, elimination tree 1. [sent-9, score-0.731]

3 This paper presents an alternative scheme in which the local computation is performed on an elimination tree, rather than using a junction tree. [sent-19, score-0.605]

4 Then the general procedure is presented: deriving an elimination tree derived from a network; initializing the elimination tree is given (this is perhaps the most complicated part of the scheme); entering evidence and evaluating posterior marginal densities. [sent-27, score-1.411]

5 These operations are required for their message passing algorithm on the junction tree structure, and in the main correspond to operations on probability distributions. [sent-40, score-0.573]

6 In comparison to the propagation scheme presented in this paper, much of the complexity of their algorithm arises because their local computational structure is a strong junction tree of cliques. [sent-43, score-0.612]

7 In contrast, we work with univariate regressions on an elimination tree, avoiding matrix operations. [sent-46, score-0.685]

8 Junction Trees and Elimination Trees In this section we review the notions of a junction tree of cliques and of an elimination tree, and of their relative advantages and disadvantages. [sent-49, score-0.864]

9 This is a tree structure, with the node set being the set of cliques C of a chordal graph, such that the intersection C1 ∩C2 of any two cliques C1 ∈ C and C2 ∈ C is contained in every clique on the path in the junction tree between C1 and C2 . [sent-54, score-0.963]

10 An elimination tree is similar to a junction tree, in that it is a tree structure, but with the node set being a subset of the complete subgraphs of a chordal graph (rather than the set of cliques) such that the intersection C1 ∩ C2 of any two nodes 1. [sent-71, score-1.147]

11 1519 C OWELL in the elimination tree is contained in every node on the path in the tree between C1 and C2 . [sent-75, score-0.871]

12 The basic steps to make an elimination tree starting with the directed acyclic graph G are as follows: 1. [sent-77, score-0.623]

13 For each node vi associate a so called cluster set of nodes consisting of vi and its neighbours later in the elimination sequence (and hence of lower number), call this set Ci . [sent-85, score-0.677]

14 The choice of elimination sequence will govern the number of fill-ins in the triangulation, and hence the size of the state space of the elimination tree. [sent-96, score-0.732]

15 3 Comparing Elimination and Junction Trees In an elimination tree, the set of cluster sets contains the set of cliques of the triangulated graph together with some other sets. [sent-101, score-0.746]

16 Hence in terms of storage requirements for potentials on the sets, elimination trees are less efficient than junction trees. [sent-102, score-0.663]

17 Now consider the elimination tree, made by using an elimination j sequence vk , vk−1 , . [sent-106, score-0.762]

18 In the corresponding elimination tree, as we shall see, only univariate regressions are stored. [sent-116, score-0.685]

19 Thus the use of the elimination tree does not introduce duplication in the values to be stored, in contrast to the discrete case. [sent-122, score-0.724]

20 Hence the elimination tree is almost as efficient as the junction tree in storage requirements (only almost, because there will be some extra bookkeeping needed to keep track of which variables are in each of the cluster sets). [sent-123, score-1.264]

21 Here we use a similar structure based on elimination trees, which we shall call a strongly rooted elimination tree. [sent-128, score-0.732]

22 Then a strongly rooted elimination tree may be formed in the same way as a standard elimination tree provided that, in the elimination sequence used, all of the continuous variables occur before any of the discrete variables. [sent-130, score-1.805]

23 ) The reason for using a strong elimination tree will become apparent when we discuss the initialization of the tree and propagation on it. [sent-133, score-0.964]

24 For a more efficient computation scheme (from a storage requirement viewpoint) is it convenient to use a tree structure that is intermediate between a junction tree and an elimination tree, a structure which we call a strong semi-elimination tree, introduced in the next section. [sent-134, score-1.115]

25 5 From Elimination Tree to Junction Tree Given an elimination ordering for a graph G, one can construct a triangulated graph Gc , use maximum cardinality search to find the cliques, and then organize the cliques into a junction tree. [sent-136, score-0.788]

26 Alternatively, one could take the elimination tree and remove the redundant cluster sets that are subsets of cliques, by repeated application of the following result due to Leimer (1989) (see also Lemma 2. [sent-137, score-0.817]

27 The preceding Lemma operates on sets, but in the elimination tree we also have links between sets, which will need to be rearranged if a set if deleted. [sent-163, score-0.591]

28 1 makes a single pass through the cluster sets of an elimination tree to produce a junction tree, it assumes that the elimination tree is connected. [sent-165, score-1.594]

29 It is convenient to make the edges of the elimination tree directed edges, with directions pointing away from the root C1 . [sent-166, score-0.622]

30 Although the message passing algorithms presented below will work on a strong elimination tree, to optimize the storage requirements it is better to work on a strong semi-elimination tree. [sent-185, score-0.564]

31 This is a strong elimination tree in which the purely discrete clusters that are subsets of other purely discrete clusters have been removed, with links among the remaining clusters suitably adjusted. [sent-186, score-1.181]

32 3 produces a strong semi-elimination tree from a strong elimination tree. [sent-188, score-0.711]

33 On the left a Bayesian network, and top right the elimination tree obtained using the elimination sequence of reverse alphabetical ordering. [sent-191, score-0.957]

34 In the top tree we first remove the redundant cluster Ct = {A} having the unique child cluster Cp = {AB}, yielding the second tree in which {AB} now has no parent. [sent-192, score-0.902]

35 In the initialization phase, CG regressions are passed between continuous clusters, and so we introduce a list structure called a postbag to store such messages that are to be passed. [sent-216, score-0.73]

36 In addition we introduce for each continuous cluster another list structure called an lp-potential that will on completion of the initialization phase store the conditional density of the elimination variable associated with the cluster (conditional on the remaining variables in the cluster). [sent-217, score-1.085]

37 For discrete variables this proceeds as in the purely discrete case, by a marginalization of the tables in the discrete clusters in the elimination tree. [sent-223, score-0.935]

38 This set of linear regressions defines the joint density of the continuous variables conditional on the discrete variables. [sent-234, score-0.683]

39 3 Assignment of Potentials The next stage is to assign the conditional probabilities and densities of the Bayesian network to the tree so that the tree contains a representation of the joint density of all of the variables. [sent-245, score-0.586]

40 As a consequence of this assignment, the subtree consisting of those clusters containing only discrete variables contains all of the information required to reconstruct the joint marginal of the discrete variables; this part of the tree shall be referred to as the discrete part. [sent-249, score-0.767]

41 Similarly, the clusters having the continuous variables hold representations of all of the densities with which one can construct the joint density of the continuous variables conditional on the discrete variables; this part of the tree will be called the continuous part. [sent-250, score-0.859]

42 The regressions L (Z | X,C), which are independent of Y , are forwarded to the neighbouring cluster set towards the root, here {XZABC}. [sent-262, score-0.589]

43 The regressions L (X | Z, A, B,C) are retained in the cluster {XZABC}, and the regressions L (Z | A, B,C) are forwarded to the cluster {ZABC}, and we are done. [sent-264, score-1.09]

44 (Their P USH operation can act more generally on a group of variables, but because we are working with an elimination tree we only require it to act on one variable at a time. [sent-289, score-0.665]

45 Also suppose that the variables have been numbered so that the elimination ordering of the continuous variables to make the strong elimination tree is (γn , γn−1 , . [sent-315, score-1.227]

46 We denote the set of continuous cluster sets by CΓ , with CΓ (i) ∈ CΓ denoting the cluster set associated with eliminating the continuous variable γi . [sent-320, score-0.628]

47 Let SΓ denote the separators adjacent to continuous cluster sets, with SΓ (i) ∈ SΓ denoting the separator between CΓ (i) and its neighbouring cluster set in the direction of the strong root. [sent-321, score-0.72]

48 On the clusters C∆ and separators S∆ of the discrete part of the tree we store the usual tables (called potentials) of discrete junction tree propagation algorithms. [sent-329, score-1.159]

49 2 Pre-initializing the Tree After constructing the tree from the Bayesian network, the first task is to allocate the conditional probability tables and CG regressions to the clusters of the tree. [sent-331, score-0.685]

50 1 Allocation of potentials Pre-initialize tree potentials In each continuous cluster set CΓ (i) ∈ CΓ : • Set each lp-potential to empty; • Set each postbag to an empty list. [sent-334, score-1.092]

51 Allocation of CG regressions For each X ∈ Γ, find a cluster, CΓ (i) say, which contains X ∪ pa(X), and: • IF the elimination node γ(i) of CΓ (i) is X, then add L (X | pa(X)) to the lp-potential; OTHERWISE append L (X | pa(X)) to the postbag of CΓ (i). [sent-338, score-1.027]

52 Under this allocation the discrete part of the tree contains all of the conditional (and unconditional) probability tables of the discrete variables, and the product of the potentials in the discrete clusters yields the joint marginal distribution of the discrete variables. [sent-339, score-1.093]

53 Applying this to the example of Section 4, both the postbag and lp-potential of {ZABC} were empty, the postbag of {XZABC} was empty but its lp-potential stored L (X | A, B), whilst for {Y XZC} the postbag contained L (Z |Y,C) and the lp-potential stored L (Y | X,C). [sent-341, score-1.019]

54 Note that the continuous leaves of the elimination tree will have non-empty lp-potentials, because for each such leaf, its associated elimination node appears nowhere else in the tree, and so there is no other cluster where the conditional density of that variable can be allocated. [sent-342, score-1.433]

55 In the example of Section 4 no postbag contained more than one regression (for each discrete configuration); in larger and more complicated networks this might not happen, and so within each continuous cluster a sequence of E XCHANGE operations may be necessary. [sent-396, score-0.776]

56 In the strong elimination tree (right) the strong root is a, and the two sets {CDEF} and {ABCDE} would by themselves form a strong junction tree with {ABCDE} as the strong root. [sent-403, score-1.273]

57 We shall use the same initial allocation of regressions as Lauritzen and Jensen, which means that L (F | −) is put into the lp-potential of cluster {CDEF}, and in its postbag we place L (E |CF) and L (D | F). [sent-404, score-0.867]

58 In the postbag of cluster {ABCDE} we place the regressions L (C | DA) and L (B | ACDE). [sent-405, score-0.832]

59 We now retain L (F |CDE) in the lp-potential of cluster {CDEF} and pass the regressions L (D | −) and L (E |CD) to the cluster {ABCDE}. [sent-413, score-0.771]

60 Actually, because this cluster contains the discrete variable A, there is an lp-potential and a postbag for each configuration of A. [sent-415, score-0.646]

61 From this set, L (E | ABCD) is retained in the lp-potential of {ABCDE}, L (D | A) is put into the lp-potential of {ABCD} (because D is the elimination variable of this cluster), and L (C | DA) and L (B | ACD) are put into the postbag of {ABCD}. [sent-420, score-0.653]

62 Hence in the cluster {ABCD} we have: 1534 P ROPAGATION IN C ONDITIONAL G AUSSIAN N ETWORKS lp-potential postbag L (D | A) L (B | ACD), L (C | DA) Now D appears in the tails of both regressions in the postbag, hence care must be taken. [sent-421, score-0.832]

63 Recall that γi is the elimination variable associated with the continuous cluster CΓ (i). [sent-430, score-0.68]

64 Given: • A mixed Bayesian network B with a strong elimination tree initialized according to Algorithm 5. [sent-433, score-0.68]

65 To see this, first note that each E XCHANGE operation is a valid application of Bayes’ theorem, and does not at any stage alter the joint conditional density that can be reconstructed from the regressions stored in the lp-potentials and postbags. [sent-448, score-0.557]

66 In order to keep track of those variables that have already had their evidence entered, in each continuous cluster we retain a boolean variable—called activeflag—which is initially set to TRUE before any continuous evidence has been entered. [sent-456, score-0.644]

67 A value of FALSE indicates that evidence on the elimination variable of the cluster has been entered. [sent-457, score-0.695]

68 A TRUE value indicates that evidence concerning the elimination variable may be entered, or that the marginal density of the variable may be calculated. [sent-458, score-0.573]

69 Recall that the separator SΓ (i) between a continuous cluster set CΓ (i) that is a neighbour to a discrete cluster in C∆ only contains those discrete variables in CΓ (i). [sent-459, score-0.874]

70 In the algorithm below the cluster neighbouring CΓ (i) in the direction of the strong root will be denoted by toroot(i), and is either another continuous cluster, or a purely discrete boundary cluster. [sent-461, score-0.669]

71 We now state the algorithm for entering evidence on a continuous variable γ j (indexed according the elimination ordering) which consists of the finding that γ j = γ∗ . [sent-466, score-0.585]

72 3 (The P USH operation: Entering evidence γ j = γ∗ ) j • Given: – A tree with elimination sequence γn , γn−1 , . [sent-469, score-0.694]

73 while toroot(i) is not a boundary cluster do: – For each configuration δ∗i of the discrete variables of CΓ (ri ) (with induced configuration r δ∗ of the discrete variables of CΓ (i)) do: i ∗ IF activeflag of toroot(i) is FALSE, then: 1. [sent-484, score-0.606]

74 Copy the δ∗ postbag of CΓ (i) into the δ∗i postbag of CΓ (ri ); r i OTHERWISE: 1. [sent-485, score-0.574]

75 Copy the δ∗ postbag of CΓ (i) into the δ∗i postbag of CΓ (ri ) r i 2. [sent-486, score-0.574]

76 Perform the E XCHANGE operation on the δ∗i lp-potential and postbag of CΓ (ri ) r (such that the resulting postbag regression has γ j as head). [sent-487, score-0.648]

77 For each configuration δ∗ of the discrete variables of CΓ (i) substitute γ j = γ∗ into the i j density of the regression stored in the δ∗ postbag of CΓ (i) and store the result in the δ∗ i i entry of the weight table of SΓ (i). [sent-492, score-0.604]

78 After standard evidence propagation on the discrete part of the tree, the latter stores a representation of P(∆ | γ j = γ∗ , δ∗ ) or P(∆, γ j = γ∗ , δ∗ ) j j depending on whether or not the discrete potentials have been normalized to unity (δ∗ represents discrete evidence). [sent-504, score-0.701]

79 In {XZABC}, the activeflag is set to FALSE, and for each configuration abc the regression N (αX:abc + βX:Zabc z, σ2 ) is moved into the abc postbag of {ZABC} X:abc 3. [sent-514, score-0.791]

80 The idea is to use a sequence of E XCHANGE operations to push Y to a cluster C neighbouring the boundary, so that we have a representation of the distribution L (Y | EΓ , B) where EΓ denotes the evidence on the continuous variables, and B ⊆ ∆ are the discrete variables in the cluster C. [sent-528, score-0.898]

81 4 (Find the posterior density of a continuous variable) • Given: – A strong elimination tree with elimination sequence γn , γn−1 , . [sent-534, score-1.209]

82 3, and discrete evidence E∆ entered and propagated on the discrete part, so that the discrete clusters contain posterior distributions. [sent-540, score-0.708]

83 1 Summary of Current Scheme In the current scheme, evidence propagation and evaluation of marginal distributions in a mixed Bayesian network B takes place on a strong elimination tree or strong semi-elimination tree. [sent-563, score-0.98]

84 The tree has two distinct parts, the continuous part and the discrete part, with the strong root located in the discrete part. [sent-564, score-0.67]

85 The continuous part is initialized using the CG regressions of B , and represents, using a collection of univariate regressions, the density of the continuous variables conditional on the discrete variables. [sent-565, score-0.771]

86 Discrete evidence is entered on the discrete part of the tree and propagated on the discrete part of the tree in the usual way. [sent-568, score-0.918]

87 2 Comparison with the Lauritzen and Jensen (2001) Scheme As discussed in Section 2 the Lauritzen and Jensen propagation scheme uses a strong junction tree architecture. [sent-571, score-0.612]

88 Associated with each clique of the junction tree is a CG potential, which is a tuple φ = [p, A, B,C](H | T ) of scalars, vectors and matrices and a partition of the variables into conditioned variables (the head) and conditioning variables (the tail). [sent-572, score-0.589]

89 In Figure 4 the clusters {CDEF} and {ABCDE} form a strong junction tree with the latter as the strong root. [sent-578, score-0.589]

90 If one compares these four potentials with the regressions stored in the cluster {ABCDE} in Section 5. [sent-585, score-0.713]

91 In the current scheme the regressions L (D | −), L (E |CD) were passed to the cluster {ABCDE}, which stored (omitting the dependence on A) the regressions L (C | D), L (B |CDE). [sent-587, score-0.974]

92 It has echoes of lazy-propagation (Madsen and Jensen, 1998), in which potentials are stored in factored form when initializing a junction tree and are only combined when required; the difference is that in our scheme factored forms are always retained. [sent-591, score-0.69]

93 The elimination tree, with its lp-potential and postbag structures, provides an organizing framework for such operations. [sent-623, score-0.653]

94 Sampling and Mode-Finding Algorithms Aside from finding the posterior marginals of variables, the use of the elimination tree in the current scheme facilitates other operations that may be of interest in applications. [sent-644, score-0.735]

95 Starting from a junction tree of cliques, and after entering and sum-propagating evidence E∆ , the clique and separator tables contain posterior marginals of their variables. [sent-651, score-0.724]

96 1 (Sampling from the posterior) • Given: A tree with elimination sequence γn , γn−1 , . [sent-664, score-0.591]

97 i 1544 P ROPAGATION IN C ONDITIONAL G AUSSIAN N ETWORKS Note that, because of the elimination tree structure employed, each simulated γi is sampled from a univariate normal distribution. [sent-679, score-0.591]

98 2 (Highest Component Search) • Given: – A tree with elimination sequence γn , γn−1 , . [sent-698, score-0.591]

99 (Note that this accumulated product is valid because of the strong nature of the tree: a continuous cluster CΓ (ri ) neighbouring another continuous cluster CΓ (i) but closer to the strong root will contain all of the discrete variables that are in CΓ (i). [sent-730, score-0.992]

100 Summary We have presented a local propagation scheme for conditional Gaussian Bayesian networks based on elimination trees, that combines the scheme of Lauritzen and Jensen (2001) with that of Shachter and Kenley (1989). [sent-741, score-0.612]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('elimination', 0.366), ('regressions', 0.319), ('postbag', 0.287), ('abc', 0.252), ('xchange', 0.246), ('cluster', 0.226), ('tree', 0.225), ('junction', 0.186), ('zabc', 0.146), ('cg', 0.139), ('discrete', 0.133), ('guration', 0.123), ('lauritzen', 0.12), ('xzabc', 0.111), ('potentials', 0.111), ('xzc', 0.105), ('cp', 0.105), ('evidence', 0.103), ('owell', 0.1), ('shachter', 0.1), ('onditional', 0.094), ('ropagation', 0.094), ('propagation', 0.088), ('continuous', 0.088), ('cliques', 0.087), ('kenley', 0.082), ('abcde', 0.076), ('ush', 0.074), ('operation', 0.074), ('clique', 0.07), ('entered', 0.066), ('jensen', 0.062), ('ag', 0.061), ('strong', 0.06), ('ct', 0.059), ('abcd', 0.059), ('acd', 0.059), ('toroot', 0.059), ('head', 0.059), ('etworks', 0.059), ('clusters', 0.058), ('stored', 0.057), ('density', 0.055), ('node', 0.055), ('ri', 0.054), ('scheme', 0.053), ('bayesian', 0.053), ('postbags', 0.053), ('conditional', 0.052), ('aussian', 0.051), ('ordering', 0.05), ('marginal', 0.049), ('posterior', 0.049), ('cdef', 0.047), ('purely', 0.045), ('empty', 0.044), ('wl', 0.044), ('neighbouring', 0.044), ('ab', 0.044), ('fy', 0.044), ('formulae', 0.044), ('moral', 0.044), ('separators', 0.044), ('cowell', 0.044), ('message', 0.043), ('operations', 0.042), ('boundary', 0.042), ('tail', 0.042), ('da', 0.04), ('variables', 0.036), ('parents', 0.036), ('store', 0.036), ('active', 0.036), ('gm', 0.036), ('passing', 0.035), ('acde', 0.035), ('aiwi', 0.035), ('bcde', 0.035), ('bcdf', 0.035), ('cde', 0.035), ('ciwi', 0.035), ('triangulated', 0.035), ('allocation', 0.035), ('triangulation', 0.034), ('ch', 0.033), ('propagated', 0.033), ('separator', 0.032), ('graph', 0.032), ('root', 0.031), ('pa', 0.031), ('tables', 0.031), ('bc', 0.031), ('vk', 0.03), ('recursive', 0.03), ('nodes', 0.03), ('factored', 0.029), ('network', 0.029), ('intersection', 0.028), ('copy', 0.028), ('entering', 0.028)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000014 51 jmlr-2005-Local Propagation in Conditional Gaussian Bayesian Networks

Author: Robert G. Cowell

Abstract: This paper describes a scheme for local computation in conditional Gaussian Bayesian networks that combines the approach of Lauritzen and Jensen (2001) with some elements of Shachter and Kenley (1989). Message passing takes place on an elimination tree structure rather than the more compact (and usual) junction tree of cliques. This yields a local computation scheme in which all calculations involving the continuous variables are performed by manipulating univariate regressions, and hence matrix operations are avoided. Keywords: Bayesian networks, conditional Gaussian distributions, propagation algorithm, elimination tree

2 0.094296552 71 jmlr-2005-Variational Message Passing

Author: John Winn, Christopher M. Bishop

Abstract: Bayesian inference is now widely established as one of the principal foundations for machine learning. In practice, exact inference is rarely possible, and so a variety of approximation techniques have been developed, one of the most widely used being a deterministic framework called variational inference. In this paper we introduce Variational Message Passing (VMP), a general purpose algorithm for applying variational inference to Bayesian Networks. Like belief propagation, VMP proceeds by sending messages between nodes in the network and updating posterior beliefs using local operations at each node. Each such update increases a lower bound on the log evidence (unless already at a local maximum). In contrast to belief propagation, VMP can be applied to a very general class of conjugate-exponential models because it uses a factorised variational approximation. Furthermore, by introducing additional variational parameters, VMP can be applied to models containing non-conjugate distributions. The VMP framework also allows the lower bound to be evaluated, and this can be used both for model comparison and for detection of convergence. Variational message passing has been implemented in the form of a general purpose inference engine called VIBES (‘Variational Inference for BayEsian networkS’) which allows models to be specified graphically and then solved variationally without recourse to coding. Keywords: Bayesian networks, variational inference, message passing

3 0.084522575 17 jmlr-2005-Change Point Problems in Linear Dynamical Systems

Author: Onno Zoeter, Tom Heskes

Abstract: We study the problem of learning two regimes (we have a normal and a prefault regime in mind) based on a train set of non-Markovian observation sequences. Key to the model is that we assume that once the system switches from the normal to the prefault regime it cannot restore and will eventually result in a fault. We refer to the particular setting as semi-supervised since we assume the only information given to the learner is whether a particular sequence ended with a stop (implying that the sequence was generated by the normal regime) or with a fault (implying that there was a switch from the normal to the fault regime). In the latter case the particular time point at which a switch occurred is not known. The underlying model used is a switching linear dynamical system (SLDS). The constraints in the regime transition probabilities result in an exact inference procedure that scales quadratically with the length of a sequence. Maximum aposteriori (MAP) parameter estimates can be found using an expectation maximization (EM) algorithm with this inference algorithm in the E-step. For long sequences this will not be practically feasible and an approximate inference and an approximate EM procedure is called for. We describe a Ä?Ĺš‚exible class of approximations corresponding to different choices of clusters in a Kikuchi free energy with weak consistency constraints. Keywords: change point problems, switching linear dynamical systems, strong junction trees, approximate inference, expectation propagation, Kikuchi free energies

4 0.057665315 2 jmlr-2005-A Bayesian Model for Supervised Clustering with the Dirichlet Process Prior

Author: Hal Daumé III, Daniel Marcu

Abstract: We develop a Bayesian framework for tackling the supervised clustering problem, the generic problem encountered in tasks such as reference matching, coreference resolution, identity uncertainty and record linkage. Our clustering model is based on the Dirichlet process prior, which enables us to define distributions over the countably infinite sets that naturally arise in this problem. We add supervision to our model by positing the existence of a set of unobserved random variables (we call these “reference types”) that are generic across all clusters. Inference in our framework, which requires integrating over infinitely many parameters, is solved using Markov chain Monte Carlo techniques. We present algorithms for both conjugate and non-conjugate priors. We present a simple—but general—parameterization of our model based on a Gaussian assumption. We evaluate this model on one artificial task and three real-world tasks, comparing it against both unsupervised and state-of-the-art supervised algorithms. Our results show that our model is able to outperform other models across a variety of tasks and performance metrics. Keywords: supervised clustering, record linkage, citation matching, coreference, Dirichlet process, non-parametric Bayesian

5 0.055015523 52 jmlr-2005-Loopy Belief Propagation: Convergence and Effects of Message Errors

Author: Alexander T. Ihler, John W. Fisher III, Alan S. Willsky

Abstract: Belief propagation (BP) is an increasingly popular method of performing approximate inference on arbitrary graphical models. At times, even further approximations are required, whether due to quantization of the messages or model parameters, from other simplified message or model representations, or from stochastic approximation methods. The introduction of such errors into the BP message computations has the potential to affect the solution obtained adversely. We analyze the effect resulting from message approximation under two particular measures of error, and show bounds on the accumulation of errors in the system. This analysis leads to convergence conditions for traditional BP message passing, and both strict bounds and estimates of the resulting error in systems of approximate BP message passing. Keywords: belief propagation, sum-product, convergence, approximate inference, quantization

6 0.053539064 32 jmlr-2005-Expectation Consistent Approximate Inference

7 0.053526953 19 jmlr-2005-Clustering on the Unit Hypersphere using von Mises-Fisher Distributions

8 0.05325963 68 jmlr-2005-Tree-Based Batch Mode Reinforcement Learning

9 0.048597138 6 jmlr-2005-A Modified Finite Newton Method for Fast Solution of Large Scale Linear SVMs

10 0.046129413 27 jmlr-2005-Dimension Reduction in Text Classification with Support Vector Machines

11 0.045950707 36 jmlr-2005-Gaussian Processes for Ordinal Regression

12 0.045185342 34 jmlr-2005-Feature Selection for Unsupervised and Supervised Inference: The Emergence of Sparsity in a Weight-Based Approach

13 0.044802949 14 jmlr-2005-Assessing Approximate Inference for Binary Gaussian Process Classification

14 0.041404966 42 jmlr-2005-Large Margin Methods for Structured and Interdependent Output Variables

15 0.040530927 40 jmlr-2005-Inner Product Spaces for Bayesian Networks

16 0.039384216 43 jmlr-2005-Learning Hidden Variable Networks: The Information Bottleneck Approach

17 0.038419344 70 jmlr-2005-Universal Algorithms for Learning Theory Part I : Piecewise Constant Functions

18 0.036950644 15 jmlr-2005-Asymptotic Model Selection for Naive Bayesian Networks

19 0.032114241 73 jmlr-2005-Working Set Selection Using Second Order Information for Training Support Vector Machines

20 0.028650874 44 jmlr-2005-Learning Module Networks


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.191), (1, -0.091), (2, 0.023), (3, -0.198), (4, -0.079), (5, 0.221), (6, -0.068), (7, 0.082), (8, -0.079), (9, -0.157), (10, 0.095), (11, 0.061), (12, 0.043), (13, 0.019), (14, 0.072), (15, -0.175), (16, -0.079), (17, 0.022), (18, 0.025), (19, -0.014), (20, -0.124), (21, 0.123), (22, -0.231), (23, 0.112), (24, 0.095), (25, -0.017), (26, -0.15), (27, 0.088), (28, 0.1), (29, -0.084), (30, 0.06), (31, -0.002), (32, -0.034), (33, 0.281), (34, -0.146), (35, 0.018), (36, 0.239), (37, 0.081), (38, -0.053), (39, -0.333), (40, -0.095), (41, -0.121), (42, 0.129), (43, -0.048), (44, -0.149), (45, -0.008), (46, 0.093), (47, 0.083), (48, -0.013), (49, -0.085)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.98338145 51 jmlr-2005-Local Propagation in Conditional Gaussian Bayesian Networks

Author: Robert G. Cowell

Abstract: This paper describes a scheme for local computation in conditional Gaussian Bayesian networks that combines the approach of Lauritzen and Jensen (2001) with some elements of Shachter and Kenley (1989). Message passing takes place on an elimination tree structure rather than the more compact (and usual) junction tree of cliques. This yields a local computation scheme in which all calculations involving the continuous variables are performed by manipulating univariate regressions, and hence matrix operations are avoided. Keywords: Bayesian networks, conditional Gaussian distributions, propagation algorithm, elimination tree

2 0.3232421 17 jmlr-2005-Change Point Problems in Linear Dynamical Systems

Author: Onno Zoeter, Tom Heskes

Abstract: We study the problem of learning two regimes (we have a normal and a prefault regime in mind) based on a train set of non-Markovian observation sequences. Key to the model is that we assume that once the system switches from the normal to the prefault regime it cannot restore and will eventually result in a fault. We refer to the particular setting as semi-supervised since we assume the only information given to the learner is whether a particular sequence ended with a stop (implying that the sequence was generated by the normal regime) or with a fault (implying that there was a switch from the normal to the fault regime). In the latter case the particular time point at which a switch occurred is not known. The underlying model used is a switching linear dynamical system (SLDS). The constraints in the regime transition probabilities result in an exact inference procedure that scales quadratically with the length of a sequence. Maximum aposteriori (MAP) parameter estimates can be found using an expectation maximization (EM) algorithm with this inference algorithm in the E-step. For long sequences this will not be practically feasible and an approximate inference and an approximate EM procedure is called for. We describe a Ä?Ĺš‚exible class of approximations corresponding to different choices of clusters in a Kikuchi free energy with weak consistency constraints. Keywords: change point problems, switching linear dynamical systems, strong junction trees, approximate inference, expectation propagation, Kikuchi free energies

3 0.28866997 71 jmlr-2005-Variational Message Passing

Author: John Winn, Christopher M. Bishop

Abstract: Bayesian inference is now widely established as one of the principal foundations for machine learning. In practice, exact inference is rarely possible, and so a variety of approximation techniques have been developed, one of the most widely used being a deterministic framework called variational inference. In this paper we introduce Variational Message Passing (VMP), a general purpose algorithm for applying variational inference to Bayesian Networks. Like belief propagation, VMP proceeds by sending messages between nodes in the network and updating posterior beliefs using local operations at each node. Each such update increases a lower bound on the log evidence (unless already at a local maximum). In contrast to belief propagation, VMP can be applied to a very general class of conjugate-exponential models because it uses a factorised variational approximation. Furthermore, by introducing additional variational parameters, VMP can be applied to models containing non-conjugate distributions. The VMP framework also allows the lower bound to be evaluated, and this can be used both for model comparison and for detection of convergence. Variational message passing has been implemented in the form of a general purpose inference engine called VIBES (‘Variational Inference for BayEsian networkS’) which allows models to be specified graphically and then solved variationally without recourse to coding. Keywords: Bayesian networks, variational inference, message passing

4 0.2487842 68 jmlr-2005-Tree-Based Batch Mode Reinforcement Learning

Author: Damien Ernst, Pierre Geurts, Louis Wehenkel

Abstract: Reinforcement learning aims to determine an optimal control policy from interaction with a system or from observations gathered from a system. In batch mode, it can be achieved by approximating the so-called Q-function based on a set of four-tuples (xt , ut , rt , xt+1 ) where xt denotes the system state at time t, ut the control action taken, rt the instantaneous reward obtained and xt+1 the successor state of the system, and by determining the control policy from this Q-function. The Q-function approximation may be obtained from the limit of a sequence of (batch mode) supervised learning problems. Within this framework we describe the use of several classical tree-based supervised learning methods (CART, Kd-tree, tree bagging) and two newly proposed ensemble algorithms, namely extremely and totally randomized trees. We study their performances on several examples and find that the ensemble methods based on regression trees perform well in extracting relevant information about the optimal control policy from sets of four-tuples. In particular, the totally randomized trees give good results while ensuring the convergence of the sequence, whereas by relaxing the convergence constraint even better accuracy results are provided by the extremely randomized trees. Keywords: batch mode reinforcement learning, regression trees, ensemble methods, supervised learning, fitted value iteration, optimal control

5 0.21969135 6 jmlr-2005-A Modified Finite Newton Method for Fast Solution of Large Scale Linear SVMs

Author: S. Sathiya Keerthi, Dennis DeCoste

Abstract: This paper develops a fast method for solving linear SVMs with L2 loss function that is suited for large scale data mining tasks such as text classification. This is done by modifying the finite Newton method of Mangasarian in several ways. Experiments indicate that the method is much faster than decomposition methods such as SVMlight , SMO and BSVM (e.g., 4-100 fold), especially when the number of examples is large. The paper also suggests ways of extending the method to other loss functions such as the modified Huber’s loss function and the L1 loss function, and also for solving ordinal regression. Keywords: linear SVMs, classification, conjugate gradient

6 0.17980936 42 jmlr-2005-Large Margin Methods for Structured and Interdependent Output Variables

7 0.16636339 27 jmlr-2005-Dimension Reduction in Text Classification with Support Vector Machines

8 0.16593656 44 jmlr-2005-Learning Module Networks

9 0.14281063 3 jmlr-2005-A Classification Framework for Anomaly Detection

10 0.14032105 70 jmlr-2005-Universal Algorithms for Learning Theory Part I : Piecewise Constant Functions

11 0.1358899 2 jmlr-2005-A Bayesian Model for Supervised Clustering with the Dirichlet Process Prior

12 0.13229929 41 jmlr-2005-Kernel Methods for Measuring Independence

13 0.13091984 32 jmlr-2005-Expectation Consistent Approximate Inference

14 0.1305508 40 jmlr-2005-Inner Product Spaces for Bayesian Networks

15 0.12982097 39 jmlr-2005-Information Bottleneck for Gaussian Variables

16 0.11704562 19 jmlr-2005-Clustering on the Unit Hypersphere using von Mises-Fisher Distributions

17 0.11675326 52 jmlr-2005-Loopy Belief Propagation: Convergence and Effects of Message Errors

18 0.11639879 9 jmlr-2005-Active Learning to Recognize Multiple Types of Plankton

19 0.1127632 14 jmlr-2005-Assessing Approximate Inference for Binary Gaussian Process Classification

20 0.11128198 36 jmlr-2005-Gaussian Processes for Ordinal Regression


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(13, 0.025), (17, 0.021), (19, 0.033), (36, 0.035), (37, 0.031), (43, 0.032), (47, 0.022), (52, 0.059), (59, 0.011), (64, 0.489), (70, 0.05), (80, 0.011), (88, 0.073), (90, 0.01), (94, 0.014)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.80563724 51 jmlr-2005-Local Propagation in Conditional Gaussian Bayesian Networks

Author: Robert G. Cowell

Abstract: This paper describes a scheme for local computation in conditional Gaussian Bayesian networks that combines the approach of Lauritzen and Jensen (2001) with some elements of Shachter and Kenley (1989). Message passing takes place on an elimination tree structure rather than the more compact (and usual) junction tree of cliques. This yields a local computation scheme in which all calculations involving the continuous variables are performed by manipulating univariate regressions, and hence matrix operations are avoided. Keywords: Bayesian networks, conditional Gaussian distributions, propagation algorithm, elimination tree

2 0.23333962 71 jmlr-2005-Variational Message Passing

Author: John Winn, Christopher M. Bishop

Abstract: Bayesian inference is now widely established as one of the principal foundations for machine learning. In practice, exact inference is rarely possible, and so a variety of approximation techniques have been developed, one of the most widely used being a deterministic framework called variational inference. In this paper we introduce Variational Message Passing (VMP), a general purpose algorithm for applying variational inference to Bayesian Networks. Like belief propagation, VMP proceeds by sending messages between nodes in the network and updating posterior beliefs using local operations at each node. Each such update increases a lower bound on the log evidence (unless already at a local maximum). In contrast to belief propagation, VMP can be applied to a very general class of conjugate-exponential models because it uses a factorised variational approximation. Furthermore, by introducing additional variational parameters, VMP can be applied to models containing non-conjugate distributions. The VMP framework also allows the lower bound to be evaluated, and this can be used both for model comparison and for detection of convergence. Variational message passing has been implemented in the form of a general purpose inference engine called VIBES (‘Variational Inference for BayEsian networkS’) which allows models to be specified graphically and then solved variationally without recourse to coding. Keywords: Bayesian networks, variational inference, message passing

3 0.23269469 17 jmlr-2005-Change Point Problems in Linear Dynamical Systems

Author: Onno Zoeter, Tom Heskes

Abstract: We study the problem of learning two regimes (we have a normal and a prefault regime in mind) based on a train set of non-Markovian observation sequences. Key to the model is that we assume that once the system switches from the normal to the prefault regime it cannot restore and will eventually result in a fault. We refer to the particular setting as semi-supervised since we assume the only information given to the learner is whether a particular sequence ended with a stop (implying that the sequence was generated by the normal regime) or with a fault (implying that there was a switch from the normal to the fault regime). In the latter case the particular time point at which a switch occurred is not known. The underlying model used is a switching linear dynamical system (SLDS). The constraints in the regime transition probabilities result in an exact inference procedure that scales quadratically with the length of a sequence. Maximum aposteriori (MAP) parameter estimates can be found using an expectation maximization (EM) algorithm with this inference algorithm in the E-step. For long sequences this will not be practically feasible and an approximate inference and an approximate EM procedure is called for. We describe a Ä?Ĺš‚exible class of approximations corresponding to different choices of clusters in a Kikuchi free energy with weak consistency constraints. Keywords: change point problems, switching linear dynamical systems, strong junction trees, approximate inference, expectation propagation, Kikuchi free energies

4 0.23241185 49 jmlr-2005-Learning the Kernel with Hyperkernels     (Kernel Machines Section)

Author: Cheng Soon Ong, Alexander J. Smola, Robert C. Williamson

Abstract: This paper addresses the problem of choosing a kernel suitable for estimation with a support vector machine, hence further automating machine learning. This goal is achieved by defining a reproducing kernel Hilbert space on the space of kernels itself. Such a formulation leads to a statistical estimation problem similar to the problem of minimizing a regularized risk functional. We state the equivalent representer theorem for the choice of kernels and present a semidefinite programming formulation of the resulting optimization problem. Several recipes for constructing hyperkernels are provided, as well as the details of common machine learning problems. Experimental results for classification, regression and novelty detection on UCI data show the feasibility of our approach. Keywords: learning the kernel, capacity control, kernel methods, support vector machines, representer theorem, semidefinite programming

5 0.2305724 36 jmlr-2005-Gaussian Processes for Ordinal Regression

Author: Wei Chu, Zoubin Ghahramani

Abstract: We present a probabilistic kernel approach to ordinal regression based on Gaussian processes. A threshold model that generalizes the probit function is used as the likelihood function for ordinal variables. Two inference techniques, based on the Laplace approximation and the expectation propagation algorithm respectively, are derived for hyperparameter learning and model selection. We compare these two Gaussian process approaches with a previous ordinal regression method based on support vector machines on some benchmark and real-world data sets, including applications of ordinal regression to collaborative filtering and gene expression analysis. Experimental results on these data sets verify the usefulness of our approach. Keywords: Gaussian processes, ordinal regression, approximate Bayesian inference, collaborative filtering, gene expression analysis, feature selection

6 0.22708923 33 jmlr-2005-Fast Kernel Classifiers with Online and Active Learning

7 0.22635029 64 jmlr-2005-Semigroup Kernels on Measures

8 0.22539029 39 jmlr-2005-Information Bottleneck for Gaussian Variables

9 0.22352219 31 jmlr-2005-Estimation of Non-Normalized Statistical Models by Score Matching

10 0.22320125 3 jmlr-2005-A Classification Framework for Anomaly Detection

11 0.22211067 46 jmlr-2005-Learning a Mahalanobis Metric from Equivalence Constraints

12 0.2214849 44 jmlr-2005-Learning Module Networks

13 0.22014564 20 jmlr-2005-Clustering with Bregman Divergences

14 0.2195828 52 jmlr-2005-Loopy Belief Propagation: Convergence and Effects of Message Errors

15 0.21940474 14 jmlr-2005-Assessing Approximate Inference for Binary Gaussian Process Classification

16 0.21938966 48 jmlr-2005-Learning the Kernel Function via Regularization

17 0.21928848 19 jmlr-2005-Clustering on the Unit Hypersphere using von Mises-Fisher Distributions

18 0.2169168 63 jmlr-2005-Quasi-Geodesic Neural Learning Algorithms Over the Orthogonal Group: A Tutorial

19 0.21686447 11 jmlr-2005-Algorithmic Stability and Meta-Learning

20 0.21647657 35 jmlr-2005-Frames, Reproducing Kernels, Regularization and Learning