nips nips2000 nips2000-106 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Zoubin Ghahramani, Matthew J. Beal
Abstract: Variational approximations are becoming a widespread tool for Bayesian learning of graphical models. We provide some theoretical results for the variational updates in a very general family of conjugate-exponential graphical models. We show how the belief propagation and the junction tree algorithms can be used in the inference step of variational Bayesian learning. Applying these results to the Bayesian analysis of linear-Gaussian state-space models we obtain a learning procedure that exploits the Kalman smoothing propagation, while integrating over all model parameters. We demonstrate how this can be used to infer the hidden state dimensionality of the state-space model in a variety of synthetic problems and one real high-dimensional data set. 1
Reference: text
sentIndex sentText sentNum sentScore
1 uk Abstract Variational approximations are becoming a widespread tool for Bayesian learning of graphical models. [sent-6, score-0.057]
2 We provide some theoretical results for the variational updates in a very general family of conjugate-exponential graphical models. [sent-7, score-0.581]
3 We show how the belief propagation and the junction tree algorithms can be used in the inference step of variational Bayesian learning. [sent-8, score-1.021]
4 Applying these results to the Bayesian analysis of linear-Gaussian state-space models we obtain a learning procedure that exploits the Kalman smoothing propagation, while integrating over all model parameters. [sent-9, score-0.232]
5 We demonstrate how this can be used to infer the hidden state dimensionality of the state-space model in a variety of synthetic problems and one real high-dimensional data set. [sent-10, score-0.348]
6 Moreover it is possible to learn model structures and readily compare between model classes. [sent-14, score-0.084]
7 Unfortunately, for most models of interest a full Bayesian analysis is computationally intractable. [sent-15, score-0.086]
8 The recent introduction of variational methods for Bayesian learning has resulted in the series of papers showing that these methods can be used to rapidly learn the model structure and approximate the evidence in a wide variety of models. [sent-17, score-0.545]
9 In this paper we will not motivate advantages of the variational Bayesian approach as this is done in previous papers [1, 5]. [sent-18, score-0.416]
10 Rather we focus on deriving variational Bayesian (VB) learning in a very general form, relating it to EM, motivating parameter-hidden variable factorisations, and the use of conjugate priors (section 3). [sent-19, score-0.57]
11 We then present several theoretical results relating VB learning to the belief propagation and junction tree algorithms for inference in belief networks and Markov networks (section 4). [sent-20, score-0.924]
12 Finally, we show how these results can be applied to learning the dimensionality of the hidden state space of linear dynamical systems (section 5). [sent-21, score-0.298]
13 From (1) we can see that this maximisation is equivalent to minimising the KL divergence between Qx(x)Qo(O) and the joint posterior over hidden states and parameters P(x, Oly, M). [sent-24, score-0.316]
14 This approach was first proposed for one-hidden layer neural networks [6] under the restriction that Qo(O) is Gaussian. [sent-25, score-0.057]
15 It has since been extended to models with hidden variables and the restrictions on Qo(O) and Qx(x) have been removed in certain models to allow arbitrary distributions [11, 8, 3, 1, 5]. [sent-26, score-0.44]
16 Free-form optimisation with respect to the distributions Qo(O) and Qx(x) is done using calculus of variations, often resulting in algorithms that appear closely related to the corresponding EM algorithm. [sent-27, score-0.097]
17 3 Conjugate-Exponential Models We consider variational Bayesian learning in models that satisfy two conditions: Condition (1). [sent-29, score-0.553]
18 The complete data likelihood is in the exponential family: P(x,yIO) = f(x,y) g(O)exp{¢(O)T u(x,y)} where ¢( 0) is the vector of natural parameters, and u and f and g are the functions that define the exponential family . [sent-30, score-0.375]
19 The list of latent-variable models of practical interest with complete-data likelihoods in the exponential family is very long. [sent-31, score-0.283]
20 We mention a few: Gaussian mixtures, factor analysis, hidden Markov models and extensions, switching state-space models, Boltzmann machines, and discrete-variable belief networks. [sent-32, score-0.48]
21 1 Of course, there are also many as yet undreamed-of models combining Gaussian, Gamma, Poisson, Dirichlet, Wishart, Multinomial, and other distributions. [sent-33, score-0.086]
22 The parameter prior is conjugate to the complete data likelihood: P(OI"7, v) = h("7, v) g(O)'1 exp {¢(O) TV} where "7 and v are hyperparameters of the prior. [sent-35, score-0.392]
23 Apart from some irregular cases, it has been shown that the exponential families are the only classes of distributions with a fixed number of sufficient statistics, hence allowing them to have natural conjugate priors. [sent-37, score-0.307]
24 From the definition of conjugacy it is easy to see that the hyperparameters of a conjugate prior can be interpreted as the number ("7) and values (v) of pseudo-observations under the corresponding likelihood. [sent-38, score-0.384]
25 We call models that satisfy conditions (1) and (2) conjugate-exponential. [sent-39, score-0.137]
26 IModels whose complete-data likelihood is not in the exponential family (such as ICA with the logistic nonlinearity, or sigmoid belief networks) can often be approximated by models in the exponential family with additional hidden variables. [sent-40, score-0.834]
27 In Bayesian inference we want to determine the posterior over parameters and hidden variables P(x, 91y, 'f'J, v). [sent-41, score-0.318]
28 In general this posterior is neither conjugate nor in the exponential family. [sent-42, score-0.259]
29 We therefore approximate the true posterior by the following factorised distribution: P(x, 91y, 'f'J, v) :::::i Q(x, 9) = Qx(x)Q9(9), and minimise KL(QIIP) = fdX d9 Q(X, 9) In (Q~7,9) ) Y,'f'J,V P x, which is equivalent to maximising F(Qx(X),Q9(9),y). [sent-43, score-0.061]
30 Yn), if the model satisfies conditions (1) and (2), then at the maxima of F(Q,y) (minima of KL(QIIP)): (a) Q9(9) is conjugate and of the form: Q9(9) = h(ij, v)g(9)7) exp {4>(9) Tv} where ij = 'f'J+n, v = v+ L~=l U(Yi), and U(Yi) = (U(Xi,yi))Q, using (. [sent-48, score-0.2]
31 Li U(Yi) VM Step: Compute the expected natural parameters ¢( 9) under the parameter distribution given by ij and v. [sent-58, score-0.098]
32 Dirac delta function), Q9(9) = 15(9 - 9*), in which case the M step involves re-estimating 9*. [sent-60, score-0.052]
33 Note that unless we make the assumption that the parameters and hidden variables factorise, we will not generally obtain the further hidden variable factorisation over n in (b). [sent-61, score-0.532]
34 In that case, the distributions of Xi and Xj will be coupled for all cases i,j in the data set, greatly increasing the overall computational complexity of inference. [sent-62, score-0.059]
35 4 Belief Networks and Markov Networks The above result can be used to derive variational Bayesian learning algorithms for exponential family distributions that fall into two important special classes. [sent-63, score-0.72]
36 Let M be a conjugate-exponential model with hidden and visible variables z = (x, y) that satisfy a belief network factorisation. [sent-65, score-0.557]
37 Then the approximating joint distribution for M satisfies the same belief network factorisation: Qz(z) = II Q(zjlzp;,ij) j 2 A tutorial on belief networks and Markov networks can be found in [9]. [sent-67, score-0.706]
38 where the conditional distributions have exactly the same form as those in the original model but with natural parameters ¢(O) = ¢(9). [sent-68, score-0.199]
39 This result is somewhat surprising as it shows that it is possible to infer the hidden states tractably while integrating over an ensemble of model parameters. [sent-70, score-0.298]
40 This result generalises the derivation of variational learning for HMMs in [8], which uses the forward-backward algorithm as a subroutine. [sent-71, score-0.461]
41 Let M be a model with hidden and visible variables z = (x, y) that satisfy a Markov network factorisation. [sent-73, score-0.345]
42 That is, the joint density can be written as a product of clique-potentials 'lj;j, P(zI9) = g(9) TI j 'lj;j(Cj , 9), where each clique C j is a subset of the variables in z. [sent-74, score-0.204]
43 Then the approximating joint distribution for M satisfies the same Markov network factorisation: Qz(z) = 9 II ¢j (Cj ) j where ¢j (Cj ) = exp { (In 'lj;j (Cj , 9))Q} are new clique potentials obtained by averaging over Qe(9), and 9 is a normalisation constant. [sent-75, score-0.289]
44 Furthermore, the expectations under the approximating posterior Qx(x) required for the VE Step can be obtained by applying the junction tree algorithm. [sent-76, score-0.457]
45 Let M be a conjugate-exponential Markov network over the variables in z . [sent-78, score-0.11]
46 Then the approximating joint distribution for M is given by Qz(z) = gTIj 'lj;j(Cj,O), where the clique potentials have exactly the same form as those in the original model but with natural parameters ¢(O) = ¢(9). [sent-79, score-0.337]
47 For conjugate-exponential models in which belief propagation and the junction tree algorithm over hidden variables is intractable further applications of Jensen's inequality can yield tractable factorisations in the usual way [7]. [sent-80, score-1.038]
48 In the following section we derive a variational Bayesian treatment of linearGaussian state-space models. [sent-81, score-0.495]
49 Second, linear-Gaussian state-space models are the cornerstone of stochastic filtering, prediction and control. [sent-84, score-0.086]
50 A variational Bayesian treatment of these models provides a novel way to learn their structure, i. [sent-85, score-0.581]
51 The joint probability of a sequence of states and observations is therefore given by (Figure 1): T P(Xl:T' Yl:T) = P(Xl)P(Yllxl) II P(Xt IXt-l)P(Yt IXt). [sent-89, score-0.065]
52 t=2 We focus on the case where both the transition and output functions are linear and time-invariant and the distribution of the state and observation noise variables is Gaussian. [sent-90, score-0.128]
53 This model is the linear-Gaussian state-space model: Yt = CXt +Vt ~~···1 T ® @ © ~ Figure 1: Belief network representation of a state-space model. [sent-91, score-0.085]
54 where A and C are the state transition and emission matrices and Wt and Vt are state and output noise. [sent-92, score-0.122]
55 A Bayesian analysis of state-space models using MCMC methods can be found in [4]. [sent-94, score-0.086]
56 The complete data likelihood for state-space models is Gaussian, which falls within the class of exponential family distributions. [sent-95, score-0.362]
57 In order to derive a variational Bayesian algorithm by applying the results in the previous section we now turn to defining conjugate priors over the parameters. [sent-96, score-0.634]
58 Without loss of generality we can assume that Wt has covariance equal to the unit matrix. [sent-98, score-0.065]
59 The remaining parameters of a linear-Gaussian state-space model are the matrices A and C and the covariance matrix of the output noise, Vt , which we will call R and assume to be diagonal, R = diag(p)-l, where Pi are the precisions (inverse variances) associated with each output. [sent-99, score-0.155]
60 Each row vector of the A matrix, denoted a"[, is given a zero mean Gaussian prior with inverse covariance matrix equal to diag( a). [sent-100, score-0.16]
61 Each row vector of C, c"[, is given a zero-mean Gaussian prior with precision matrix equal to diag(pi,8). [sent-101, score-0.15]
62 The dependence of the precision of c"[ on the noise output precision Pi is motivated by conjugacy. [sent-102, score-0.11]
63 Intuitively, this prior links the scale of the signal and noise. [sent-103, score-0.112]
64 The prior over the output noise covariance matrix, R, is defined through the precision vector, p, which for conjugacy is assumed to be Gamma distributed 3 with p~-l exp{ -bpi}. [sent-104, score-0.249]
65 Here, a, ,8 are hyperparameters a and b: P(p la, b) = I1~1 hyperparameters that we can optimise to do automatic relevance determination (ARD) of hidden states, thus inferring the structure of the SSM. [sent-105, score-0.52]
66 A:) Variational Bayesian learning for SSMs Since A, C, p and Xl :T are all unknown, given a sequence of observations Yl:T, an exact Bayesian treatment of SSMs would require computing marginals of the posterior P(A, C, p, xl:TIY1:T). [sent-106, score-0.14]
67 This posterior contains interaction terms up to fifth order (for example, between elements of C, x and p), and is not analytically manageable. [sent-107, score-0.061]
68 However, since the model is conjugate-exponential we can apply Theorem 1 to derive a variational EM algorithm for state-space models analogous to the maximumlikelihood EM algorithm [10]. [sent-108, score-0.634]
69 Moreover, since SSMs are singly connected belief networks Corollary 1 tells us that we can make use of belief propagation, which in the case of SSMs is known as the Kalman smoother. [sent-109, score-0.543]
70 This observation implies a further factorisation, Q(A, C,p) = Q(A)Q(C, p), which falls out of the initial factorisation and the conditional independencies of the model. [sent-111, score-0.173]
71 Starting from some arbitrary distribution over the hidden variables, the VM step obtained by applying Theorem 1 computes the expected natural parameters of Q9((}), where (} = (A,C,p). [sent-112, score-0.356]
72 3More generally, if we let R be a full covariance matrix for conjugacy we would give its inverse V = R- 1 a Wishart distribution: P(Vlv, S) ex IVI(v-D-1)/2 exp {-~tr VS- 1 } , where tr is the matrix trace operator. [sent-113, score-0.194]
73 A has mean ST(diag(o:) + W)-l and each row of A has covariance (diag(o:) + W)-I, where S = Ei'=2 (Xt-lXi) , W = Ei'. [sent-116, score-0.111]
74 Given p, each row of C is Gaussian with covariance COV(Ci) = (diag(,8) + W,)-l / Pi and mean Ci = Pi Ui COV(Ci). [sent-124, score-0.111]
75 Using the parameter distributions the hyperparameters can also be optimised. [sent-126, score-0.205]
76 Since the model is a conjugateexponential singly-connected belief network, we can use belief propagation (Corollary 1). [sent-128, score-0.61]
77 For SSMs this corresponds to the Kalman smoothing algorithm, where every appearance of the natural parameters of the model is replaced with the following corresponding expectations under the Q distribution: (PiCi), (PiCiCi) , (A), (A T A). [sent-129, score-0.269]
78 Like for PCA [3], independent components analysis [1], and mixtures of factor analysers [5], the variational Bayesian algorithm for state-space models can be used to learn the structure of the model as well as average over parameters. [sent-131, score-0.692]
79 Specifically, using F it is possible to compare models with different state-space sizes and optimise the dimensionality of the state-space, as we demonstrate in the following section. [sent-132, score-0.173]
80 6 Results Experiment 1: The goal of this experiment was to see if the variational method could infer the structure of a variety of state space models by optimising over 0: and ,8. [sent-133, score-0.665]
81 an SSM with A = 0) with 3 factors (static state variables); (b) an SSM with 3 dynamical interacting state variables, Le. [sent-135, score-0.226]
82 A::p 0; (c) an SSM with 3 interacting dynamical and 1 static state variables. [sent-136, score-0.215]
83 The variational Bayesian method correctly inferred the structure of each model in 2-3 minutes of CPU time on a 500 MHz Pentium III (Fig. [sent-137, score-0.496]
84 On reducing the length of the time series from 400 to 10 steps the recovered structure became progressively less complex (Fig. [sent-141, score-0.087]
85 The variational algorithm inferred that 16 state variables were required, of which 14 emitted outputs. [sent-146, score-0.589]
86 4The ARD hyperparameters become Ok = (AT~}kk ' and 13k = (CTdia~p)C}kk . [sent-148, score-0.146]
87 The hyperparameters a and b solve the fixed point equations '1j;(a) = Inb+ ~~l (Inpi), and = ab ~~l (Pi) , where '1j;(w) = BBw In r(w) is the digamma function. [sent-149, score-0.146]
88 {1 A link is drawn from node k in Xt-l to node 1 in Xt iff -. [sent-154, score-0.076]
89 Similarly links are drawn from node k of Xt to Yt if {31 > E. [sent-164, score-0.101]
90 01 k Therefore the graph shows the links that take part in the dynamics and the output. [sent-165, score-0.063]
91 7 Conclusions We have derived a general variational Bayesian learning algorithm for models in the conjugate-exponential family. [sent-166, score-0.547]
92 There are a large number of interesting models that fall in this family, and the results in this paper should allow an almost automated protocol for implementing a variational Bayesian treatment of these models. [sent-167, score-0.629]
93 We have given one example of such an implementation, state-space models, and shown that the VB algorithm can be used to rapidly infer the hidden state dimensionality. [sent-168, score-0.312]
94 Using the theory laid out in this paper it is straightforward to generalise the algorithm to mixtures of SSMs, switching SSMs, etc. [sent-169, score-0.198]
95 For conjugate-exponential models, integrating both belief propagation and the junction tree algorithm into the variational Bayesian framework simply amounts to computing expectations of the natural parameters. [sent-170, score-1.189]
96 Moreover, the variational Bayesian algorithm contains EM as a special case. [sent-171, score-0.461]
97 We believe this paper provides the foundations for a general algorithm for variational Bayesian learning in graphical models. [sent-172, score-0.518]
98 Bayesian model discrimination and Bayes factors for linear Gaussian state space models. [sent-192, score-0.103]
99 Keeping neural networks simple by minimizing the description length ofthe weights. [sent-214, score-0.057]
100 An approach to time series smoothing and forecasting using the EM algorithm. [sent-240, score-0.103]
wordName wordTfidf (topN-words)
[('variational', 0.416), ('bayesian', 0.248), ('ssms', 0.247), ('belief', 0.212), ('qx', 0.209), ('hyperparameters', 0.146), ('propagation', 0.144), ('hidden', 0.142), ('qo', 0.138), ('factorisation', 0.133), ('diag', 0.132), ('junction', 0.125), ('qz', 0.123), ('conjugate', 0.109), ('family', 0.108), ('ssm', 0.093), ('markov', 0.09), ('exponential', 0.089), ('models', 0.086), ('em', 0.084), ('ve', 0.084), ('cj', 0.084), ('conjugacy', 0.08), ('treatment', 0.079), ('corollary', 0.075), ('kalman', 0.075), ('expectations', 0.075), ('pi', 0.073), ('tree', 0.072), ('clique', 0.072), ('yt', 0.072), ('variables', 0.067), ('vb', 0.067), ('mixtures', 0.065), ('covariance', 0.065), ('joint', 0.065), ('applying', 0.064), ('infer', 0.064), ('links', 0.063), ('beal', 0.062), ('factorisations', 0.062), ('singly', 0.062), ('wishart', 0.062), ('zjlzp', 0.062), ('posterior', 0.061), ('state', 0.061), ('approximating', 0.06), ('yl', 0.06), ('gamma', 0.059), ('distributions', 0.059), ('theorem', 0.058), ('graphical', 0.057), ('networks', 0.057), ('dynamical', 0.056), ('precision', 0.055), ('ui', 0.055), ('smoothing', 0.054), ('vt', 0.054), ('ard', 0.053), ('kk', 0.053), ('qiip', 0.053), ('step', 0.052), ('satisfy', 0.051), ('static', 0.05), ('integrating', 0.05), ('natural', 0.05), ('series', 0.049), ('prior', 0.049), ('exp', 0.049), ('xt', 0.048), ('fall', 0.048), ('generalise', 0.048), ('tv', 0.048), ('optimise', 0.048), ('interacting', 0.048), ('parameters', 0.048), ('xl', 0.047), ('row', 0.046), ('kl', 0.045), ('algorithm', 0.045), ('jensen', 0.045), ('vm', 0.045), ('mcmc', 0.045), ('relating', 0.045), ('cov', 0.045), ('intractable', 0.044), ('network', 0.043), ('ci', 0.042), ('ti', 0.042), ('ei', 0.042), ('model', 0.042), ('xi', 0.041), ('falls', 0.04), ('switching', 0.04), ('complete', 0.039), ('dimensionality', 0.039), ('inequality', 0.039), ('structure', 0.038), ('node', 0.038), ('optimisation', 0.038)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999899 106 nips-2000-Propagation Algorithms for Variational Bayesian Learning
Author: Zoubin Ghahramani, Matthew J. Beal
Abstract: Variational approximations are becoming a widespread tool for Bayesian learning of graphical models. We provide some theoretical results for the variational updates in a very general family of conjugate-exponential graphical models. We show how the belief propagation and the junction tree algorithms can be used in the inference step of variational Bayesian learning. Applying these results to the Bayesian analysis of linear-Gaussian state-space models we obtain a learning procedure that exploits the Kalman smoothing propagation, while integrating over all model parameters. We demonstrate how this can be used to infer the hidden state dimensionality of the state-space model in a variety of synthetic problems and one real high-dimensional data set. 1
2 0.19717777 115 nips-2000-Sequentially Fitting ``Inclusive'' Trees for Inference in Noisy-OR Networks
Author: Brendan J. Frey, Relu Patrascu, Tommi Jaakkola, Jodi Moran
Abstract: An important class of problems can be cast as inference in noisyOR Bayesian networks, where the binary state of each variable is a logical OR of noisy versions of the states of the variable's parents. For example, in medical diagnosis, the presence of a symptom can be expressed as a noisy-OR of the diseases that may cause the symptom - on some occasions, a disease may fail to activate the symptom. Inference in richly-connected noisy-OR networks is intractable, but approximate methods (e .g., variational techniques) are showing increasing promise as practical solutions. One problem with most approximations is that they tend to concentrate on a relatively small number of modes in the true posterior, ignoring other plausible configurations of the hidden variables. We introduce a new sequential variational method for bipartite noisyOR networks, that favors including all modes of the true posterior and models the posterior distribution as a tree. We compare this method with other approximations using an ensemble of networks with network statistics that are comparable to the QMR-DT medical diagnostic network. 1 Inclusive variational approximations Approximate algorithms for probabilistic inference are gaining in popularity and are now even being incorporated into VLSI hardware (T. Richardson, personal communication). Approximate methods include variational techniques (Ghahramani and Jordan 1997; Saul et al. 1996; Frey and Hinton 1999; Jordan et al. 1999), local probability propagation (Gallager 1963; Pearl 1988; Frey 1998; MacKay 1999a; Freeman and Weiss 2001) and Markov chain Monte Carlo (Neal 1993; MacKay 1999b). Many algorithms have been proposed in each of these classes. One problem that most of the above algorithms suffer from is a tendency to concentrate on a relatively small number of modes of the target distribution (the distribution being approximated). In the case of medical diagnosis, different modes correspond to different explanations of the symptoms. Markov chain Monte Carlo methods are usually guaranteed to eventually sample from all the modes, but this may take an extremely long time, even when tempered transitions (Neal 1996) are (a) ,,
3 0.19097714 41 nips-2000-Discovering Hidden Variables: A Structure-Based Approach
Author: Gal Elidan, Noam Lotner, Nir Friedman, Daphne Koller
Abstract: A serious problem in learning probabilistic models is the presence of hidden variables. These variables are not observed, yet interact with several of the observed variables. As such, they induce seemingly complex dependencies among the latter. In recent years, much attention has been devoted to the development of algorithms for learning parameters, and in some cases structure, in the presence of hidden variables. In this paper, we address the related problem of detecting hidden variables that interact with the observed variables. This problem is of interest both for improving our understanding of the domain and as a preliminary step that guides the learning procedure towards promising models. A very natural approach is to search for
4 0.16807526 14 nips-2000-A Variational Mean-Field Theory for Sigmoidal Belief Networks
Author: Chiranjib Bhattacharyya, S. Sathiya Keerthi
Abstract: A variational derivation of Plefka's mean-field theory is presented. This theory is then applied to sigmoidal belief networks with the aid of further approximations. Empirical evaluation on small scale networks show that the proposed approximations are quite competitive. 1
5 0.16299433 13 nips-2000-A Tighter Bound for Graphical Models
Author: Martijn A. R. Leisink, Hilbert J. Kappen
Abstract: We present a method to bound the partition function of a Boltzmann machine neural network with any odd order polynomial. This is a direct extension of the mean field bound, which is first order. We show that the third order bound is strictly better than mean field. Additionally we show the rough outline how this bound is applicable to sigmoid belief networks. Numerical experiments indicate that an error reduction of a factor two is easily reached in the region where expansion based approximations are useful. 1
6 0.16273521 114 nips-2000-Second Order Approximations for Probability Models
7 0.16087064 94 nips-2000-On Reversing Jensen's Inequality
8 0.14912815 27 nips-2000-Automatic Choice of Dimensionality for PCA
9 0.1451546 35 nips-2000-Computing with Finite and Infinite Networks
10 0.13801908 122 nips-2000-Sparse Representation for Gaussian Process Models
11 0.134799 80 nips-2000-Learning Switching Linear Models of Human Motion
12 0.13370207 17 nips-2000-Active Learning for Parameter Estimation in Bayesian Networks
13 0.13307229 92 nips-2000-Occam's Razor
14 0.12953398 140 nips-2000-Tree-Based Modeling and Estimation of Gaussian Processes on Graphs with Cycles
15 0.12891227 108 nips-2000-Recognizing Hand-written Digits Using Hierarchical Products of Experts
16 0.12327941 15 nips-2000-Accumulator Networks: Suitors of Local Probability Propagation
17 0.10314618 133 nips-2000-The Kernel Gibbs Sampler
18 0.093430974 123 nips-2000-Speech Denoising and Dereverberation Using Probabilistic Models
19 0.091097362 53 nips-2000-Feature Correspondence: A Markov Chain Monte Carlo Approach
20 0.090406992 75 nips-2000-Large Scale Bayes Point Machines
topicId topicWeight
[(0, 0.329), (1, 0.005), (2, 0.222), (3, -0.093), (4, 0.347), (5, -0.023), (6, -0.026), (7, 0.065), (8, 0.001), (9, -0.06), (10, 0.169), (11, -0.021), (12, -0.065), (13, -0.029), (14, -0.063), (15, -0.057), (16, -0.029), (17, 0.096), (18, -0.076), (19, 0.1), (20, -0.018), (21, 0.096), (22, 0.042), (23, -0.066), (24, -0.029), (25, -0.082), (26, -0.0), (27, -0.116), (28, 0.014), (29, 0.086), (30, -0.063), (31, -0.004), (32, 0.121), (33, 0.109), (34, -0.059), (35, -0.057), (36, 0.081), (37, -0.064), (38, 0.111), (39, 0.086), (40, -0.013), (41, 0.081), (42, 0.146), (43, -0.044), (44, -0.055), (45, -0.035), (46, 0.031), (47, -0.026), (48, 0.025), (49, -0.076)]
simIndex simValue paperId paperTitle
same-paper 1 0.96271467 106 nips-2000-Propagation Algorithms for Variational Bayesian Learning
Author: Zoubin Ghahramani, Matthew J. Beal
Abstract: Variational approximations are becoming a widespread tool for Bayesian learning of graphical models. We provide some theoretical results for the variational updates in a very general family of conjugate-exponential graphical models. We show how the belief propagation and the junction tree algorithms can be used in the inference step of variational Bayesian learning. Applying these results to the Bayesian analysis of linear-Gaussian state-space models we obtain a learning procedure that exploits the Kalman smoothing propagation, while integrating over all model parameters. We demonstrate how this can be used to infer the hidden state dimensionality of the state-space model in a variety of synthetic problems and one real high-dimensional data set. 1
2 0.82483828 115 nips-2000-Sequentially Fitting ``Inclusive'' Trees for Inference in Noisy-OR Networks
Author: Brendan J. Frey, Relu Patrascu, Tommi Jaakkola, Jodi Moran
Abstract: An important class of problems can be cast as inference in noisyOR Bayesian networks, where the binary state of each variable is a logical OR of noisy versions of the states of the variable's parents. For example, in medical diagnosis, the presence of a symptom can be expressed as a noisy-OR of the diseases that may cause the symptom - on some occasions, a disease may fail to activate the symptom. Inference in richly-connected noisy-OR networks is intractable, but approximate methods (e .g., variational techniques) are showing increasing promise as practical solutions. One problem with most approximations is that they tend to concentrate on a relatively small number of modes in the true posterior, ignoring other plausible configurations of the hidden variables. We introduce a new sequential variational method for bipartite noisyOR networks, that favors including all modes of the true posterior and models the posterior distribution as a tree. We compare this method with other approximations using an ensemble of networks with network statistics that are comparable to the QMR-DT medical diagnostic network. 1 Inclusive variational approximations Approximate algorithms for probabilistic inference are gaining in popularity and are now even being incorporated into VLSI hardware (T. Richardson, personal communication). Approximate methods include variational techniques (Ghahramani and Jordan 1997; Saul et al. 1996; Frey and Hinton 1999; Jordan et al. 1999), local probability propagation (Gallager 1963; Pearl 1988; Frey 1998; MacKay 1999a; Freeman and Weiss 2001) and Markov chain Monte Carlo (Neal 1993; MacKay 1999b). Many algorithms have been proposed in each of these classes. One problem that most of the above algorithms suffer from is a tendency to concentrate on a relatively small number of modes of the target distribution (the distribution being approximated). In the case of medical diagnosis, different modes correspond to different explanations of the symptoms. Markov chain Monte Carlo methods are usually guaranteed to eventually sample from all the modes, but this may take an extremely long time, even when tempered transitions (Neal 1996) are (a) ,,
3 0.61054307 41 nips-2000-Discovering Hidden Variables: A Structure-Based Approach
Author: Gal Elidan, Noam Lotner, Nir Friedman, Daphne Koller
Abstract: A serious problem in learning probabilistic models is the presence of hidden variables. These variables are not observed, yet interact with several of the observed variables. As such, they induce seemingly complex dependencies among the latter. In recent years, much attention has been devoted to the development of algorithms for learning parameters, and in some cases structure, in the presence of hidden variables. In this paper, we address the related problem of detecting hidden variables that interact with the observed variables. This problem is of interest both for improving our understanding of the domain and as a preliminary step that guides the learning procedure towards promising models. A very natural approach is to search for
4 0.57454771 80 nips-2000-Learning Switching Linear Models of Human Motion
Author: Vladimir Pavlovic, James M. Rehg, John MacCormick
Abstract: The human figure exhibits complex and rich dynamic behavior that is both nonlinear and time-varying. Effective models of human dynamics can be learned from motion capture data using switching linear dynamic system (SLDS) models. We present results for human motion synthesis, classification, and visual tracking using learned SLDS models. Since exact inference in SLDS is intractable, we present three approximate inference algorithms and compare their performance. In particular, a new variational inference algorithm is obtained by casting the SLDS model as a Dynamic Bayesian Network. Classification experiments show the superiority of SLDS over conventional HMM's for our problem domain.
5 0.56309038 15 nips-2000-Accumulator Networks: Suitors of Local Probability Propagation
Author: Brendan J. Frey, Anitha Kannan
Abstract: One way to approximate inference in richly-connected graphical models is to apply the sum-product algorithm (a.k.a. probability propagation algorithm), while ignoring the fact that the graph has cycles. The sum-product algorithm can be directly applied in Gaussian networks and in graphs for coding, but for many conditional probability functions - including the sigmoid function - direct application of the sum-product algorithm is not possible. We introduce
6 0.55463618 27 nips-2000-Automatic Choice of Dimensionality for PCA
7 0.53971881 14 nips-2000-A Variational Mean-Field Theory for Sigmoidal Belief Networks
8 0.51405448 94 nips-2000-On Reversing Jensen's Inequality
9 0.4921298 92 nips-2000-Occam's Razor
10 0.49039724 114 nips-2000-Second Order Approximations for Probability Models
11 0.48018339 35 nips-2000-Computing with Finite and Infinite Networks
12 0.47048247 17 nips-2000-Active Learning for Parameter Estimation in Bayesian Networks
13 0.42648795 140 nips-2000-Tree-Based Modeling and Estimation of Gaussian Processes on Graphs with Cycles
14 0.41236162 122 nips-2000-Sparse Representation for Gaussian Process Models
15 0.41066098 85 nips-2000-Mixtures of Gaussian Processes
16 0.40875977 53 nips-2000-Feature Correspondence: A Markov Chain Monte Carlo Approach
17 0.38455373 13 nips-2000-A Tighter Bound for Graphical Models
18 0.38033602 133 nips-2000-The Kernel Gibbs Sampler
19 0.37616724 20 nips-2000-Algebraic Information Geometry for Learning Machines with Singularities
20 0.37443134 86 nips-2000-Model Complexity, Goodness of Fit and Diminishing Returns
topicId topicWeight
[(1, 0.19), (10, 0.028), (17, 0.107), (32, 0.033), (33, 0.05), (38, 0.011), (54, 0.017), (55, 0.053), (62, 0.052), (65, 0.034), (67, 0.062), (76, 0.061), (79, 0.067), (81, 0.04), (90, 0.037), (91, 0.042), (97, 0.028)]
simIndex simValue paperId paperTitle
same-paper 1 0.86542463 106 nips-2000-Propagation Algorithms for Variational Bayesian Learning
Author: Zoubin Ghahramani, Matthew J. Beal
Abstract: Variational approximations are becoming a widespread tool for Bayesian learning of graphical models. We provide some theoretical results for the variational updates in a very general family of conjugate-exponential graphical models. We show how the belief propagation and the junction tree algorithms can be used in the inference step of variational Bayesian learning. Applying these results to the Bayesian analysis of linear-Gaussian state-space models we obtain a learning procedure that exploits the Kalman smoothing propagation, while integrating over all model parameters. We demonstrate how this can be used to infer the hidden state dimensionality of the state-space model in a variety of synthetic problems and one real high-dimensional data set. 1
2 0.65487039 17 nips-2000-Active Learning for Parameter Estimation in Bayesian Networks
Author: Simon Tong, Daphne Koller
Abstract: Bayesian networks are graphical representations of probability distributions. In virtually all of the work on learning these networks, the assumption is that we are presented with a data set consisting of randomly generated instances from the underlying distribution. In many situations, however, we also have the option of active learning, where we have the possibility of guiding the sampling process by querying for certain types of samples. This paper addresses the problem of estimating the parameters of Bayesian networks in an active learning setting. We provide a theoretical framework for this problem, and an algorithm that chooses which active learning queries to generate based on the model learned so far. We present experimental results showing that our active learning algorithm can significantly reduce the need for training data in many situations.
3 0.63509798 104 nips-2000-Processing of Time Series by Neural Circuits with Biologically Realistic Synaptic Dynamics
Author: Thomas Natschläger, Wolfgang Maass, Eduardo D. Sontag, Anthony M. Zador
Abstract: Experimental data show that biological synapses behave quite differently from the symbolic synapses in common artificial neural network models. Biological synapses are dynamic, i.e., their
4 0.62883341 122 nips-2000-Sparse Representation for Gaussian Process Models
Author: Lehel Csatč´¸, Manfred Opper
Abstract: We develop an approach for a sparse representation for Gaussian Process (GP) models in order to overcome the limitations of GPs caused by large data sets. The method is based on a combination of a Bayesian online algorithm together with a sequential construction of a relevant subsample of the data which fully specifies the prediction of the model. Experimental results on toy examples and large real-world data sets indicate the efficiency of the approach.
5 0.62706155 92 nips-2000-Occam's Razor
Author: Carl Edward Rasmussen, Zoubin Ghahramani
Abstract: The Bayesian paradigm apparently only sometimes gives rise to Occam's Razor; at other times very large models perform well. We give simple examples of both kinds of behaviour. The two views are reconciled when measuring complexity of functions, rather than of the machinery used to implement them. We analyze the complexity of functions for some linear in the parameter models that are equivalent to Gaussian Processes, and always find Occam's Razor at work.
6 0.62487632 85 nips-2000-Mixtures of Gaussian Processes
7 0.6120531 146 nips-2000-What Can a Single Neuron Compute?
8 0.60902029 74 nips-2000-Kernel Expansions with Unlabeled Examples
9 0.60753334 27 nips-2000-Automatic Choice of Dimensionality for PCA
10 0.60625738 98 nips-2000-Partially Observable SDE Models for Image Sequence Recognition Tasks
11 0.60511541 94 nips-2000-On Reversing Jensen's Inequality
12 0.6038183 123 nips-2000-Speech Denoising and Dereverberation Using Probabilistic Models
13 0.60094815 49 nips-2000-Explaining Away in Weight Space
14 0.60087919 127 nips-2000-Structure Learning in Human Causal Induction
15 0.59978086 60 nips-2000-Gaussianization
16 0.59964639 7 nips-2000-A New Approximate Maximal Margin Classification Algorithm
17 0.59877199 78 nips-2000-Learning Joint Statistical Models for Audio-Visual Fusion and Segregation
18 0.59837466 13 nips-2000-A Tighter Bound for Graphical Models
19 0.59745735 69 nips-2000-Incorporating Second-Order Functional Knowledge for Better Option Pricing
20 0.59690863 10 nips-2000-A Productive, Systematic Framework for the Representation of Visual Structure