nips nips2000 nips2000-114 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Hilbert J. Kappen, Wim Wiegerinck
Abstract: In this paper, we derive a second order mean field theory for directed graphical probability models. By using an information theoretic argument it is shown how this can be done in the absense of a partition function. This method is a direct generalisation of the well-known TAP approximation for Boltzmann Machines. In a numerical example, it is shown that the method greatly improves the first order mean field approximation. For a restricted class of graphical models, so-called single overlap graphs, the second order method has comparable complexity to the first order method. For sigmoid belief networks, the method is shown to be particularly fast and effective.
Reference: text
sentIndex sentText sentNum sentScore
1 Second order approximations for probability models Hilbert J. [sent-1, score-0.255]
2 nl Wim Wiegerinck Department of Biophysics Nijmegen University Nijmegen, the Netherlands wimw@mbfys. [sent-4, score-0.17]
3 nl Abstract In this paper, we derive a second order mean field theory for directed graphical probability models. [sent-6, score-0.93]
4 By using an information theoretic argument it is shown how this can be done in the absense of a partition function. [sent-7, score-0.08]
5 This method is a direct generalisation of the well-known TAP approximation for Boltzmann Machines. [sent-8, score-0.112]
6 In a numerical example, it is shown that the method greatly improves the first order mean field approximation. [sent-9, score-0.45]
7 For a restricted class of graphical models, so-called single overlap graphs, the second order method has comparable complexity to the first order method. [sent-10, score-0.731]
8 For sigmoid belief networks, the method is shown to be particularly fast and effective. [sent-11, score-0.4]
9 1 Introduction Recently, a number of authors have proposed deterministic methods for approximate inference in large graphical models. [sent-12, score-0.178]
10 The simplest approach gives a lower bound on the probability of a subset of variables using Jenssen's inequality (Saul et aI. [sent-13, score-0.192]
11 The method involves the minimization of the KL divergence between the target probability distribution p and some 'simple' variational distribution q. [sent-15, score-0.143]
12 The method can be applied to a large class of probability models, such as sigmoid belief networks, DAGs and Boltzmann Machines (BM). [sent-16, score-0.445]
13 For Boltzmann-Gibbs distributions, it is possible to derive the lower bound as the first term in a Taylor series expansion of the free energy around a factorized model. [sent-17, score-0.322]
14 This Taylor series can be continued and the second order term is known as the TAP correction (Plefka, 1982; Kappen and Rodriguez, 1998). [sent-19, score-0.337]
15 The second order term significantly improves the quality of the approximation, but is no longer a bound. [sent-20, score-0.372]
16 For probability distributions that are not Boltzmann-Gibbs distributions, it is not obvious how to obtain the second order approximation. [sent-21, score-0.361]
17 However, there is an alternative way to compute the higher order corrections, based on an information theoretic argument. [sent-22, score-0.184]
18 Recently, this argument was applied to stochastic neural networks with asymmetric connectivity (Kappen and Spanjers, 1999). [sent-23, score-0.136]
19 Here, we apply this idea to directed graphical models. [sent-24, score-0.276]
20 2 The method Let x = (Xl"", Xn) be an n-dimensional vector, with Xi taking on discrete values. [sent-25, score-0.048]
21 We will assume that p(x) can be written as a product of potentials in the following way : n n k=l k=l (I) Here, Pk(Xkl'lrk) denotes the conditional probability table of variable Xk given the values of its parents 'Irk. [sent-27, score-0.271]
22 xk = (X k, 'Irk) denotes the subset of variables that appear in potential k and ¢k (xk) = logpk(xk l'lrk). [sent-28, score-0.291]
23 Potentials can be overlapping, xk n xl =1= 0, and X = Ukxk. [sent-29, score-0.186]
24 We wish to compute the marginal probability that Xi has some specific value Si in the presence of some evidence. [sent-30, score-0.128]
25 We therefore denote X = (e, s) where e denote the subset of variables that constitute the evidence, and S denotes the remainder of the variables. [sent-31, score-0.141]
26 (2) Both numerator and denominator contain sums over hidden states. [sent-34, score-0.12]
27 These sums scale exponentially with the size of the problem, and therefore the computation of marginals is intractable. [sent-35, score-0.251]
28 We propose to approximate this problem by using a mean field approach. [sent-36, score-0.191]
29 Consider a factorized distribution on the hidden variables h: (3) We wish to find the factorized distribution q that best approximates p(sle). [sent-37, score-0.563]
30 • It is easy to see that the q that minimizes KL satisfies: (5) We now think of the manifold of all probability distributions of the form Eq. [sent-40, score-0.18]
31 This manifold contains a submanifold of factorized probability distributions in which the potentials factorize: ¢k(X k ) = Li,iEk ¢ki(Xi). [sent-46, score-0.55]
32 Assume now that p(sle) is somehow close to the factorized submanifold. [sent-48, score-0.22]
33 The difference ~P(Si Ie) = P(Si Ie) - qi(Si) is then small, and we can expand this small difference in terms of changes in the parameters ~¢k(xk) = ¢k(X k ) -logq(x k ), k = 1, . [sent-49, score-0.093]
34 J 8¢ (xk)8¢ (I) kl a;k ,ii' k 1 Y higher order terms -k -I ~¢k(X )~¢I(Y) q (6) The differentials are evaluated in the factorized distribution q. [sent-56, score-0.459]
35 5 and we solve for q(Si)' This factorized distribution gives the desired marginals up to the order of the expansion of ~ logp(sile). [sent-59, score-0.552]
36 ) knl means expectation wrt q conditioned on the variables in k n l) we can precompute (! [sent-64, score-0.187]
37 l< />k)knl for all pairs of overlapping cliques k, l, for all states in knl. [sent-65, score-0.035]
38 Therefore, the worse case complexity of the second order term is less than ninoverlap exp(c). [sent-66, score-0.492]
39 Thus, we see that the second order method has the same exponential complexity as the first order method, but with a different polynomial prefactor. [sent-67, score-0.513]
40 Therefore, the first or second order method can be applied to directed graphical models as long as the number of parents is reasonably small. [sent-68, score-0.648]
41 The fact that the second order term has a worse complexity than the first order term is in contrast to Boltzmann machines, in which the TAP approximation has the same complexity as the standard mean field approximation. [sent-69, score-0.974]
42 These are graphs in which the potentials < />k share at most one node. [sent-71, score-0.26]
43 For single overlap graphs, we can use the first order result Eq. [sent-73, score-0.181]
44 , 8¢'),,) (12) which has a complexity that is of order ni(c -1) exp(c). [sent-81, score-0.217]
45 For probability distributions with many small potentials that share nodes with many other potentials, Eq. [sent-82, score-0.455]
46 For instance, for Boltzmann Machines ni = noverlap = n - 1 and c = 2. [sent-85, score-0.102]
47 12 is identical to the TAP equations (Thouless et al. [sent-87, score-0.086]
48 4 Sigmoid belief networks In this section, we consider sigmoid belief networks as an interesting class of directed graphical models. [sent-89, score-0.87]
49 The reason is, that one can expand in terms of the couplings instead of the potentials which is more efficient. [sent-90, score-0.305]
50 The sigmoid belief network is defined as (13) (a) (b) (c) (d) Figure 2: Interpretation of different interaction terms appearing in Eq. [sent-91, score-0.387]
51 The open and shaded nodes are hidden and evidence nodes, respectively (except in (a), where k can be any node). [sent-93, score-0.328]
52 Solid arrows indicate the graphical structure in the network. [sent-94, score-0.24]
53 Dashed arrows indicate interaction terms that appear in Eq. [sent-95, score-0.097]
54 + exp(-2x))-1, Xi = ±1 and hi is the local field: hi(x) = We separate the variables in evidence variables e and hidden variables s: X = (s, e). [sent-98, score-0.412]
55 When couplings from hidden nodes to either hidden or evidence nodes are zero, Wij = 0, i E e, s and j E s, the probability distributions p(sle) and p(e) reduce to p(sle) -+ q(s) = II 0" (Si 9 n (14) iEs (15) iEe where 9,/ = 2: jE e Wijej + 9i depends on the evidence. [sent-99, score-0.741]
56 The different terms that appear in this equation can be easily interpreted. [sent-102, score-0.035]
57 The first term describes the lowest order forward influence on node i from its parents. [sent-103, score-0.448]
58 Parents can be either evidence or hidden nodes (fig. [sent-104, score-0.328]
59 The third term describes to lowest order the effect of Bayes' rule: it affects mi such that the observed evidence on its children becomes most probable (fig. [sent-107, score-0.495]
60 Note, that this term is absent when the evidence is explained by the evidence nodes themselves: r(ek) = 1. [sent-109, score-0.424]
61 The fourth and fifth terms are the quadratic contributions to the first and third terms, respectively. [sent-110, score-0.035]
62 It describes the effect of hidden node I on node i, when both have a common observed child k (fig. [sent-112, score-0.374]
63 The last term describes the effect on node i when its grandchild is observed (fig. [sent-114, score-0.266]
64 10 to sigmoid belief networks, one requires additional approximations to compute (logO"(xihi)} (Saul et aI. [sent-119, score-0.463]
65 5 J Figure 3: Second order approximation for fully connected sigmoid belief network of n nodes. [sent-143, score-0.557]
66 , n are clamped (grey), nl n/2; b) CPU time for exact inference (dashed) and second order approximation (solid) versus nl (J = 0. [sent-150, score-0.652]
67 5); c) RMS of hidden node exact marginals (solid) and RMS error of second order approximation (dashed) versus coupling strength J, (nl = 10). [sent-151, score-0.687]
68 = Since only feed-forward connections are present, one can order the nodes such that Wij = 0 for i < j. [sent-152, score-0.296]
69 Then the first order mean field equations can be solved in one single sweep starting with node l. [sent-153, score-0.49]
70 The full second order equations can be solved by iteration, starting with the first order solution. [sent-154, score-0.433]
71 The first one is inference in Lauritzen's chest clinic model (ASIA), defined on 8 binary variables x = {A, T, S, L, B, E, X, D} (see figure 1, and (Lauritzen and SpiegeJhaJter, 1988) for more details about the model). [sent-156, score-0.068]
72 We computed exact marginals and approximate marginals using the approximating methods up to first (Eq. [sent-157, score-0.403]
73 The approximate marginals are determined by sequential iteration of (10) and (11), starting at q(Xi) = 0. [sent-160, score-0.178]
74 The maximal error in the marginals using the first and second order method is 0. [sent-162, score-0.474]
75 3, we assess the accuracy and CPU time of the second order approximation Eq. [sent-168, score-0.312]
76 We generate random fully connected sigmoid belief networks with Wij from a normal distribution with mean zero and variance J2 In and (}i = O. [sent-170, score-0.467]
77 3b that the computation time is very fast: For nl = 500, we have obtained convergence in 37 second on a Pentium 300 Mhz processor. [sent-172, score-0.277]
78 The accuracy of the method depends on the size of the weights and is computed for a network of nl = 10 (fig. [sent-173, score-0.218]
79 In (Kappen and Wiegerinck, 2001), we compare this approach to Saul's variational approach (Saul et aI. [sent-175, score-0.092]
80 6 Discussion In this paper, we computed a second order mean field approximation for directed graphical models. [sent-177, score-0.779]
81 We show that the second order approximation gives a significant improvement over the first order result. [sent-178, score-0.453]
82 The method does not use explicitly that the graph is directed. [sent-179, score-0.048]
83 a The complexity of the first and second order approximation is of (ni exp (c)) and O(ninoverlap exp(c)), respectively, with c the number of variables in the largest potential. [sent-181, score-0.528]
84 For single-overlap graphs, one can rewrite the second order equation such that the computational complexity reduces to O(ni(c - 1) exp(c)). [sent-182, score-0.324]
85 Boltzmann machines and the Asia network are examples of single-overlap graphs. [sent-183, score-0.05]
86 For large c, additional approximations are required, as was proposed by (Saul et al. [sent-184, score-0.111]
87 It is evident, that such additional approximations are then also required for the second order mean field equations. [sent-186, score-0.508]
88 It has been reported (Barber and Wiegerinck, 1999; Wiegerinck and Kappen, 1999) that similar numerical improvements can be obtained by using a very different approach, which is to use an approximating distribution q that is not factorized, but still tractable. [sent-187, score-0.082]
89 A promising way to proceed is therefore to combine both approaches and to do a second order expansion aroud a manifold of distributions with non-factorized yet tractable distributions. [sent-188, score-0.531]
90 In this approach the sufficient statistics of the tractable structure is expanded, rather than the marginal probabilities. [sent-189, score-0.108]
91 Local computations with probabilties on graphical structures and their application to expert systems. [sent-221, score-0.178]
92 Convergence condition of the TAP equation for the infinite-range Ising spin glass model. [sent-226, score-0.049]
wordName wordTfidf (topN-words)
[('wiegerinck', 0.278), ('kappen', 0.242), ('sle', 0.238), ('sigmoid', 0.2), ('xk', 0.186), ('factorized', 0.186), ('marginals', 0.178), ('graphical', 0.178), ('nl', 0.17), ('nodes', 0.155), ('belief', 0.152), ('potentials', 0.15), ('order', 0.141), ('nijmegen', 0.137), ('si', 0.136), ('saul', 0.124), ('field', 0.121), ('knl', 0.119), ('node', 0.114), ('tap', 0.111), ('second', 0.107), ('boltzmann', 0.103), ('ni', 0.102), ('directed', 0.098), ('kl', 0.097), ('lauritzen', 0.093), ('evidence', 0.09), ('term', 0.089), ('ek', 0.086), ('hidden', 0.083), ('kee', 0.079), ('kes', 0.079), ('ninoverlap', 0.079), ('rms', 0.079), ('sile', 0.079), ('spanjers', 0.079), ('parents', 0.076), ('complexity', 0.076), ('graphs', 0.073), ('wij', 0.073), ('exp', 0.072), ('mi', 0.071), ('mean', 0.07), ('approximations', 0.069), ('netherlands', 0.068), ('asia', 0.068), ('dags', 0.068), ('variables', 0.068), ('distributions', 0.068), ('manifold', 0.067), ('tractable', 0.065), ('approximation', 0.064), ('describes', 0.063), ('thouless', 0.062), ('plefka', 0.062), ('arrows', 0.062), ('couplings', 0.062), ('expand', 0.058), ('rodriguez', 0.057), ('asymmetric', 0.054), ('cpu', 0.054), ('xi', 0.053), ('barber', 0.051), ('ki', 0.051), ('variational', 0.05), ('machines', 0.05), ('biophysics', 0.049), ('spin', 0.049), ('method', 0.048), ('approximating', 0.047), ('expansion', 0.047), ('dashed', 0.045), ('probability', 0.045), ('networks', 0.045), ('equations', 0.044), ('marginal', 0.043), ('theoretic', 0.043), ('taylor', 0.043), ('et', 0.042), ('lowest', 0.041), ('wish', 0.04), ('overlap', 0.04), ('solid', 0.04), ('ie', 0.039), ('share', 0.037), ('argument', 0.037), ('sums', 0.037), ('subset', 0.037), ('therefore', 0.036), ('improves', 0.035), ('hi', 0.035), ('overlapping', 0.035), ('numerical', 0.035), ('terms', 0.035), ('stw', 0.034), ('bm', 0.034), ('wijxj', 0.034), ('submanifold', 0.034), ('bert', 0.034), ('somehow', 0.034)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999964 114 nips-2000-Second Order Approximations for Probability Models
Author: Hilbert J. Kappen, Wim Wiegerinck
Abstract: In this paper, we derive a second order mean field theory for directed graphical probability models. By using an information theoretic argument it is shown how this can be done in the absense of a partition function. This method is a direct generalisation of the well-known TAP approximation for Boltzmann Machines. In a numerical example, it is shown that the method greatly improves the first order mean field approximation. For a restricted class of graphical models, so-called single overlap graphs, the second order method has comparable complexity to the first order method. For sigmoid belief networks, the method is shown to be particularly fast and effective.
2 0.31698409 13 nips-2000-A Tighter Bound for Graphical Models
Author: Martijn A. R. Leisink, Hilbert J. Kappen
Abstract: We present a method to bound the partition function of a Boltzmann machine neural network with any odd order polynomial. This is a direct extension of the mean field bound, which is first order. We show that the third order bound is strictly better than mean field. Additionally we show the rough outline how this bound is applicable to sigmoid belief networks. Numerical experiments indicate that an error reduction of a factor two is easily reached in the region where expansion based approximations are useful. 1
3 0.21670157 14 nips-2000-A Variational Mean-Field Theory for Sigmoidal Belief Networks
Author: Chiranjib Bhattacharyya, S. Sathiya Keerthi
Abstract: A variational derivation of Plefka's mean-field theory is presented. This theory is then applied to sigmoidal belief networks with the aid of further approximations. Empirical evaluation on small scale networks show that the proposed approximations are quite competitive. 1
4 0.16273521 106 nips-2000-Propagation Algorithms for Variational Bayesian Learning
Author: Zoubin Ghahramani, Matthew J. Beal
Abstract: Variational approximations are becoming a widespread tool for Bayesian learning of graphical models. We provide some theoretical results for the variational updates in a very general family of conjugate-exponential graphical models. We show how the belief propagation and the junction tree algorithms can be used in the inference step of variational Bayesian learning. Applying these results to the Bayesian analysis of linear-Gaussian state-space models we obtain a learning procedure that exploits the Kalman smoothing propagation, while integrating over all model parameters. We demonstrate how this can be used to infer the hidden state dimensionality of the state-space model in a variety of synthetic problems and one real high-dimensional data set. 1
5 0.1354913 41 nips-2000-Discovering Hidden Variables: A Structure-Based Approach
Author: Gal Elidan, Noam Lotner, Nir Friedman, Daphne Koller
Abstract: A serious problem in learning probabilistic models is the presence of hidden variables. These variables are not observed, yet interact with several of the observed variables. As such, they induce seemingly complex dependencies among the latter. In recent years, much attention has been devoted to the development of algorithms for learning parameters, and in some cases structure, in the presence of hidden variables. In this paper, we address the related problem of detecting hidden variables that interact with the observed variables. This problem is of interest both for improving our understanding of the domain and as a preliminary step that guides the learning procedure towards promising models. A very natural approach is to search for
6 0.13340591 64 nips-2000-High-temperature Expansions for Learning Models of Nonnegative Data
7 0.11508384 140 nips-2000-Tree-Based Modeling and Estimation of Gaussian Processes on Graphs with Cycles
8 0.11248412 46 nips-2000-Ensemble Learning and Linear Response Theory for ICA
9 0.10505985 115 nips-2000-Sequentially Fitting ``Inclusive'' Trees for Inference in Noisy-OR Networks
10 0.10271623 62 nips-2000-Generalized Belief Propagation
11 0.10087503 17 nips-2000-Active Learning for Parameter Estimation in Bayesian Networks
12 0.091772668 77 nips-2000-Learning Curves for Gaussian Processes Regression: A Framework for Good Approximations
13 0.091696553 15 nips-2000-Accumulator Networks: Suitors of Local Probability Propagation
14 0.089788347 69 nips-2000-Incorporating Second-Order Functional Knowledge for Better Option Pricing
15 0.085857071 47 nips-2000-Error-correcting Codes on a Bethe-like Lattice
16 0.076733068 86 nips-2000-Model Complexity, Goodness of Fit and Diminishing Returns
17 0.076198459 85 nips-2000-Mixtures of Gaussian Processes
18 0.074014053 108 nips-2000-Recognizing Hand-written Digits Using Hierarchical Products of Experts
19 0.071537934 92 nips-2000-Occam's Razor
20 0.06823878 76 nips-2000-Learning Continuous Distributions: Simulations With Field Theoretic Priors
topicId topicWeight
[(0, 0.249), (1, -0.003), (2, 0.152), (3, -0.143), (4, 0.376), (5, -0.069), (6, 0.024), (7, 0.037), (8, 0.044), (9, 0.126), (10, 0.014), (11, -0.055), (12, -0.298), (13, 0.063), (14, 0.026), (15, -0.05), (16, 0.131), (17, -0.102), (18, -0.069), (19, 0.039), (20, -0.001), (21, -0.151), (22, -0.052), (23, -0.029), (24, 0.039), (25, -0.0), (26, -0.142), (27, 0.05), (28, 0.061), (29, -0.04), (30, 0.066), (31, 0.012), (32, -0.059), (33, 0.023), (34, 0.03), (35, 0.051), (36, -0.018), (37, 0.024), (38, -0.064), (39, -0.026), (40, -0.075), (41, 0.01), (42, -0.066), (43, 0.031), (44, 0.088), (45, 0.015), (46, 0.047), (47, 0.005), (48, 0.07), (49, -0.004)]
simIndex simValue paperId paperTitle
same-paper 1 0.97742158 114 nips-2000-Second Order Approximations for Probability Models
Author: Hilbert J. Kappen, Wim Wiegerinck
Abstract: In this paper, we derive a second order mean field theory for directed graphical probability models. By using an information theoretic argument it is shown how this can be done in the absense of a partition function. This method is a direct generalisation of the well-known TAP approximation for Boltzmann Machines. In a numerical example, it is shown that the method greatly improves the first order mean field approximation. For a restricted class of graphical models, so-called single overlap graphs, the second order method has comparable complexity to the first order method. For sigmoid belief networks, the method is shown to be particularly fast and effective.
2 0.87640327 13 nips-2000-A Tighter Bound for Graphical Models
Author: Martijn A. R. Leisink, Hilbert J. Kappen
Abstract: We present a method to bound the partition function of a Boltzmann machine neural network with any odd order polynomial. This is a direct extension of the mean field bound, which is first order. We show that the third order bound is strictly better than mean field. Additionally we show the rough outline how this bound is applicable to sigmoid belief networks. Numerical experiments indicate that an error reduction of a factor two is easily reached in the region where expansion based approximations are useful. 1
3 0.83845198 14 nips-2000-A Variational Mean-Field Theory for Sigmoidal Belief Networks
Author: Chiranjib Bhattacharyya, S. Sathiya Keerthi
Abstract: A variational derivation of Plefka's mean-field theory is presented. This theory is then applied to sigmoidal belief networks with the aid of further approximations. Empirical evaluation on small scale networks show that the proposed approximations are quite competitive. 1
4 0.64416528 64 nips-2000-High-temperature Expansions for Learning Models of Nonnegative Data
Author: Oliver B. Downs
Abstract: Recent work has exploited boundedness of data in the unsupervised learning of new types of generative model. For nonnegative data it was recently shown that the maximum-entropy generative model is a Nonnegative Boltzmann Distribution not a Gaussian distribution, when the model is constrained to match the first and second order statistics of the data. Learning for practical sized problems is made difficult by the need to compute expectations under the model distribution. The computational cost of Markov chain Monte Carlo methods and low fidelity of naive mean field techniques has led to increasing interest in advanced mean field theories and variational methods. Here I present a secondorder mean-field approximation for the Nonnegative Boltzmann Machine model, obtained using a
5 0.47735459 115 nips-2000-Sequentially Fitting ``Inclusive'' Trees for Inference in Noisy-OR Networks
Author: Brendan J. Frey, Relu Patrascu, Tommi Jaakkola, Jodi Moran
Abstract: An important class of problems can be cast as inference in noisyOR Bayesian networks, where the binary state of each variable is a logical OR of noisy versions of the states of the variable's parents. For example, in medical diagnosis, the presence of a symptom can be expressed as a noisy-OR of the diseases that may cause the symptom - on some occasions, a disease may fail to activate the symptom. Inference in richly-connected noisy-OR networks is intractable, but approximate methods (e .g., variational techniques) are showing increasing promise as practical solutions. One problem with most approximations is that they tend to concentrate on a relatively small number of modes in the true posterior, ignoring other plausible configurations of the hidden variables. We introduce a new sequential variational method for bipartite noisyOR networks, that favors including all modes of the true posterior and models the posterior distribution as a tree. We compare this method with other approximations using an ensemble of networks with network statistics that are comparable to the QMR-DT medical diagnostic network. 1 Inclusive variational approximations Approximate algorithms for probabilistic inference are gaining in popularity and are now even being incorporated into VLSI hardware (T. Richardson, personal communication). Approximate methods include variational techniques (Ghahramani and Jordan 1997; Saul et al. 1996; Frey and Hinton 1999; Jordan et al. 1999), local probability propagation (Gallager 1963; Pearl 1988; Frey 1998; MacKay 1999a; Freeman and Weiss 2001) and Markov chain Monte Carlo (Neal 1993; MacKay 1999b). Many algorithms have been proposed in each of these classes. One problem that most of the above algorithms suffer from is a tendency to concentrate on a relatively small number of modes of the target distribution (the distribution being approximated). In the case of medical diagnosis, different modes correspond to different explanations of the symptoms. Markov chain Monte Carlo methods are usually guaranteed to eventually sample from all the modes, but this may take an extremely long time, even when tempered transitions (Neal 1996) are (a) ,,
6 0.47117901 46 nips-2000-Ensemble Learning and Linear Response Theory for ICA
7 0.4409501 106 nips-2000-Propagation Algorithms for Variational Bayesian Learning
8 0.40993378 62 nips-2000-Generalized Belief Propagation
9 0.39465755 69 nips-2000-Incorporating Second-Order Functional Knowledge for Better Option Pricing
10 0.3926816 77 nips-2000-Learning Curves for Gaussian Processes Regression: A Framework for Good Approximations
11 0.36291465 140 nips-2000-Tree-Based Modeling and Estimation of Gaussian Processes on Graphs with Cycles
12 0.35828793 85 nips-2000-Mixtures of Gaussian Processes
13 0.35342726 41 nips-2000-Discovering Hidden Variables: A Structure-Based Approach
14 0.35334936 86 nips-2000-Model Complexity, Goodness of Fit and Diminishing Returns
15 0.34204841 15 nips-2000-Accumulator Networks: Suitors of Local Probability Propagation
16 0.33399183 1 nips-2000-APRICODD: Approximate Policy Construction Using Decision Diagrams
17 0.30129975 17 nips-2000-Active Learning for Parameter Estimation in Bayesian Networks
18 0.28619725 47 nips-2000-Error-correcting Codes on a Bethe-like Lattice
19 0.27462727 148 nips-2000-`N-Body' Problems in Statistical Learning
20 0.27336034 125 nips-2000-Stability and Noise in Biochemical Switches
topicId topicWeight
[(10, 0.021), (17, 0.089), (26, 0.013), (32, 0.012), (33, 0.055), (36, 0.023), (55, 0.041), (62, 0.034), (65, 0.017), (67, 0.058), (76, 0.092), (79, 0.013), (81, 0.017), (90, 0.024), (91, 0.41), (97, 0.015)]
simIndex simValue paperId paperTitle
same-paper 1 0.88677692 114 nips-2000-Second Order Approximations for Probability Models
Author: Hilbert J. Kappen, Wim Wiegerinck
Abstract: In this paper, we derive a second order mean field theory for directed graphical probability models. By using an information theoretic argument it is shown how this can be done in the absense of a partition function. This method is a direct generalisation of the well-known TAP approximation for Boltzmann Machines. In a numerical example, it is shown that the method greatly improves the first order mean field approximation. For a restricted class of graphical models, so-called single overlap graphs, the second order method has comparable complexity to the first order method. For sigmoid belief networks, the method is shown to be particularly fast and effective.
2 0.8470583 78 nips-2000-Learning Joint Statistical Models for Audio-Visual Fusion and Segregation
Author: John W. Fisher III, Trevor Darrell, William T. Freeman, Paul A. Viola
Abstract: People can understand complex auditory and visual information, often using one to disambiguate the other. Automated analysis, even at a lowlevel, faces severe challenges, including the lack of accurate statistical models for the signals, and their high-dimensionality and varied sampling rates. Previous approaches [6] assumed simple parametric models for the joint distribution which, while tractable, cannot capture the complex signal relationships. We learn the joint distribution of the visual and auditory signals using a non-parametric approach. First, we project the data into a maximally informative, low-dimensional subspace, suitable for density estimation. We then model the complicated stochastic relationships between the signals using a nonparametric density estimator. These learned densities allow processing across signal modalities. We demonstrate, on synthetic and real signals, localization in video of the face that is speaking in audio, and, conversely, audio enhancement of a particular speaker selected from the video.
3 0.79814416 85 nips-2000-Mixtures of Gaussian Processes
Author: Volker Tresp
Abstract: We introduce the mixture of Gaussian processes (MGP) model which is useful for applications in which the optimal bandwidth of a map is input dependent. The MGP is derived from the mixture of experts model and can also be used for modeling general conditional probability densities. We discuss how Gaussian processes -in particular in form of Gaussian process classification, the support vector machine and the MGP modelcan be used for quantifying the dependencies in graphical models.
4 0.58769327 13 nips-2000-A Tighter Bound for Graphical Models
Author: Martijn A. R. Leisink, Hilbert J. Kappen
Abstract: We present a method to bound the partition function of a Boltzmann machine neural network with any odd order polynomial. This is a direct extension of the mean field bound, which is first order. We show that the third order bound is strictly better than mean field. Additionally we show the rough outline how this bound is applicable to sigmoid belief networks. Numerical experiments indicate that an error reduction of a factor two is easily reached in the region where expansion based approximations are useful. 1
5 0.48615935 14 nips-2000-A Variational Mean-Field Theory for Sigmoidal Belief Networks
Author: Chiranjib Bhattacharyya, S. Sathiya Keerthi
Abstract: A variational derivation of Plefka's mean-field theory is presented. This theory is then applied to sigmoidal belief networks with the aid of further approximations. Empirical evaluation on small scale networks show that the proposed approximations are quite competitive. 1
6 0.44686958 106 nips-2000-Propagation Algorithms for Variational Bayesian Learning
7 0.42083108 64 nips-2000-High-temperature Expansions for Learning Models of Nonnegative Data
8 0.42080665 46 nips-2000-Ensemble Learning and Linear Response Theory for ICA
9 0.39868239 62 nips-2000-Generalized Belief Propagation
10 0.38995299 122 nips-2000-Sparse Representation for Gaussian Process Models
11 0.38815853 94 nips-2000-On Reversing Jensen's Inequality
12 0.37646681 69 nips-2000-Incorporating Second-Order Functional Knowledge for Better Option Pricing
13 0.37283799 49 nips-2000-Explaining Away in Weight Space
14 0.3671214 123 nips-2000-Speech Denoising and Dereverberation Using Probabilistic Models
15 0.36609405 37 nips-2000-Convergence of Large Margin Separable Linear Classification
16 0.36555231 74 nips-2000-Kernel Expansions with Unlabeled Examples
17 0.3636893 136 nips-2000-The Missing Link - A Probabilistic Model of Document Content and Hypertext Connectivity
18 0.36252734 115 nips-2000-Sequentially Fitting ``Inclusive'' Trees for Inference in Noisy-OR Networks
19 0.36241275 20 nips-2000-Algebraic Information Geometry for Learning Machines with Singularities
20 0.36145222 127 nips-2000-Structure Learning in Human Causal Induction