nips nips2000 nips2000-13 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Martijn A. R. Leisink, Hilbert J. Kappen
Abstract: We present a method to bound the partition function of a Boltzmann machine neural network with any odd order polynomial. This is a direct extension of the mean field bound, which is first order. We show that the third order bound is strictly better than mean field. Additionally we show the rough outline how this bound is applicable to sigmoid belief networks. Numerical experiments indicate that an error reduction of a factor two is easily reached in the region where expansion based approximations are useful. 1
Reference: text
sentIndex sentText sentNum sentScore
1 nl Abstract We present a method to bound the partition function of a Boltzmann machine neural network with any odd order polynomial. [sent-8, score-0.845]
2 This is a direct extension of the mean field bound, which is first order. [sent-9, score-0.306]
3 We show that the third order bound is strictly better than mean field. [sent-10, score-0.938]
4 Additionally we show the rough outline how this bound is applicable to sigmoid belief networks. [sent-11, score-0.875]
5 Numerical experiments indicate that an error reduction of a factor two is easily reached in the region where expansion based approximations are useful. [sent-12, score-0.05]
6 The neurons in these networks are the random variables, whereas the connections between them model the causal dependencies. [sent-14, score-0.122]
7 Learning in graphical models can be done as long as the likelihood that the visibles correspond to a pattern in the data set, can be computed. [sent-17, score-0.314]
8 For such architectures one has no other choice than using an approximation for the likelihood. [sent-19, score-0.13]
9 A well known approximation technique from statistical mechanics, called Gibbs sampling, was applied to graphical models in [1]. [sent-20, score-0.238]
10 More recently, the mean field approximation known from physics was derived for sigmoid belief networks [2]. [sent-21, score-0.816]
11 For this type of graphical models the parental dependency of a neuron is modelled by a non-linear (sigmoidal) function of the weighted parent states [3]. [sent-22, score-0.155]
12 It turns out that the mean field approximation has the nice feature that it bounds the likelihood from below. [sent-23, score-0.495]
13 This is useful for learning, since a maximisation of the bound either increases its accuracy or increases the likelihood for a pattern in the data set, which is the actual learning process. [sent-24, score-0.513]
14 In this article we show that it is possible to improve the mean field approximation *http://www. [sent-25, score-0.471]
15 In section 2 we show the general theory to create a new bound using an existing one, which is applied to a Boltzmann machine in section 3. [sent-32, score-0.582]
16 In contrast with belief networks the connections are symmetric and not directed [4]. [sent-34, score-0.213]
17 A mean field approximation for this type of neural networks was already described in [5]. [sent-35, score-0.452]
18 An improvement of this approximation was found by Thouless, Anderson and Palmer in [6], which was applied to Boltzmann machines in [7]. [sent-36, score-0.13]
19 Unfortunately, this so called TAP approximation is not a bound. [sent-37, score-0.092]
20 We apply our method to the mean field approximation, which results in a third order bound. [sent-38, score-0.657]
21 Due to the limited space it is not possible to discuss the third order bound for sigmoid belief networks in much detail. [sent-40, score-1.164]
22 Instead, we show the general outline and focus more on the experimental results in section 5. [sent-41, score-0.097]
23 2 Higher order bounds Suppose we have a function 10 (x) and a bound bo(x) such that 't/x Let It(x) and b1(x) be two primitive functions of lo(x) and bo(x) It(x) = J dx lo(x) and b1(x) = 10 (x) ::::: bo(x) . [sent-43, score-0.703]
24 Note that we can always add an appropriate constant to the primitive functions such that they are indeed equal at x = v . [sent-45, score-0.187]
25 If we repeat this and let 12 (x) and b2 (x) be two primitive functions of It (x) and bt{x), again such that h(v) = b2 (v), one can easily verify that 't/x h(x) ::::: b2 (x) . [sent-49, score-0.139]
26 Thus given a lower bound of lo(x) we can create another lower bound. [sent-50, score-0.547]
27 In case the given bound is a polynomial of degree k, the new bound is a polynomial of degree k + 2 with one additional variational parameter. [sent-51, score-1.12]
28 To illustrate this procedure, we derive a third order bound on the exponential function starting with the well known linear bound: the tangent of the exponential function at x = v. [sent-52, score-0.94]
29 Using the procedure of the previous section we derive 't/X ,V lo(x) = eX ::::: eV (1 It (x) 't/",I', >' h(x) +x - = eX ~ el' + e = eX with A = v - J1. [sent-53, score-0.089]
30 )3) } (4) = b2(x{5) 3 Boltzmann machines In this section we derive a third order lower bound on the partition function of a Boltzmann machine neural network using the results from the previous section. [sent-61, score-1.185]
31 ) + 0' Si (6) There is an implicit summation over all repeated indices (Einstein's convention). [sent-65, score-0.055]
32 Z = Lall s exp ( - E (i)) is the normalisation constant known as the partition function which requires a sum over all, exponentially many states. [sent-66, score-0.288]
33 To compute the partition function approximately, we use the third order bound 1 from equation 5. [sent-68, score-1.062]
34 Note that the former constants J-L and A are now functions of i, since we may take different values for J-L and A for each term in the sum. [sent-72, score-0.092]
35 If we take, for instance, J-L (i) = - E (i) the approximation is exact. [sent-74, score-0.092]
36 This would lead, however, to the same intractability as before and therefore we must restrict our choice to those that make equation 7 tractable to compute. [sent-75, score-0.159]
37 We choose J-L (8) and A (8) to be linear with respect to the neuron states Si : (8) One may view J-L (i) and A (. [sent-76, score-0.122]
38 S) as (the negative of) the energy functions for the Boltzmann distribution P ~ exp (J-L (i)) and P ~ exp (A (i)). [sent-77, score-0.207]
39 Therefore we will sometimes speak of 'the distribution J-L (i)' . [sent-78, score-0.065]
40 Since these linear energy functions correspond to factorised distributions, we can compute the right hand side of equation 7 in a reasonable time, 0 (N 3 ). [sent-79, score-0.36]
41 To obtain the tightest bound, we may maximise equation 7 with respect to its variational parameters J-LD, J-Li , AD and Ai . [sent-80, score-0.389]
42 A special case of the third order bound Although it is possible to choose Ai f:. [sent-81, score-0.82]
43 0, we will set them to the suboptimal value = 0, since this simplifies equation 7 enormously. [sent-82, score-0.118]
44 Given this choice we can compute the optimal values for J-LD and AD, given by Ai (9) where (-) denotes an average over the (factorised) distribution J-L (i) . [sent-84, score-0.086]
45 Using this solution the bound reduces to the simple form (10) lUsing the first order bound from equation 3 resuJts in the standard mean field bound. [sent-85, score-1.335]
46 where ZI' is the partition function of the distribution fl (8). [sent-86, score-0.231]
47 E2) corresponds to the variance of E + fli Si with respect to the distribution fl (8), since flo = - (E + fli si ). [sent-89, score-0.757]
48 0 is proportional to the third order moment according to (9). [sent-91, score-0.4]
49 There is no explicit expression for the optimal fli as is the case with the standard mean field equations. [sent-93, score-0.615]
50 An implicit expression, however, follows from setting the derivative with respect to fl i to zero. [sent-94, score-0.164]
51 Wherever we speak of 'fully optimised', we refer to this solution for fli . [sent-96, score-0.331]
52 Since the last term in equation 10 is always positive, we conclude that the third order bound is always tighter than the mean field bound. [sent-98, score-1.354]
53 The relation between TAP and the third order bound is clear in the region of small weights. [sent-99, score-0.814]
54 If we assume that 0 (Oi j3) is negligible, a small weight expansion of equation 10 yields logZ ? [sent-100, score-0.074]
55 E2)} >;:J logZI' + ~Oij2 (1- m;) (1 4 mj) (12) where the last term is equal to the TAP correction term [7] . [sent-103, score-0.1]
56 Thus the third order bound tends to the TAP approximation for small weights. [sent-104, score-0.912]
57 For larger weights, however, the TAP approximation overestimates the partition function, whereas the third order approximation is still a bound. [sent-105, score-0.729]
58 4 Sigmoid belief networks In the previous section we saw how to derive a third order bound on the partition function. [sent-106, score-1.291]
59 For sigmoid belief networks 3 we can use the same strategy to obtain a third order bound on the likelihood of the visible neurons of the network to be in a particular state. [sent-107, score-1.319]
60 In this article, we present the rough outline of our method. [sent-108, score-0.116]
61 It turns out that these graphical models are comparable to Boltzmann machines to a large extent. [sent-110, score-0.23]
62 The energy function E(s) (as in equation 6), however, differs for sigmoid belief networks: -E(s) = Oi jsi Sj + Oi Si - Llog2cosh (OPi Si + Op) (13) p The last term, known as the local normalisation, does not appear in the Boltzmann machine energy function. [sent-111, score-0.609]
63 We have similar difficulties as with the Boltzmann machine, if we want to compute the log-likelihood given by loge = log L P (s) sEHidden = log L exp (-E (8)) (14) sEHidden 2Be aware of the fact that J. [sent-112, score-0.175]
64 l 0 = an important contribution to the expression for log Z 1' . [sent-114, score-0.081]
65 3 A detailed description of these networks can be found in [3]. [sent-115, score-0.054]
66 260 16 c: 0 ~50 < 1), however, we see that the third order bound is much closer to the exact curved form than mean field is. [sent-119, score-1.118]
67 2 Sigmoid belief networks Mean field bound Third order bound j ~: ~V1"ble 2 4 Relative error (%) 0. [sent-121, score-1.405]
68 6 Figure 2: Histograms of the relative error for the toy network in the middle. [sent-122, score-0.174]
69 The error of the third order bound is roughly ten times smaller than the error of the mean field bound. [sent-123, score-1.185]
70 Although a full optimisation of the variational parameters gives the tightest bound, it turns out that the computational complexity of this optimisation is quite large for sigmoid belief networks. [sent-124, score-0.733]
71 Therefore, we use the mean field solution for J-Li (equation 11) instead. [sent-125, score-0.306]
72 This can be justified since the most important error reduction is due to the use of the third order bound. [sent-126, score-0.401]
73 From experimental results not shown here it is clear that a full optimisation has a share of only a few percent in the total gain. [sent-127, score-0.105]
74 To assess the error made by the various approaches, we use the same toy problem as in [2] and [10]. [sent-128, score-0.139]
75 The network has a top layer of two neurons, a middle layer of four neurons and a lower layer of six visibles (figure 2). [sent-129, score-0.363]
76 All neurons of two successive layers are connected with weights pointing downwards. [sent-130, score-0.121]
77 Weights and thresholds are drawn from a uniform distribution over [-1,1]. [sent-131, score-0.052]
78 5 We compute the likelihood when all visibles are clamped to -1. [sent-132, score-0.214]
79 Since the network is rather small, we can compute the exact likelihood to compare the lower bound with. [sent-133, score-0.601]
80 It is clear from the picture that for this toy problem the error is reduced by a factor ten. [sent-135, score-0.174]
81 For instance , if the weights are drawn from a uniform distribution over [-2,2], the error reduces by about a factor four on average and is always less than the mean field error . [sent-137, score-0.507]
82 5The original toy problem in [2] used a Oil-coding for the neuron activity. [sent-138, score-0.131]
83 To be able to compare the results, we transform the weights and thresholds to the -l/+l-coding used in this article. [sent-139, score-0.105]
84 6 Conclusions We showed a procedure to find any odd order polynomial bound for the exponential function. [sent-140, score-0.708]
85 A 2k -1 order polynomial bound has k variational parameters. [sent-141, score-0.723]
86 We can use this result to derive a bound on the partition function, where the variational parameters can be seen as energy functions for probability distributions. [sent-145, score-0.878]
87 If we choose those distributions to be factorised, we have (N + l)k new variational parameters. [sent-146, score-0.169]
88 Since the approximating function is a bound, we may maximise it with respect to all these parameters. [sent-147, score-0.098]
89 In this article we restricted ourselves to the third order bound, although an extension to any odd order bound is possible. [sent-148, score-1.028]
90 Third order is the next higher order bound to naive mean field. [sent-149, score-0.745]
91 We showed that this bound is strictly better than the mean field bound and tends to the TAP approximation for small weights. [sent-150, score-1.335]
92 For larger weights, however, the TAP approximation crosses the partition function and violates the bounding properties. [sent-151, score-0.297]
93 We saw that the third order bound gives an enormous improvement compared to mean field. [sent-152, score-0.947]
94 The choice between third order and variational structures, however, is not exclusive. [sent-154, score-0.517]
95 We expect that a combination of both methods is a promising research direction to obtain the tightest tractable bound. [sent-155, score-0.136]
96 A mean field theory learning algorithm for neural networks. [sent-185, score-0.306]
97 Boltzmann machine learning using mean field theory and linear response correction. [sent-201, score-0.351]
98 Cohn, editors, Advances in Neural Tnformation Processing Systems, volume 11, pages 280286. [sent-208, score-0.068]
99 In ICANN 99, volume 1, pages 425- 430, ISBN 0852967217, 1999. [sent-222, score-0.068]
100 Cohn, editors, Advances in Neural Information Processin g Systems, volume 11, pages 183- 189. [sent-234, score-0.068]
wordName wordTfidf (topN-words)
[('bound', 0.428), ('fli', 0.266), ('third', 0.252), ('boltzmann', 0.214), ('bo', 0.192), ('tap', 0.192), ('field', 0.187), ('sigmoid', 0.172), ('partition', 0.161), ('belief', 0.159), ('variational', 0.128), ('mean', 0.119), ('si', 0.116), ('logzi', 0.114), ('visibles', 0.114), ('graphical', 0.113), ('oi', 0.102), ('order', 0.099), ('factorised', 0.098), ('primitive', 0.097), ('approximation', 0.092), ('lo', 0.09), ('tightest', 0.089), ('toy', 0.089), ('odd', 0.077), ('leisink', 0.076), ('sehidden', 0.076), ('equation', 0.074), ('article', 0.073), ('optimisation', 0.07), ('fl', 0.07), ('polynomial', 0.068), ('neurons', 0.068), ('nijmegen', 0.065), ('speak', 0.065), ('outline', 0.064), ('energy', 0.063), ('maximise', 0.059), ('mj', 0.059), ('thouless', 0.059), ('derive', 0.056), ('implicit', 0.055), ('networks', 0.054), ('weights', 0.053), ('likelihood', 0.052), ('rough', 0.052), ('thresholds', 0.052), ('exp', 0.051), ('error', 0.05), ('term', 0.05), ('bt', 0.049), ('tighter', 0.049), ('moment', 0.049), ('el', 0.049), ('saw', 0.049), ('compute', 0.048), ('always', 0.048), ('tractable', 0.047), ('machine', 0.045), ('turns', 0.045), ('bounding', 0.044), ('suboptimal', 0.044), ('expression', 0.043), ('ad', 0.043), ('normalisation', 0.043), ('create', 0.043), ('neuron', 0.042), ('ai', 0.042), ('functions', 0.042), ('choose', 0.041), ('tends', 0.041), ('cohn', 0.041), ('sj', 0.041), ('strictly', 0.04), ('respect', 0.039), ('machines', 0.038), ('log', 0.038), ('lower', 0.038), ('choice', 0.038), ('dx', 0.037), ('exponential', 0.036), ('layer', 0.036), ('network', 0.035), ('clear', 0.035), ('correspond', 0.035), ('kearns', 0.035), ('pages', 0.034), ('volume', 0.034), ('comparable', 0.034), ('surface', 0.033), ('known', 0.033), ('section', 0.033), ('ackley', 0.033), ('curved', 0.033), ('economic', 0.033), ('ev', 0.033), ('maximisation', 0.033), ('negligible', 0.033), ('netherlands', 0.033), ('overestimates', 0.033)]
simIndex simValue paperId paperTitle
same-paper 1 1.0 13 nips-2000-A Tighter Bound for Graphical Models
Author: Martijn A. R. Leisink, Hilbert J. Kappen
Abstract: We present a method to bound the partition function of a Boltzmann machine neural network with any odd order polynomial. This is a direct extension of the mean field bound, which is first order. We show that the third order bound is strictly better than mean field. Additionally we show the rough outline how this bound is applicable to sigmoid belief networks. Numerical experiments indicate that an error reduction of a factor two is easily reached in the region where expansion based approximations are useful. 1
2 0.31698409 114 nips-2000-Second Order Approximations for Probability Models
Author: Hilbert J. Kappen, Wim Wiegerinck
Abstract: In this paper, we derive a second order mean field theory for directed graphical probability models. By using an information theoretic argument it is shown how this can be done in the absense of a partition function. This method is a direct generalisation of the well-known TAP approximation for Boltzmann Machines. In a numerical example, it is shown that the method greatly improves the first order mean field approximation. For a restricted class of graphical models, so-called single overlap graphs, the second order method has comparable complexity to the first order method. For sigmoid belief networks, the method is shown to be particularly fast and effective.
3 0.20783506 14 nips-2000-A Variational Mean-Field Theory for Sigmoidal Belief Networks
Author: Chiranjib Bhattacharyya, S. Sathiya Keerthi
Abstract: A variational derivation of Plefka's mean-field theory is presented. This theory is then applied to sigmoidal belief networks with the aid of further approximations. Empirical evaluation on small scale networks show that the proposed approximations are quite competitive. 1
4 0.16299433 106 nips-2000-Propagation Algorithms for Variational Bayesian Learning
Author: Zoubin Ghahramani, Matthew J. Beal
Abstract: Variational approximations are becoming a widespread tool for Bayesian learning of graphical models. We provide some theoretical results for the variational updates in a very general family of conjugate-exponential graphical models. We show how the belief propagation and the junction tree algorithms can be used in the inference step of variational Bayesian learning. Applying these results to the Bayesian analysis of linear-Gaussian state-space models we obtain a learning procedure that exploits the Kalman smoothing propagation, while integrating over all model parameters. We demonstrate how this can be used to infer the hidden state dimensionality of the state-space model in a variety of synthetic problems and one real high-dimensional data set. 1
5 0.1333545 9 nips-2000-A PAC-Bayesian Margin Bound for Linear Classifiers: Why SVMs work
Author: Ralf Herbrich, Thore Graepel
Abstract: We present a bound on the generalisation error of linear classifiers in terms of a refined margin quantity on the training set. The result is obtained in a PAC- Bayesian framework and is based on geometrical arguments in the space of linear classifiers. The new bound constitutes an exponential improvement of the so far tightest margin bound by Shawe-Taylor et al. [8] and scales logarithmically in the inverse margin. Even in the case of less training examples than input dimensions sufficiently large margins lead to non-trivial bound values and - for maximum margins - to a vanishing complexity term. Furthermore, the classical margin is too coarse a measure for the essential quantity that controls the generalisation error: the volume ratio between the whole hypothesis space and the subset of consistent hypotheses. The practical relevance of the result lies in the fact that the well-known support vector machine is optimal w.r.t. the new bound only if the feature vectors are all of the same length. As a consequence we recommend to use SVMs on normalised feature vectors only - a recommendation that is well supported by our numerical experiments on two benchmark data sets. 1
6 0.13143611 58 nips-2000-From Margin to Sparsity
7 0.12863748 64 nips-2000-High-temperature Expansions for Learning Models of Nonnegative Data
8 0.11321498 46 nips-2000-Ensemble Learning and Linear Response Theory for ICA
9 0.11096144 21 nips-2000-Algorithmic Stability and Generalization Performance
10 0.10904473 77 nips-2000-Learning Curves for Gaussian Processes Regression: A Framework for Good Approximations
11 0.099471726 145 nips-2000-Weak Learners and Improved Rates of Convergence in Boosting
12 0.099445216 69 nips-2000-Incorporating Second-Order Functional Knowledge for Better Option Pricing
13 0.098466963 37 nips-2000-Convergence of Large Margin Separable Linear Classification
14 0.089939088 94 nips-2000-On Reversing Jensen's Inequality
15 0.088115305 115 nips-2000-Sequentially Fitting ``Inclusive'' Trees for Inference in Noisy-OR Networks
16 0.087482482 47 nips-2000-Error-correcting Codes on a Bethe-like Lattice
17 0.082988337 120 nips-2000-Sparse Greedy Gaussian Process Regression
18 0.080954216 108 nips-2000-Recognizing Hand-written Digits Using Hierarchical Products of Experts
19 0.076074488 107 nips-2000-Rate-coded Restricted Boltzmann Machines for Face Recognition
20 0.070125014 75 nips-2000-Large Scale Bayes Point Machines
topicId topicWeight
[(0, 0.278), (1, 0.052), (2, 0.045), (3, -0.131), (4, 0.308), (5, -0.179), (6, 0.046), (7, -0.018), (8, 0.066), (9, 0.053), (10, -0.015), (11, -0.115), (12, -0.308), (13, 0.079), (14, -0.04), (15, -0.063), (16, 0.195), (17, -0.16), (18, -0.105), (19, 0.035), (20, 0.096), (21, -0.146), (22, -0.042), (23, 0.037), (24, 0.013), (25, 0.042), (26, -0.086), (27, 0.02), (28, 0.03), (29, -0.01), (30, 0.031), (31, 0.061), (32, -0.025), (33, 0.002), (34, 0.106), (35, 0.098), (36, -0.025), (37, 0.048), (38, -0.06), (39, 0.018), (40, -0.051), (41, 0.001), (42, -0.003), (43, 0.035), (44, 0.077), (45, 0.002), (46, -0.045), (47, -0.048), (48, -0.001), (49, -0.024)]
simIndex simValue paperId paperTitle
same-paper 1 0.98334783 13 nips-2000-A Tighter Bound for Graphical Models
Author: Martijn A. R. Leisink, Hilbert J. Kappen
Abstract: We present a method to bound the partition function of a Boltzmann machine neural network with any odd order polynomial. This is a direct extension of the mean field bound, which is first order. We show that the third order bound is strictly better than mean field. Additionally we show the rough outline how this bound is applicable to sigmoid belief networks. Numerical experiments indicate that an error reduction of a factor two is easily reached in the region where expansion based approximations are useful. 1
2 0.87449288 114 nips-2000-Second Order Approximations for Probability Models
Author: Hilbert J. Kappen, Wim Wiegerinck
Abstract: In this paper, we derive a second order mean field theory for directed graphical probability models. By using an information theoretic argument it is shown how this can be done in the absense of a partition function. This method is a direct generalisation of the well-known TAP approximation for Boltzmann Machines. In a numerical example, it is shown that the method greatly improves the first order mean field approximation. For a restricted class of graphical models, so-called single overlap graphs, the second order method has comparable complexity to the first order method. For sigmoid belief networks, the method is shown to be particularly fast and effective.
3 0.84521508 14 nips-2000-A Variational Mean-Field Theory for Sigmoidal Belief Networks
Author: Chiranjib Bhattacharyya, S. Sathiya Keerthi
Abstract: A variational derivation of Plefka's mean-field theory is presented. This theory is then applied to sigmoidal belief networks with the aid of further approximations. Empirical evaluation on small scale networks show that the proposed approximations are quite competitive. 1
4 0.62632203 64 nips-2000-High-temperature Expansions for Learning Models of Nonnegative Data
Author: Oliver B. Downs
Abstract: Recent work has exploited boundedness of data in the unsupervised learning of new types of generative model. For nonnegative data it was recently shown that the maximum-entropy generative model is a Nonnegative Boltzmann Distribution not a Gaussian distribution, when the model is constrained to match the first and second order statistics of the data. Learning for practical sized problems is made difficult by the need to compute expectations under the model distribution. The computational cost of Markov chain Monte Carlo methods and low fidelity of naive mean field techniques has led to increasing interest in advanced mean field theories and variational methods. Here I present a secondorder mean-field approximation for the Nonnegative Boltzmann Machine model, obtained using a
5 0.4549728 46 nips-2000-Ensemble Learning and Linear Response Theory for ICA
Author: Pedro A. d. F. R. Højen-Sørensen, Ole Winther, Lars Kai Hansen
Abstract: We propose a general Bayesian framework for performing independent component analysis (leA) which relies on ensemble learning and linear response theory known from statistical physics. We apply it to both discrete and continuous sources. For the continuous source the underdetermined (overcomplete) case is studied. The naive mean-field approach fails in this case whereas linear response theory-which gives an improved estimate of covariances-is very efficient. The examples given are for sources without temporal correlations. However, this derivation can easily be extended to treat temporal correlations. Finally, the framework offers a simple way of generating new leA algorithms without needing to define the prior distribution of the sources explicitly.
6 0.38251281 69 nips-2000-Incorporating Second-Order Functional Knowledge for Better Option Pricing
7 0.37358588 77 nips-2000-Learning Curves for Gaussian Processes Regression: A Framework for Good Approximations
8 0.36984253 115 nips-2000-Sequentially Fitting ``Inclusive'' Trees for Inference in Noisy-OR Networks
9 0.36764884 58 nips-2000-From Margin to Sparsity
10 0.36615023 37 nips-2000-Convergence of Large Margin Separable Linear Classification
11 0.3588742 21 nips-2000-Algorithmic Stability and Generalization Performance
12 0.34761447 106 nips-2000-Propagation Algorithms for Variational Bayesian Learning
13 0.34526703 9 nips-2000-A PAC-Bayesian Margin Bound for Linear Classifiers: Why SVMs work
14 0.33529252 125 nips-2000-Stability and Noise in Biochemical Switches
15 0.32715911 120 nips-2000-Sparse Greedy Gaussian Process Regression
16 0.30283386 84 nips-2000-Minimum Bayes Error Feature Selection for Continuous Speech Recognition
17 0.30265552 145 nips-2000-Weak Learners and Improved Rates of Convergence in Boosting
18 0.29931375 1 nips-2000-APRICODD: Approximate Policy Construction Using Decision Diagrams
19 0.29741794 62 nips-2000-Generalized Belief Propagation
20 0.29029655 119 nips-2000-Some New Bounds on the Generalization Error of Combined Classifiers
topicId topicWeight
[(10, 0.028), (17, 0.103), (26, 0.013), (33, 0.034), (36, 0.304), (55, 0.032), (62, 0.04), (65, 0.017), (67, 0.08), (76, 0.086), (79, 0.015), (81, 0.018), (90, 0.029), (91, 0.102), (97, 0.024)]
simIndex simValue paperId paperTitle
same-paper 1 0.83187854 13 nips-2000-A Tighter Bound for Graphical Models
Author: Martijn A. R. Leisink, Hilbert J. Kappen
Abstract: We present a method to bound the partition function of a Boltzmann machine neural network with any odd order polynomial. This is a direct extension of the mean field bound, which is first order. We show that the third order bound is strictly better than mean field. Additionally we show the rough outline how this bound is applicable to sigmoid belief networks. Numerical experiments indicate that an error reduction of a factor two is easily reached in the region where expansion based approximations are useful. 1
2 0.71791905 142 nips-2000-Using Free Energies to Represent Q-values in a Multiagent Reinforcement Learning Task
Author: Brian Sallans, Geoffrey E. Hinton
Abstract: The problem of reinforcement learning in large factored Markov decision processes is explored. The Q-value of a state-action pair is approximated by the free energy of a product of experts network. Network parameters are learned on-line using a modified SARSA algorithm which minimizes the inconsistency of the Q-values of consecutive state-action pairs. Actions are chosen based on the current value estimates by fixing the current state and sampling actions from the network using Gibbs sampling. The algorithm is tested on a co-operative multi-agent task. The product of experts model is found to perform comparably to table-based Q-Iearning for small instances of the task, and continues to perform well when the problem becomes too large for a table-based representation.
3 0.57307559 114 nips-2000-Second Order Approximations for Probability Models
Author: Hilbert J. Kappen, Wim Wiegerinck
Abstract: In this paper, we derive a second order mean field theory for directed graphical probability models. By using an information theoretic argument it is shown how this can be done in the absense of a partition function. This method is a direct generalisation of the well-known TAP approximation for Boltzmann Machines. In a numerical example, it is shown that the method greatly improves the first order mean field approximation. For a restricted class of graphical models, so-called single overlap graphs, the second order method has comparable complexity to the first order method. For sigmoid belief networks, the method is shown to be particularly fast and effective.
4 0.5417437 85 nips-2000-Mixtures of Gaussian Processes
Author: Volker Tresp
Abstract: We introduce the mixture of Gaussian processes (MGP) model which is useful for applications in which the optimal bandwidth of a map is input dependent. The MGP is derived from the mixture of experts model and can also be used for modeling general conditional probability densities. We discuss how Gaussian processes -in particular in form of Gaussian process classification, the support vector machine and the MGP modelcan be used for quantifying the dependencies in graphical models.
5 0.52027535 78 nips-2000-Learning Joint Statistical Models for Audio-Visual Fusion and Segregation
Author: John W. Fisher III, Trevor Darrell, William T. Freeman, Paul A. Viola
Abstract: People can understand complex auditory and visual information, often using one to disambiguate the other. Automated analysis, even at a lowlevel, faces severe challenges, including the lack of accurate statistical models for the signals, and their high-dimensionality and varied sampling rates. Previous approaches [6] assumed simple parametric models for the joint distribution which, while tractable, cannot capture the complex signal relationships. We learn the joint distribution of the visual and auditory signals using a non-parametric approach. First, we project the data into a maximally informative, low-dimensional subspace, suitable for density estimation. We then model the complicated stochastic relationships between the signals using a nonparametric density estimator. These learned densities allow processing across signal modalities. We demonstrate, on synthetic and real signals, localization in video of the face that is speaking in audio, and, conversely, audio enhancement of a particular speaker selected from the video.
6 0.51354319 10 nips-2000-A Productive, Systematic Framework for the Representation of Visual Structure
7 0.50029904 64 nips-2000-High-temperature Expansions for Learning Models of Nonnegative Data
8 0.49746251 14 nips-2000-A Variational Mean-Field Theory for Sigmoidal Belief Networks
9 0.49232599 108 nips-2000-Recognizing Hand-written Digits Using Hierarchical Products of Experts
10 0.47430664 46 nips-2000-Ensemble Learning and Linear Response Theory for ICA
11 0.46650687 106 nips-2000-Propagation Algorithms for Variational Bayesian Learning
12 0.45415807 107 nips-2000-Rate-coded Restricted Boltzmann Machines for Face Recognition
13 0.45380566 122 nips-2000-Sparse Representation for Gaussian Process Models
14 0.44202188 146 nips-2000-What Can a Single Neuron Compute?
15 0.44198182 134 nips-2000-The Kernel Trick for Distances
16 0.44180098 37 nips-2000-Convergence of Large Margin Separable Linear Classification
17 0.43970126 95 nips-2000-On a Connection between Kernel PCA and Metric Multidimensional Scaling
18 0.43943015 74 nips-2000-Kernel Expansions with Unlabeled Examples
19 0.43668982 133 nips-2000-The Kernel Gibbs Sampler
20 0.43493456 94 nips-2000-On Reversing Jensen's Inequality