nips nips2000 nips2000-85 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Volker Tresp
Abstract: We introduce the mixture of Gaussian processes (MGP) model which is useful for applications in which the optimal bandwidth of a map is input dependent. The MGP is derived from the mixture of experts model and can also be used for modeling general conditional probability densities. We discuss how Gaussian processes -in particular in form of Gaussian process classification, the support vector machine and the MGP modelcan be used for quantifying the dependencies in graphical models.
Reference: text
sentIndex sentText sentNum sentScore
1 de Abstract We introduce the mixture of Gaussian processes (MGP) model which is useful for applications in which the optimal bandwidth of a map is input dependent. [sent-4, score-0.62]
2 The MGP is derived from the mixture of experts model and can also be used for modeling general conditional probability densities. [sent-5, score-0.158]
3 We discuss how Gaussian processes -in particular in form of Gaussian process classification, the support vector machine and the MGP modelcan be used for quantifying the dependencies in graphical models. [sent-6, score-0.603]
4 1 Introduction Gaussian processes are typically used for regression where it is assumed that the underlying function is generated by one infinite-dimensional Gaussian distribution (i. [sent-7, score-0.301]
5 In Gaussian process regression (GPR) we further assume that output data are generated by additive Gaussian noise, i. [sent-10, score-0.121]
6 GPR can be generalized by using likelihood models from the exponential family of distributions which is useful for classification and the prediction of lifetimes or counts. [sent-13, score-0.241]
7 The support vector machine (SVM) is a variant in which the likelihood model is not derived from the exponential family of distributions but rather uses functions with a discontinuous first derivative. [sent-14, score-0.24]
8 In this paper we introduce another generalization of GPR in form of the mixture of Gaussian processes (MGP) model which is a variant of the well known mixture of experts (ME) model of Jacobs et al. [sent-15, score-0.54]
9 The MGP model allows Gaussian processes to model general conditional probability densities. [sent-17, score-0.326]
10 An advantage of the MGP model is that it is fast to train, if compared to the neural network ME model. [sent-18, score-0.067]
11 Even more interesting, the MGP model is one possible approach of addressing the problem of input-dependent bandwidth requirements in GPR. [sent-19, score-0.229]
12 Input-dependent bandwidth is useful if either the complexity of the map is input dependent -requiring a higher bandwidth in regions of high complexity- or if the input data distribution is input dependent. [sent-20, score-0.62]
13 In the latter case, one would prefer Gaussian processes with a higher bandwidth in regions with many data points and a lower bandwidth in regions with lower data density. [sent-21, score-0.746]
14 If GPR models with different bandwidths are used, the MGP approach allows the system to self-organize by locally selecting the GPR model with the appropriate optimal bandwidth. [sent-22, score-0.082]
15 Gaussian process classifiers, the support vector machine and the MGP can be used to model the local dependencies in graphical models. [sent-23, score-0.335]
16 Here, we are mostly interested in the case that the dependencies of a set of variables y is modified via Gaussian processes by a set of exogenous variables x. [sent-24, score-0.637]
17 Another example would be collaborative filtering where y might represent a set of goods and the correlation between customer preferences is modeled by a dependency network (another example of a graphical model). [sent-26, score-0.432]
18 Here, exogenous variables such as income, gender and social status might be useful quantities to modify those dependencies. [sent-27, score-0.43]
19 In the next section we briefly review Gaussian processes and their application to regression. [sent-29, score-0.24]
20 In Section 3 we discuss generalizations of the simple GPR model. [sent-30, score-0.028]
21 In Section 4 we introduce the MGP model and present experimental results. [sent-31, score-0.043]
22 In Section 5 we discuss Gaussian processes in context with graphical models. [sent-32, score-0.341]
23 2 Gaussian Processes In Gaussian Process Regression (GPR) one assumes that a priori a function f(x) is generated from an infinite-dimensional Gaussian distribution with zero mean and covariance K(x, Xk) = cav (f (x) , f(Xk)) where K(x, Xk) are positive definite kernel functions. [sent-34, score-0.068]
24 In this paper we will only use Gaussian kernel functions of the form K(X,Xk) = Aexp (_llx - xk112) 2 28 with scale parameter 8 and amplitude A. [sent-35, score-0.027]
25 Furthermore, we assume a set of N training data D = {(Xk' Yk) H'=l where targets are generated following a normal distribution with variance (72 such that P(ylf(x)) ex exp ( - 2~2 (f(x) - y)2) . [sent-36, score-0.143]
26 (1) The expected value j(x) to an input x given the training data is a superposition of the kernel functions of the form N j(x) = L WkK(X, Xk). [sent-37, score-0.055]
27 Then we have the relation fm = Kw where the components of fm = (f(XI), . [sent-40, score-0.15]
28 ,f(XN ))' are the values of f at the location of the training data and W = (WI' . [sent-43, score-0.025]
29 As a result of this relationship we can either calculate the optimal W or we can calculate the optimal fm and then deduce the corresponding w-vector by matrix inversion. [sent-47, score-0.075]
30 Following the assumptions, the optimal fm minimizes the cost function (3) such that jm = K(K Here y + (72 I)-ly. [sent-49, score-0.075]
31 3 Generalized Gaussian Processes and the Support Vector Machine In generalized Gaussian processes the Gaussian prior assumption is maintained but the likelihood model is now derived from the exponential family of distributions. [sent-54, score-0.419]
32 The most important special cases are two-class classification 1 P(y = 1If(x)) = 1 + exp( - f(x)) and multiple-class classification. [sent-55, score-0.031]
33 Here, y is a discrete variable with C states and P(y . [sent-56, score-0.07]
34 , fdx)) = ~c;;-----"---'-'-----'----'--'- Lj=l exp (/i(x)) (4) Note, that for multiple-class classification C Gaussian processes h (x), . [sent-60, score-0.325]
35 The special case of classification was discussed by Williams and Barber (1998) from a Bayesian perspective. [sent-65, score-0.054]
36 The related smoothing splines approaches are discussed in Fahrmeir and Tutz (1994). [sent-66, score-0.023]
37 For generalized Gaussian processes, the optimization of the cost function is based on an iterative Fisher scoring procedure. [sent-67, score-0.105]
38 Incidentally, the support vector machine (SVM) can also be considered to be a generalized Gaussian process model with P(ylf(x)) ex exp (-const(l- yf(x))+). [sent-68, score-0.241]
39 1 The SVM cost function is particularly interesting since due to its discontinuous first derivative, many components of the optimal weight vector w are zero, i. [sent-70, score-0.037]
40 4 Mixtures of Gaussian Processes GPR employs a global scale parameter s. [sent-73, score-0.027]
41 In many applications it might be more desirable to permit an input-dependent scale parameter: the complexity of the map might be input dependent or the input data density might be nonuniform. [sent-74, score-0.328]
42 In the latter case one might want to use a smaller scale parameter in regions with high data density. [sent-75, score-0.147]
43 This is the main motivation for introducing another generalization of the simple GPR model, the mixture of Gaussian processes (MGP) model, which is a variant of the mixture of experts model of Jacobs et al. [sent-76, score-0.497]
44 Here, a set of GPR models with different scale parameters is used and the system can autonomously decide which GPR model is appropriate for a particular region of input space. [sent-78, score-0.139]
45 The state ofa discrete M -state variable z determines which of the GPR models is active for a given input x. [sent-84, score-0.139]
46 The state of z is estimated by an M -class classification Gaussian process model with P(z = iIFZ(x)) = exp (ft(x)) L~l exp (Jj(x)) where FZ(x) = {f{(x), . [sent-85, score-0.217]
47 Finally, we use a set of M Gaussian processes FU (x) = {ff(x) , . [sent-89, score-0.24]
48 ,fM(X)} to model the input-dependent noise variance of the GPR models. [sent-92, score-0.071]
49 The likelihood model given the state of z P(ylz, P'(x), F U(x)) = G (yj ff'(x), exp(2g (x))) is a Gaussian centered at ff(x) and with variance (exp(2J:(x))). [sent-93, score-0.096]
50 Note that G(aj b, c) is our notation for a Gaussian density with mean b, variance c, evaluated at a. [sent-95, score-0.089]
51 In the remaining parts of the paper we will not denote the lProperly normalizing the conditional probability density is somewhat tricky and is discussed in detail in Sollich (2000). [sent-96, score-0.052]
52 Since z is a latent variable we obtain with M P(Ylx) M = LP(z = ilx) G(Yjft(x),exp(2ff(x))) E(Ylx) =L i=l P(z = ilx) ft(x) i=l the well known mixture of experts network of Jacobs et al (1991) where the Jt(x) are the (Gaussian process) experts and P(z = ilx) is the gating network. [sent-99, score-0.301]
53 Figure 2 (left) illustrates the dependencies in the GPR model. [sent-100, score-0.133]
54 Therefore one is typically interested in the minimum of the negative logarithm of the posterior density N M - L log L P(z = ilxk) G (Ykj ft(Xk), exp(2ff(xk))) k=l i=l 1M 1M 1M +2 LU:,m)'('E:,m)-l f:,m+ 2 Lut,m)'('Er,m)-l ft,m+ 2 LU;,m)'('Ef,m)-l f;,m. [sent-103, score-0.054]
55 In the E-step, based on the current estimates of the Gaussian processes at the data points, the state of the latent variable is estimated as In the M-step, based on the E-step, the Gaussian processes at the data points are updated. [sent-110, score-0.599]
56 Note, that data with a small F(z = ilxk,Yk) obtain a small weight. [sent-114, score-0.025]
57 To update the other Gaussian processes iterative Fisher scoring steps have to be used as shown in the appendix. [sent-115, score-0.287]
58 The reason is that the GPR model with the highest bandwidth tends to obtain the highest weight in the E-step since it provides the best fit to the data. [sent-117, score-0.287]
59 There is an easy fix for the MGP: For calculating the responses of the Gaussian processes at Xk in the E-step we use all training data except (Xk, Yk). [sent-118, score-0.265]
60 Fortunately, this calculation is very cheap in the case of Gaussian processes since for example _/1-( _ 1i Xk ) - Yk - Yk - it(Xk) 1- Si ,kk where it(Xk) denotes the estimates at the training data point Xk not using (Xk, Yk) . [sent-119, score-0.265]
61 5 / Q) -1 -1 0 2 -2 x -1 0 2 x Figure 1: The input data are generated from a Gaussian distribution with unit variance and mean O. [sent-139, score-0.151]
62 The output data are generated from a step function (0, bottom right). [sent-140, score-0.061]
63 The top left plot shows the map formed by three GPR models with different bandwidths. [sent-141, score-0.112]
64 As can be seen no individual model achieves a good map. [sent-142, score-0.043]
65 Then a MGP model was trained using the three GPR models. [sent-143, score-0.043]
66 The top right plot shows the GPR models after convergence. [sent-144, score-0.071]
67 The GPR model with the highest bandwidth models the transition at zero, the GPR model with an intermediate bandwidth models the intermediate region and the GPR model with the lowest bandwidth models the extreme regions. [sent-146, score-0.833]
68 The bottom right plot shows the data 0 and the fit obtained by the complete MGP model which is better than the map formed by any of the individual GPR models. [sent-147, score-0.141]
69 2 Experiments Figure 1 illustrates how the MGP divides up a complex task into subtasks modeled by the individual GPR models (see caption). [sent-149, score-0.075]
70 By dividing up the task, the MGP model can potentially achieve a performance which is better than the performance of any individual model. [sent-150, score-0.043]
71 Table 1 shows results from artificial data sets and real world data sets. [sent-151, score-0.05]
72 In all cases, the performance of the MGP is better than the mean performance of the GPR models and also better than the performance of the mean (obtained by averaging the predictions of all GPR models). [sent-152, score-0.103]
73 5 Gaussian Processes for Graphical Models Gaussian processes can be useful models for quantifying the dependencies in Bayesian networks and dependency networks (the latter were introduced in Hofmann and Tresp, 1998, Heckerman et ai. [sent-153, score-0.597]
74 , 2000), in particular when parent variables are continuous quantities. [sent-154, score-0.088]
75 If the child variable is discrete, Gaussian process classification or the SVM are appropriate models whereas when the child variable is continuous, the MGP model can be employed as a general conditional density estimator. [sent-155, score-0.301]
76 Typically one would require that the continuous input variables to the Gaussian process systems x are known. [sent-156, score-0.153]
77 It might therefore be Table 1: The table shows results using artificial and real data sets of size N = 100 using M = 10 GPR models. [sent-157, score-0.064]
78 The data set ART is generated by adding Gaussian noise with a standard deviation of 0. [sent-158, score-0.061]
79 2 to a map defined by 5 normalized Gaussian bumps. [sent-159, score-0.041]
80 The bandwidth s was generated randomly between 0 and max. [sent-161, score-0.222]
81 is the mean squared test set error of all GPR networks and peif. [sent-164, score-0.032]
82 of mean is the mean squared test set error achieved by simple averaging the predictions. [sent-165, score-0.064]
83 5979 useful to consider those as exogenous variables which modify the dependencies in a graphical model of y-variables as shown in Figure 2 (right). [sent-197, score-0.498]
84 Another example would be collaborative filtering where y might represent a set of goods and the correlation between customer preferences is modeled by a dependency network as in Heckerman et al. [sent-199, score-0.382]
85 Here, exogenous variables such as income, gender and social status might be useful quantities to modify those correlations. [sent-201, score-0.43]
86 Note, that the GPR model itself can also be considered to be a graphical model with dependencies modeled as Gaussian processes (compare Figure 2). [sent-202, score-0.568]
87 Readers might also be interested in the related and independent paper by Friedman and Nachman (2000) in which those authors used GPR systems (not in form of the MGP) to perform structural learning in Bayesian networks of continuous variables. [sent-203, score-0.112]
88 6 Conclusions We demonstrated that Gaussian processes can be useful building blocks for forming complex probabilistic models. [sent-204, score-0.275]
89 In particular we introduced the MGP model and demonstrated how Gaussian processes can model the dependencies in graphical models. [sent-205, score-0.532]
90 where Figure 2: Left: The graphical structure of an MGP model consisting of the discrete latent variable z, the continuous variable y and input variable x. [sent-218, score-0.369]
91 The probability density of z is dependent on the Gaussian processes F Z. [sent-219, score-0.298]
92 The probability distribution of y is dependent on the state of z and of the Gaussian processes FI' , FO". [sent-220, score-0.269]
93 Right: An example of a Bayesian network which contains the variables Y1, Y2, Y3, Y4. [sent-221, score-0.064]
94 Some of the dependencies are modified by x via Gaussian processes it, /2, h. [sent-222, score-0.402]
wordName wordTfidf (topN-words)
[('gpr', 0.578), ('mgp', 0.498), ('processes', 0.24), ('bandwidth', 0.186), ('xk', 0.18), ('gaussian', 0.167), ('dependencies', 0.133), ('exogenous', 0.13), ('ilxk', 0.112), ('tresp', 0.101), ('yk', 0.088), ('art', 0.079), ('fm', 0.075), ('graphical', 0.073), ('experts', 0.07), ('jacobs', 0.068), ('hofmann', 0.068), ('ylz', 0.065), ('ft', 0.061), ('ff', 0.059), ('generalized', 0.058), ('dependency', 0.056), ('collaborative', 0.056), ('sollich', 0.056), ('ilx', 0.056), ('exp', 0.054), ('continuous', 0.048), ('scoring', 0.047), ('mixture', 0.045), ('modify', 0.044), ('customer', 0.043), ('fahrmeir', 0.043), ('goods', 0.043), ('income', 0.043), ('nachman', 0.043), ('numin', 0.043), ('quantifying', 0.043), ('tutz', 0.043), ('wz', 0.043), ('model', 0.043), ('heckerman', 0.042), ('map', 0.041), ('variables', 0.04), ('bayesian', 0.04), ('quantities', 0.04), ('might', 0.039), ('models', 0.039), ('fisher', 0.038), ('blood', 0.037), ('social', 0.037), ('ylf', 0.037), ('discontinuous', 0.037), ('diseases', 0.037), ('symptoms', 0.037), ('variable', 0.036), ('generated', 0.036), ('modeled', 0.036), ('process', 0.035), ('useful', 0.035), ('discrete', 0.034), ('preferences', 0.034), ('status', 0.034), ('fu', 0.034), ('svm', 0.033), ('latent', 0.033), ('mean', 0.032), ('plot', 0.032), ('classification', 0.031), ('gender', 0.031), ('patient', 0.031), ('age', 0.031), ('variant', 0.031), ('input', 0.03), ('highest', 0.029), ('modified', 0.029), ('dependent', 0.029), ('density', 0.029), ('latter', 0.028), ('filtering', 0.028), ('discuss', 0.028), ('variance', 0.028), ('ylx', 0.028), ('regions', 0.028), ('exponential', 0.028), ('mixtures', 0.027), ('scale', 0.027), ('child', 0.026), ('williams', 0.026), ('support', 0.026), ('machine', 0.025), ('regression', 0.025), ('family', 0.025), ('data', 0.025), ('friedman', 0.025), ('likelihood', 0.025), ('interested', 0.025), ('network', 0.024), ('medical', 0.023), ('discussed', 0.023), ('et', 0.023)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999958 85 nips-2000-Mixtures of Gaussian Processes
Author: Volker Tresp
Abstract: We introduce the mixture of Gaussian processes (MGP) model which is useful for applications in which the optimal bandwidth of a map is input dependent. The MGP is derived from the mixture of experts model and can also be used for modeling general conditional probability densities. We discuss how Gaussian processes -in particular in form of Gaussian process classification, the support vector machine and the MGP modelcan be used for quantifying the dependencies in graphical models.
2 0.11211876 77 nips-2000-Learning Curves for Gaussian Processes Regression: A Framework for Good Approximations
Author: Dörthe Malzahn, Manfred Opper
Abstract: Based on a statistical mechanics approach, we develop a method for approximately computing average case learning curves for Gaussian process regression models. The approximation works well in the large sample size limit and for arbitrary dimensionality of the input space. We explain how the approximation can be systematically improved and argue that similar techniques can be applied to general likelihood models. 1
3 0.096735582 122 nips-2000-Sparse Representation for Gaussian Process Models
Author: Lehel Csatč´¸, Manfred Opper
Abstract: We develop an approach for a sparse representation for Gaussian Process (GP) models in order to overcome the limitations of GPs caused by large data sets. The method is based on a combination of a Bayesian online algorithm together with a sequential construction of a relevant subsample of the data which fully specifies the prediction of the model. Experimental results on toy examples and large real-world data sets indicate the efficiency of the approach.
4 0.089477971 86 nips-2000-Model Complexity, Goodness of Fit and Diminishing Returns
Author: Igor V. Cadez, Padhraic Smyth
Abstract: We investigate a general characteristic of the trade-off in learning problems between goodness-of-fit and model complexity. Specifically we characterize a general class of learning problems where the goodness-of-fit function can be shown to be convex within firstorder as a function of model complexity. This general property of
5 0.076542966 41 nips-2000-Discovering Hidden Variables: A Structure-Based Approach
Author: Gal Elidan, Noam Lotner, Nir Friedman, Daphne Koller
Abstract: A serious problem in learning probabilistic models is the presence of hidden variables. These variables are not observed, yet interact with several of the observed variables. As such, they induce seemingly complex dependencies among the latter. In recent years, much attention has been devoted to the development of algorithms for learning parameters, and in some cases structure, in the presence of hidden variables. In this paper, we address the related problem of detecting hidden variables that interact with the observed variables. This problem is of interest both for improving our understanding of the domain and as a preliminary step that guides the learning procedure towards promising models. A very natural approach is to search for
6 0.076198459 114 nips-2000-Second Order Approximations for Probability Models
7 0.072477303 35 nips-2000-Computing with Finite and Infinite Networks
8 0.067570962 106 nips-2000-Propagation Algorithms for Variational Bayesian Learning
9 0.067300811 140 nips-2000-Tree-Based Modeling and Estimation of Gaussian Processes on Graphs with Cycles
10 0.063650832 92 nips-2000-Occam's Razor
11 0.063086361 120 nips-2000-Sparse Greedy Gaussian Process Regression
12 0.06239491 39 nips-2000-Decomposition of Reinforcement Learning for Admission Control of Self-Similar Call Arrival Processes
13 0.059549928 3 nips-2000-A Gradient-Based Boosting Algorithm for Regression Problems
14 0.057238389 74 nips-2000-Kernel Expansions with Unlabeled Examples
15 0.05543058 65 nips-2000-Higher-Order Statistical Properties Arising from the Non-Stationarity of Natural Signals
16 0.053147417 76 nips-2000-Learning Continuous Distributions: Simulations With Field Theoretic Priors
17 0.052902337 75 nips-2000-Large Scale Bayes Point Machines
18 0.051635958 115 nips-2000-Sequentially Fitting ``Inclusive'' Trees for Inference in Noisy-OR Networks
19 0.050496444 142 nips-2000-Using Free Energies to Represent Q-values in a Multiagent Reinforcement Learning Task
20 0.04412641 133 nips-2000-The Kernel Gibbs Sampler
topicId topicWeight
[(0, 0.176), (1, 0.023), (2, 0.071), (3, -0.031), (4, 0.113), (5, 0.029), (6, -0.055), (7, 0.048), (8, 0.002), (9, -0.105), (10, -0.005), (11, 0.084), (12, 0.026), (13, -0.032), (14, 0.082), (15, 0.093), (16, -0.04), (17, 0.097), (18, 0.016), (19, -0.054), (20, -0.037), (21, -0.023), (22, 0.076), (23, 0.088), (24, -0.021), (25, -0.063), (26, -0.158), (27, 0.006), (28, 0.128), (29, 0.058), (30, 0.037), (31, 0.092), (32, -0.053), (33, 0.026), (34, -0.106), (35, 0.084), (36, -0.051), (37, 0.003), (38, -0.024), (39, -0.143), (40, 0.03), (41, -0.052), (42, -0.018), (43, 0.055), (44, 0.141), (45, -0.06), (46, 0.225), (47, 0.183), (48, 0.173), (49, -0.092)]
simIndex simValue paperId paperTitle
same-paper 1 0.92207944 85 nips-2000-Mixtures of Gaussian Processes
Author: Volker Tresp
Abstract: We introduce the mixture of Gaussian processes (MGP) model which is useful for applications in which the optimal bandwidth of a map is input dependent. The MGP is derived from the mixture of experts model and can also be used for modeling general conditional probability densities. We discuss how Gaussian processes -in particular in form of Gaussian process classification, the support vector machine and the MGP modelcan be used for quantifying the dependencies in graphical models.
2 0.49469036 77 nips-2000-Learning Curves for Gaussian Processes Regression: A Framework for Good Approximations
Author: Dörthe Malzahn, Manfred Opper
Abstract: Based on a statistical mechanics approach, we develop a method for approximately computing average case learning curves for Gaussian process regression models. The approximation works well in the large sample size limit and for arbitrary dimensionality of the input space. We explain how the approximation can be systematically improved and argue that similar techniques can be applied to general likelihood models. 1
3 0.47277424 86 nips-2000-Model Complexity, Goodness of Fit and Diminishing Returns
Author: Igor V. Cadez, Padhraic Smyth
Abstract: We investigate a general characteristic of the trade-off in learning problems between goodness-of-fit and model complexity. Specifically we characterize a general class of learning problems where the goodness-of-fit function can be shown to be convex within firstorder as a function of model complexity. This general property of
4 0.45335895 122 nips-2000-Sparse Representation for Gaussian Process Models
Author: Lehel Csatč´¸, Manfred Opper
Abstract: We develop an approach for a sparse representation for Gaussian Process (GP) models in order to overcome the limitations of GPs caused by large data sets. The method is based on a combination of a Bayesian online algorithm together with a sequential construction of a relevant subsample of the data which fully specifies the prediction of the model. Experimental results on toy examples and large real-world data sets indicate the efficiency of the approach.
5 0.37059522 3 nips-2000-A Gradient-Based Boosting Algorithm for Regression Problems
Author: Richard S. Zemel, Toniann Pitassi
Abstract: In adaptive boosting, several weak learners trained sequentially are combined to boost the overall algorithm performance. Recently adaptive boosting methods for classification problems have been derived as gradient descent algorithms. This formulation justifies key elements and parameters in the methods, all chosen to optimize a single common objective function. We propose an analogous formulation for adaptive boosting of regression problems, utilizing a novel objective function that leads to a simple boosting algorithm. We prove that this method reduces training error, and compare its performance to other regression methods. The aim of boosting algorithms is to
6 0.3510856 35 nips-2000-Computing with Finite and Infinite Networks
7 0.34841606 39 nips-2000-Decomposition of Reinforcement Learning for Admission Control of Self-Similar Call Arrival Processes
8 0.34041053 132 nips-2000-The Interplay of Symbolic and Subsymbolic Processes in Anagram Problem Solving
9 0.33495679 140 nips-2000-Tree-Based Modeling and Estimation of Gaussian Processes on Graphs with Cycles
10 0.33153254 48 nips-2000-Exact Solutions to Time-Dependent MDPs
11 0.32662189 114 nips-2000-Second Order Approximations for Probability Models
12 0.32514083 65 nips-2000-Higher-Order Statistical Properties Arising from the Non-Stationarity of Natural Signals
13 0.32328686 41 nips-2000-Discovering Hidden Variables: A Structure-Based Approach
14 0.31758294 59 nips-2000-From Mixtures of Mixtures to Adaptive Transform Coding
15 0.31033134 106 nips-2000-Propagation Algorithms for Variational Bayesian Learning
16 0.30685848 74 nips-2000-Kernel Expansions with Unlabeled Examples
17 0.30563402 98 nips-2000-Partially Observable SDE Models for Image Sequence Recognition Tasks
18 0.30518463 60 nips-2000-Gaussianization
19 0.30082482 92 nips-2000-Occam's Razor
20 0.29494017 115 nips-2000-Sequentially Fitting ``Inclusive'' Trees for Inference in Noisy-OR Networks
topicId topicWeight
[(10, 0.021), (17, 0.088), (32, 0.014), (33, 0.033), (48, 0.013), (54, 0.022), (55, 0.038), (62, 0.056), (65, 0.014), (67, 0.057), (76, 0.076), (79, 0.013), (81, 0.021), (90, 0.048), (91, 0.372), (97, 0.015)]
simIndex simValue paperId paperTitle
1 0.88455248 114 nips-2000-Second Order Approximations for Probability Models
Author: Hilbert J. Kappen, Wim Wiegerinck
Abstract: In this paper, we derive a second order mean field theory for directed graphical probability models. By using an information theoretic argument it is shown how this can be done in the absense of a partition function. This method is a direct generalisation of the well-known TAP approximation for Boltzmann Machines. In a numerical example, it is shown that the method greatly improves the first order mean field approximation. For a restricted class of graphical models, so-called single overlap graphs, the second order method has comparable complexity to the first order method. For sigmoid belief networks, the method is shown to be particularly fast and effective.
2 0.85505551 78 nips-2000-Learning Joint Statistical Models for Audio-Visual Fusion and Segregation
Author: John W. Fisher III, Trevor Darrell, William T. Freeman, Paul A. Viola
Abstract: People can understand complex auditory and visual information, often using one to disambiguate the other. Automated analysis, even at a lowlevel, faces severe challenges, including the lack of accurate statistical models for the signals, and their high-dimensionality and varied sampling rates. Previous approaches [6] assumed simple parametric models for the joint distribution which, while tractable, cannot capture the complex signal relationships. We learn the joint distribution of the visual and auditory signals using a non-parametric approach. First, we project the data into a maximally informative, low-dimensional subspace, suitable for density estimation. We then model the complicated stochastic relationships between the signals using a nonparametric density estimator. These learned densities allow processing across signal modalities. We demonstrate, on synthetic and real signals, localization in video of the face that is speaking in audio, and, conversely, audio enhancement of a particular speaker selected from the video.
same-paper 3 0.82646078 85 nips-2000-Mixtures of Gaussian Processes
Author: Volker Tresp
Abstract: We introduce the mixture of Gaussian processes (MGP) model which is useful for applications in which the optimal bandwidth of a map is input dependent. The MGP is derived from the mixture of experts model and can also be used for modeling general conditional probability densities. We discuss how Gaussian processes -in particular in form of Gaussian process classification, the support vector machine and the MGP modelcan be used for quantifying the dependencies in graphical models.
4 0.57867557 13 nips-2000-A Tighter Bound for Graphical Models
Author: Martijn A. R. Leisink, Hilbert J. Kappen
Abstract: We present a method to bound the partition function of a Boltzmann machine neural network with any odd order polynomial. This is a direct extension of the mean field bound, which is first order. We show that the third order bound is strictly better than mean field. Additionally we show the rough outline how this bound is applicable to sigmoid belief networks. Numerical experiments indicate that an error reduction of a factor two is easily reached in the region where expansion based approximations are useful. 1
5 0.48680755 14 nips-2000-A Variational Mean-Field Theory for Sigmoidal Belief Networks
Author: Chiranjib Bhattacharyya, S. Sathiya Keerthi
Abstract: A variational derivation of Plefka's mean-field theory is presented. This theory is then applied to sigmoidal belief networks with the aid of further approximations. Empirical evaluation on small scale networks show that the proposed approximations are quite competitive. 1
6 0.48002425 106 nips-2000-Propagation Algorithms for Variational Bayesian Learning
7 0.43906066 64 nips-2000-High-temperature Expansions for Learning Models of Nonnegative Data
8 0.43739727 46 nips-2000-Ensemble Learning and Linear Response Theory for ICA
9 0.42898968 122 nips-2000-Sparse Representation for Gaussian Process Models
10 0.42538106 62 nips-2000-Generalized Belief Propagation
11 0.41405043 69 nips-2000-Incorporating Second-Order Functional Knowledge for Better Option Pricing
12 0.40946311 94 nips-2000-On Reversing Jensen's Inequality
13 0.40472052 49 nips-2000-Explaining Away in Weight Space
14 0.40332946 115 nips-2000-Sequentially Fitting ``Inclusive'' Trees for Inference in Noisy-OR Networks
15 0.40243587 140 nips-2000-Tree-Based Modeling and Estimation of Gaussian Processes on Graphs with Cycles
16 0.39973611 74 nips-2000-Kernel Expansions with Unlabeled Examples
17 0.39714858 26 nips-2000-Automated State Abstraction for Options using the U-Tree Algorithm
18 0.39131996 123 nips-2000-Speech Denoising and Dereverberation Using Probabilistic Models
19 0.38841423 136 nips-2000-The Missing Link - A Probabilistic Model of Document Content and Hypertext Connectivity
20 0.38729632 146 nips-2000-What Can a Single Neuron Compute?