nips nips2001 nips2001-154 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Christopher Williams, Felix V. Agakov, Stephen N. Felderhof
Abstract: Recently Hinton (1999) has introduced the Products of Experts (PoE) model in which several individual probabilistic models for data are combined to provide an overall model of the data. Below we consider PoE models in which each expert is a Gaussian. Although the product of Gaussians is also a Gaussian, if each Gaussian has a simple structure the product can have a richer structure. We examine (1) Products of Gaussian pancakes which give rise to probabilistic Minor Components Analysis, (2) products of I-factor PPCA models and (3) a products of experts construction for an AR(l) process. Recently Hinton (1999) has introduced the Products of Experts (PoE) model in which several individual probabilistic models for data are combined to provide an overall model of the data. In this paper we consider PoE models in which each expert is a Gaussian. It is easy to see that in this case the product model will also be Gaussian. However, if each Gaussian has a simple structure, the product can have a richer structure. Using Gaussian experts is attractive as it permits a thorough analysis of the product architecture, which can be difficult with other models , e.g. models defined over discrete random variables. Below we examine three cases of the products of Gaussians construction: (1) Products of Gaussian pancakes (PoGP) which give rise to probabilistic Minor Components Analysis (MCA), providing a complementary result to probabilistic Principal Components Analysis (PPCA) obtained by Tipping and Bishop (1999); (2) Products of I-factor PPCA models; (3) A products of experts construction for an AR(l) process. Products of Gaussians If each expert is a Gaussian pi(xI8 i ) '
Reference: text
sentIndex sentText sentNum sentScore
1 uk Abstract Recently Hinton (1999) has introduced the Products of Experts (PoE) model in which several individual probabilistic models for data are combined to provide an overall model of the data. [sent-18, score-0.183]
2 Below we consider PoE models in which each expert is a Gaussian. [sent-19, score-0.112]
3 Although the product of Gaussians is also a Gaussian, if each Gaussian has a simple structure the product can have a richer structure. [sent-20, score-0.21]
4 We examine (1) Products of Gaussian pancakes which give rise to probabilistic Minor Components Analysis, (2) products of I-factor PPCA models and (3) a products of experts construction for an AR(l) process. [sent-21, score-0.873]
5 Recently Hinton (1999) has introduced the Products of Experts (PoE) model in which several individual probabilistic models for data are combined to provide an overall model of the data. [sent-22, score-0.158]
6 In this paper we consider PoE models in which each expert is a Gaussian. [sent-23, score-0.112]
7 It is easy to see that in this case the product model will also be Gaussian. [sent-24, score-0.11]
8 However, if each Gaussian has a simple structure, the product can have a richer structure. [sent-25, score-0.129]
9 Using Gaussian experts is attractive as it permits a thorough analysis of the product architecture, which can be difficult with other models , e. [sent-26, score-0.353]
10 Products of Gaussians If each expert is a Gaussian pi(xI8 i ) '" N(J1i' ( i), the resulting distribution of the product of m Gaussians may be expressed as By completing the square in the exponent it may be easily shown that p(xI8) N(/1;E, (2:), where (E l = 2::1 (i l . [sent-30, score-0.179]
11 ° 1 Products of Gaussian Pancakes A Gaussian "pancake" (GP) is a d-dimensional Gaussian, contracted in one dimension and elongated in the other d - 1 dimensions. [sent-33, score-0.136]
12 In this section we show that the maximum likelihood solution for a product of Gaussian pancakes (PoGP) yields a probabilistic formulation of Minor Components Analysis (MCA). [sent-34, score-0.408]
13 1 Covariance Structure of a GP Expert Consider a d-dimensional Gaussian whose probability contours are contracted in the direction w and equally elongated in mutually orthogonal directions VI , . [sent-36, score-0.219]
14 Its inverse covariance may be written as d- l ( -1= L ViV; /30 + ww T /3,;;, (1) i= l where VI, . [sent-41, score-0.358]
15 ,V d - l, W form a d x d matrix of normalized eigenvectors of the covariance C. [sent-44, score-0.273]
16 /30 = 0"0 2 , /3,;; = 0";;2 define inverse variances in the directions of elongation and contraction respectively, so that 0"5 2 0"1. [sent-45, score-0.361]
17 Notice that according to the constraint considerations /30 < /3,;;, and all elements of ware real-valued. [sent-47, score-0.048]
18 Note the similarity of (2) with expression for the covariance of the data of a 1factor probabilistic principal component analysis model ( = 0"21d + ww T (Tipping and Bishop, 1999) , where 0"2 is the variance of the factor-independent spherical Gaussian noise. [sent-48, score-0.417]
19 The only difference is that it is the inverse covariance matrix for the constrained Gaussian model rather than the covariance matrix which has the structure of a rank-1 update to a multiple of Id . [sent-49, score-0.562]
20 2 Covariance of the PoGP Model We now consider a product of m GP experts, each of which is contracted in a single dimension. [sent-51, score-0.172]
21 We will refer to the model as a (I,m) PoGP, where 1 represents the number of directions of contraction of each expert. [sent-52, score-0.167]
22 We also assume that all experts have identical means. [sent-53, score-0.161]
23 From (1), the inverse covariance of the the resulting (I,m) PoGP model can be expressed as m C;;l =L Ci l (3) i=l where columns of We Rdxm correspond to weight vectors of the m PoGP experts, and (3E = 2::1 (3~i) > o. [sent-54, score-0.311]
24 In Williams and Agakov (2001) it is shown that stationarity of the log-likelihood with respect to the weight matrix Wand the noise parameter (3E results in three classes of solutions for the experts' weight matrix, namely W 5 5W 0; CE ; CEW, W:j:. [sent-58, score-0.081]
25 CE , (4) where 5 is the covariance matrix of the data (with an assumed mean of zero). [sent-60, score-0.212]
26 In Appendix A and Williams and Agakov (2001) it is shown that the maximum likelihood solution for WML is given by: (5) where R c Rmxm is an arbitrary rotation matrix, A is a m x m matrix containing the m smallest eigenvalues of 5 and U = [Ul , . [sent-62, score-0.196]
27 ,u m ] c Rdxm is a matrix of the corresponding eigenvectors of 5. [sent-65, score-0.12]
28 Thus, the maximum likelihood solution for the weights of the (1, m) PoG P model corresponds to m scaled and rotated minor eigenvectors of the sample covariance 5 and leads to a probabilistic model of minor component analysis. [sent-66, score-0.767]
29 As in the PPCA model, the number of experts m is assumed to be lower than the dimension of the data space d. [sent-67, score-0.161]
30 The correctness of this derivation has been confirmed experimentally by using a scaled conjugate gradient search to optimize the log likelihood as a function of W and (3E. [sent-68, score-0.09]
31 4 Discussion of PoGP model An intuitive interpretation of the PoGP model is as follows: Each Gaussian pancake imposes an approximate linear constraint in x space. [sent-70, score-0.171]
32 Such a linear constraint is that x should lie close to a particular hyperplane. [sent-71, score-0.022]
33 The conjunction of these constraints is given by the product of the Gaussian pancakes. [sent-72, score-0.081]
34 If m « d it will make sense to lBecause equation 3 has the form of a factor analysis decomposition, but for the inverse covariance matrix, we sometimes refer to PoGP as the rotcaf model. [sent-73, score-0.262]
35 define the resulting Gaussian distribution in terms of the constraints. [sent-74, score-0.027]
36 However, if there are many constraints (m > d/2) then it can be more efficient to describe the directions of large variability using a PPCA model, rather than the directions of small variability using a PoGP model. [sent-75, score-0.178]
37 (1991) in what they call the "Dual Subspace Pattern Recognition Method" where both PCA and MCA models are used (although their work does not use explicit probabilistic models such as PPCA and PoGP). [sent-77, score-0.135]
38 MCA can be used , for example, for signal extraction in digital signal processing (Oja, 1992), dimensionality reduction, and data visualization. [sent-78, score-0.031]
39 Extraction of the minor component is also used in the Pisarenko Harmonic Decomposition method for detecting sinusoids in white noise (see, e. [sent-79, score-0.159]
40 2 Products of PPCA In this section we analyze a product of m I-factor PPCA models, and compare it to am-factor PPCA model. [sent-84, score-0.081]
41 1 I-factor PPCA model Consider a I-factor PPCA model, having a latent variable Si and visible variables x. [sent-86, score-0.069]
42 The joint distribution is given by P(Si, x) = P(si) P(xlsi). [sent-87, score-0.022]
43 Integrating out Si we find that Pi(x) '" N(O, Ci ) where C = wiwT + (]"21d and (6) where (3 = (]"-2 and "(i = (3/(1 + (3 llwi W). [sent-89, score-0.026]
44 (3 and "(i are the inverse variances in the directions of contraction and elongation respectively. [sent-90, score-0.334]
45 The joint distribution of Si and x is given by (7) [s; (3 exp - - - - 2x T WiSi 2 "(i + XT X] . [sent-91, score-0.022]
46 (8) Tipping and Bishop (1999) showed that the general m-factor PPCA model (mPPCA) has covariance C = (]"21d + WW T , where W is the d x m matrix of factor loadings. [sent-92, score-0.241]
47 When fitting this model to data, the maximum likelihood solution is to choose W proportional to the principal components of the data covariance matrix. [sent-93, score-0.378]
48 2 Products of I-factor PPCA models We now consider the product of m I-factor PPCA models, which we denote a (1, m)-PoPPCA model. [sent-95, score-0.116]
49 Thus we see that the distribution of z is Gaussian with inverse covariance matrix 13M, where -W) r -1 (10) , and r = diag("(l , . [sent-100, score-0.321]
50 ,"(m)' Using the inversion equations for partitioned matrices (Press et al. [sent-103, score-0.042]
51 77) we can show that (11) where ~xx is the covariance of the x variables under this model. [sent-105, score-0.153]
52 It is easy to confirm that this is also the result obtained from summing (6) over i = 1, . [sent-106, score-0.02]
53 3 Maximum Likelihood solution for PoPPCA Am-factor PPCA model has covariance a21d + WW T and thus, by the Woodbury formula, it has inverse covariance j3 ld - j3W(a2 lm + WT W) - lW T . [sent-111, score-0.519]
54 The maximum likelihood solution for a m-PPCA model is similar to (5), i. [sent-112, score-0.139]
55 W = U(A _a2Im)1/2 RT, but now A is a diagonal matrix of the m principal eigenvalues, and U is a matrix of the corresponding eigenvectors. [sent-114, score-0.17]
56 If we choose RT = I then the columns of W are orthogonal and the inverse covariance of the maximum likelihood m-PPCA model has the form j3 ld - j3WrwT. [sent-115, score-0.45]
wordName wordTfidf (topN-words)
[('ppca', 0.606), ('pogp', 0.455), ('mca', 0.212), ('products', 0.189), ('experts', 0.161), ('covariance', 0.153), ('pancakes', 0.152), ('minor', 0.137), ('tipping', 0.126), ('agakov', 0.121), ('poe', 0.121), ('inverse', 0.109), ('bishop', 0.096), ('ww', 0.096), ('contracted', 0.091), ('pancake', 0.091), ('edinburgh', 0.089), ('si', 0.086), ('product', 0.081), ('contraction', 0.079), ('expert', 0.077), ('gaussian', 0.071), ('probabilistic', 0.065), ('eigenvectors', 0.061), ('elongation', 0.061), ('rdxm', 0.061), ('wisi', 0.061), ('williams', 0.059), ('matrix', 0.059), ('directions', 0.059), ('permits', 0.055), ('gaussians', 0.054), ('gp', 0.053), ('principal', 0.052), ('id', 0.048), ('informatics', 0.048), ('richer', 0.048), ('likelihood', 0.046), ('elongated', 0.045), ('xt', 0.043), ('visible', 0.04), ('ld', 0.04), ('appendix', 0.038), ('ar', 0.035), ('models', 0.035), ('solution', 0.035), ('ce', 0.034), ('components', 0.034), ('pi', 0.034), ('rt', 0.032), ('construction', 0.032), ('extraction', 0.031), ('variability', 0.03), ('model', 0.029), ('maximum', 0.029), ('ci', 0.029), ('hinton', 0.028), ('define', 0.027), ('examine', 0.027), ('eigenvalues', 0.027), ('llwi', 0.026), ('universitiit', 0.026), ('lw', 0.026), ('rhs', 0.026), ('woodbury', 0.026), ('manufacturing', 0.026), ('ware', 0.026), ('viv', 0.026), ('felix', 0.026), ('variances', 0.026), ('division', 0.026), ('pca', 0.025), ('uk', 0.025), ('scaled', 0.024), ('orthogonal', 0.024), ('isi', 0.024), ('advantages', 0.023), ('rise', 0.023), ('vi', 0.022), ('chair', 0.022), ('stationarity', 0.022), ('oja', 0.022), ('joint', 0.022), ('constraint', 0.022), ('decomposition', 0.022), ('component', 0.022), ('xu', 0.021), ('completing', 0.021), ('complementary', 0.021), ('partitioned', 0.021), ('inversion', 0.021), ('stephen', 0.021), ('thorough', 0.021), ('harmonic', 0.02), ('wand', 0.02), ('correctness', 0.02), ('zt', 0.02), ('diag', 0.02), ('confirm', 0.02), ('columns', 0.02)]
simIndex simValue paperId paperTitle
same-paper 1 1.0 154 nips-2001-Products of Gaussians
Author: Christopher Williams, Felix V. Agakov, Stephen N. Felderhof
Abstract: Recently Hinton (1999) has introduced the Products of Experts (PoE) model in which several individual probabilistic models for data are combined to provide an overall model of the data. Below we consider PoE models in which each expert is a Gaussian. Although the product of Gaussians is also a Gaussian, if each Gaussian has a simple structure the product can have a richer structure. We examine (1) Products of Gaussian pancakes which give rise to probabilistic Minor Components Analysis, (2) products of I-factor PPCA models and (3) a products of experts construction for an AR(l) process. Recently Hinton (1999) has introduced the Products of Experts (PoE) model in which several individual probabilistic models for data are combined to provide an overall model of the data. In this paper we consider PoE models in which each expert is a Gaussian. It is easy to see that in this case the product model will also be Gaussian. However, if each Gaussian has a simple structure, the product can have a richer structure. Using Gaussian experts is attractive as it permits a thorough analysis of the product architecture, which can be difficult with other models , e.g. models defined over discrete random variables. Below we examine three cases of the products of Gaussians construction: (1) Products of Gaussian pancakes (PoGP) which give rise to probabilistic Minor Components Analysis (MCA), providing a complementary result to probabilistic Principal Components Analysis (PPCA) obtained by Tipping and Bishop (1999); (2) Products of I-factor PPCA models; (3) A products of experts construction for an AR(l) process. Products of Gaussians If each expert is a Gaussian pi(xI8 i ) '
2 0.15120786 95 nips-2001-Infinite Mixtures of Gaussian Process Experts
Author: Carl E. Rasmussen, Zoubin Ghahramani
Abstract: We present an extension to the Mixture of Experts (ME) model, where the individual experts are Gaussian Process (GP) regression models. Using an input-dependent adaptation of the Dirichlet Process, we implement a gating network for an infinite number of Experts. Inference in this model may be done efficiently using a Markov Chain relying on Gibbs sampling. The model allows the effective covariance function to vary with the inputs, and may handle large datasets – thus potentially overcoming two of the biggest hurdles with GP models. Simulations show the viability of this approach.
3 0.10597898 153 nips-2001-Product Analysis: Learning to Model Observations as Products of Hidden Variables
Author: Brendan J. Frey, Anitha Kannan, Nebojsa Jojic
Abstract: Factor analysis and principal components analysis can be used to model linear relationships between observed variables and linearly map high-dimensional data to a lower-dimensional hidden space. In factor analysis, the observations are modeled as a linear combination of normally distributed hidden variables. We describe a nonlinear generalization of factor analysis , called
4 0.077887334 16 nips-2001-A Parallel Mixture of SVMs for Very Large Scale Problems
Author: Ronan Collobert, Samy Bengio, Yoshua Bengio
Abstract: Support Vector Machines (SVMs) are currently the state-of-the-art models for many classification problems but they suffer from the complexity of their training algorithm which is at least quadratic with respect to the number of examples. Hence, it is hopeless to try to solve real-life problems having more than a few hundreds of thousands examples with SVMs. The present paper proposes a new mixture of SVMs that can be easily implemented in parallel and where each SVM is trained on a small subset of the whole dataset. Experiments on a large benchmark dataset (Forest) as well as a difficult speech database , yielded significant time improvement (time complexity appears empirically to locally grow linearly with the number of examples) . In addition, and that is a surprise, a significant improvement in generalization was observed on Forest. 1
5 0.074513845 35 nips-2001-Analysis of Sparse Bayesian Learning
Author: Anita C. Faul, Michael E. Tipping
Abstract: The recent introduction of the 'relevance vector machine' has effectively demonstrated how sparsity may be obtained in generalised linear models within a Bayesian framework. Using a particular form of Gaussian parameter prior, 'learning' is the maximisation, with respect to hyperparameters, of the marginal likelihood of the data. This paper studies the properties of that objective function, and demonstrates that conditioned on an individual hyperparameter, the marginal likelihood has a unique maximum which is computable in closed form. It is further shown that if a derived 'sparsity criterion' is satisfied, this maximum is exactly equivalent to 'pruning' the corresponding parameter from the model. 1
6 0.067125261 135 nips-2001-On Spectral Clustering: Analysis and an algorithm
7 0.062839136 164 nips-2001-Sampling Techniques for Kernel Methods
8 0.056554358 9 nips-2001-A Generalization of Principal Components Analysis to the Exponential Family
9 0.054940127 171 nips-2001-Spectral Relaxation for K-means Clustering
10 0.053512916 43 nips-2001-Bayesian time series classification
11 0.051880319 88 nips-2001-Grouping and dimensionality reduction by locally linear embedding
12 0.049037535 79 nips-2001-Gaussian Process Regression with Mismatched Models
13 0.048615035 58 nips-2001-Covariance Kernels from Bayesian Generative Models
14 0.048588842 178 nips-2001-TAP Gibbs Free Energy, Belief Propagation and Sparsity
15 0.047364295 74 nips-2001-Face Recognition Using Kernel Methods
16 0.042940024 136 nips-2001-On the Concentration of Spectral Properties
17 0.041168593 29 nips-2001-Adaptive Sparseness Using Jeffreys Prior
18 0.041074585 38 nips-2001-Asymptotic Universality for Learning Curves of Support Vector Machines
19 0.040405441 19 nips-2001-A Rotation and Translation Invariant Discrete Saliency Network
20 0.039710473 46 nips-2001-Categorization by Learning and Combining Object Parts
topicId topicWeight
[(0, -0.119), (1, 0.029), (2, -0.02), (3, -0.111), (4, -0.012), (5, -0.011), (6, 0.02), (7, 0.015), (8, 0.016), (9, -0.035), (10, 0.046), (11, 0.051), (12, 0.028), (13, -0.159), (14, -0.021), (15, 0.008), (16, 0.012), (17, 0.007), (18, 0.108), (19, 0.006), (20, 0.015), (21, -0.08), (22, 0.16), (23, 0.084), (24, -0.001), (25, -0.06), (26, -0.021), (27, -0.105), (28, 0.033), (29, -0.091), (30, 0.054), (31, 0.087), (32, 0.139), (33, -0.084), (34, 0.012), (35, 0.142), (36, -0.041), (37, 0.009), (38, -0.043), (39, -0.029), (40, 0.102), (41, -0.142), (42, 0.084), (43, -0.143), (44, 0.107), (45, 0.02), (46, 0.038), (47, 0.121), (48, -0.024), (49, 0.019)]
simIndex simValue paperId paperTitle
same-paper 1 0.92836356 154 nips-2001-Products of Gaussians
Author: Christopher Williams, Felix V. Agakov, Stephen N. Felderhof
Abstract: Recently Hinton (1999) has introduced the Products of Experts (PoE) model in which several individual probabilistic models for data are combined to provide an overall model of the data. Below we consider PoE models in which each expert is a Gaussian. Although the product of Gaussians is also a Gaussian, if each Gaussian has a simple structure the product can have a richer structure. We examine (1) Products of Gaussian pancakes which give rise to probabilistic Minor Components Analysis, (2) products of I-factor PPCA models and (3) a products of experts construction for an AR(l) process. Recently Hinton (1999) has introduced the Products of Experts (PoE) model in which several individual probabilistic models for data are combined to provide an overall model of the data. In this paper we consider PoE models in which each expert is a Gaussian. It is easy to see that in this case the product model will also be Gaussian. However, if each Gaussian has a simple structure, the product can have a richer structure. Using Gaussian experts is attractive as it permits a thorough analysis of the product architecture, which can be difficult with other models , e.g. models defined over discrete random variables. Below we examine three cases of the products of Gaussians construction: (1) Products of Gaussian pancakes (PoGP) which give rise to probabilistic Minor Components Analysis (MCA), providing a complementary result to probabilistic Principal Components Analysis (PPCA) obtained by Tipping and Bishop (1999); (2) Products of I-factor PPCA models; (3) A products of experts construction for an AR(l) process. Products of Gaussians If each expert is a Gaussian pi(xI8 i ) '
2 0.66919076 95 nips-2001-Infinite Mixtures of Gaussian Process Experts
Author: Carl E. Rasmussen, Zoubin Ghahramani
Abstract: We present an extension to the Mixture of Experts (ME) model, where the individual experts are Gaussian Process (GP) regression models. Using an input-dependent adaptation of the Dirichlet Process, we implement a gating network for an infinite number of Experts. Inference in this model may be done efficiently using a Markov Chain relying on Gibbs sampling. The model allows the effective covariance function to vary with the inputs, and may handle large datasets – thus potentially overcoming two of the biggest hurdles with GP models. Simulations show the viability of this approach.
3 0.4593603 35 nips-2001-Analysis of Sparse Bayesian Learning
Author: Anita C. Faul, Michael E. Tipping
Abstract: The recent introduction of the 'relevance vector machine' has effectively demonstrated how sparsity may be obtained in generalised linear models within a Bayesian framework. Using a particular form of Gaussian parameter prior, 'learning' is the maximisation, with respect to hyperparameters, of the marginal likelihood of the data. This paper studies the properties of that objective function, and demonstrates that conditioned on an individual hyperparameter, the marginal likelihood has a unique maximum which is computable in closed form. It is further shown that if a derived 'sparsity criterion' is satisfied, this maximum is exactly equivalent to 'pruning' the corresponding parameter from the model. 1
4 0.45484555 16 nips-2001-A Parallel Mixture of SVMs for Very Large Scale Problems
Author: Ronan Collobert, Samy Bengio, Yoshua Bengio
Abstract: Support Vector Machines (SVMs) are currently the state-of-the-art models for many classification problems but they suffer from the complexity of their training algorithm which is at least quadratic with respect to the number of examples. Hence, it is hopeless to try to solve real-life problems having more than a few hundreds of thousands examples with SVMs. The present paper proposes a new mixture of SVMs that can be easily implemented in parallel and where each SVM is trained on a small subset of the whole dataset. Experiments on a large benchmark dataset (Forest) as well as a difficult speech database , yielded significant time improvement (time complexity appears empirically to locally grow linearly with the number of examples) . In addition, and that is a surprise, a significant improvement in generalization was observed on Forest. 1
5 0.3884708 70 nips-2001-Estimating Car Insurance Premia: a Case Study in High-Dimensional Data Inference
Author: Nicolas Chapados, Yoshua Bengio, Pascal Vincent, Joumana Ghosn, Charles Dugas, Ichiro Takeuchi, Linyan Meng
Abstract: Estimating insurance premia from data is a difficult regression problem for several reasons: the large number of variables, many of which are .discrete, and the very peculiar shape of the noise distribution, asymmetric with fat tails, with a large majority zeros and a few unreliable and very large values. We compare several machine learning methods for estimating insurance premia, and test them on a large data base of car insurance policies. We find that function approximation methods that do not optimize a squared loss, like Support Vector Machines regression, do not work well in this context. Compared methods include decision trees and generalized linear models. The best results are obtained with a mixture of experts, which better identifies the least and most risky contracts, and allows to reduce the median premium by charging more to the most risky customers. 1
6 0.37557879 153 nips-2001-Product Analysis: Learning to Model Observations as Products of Hidden Variables
7 0.3386668 83 nips-2001-Geometrical Singularities in the Neuromanifold of Multilayer Perceptrons
8 0.33776173 43 nips-2001-Bayesian time series classification
9 0.33625895 79 nips-2001-Gaussian Process Regression with Mismatched Models
10 0.31942844 178 nips-2001-TAP Gibbs Free Energy, Belief Propagation and Sparsity
11 0.31144246 9 nips-2001-A Generalization of Principal Components Analysis to the Exponential Family
12 0.30001438 171 nips-2001-Spectral Relaxation for K-means Clustering
13 0.29856491 129 nips-2001-Multiplicative Updates for Classification by Mixture Models
14 0.29825386 164 nips-2001-Sampling Techniques for Kernel Methods
15 0.29809079 108 nips-2001-Learning Body Pose via Specialized Maps
16 0.27336761 19 nips-2001-A Rotation and Translation Invariant Discrete Saliency Network
17 0.26232797 136 nips-2001-On the Concentration of Spectral Properties
18 0.2583636 135 nips-2001-On Spectral Clustering: Analysis and an algorithm
19 0.24588302 29 nips-2001-Adaptive Sparseness Using Jeffreys Prior
20 0.24518499 48 nips-2001-Characterizing Neural Gain Control using Spike-triggered Covariance
topicId topicWeight
[(14, 0.024), (17, 0.032), (19, 0.023), (27, 0.117), (30, 0.035), (38, 0.027), (43, 0.011), (59, 0.087), (72, 0.04), (79, 0.043), (83, 0.028), (87, 0.354), (91, 0.072)]
simIndex simValue paperId paperTitle
same-paper 1 0.73272735 154 nips-2001-Products of Gaussians
Author: Christopher Williams, Felix V. Agakov, Stephen N. Felderhof
Abstract: Recently Hinton (1999) has introduced the Products of Experts (PoE) model in which several individual probabilistic models for data are combined to provide an overall model of the data. Below we consider PoE models in which each expert is a Gaussian. Although the product of Gaussians is also a Gaussian, if each Gaussian has a simple structure the product can have a richer structure. We examine (1) Products of Gaussian pancakes which give rise to probabilistic Minor Components Analysis, (2) products of I-factor PPCA models and (3) a products of experts construction for an AR(l) process. Recently Hinton (1999) has introduced the Products of Experts (PoE) model in which several individual probabilistic models for data are combined to provide an overall model of the data. In this paper we consider PoE models in which each expert is a Gaussian. It is easy to see that in this case the product model will also be Gaussian. However, if each Gaussian has a simple structure, the product can have a richer structure. Using Gaussian experts is attractive as it permits a thorough analysis of the product architecture, which can be difficult with other models , e.g. models defined over discrete random variables. Below we examine three cases of the products of Gaussians construction: (1) Products of Gaussian pancakes (PoGP) which give rise to probabilistic Minor Components Analysis (MCA), providing a complementary result to probabilistic Principal Components Analysis (PPCA) obtained by Tipping and Bishop (1999); (2) Products of I-factor PPCA models; (3) A products of experts construction for an AR(l) process. Products of Gaussians If each expert is a Gaussian pi(xI8 i ) '
2 0.57883644 132 nips-2001-Novel iteration schemes for the Cluster Variation Method
Author: Hilbert J. Kappen, Wim Wiegerinck
Abstract: The Cluster Variation method is a class of approximation methods containing the Bethe and Kikuchi approximations as special cases. We derive two novel iteration schemes for the Cluster Variation Method. One is a fixed point iteration scheme which gives a significant improvement over loopy BP, mean field and TAP methods on directed graphical models. The other is a gradient based method, that is guaranteed to converge and is shown to give useful results on random graphs with mild frustration. We conclude that the methods are of significant practical value for large inference problems. 1
3 0.44423643 164 nips-2001-Sampling Techniques for Kernel Methods
Author: Dimitris Achlioptas, Frank Mcsherry, Bernhard Schölkopf
Abstract: We propose randomized techniques for speeding up Kernel Principal Component Analysis on three levels: sampling and quantization of the Gram matrix in training, randomized rounding in evaluating the kernel expansions, and random projections in evaluating the kernel itself. In all three cases, we give sharp bounds on the accuracy of the obtained approximations. Rather intriguingly, all three techniques can be viewed as instantiations of the following idea: replace the kernel function by a “randomized kernel” which behaves like in expectation.
4 0.43694124 178 nips-2001-TAP Gibbs Free Energy, Belief Propagation and Sparsity
Author: Lehel Csató, Manfred Opper, Ole Winther
Abstract: The adaptive TAP Gibbs free energy for a general densely connected probabilistic model with quadratic interactions and arbritary single site constraints is derived. We show how a specific sequential minimization of the free energy leads to a generalization of Minka’s expectation propagation. Lastly, we derive a sparse representation version of the sequential algorithm. The usefulness of the approach is demonstrated on classification and density estimation with Gaussian processes and on an independent component analysis problem.
5 0.43096757 108 nips-2001-Learning Body Pose via Specialized Maps
Author: Rómer Rosales, Stan Sclaroff
Abstract: A nonlinear supervised learning model, the Specialized Mappings Architecture (SMA), is described and applied to the estimation of human body pose from monocular images. The SMA consists of several specialized forward mapping functions and an inverse mapping function. Each specialized function maps certain domains of the input space (image features) onto the output space (body pose parameters). The key algorithmic problems faced are those of learning the specialized domains and mapping functions in an optimal way, as well as performing inference given inputs and knowledge of the inverse function. Solutions to these problems employ the EM algorithm and alternating choices of conditional independence assumptions. Performance of the approach is evaluated with synthetic and real video sequences of human motion. 1
6 0.4191196 9 nips-2001-A Generalization of Principal Components Analysis to the Exponential Family
7 0.41783363 127 nips-2001-Multi Dimensional ICA to Separate Correlated Sources
8 0.41460061 88 nips-2001-Grouping and dimensionality reduction by locally linear embedding
9 0.41313794 188 nips-2001-The Unified Propagation and Scaling Algorithm
10 0.4118439 44 nips-2001-Blind Source Separation via Multinode Sparse Representation
11 0.41036525 74 nips-2001-Face Recognition Using Kernel Methods
12 0.40825748 95 nips-2001-Infinite Mixtures of Gaussian Process Experts
13 0.40815702 103 nips-2001-Kernel Feature Spaces and Nonlinear Blind Souce Separation
14 0.4066003 155 nips-2001-Quantizing Density Estimators
15 0.40558878 84 nips-2001-Global Coordination of Local Linear Models
16 0.40398848 27 nips-2001-Activity Driven Adaptive Stochastic Resonance
17 0.40353516 190 nips-2001-Thin Junction Trees
18 0.40326017 13 nips-2001-A Natural Policy Gradient
19 0.40317866 131 nips-2001-Neural Implementation of Bayesian Inference in Population Codes
20 0.40252706 168 nips-2001-Sequential Noise Compensation by Sequential Monte Carlo Method