nips nips2001 nips2001-178 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Lehel Csató, Manfred Opper, Ole Winther
Abstract: The adaptive TAP Gibbs free energy for a general densely connected probabilistic model with quadratic interactions and arbritary single site constraints is derived. We show how a specific sequential minimization of the free energy leads to a generalization of Minka’s expectation propagation. Lastly, we derive a sparse representation version of the sequential algorithm. The usefulness of the approach is demonstrated on classification and density estimation with Gaussian processes and on an independent component analysis problem.
Reference: text
sentIndex sentText sentNum sentScore
1 dk Abstract The adaptive TAP Gibbs free energy for a general densely connected probabilistic model with quadratic interactions and arbritary single site constraints is derived. [sent-7, score-0.501]
2 We show how a specific sequential minimization of the free energy leads to a generalization of Minka’s expectation propagation. [sent-8, score-0.543]
3 Lastly, we derive a sparse representation version of the sequential algorithm. [sent-9, score-0.286]
4 The usefulness of the approach is demonstrated on classification and density estimation with Gaussian processes and on an independent component analysis problem. [sent-10, score-0.127]
5 1 Introduction There is an increasing interest in methods for approximate inference in probabilistic (graphical) models. [sent-11, score-0.079]
6 Such approximations may usually be grouped in three classes. [sent-12, score-0.121]
7 In the first case we approximate self-consistency relations for marginal probabilities by a set of nonlinear equations. [sent-13, score-0.111]
8 Mean field (MF) approximations and their advanced extensions belong to this group. [sent-14, score-0.085]
9 However, it is not clear in general, how to solve these equations efficiently. [sent-15, score-0.041]
10 This latter problem is of central concern to the second class, the Message passing algorithms, like Bayesian online approaches (for references, see e. [sent-16, score-0.068]
11 [1]) and belief propagation (BP) which dynamically update approximations to conditional probabilities. [sent-18, score-0.325]
12 Finally, approximations based on Free Energies allow us to derive marginal moments by minimising entropic loss measures. [sent-19, score-0.352]
13 This method introduces new possibilities for algorithms and also gives approximations for the log-likelihood of observed data. [sent-20, score-0.085]
14 Recently, the fixed points of the BP algorithm were identified as the stable minima of the Bethe Free Energy, an insight which led to improved approximation schemes [2]. [sent-23, score-0.129]
15 While BP is good and efficient on sparse tree-like structures, one may look for an approxi- mation that works well in the opposite limit of densely connected graphs where individual dependencies are weak but their overall effect cannot be neglected. [sent-24, score-0.217]
16 A interesting candidate is the adaptive TAP (ADATAP) approach introduced in [3] as a set of self-consistency relations. [sent-25, score-0.039]
17 Recently, a message passing algorithm of Minka (termed expectation propagation) [1] was found to solve the ADATAP equations efficiently for models with Gaussian Process (GP) priors. [sent-26, score-0.264]
18 We will add a further derivation of ADATAP using an approximate free energy. [sent-28, score-0.15]
19 A sequential algorithm for minimising the free energy generalises Minka’s result. [sent-29, score-0.515]
20 Finally, we discuss how a sparse representation of ADATAP can be achieved for GP models, thereby extending previous sparse on-line approximations to the batch case [4]. [sent-30, score-0.361]
21 We will specialize to probabilistic models on undirected graphs with nodes that are of the type " #! [sent-31, score-0.111]
22 $ ) ¨ ' ¤ ' ' QIHG¤ P ¨ ¦ $ ) ¨ The set of (1) E )@BBB@9 F#DCDCA©) ¤ ) 2 ADATAP approach from Gibbs Free Energy R We use the minimization of an approximation to a Gibbs Free Energy derive the ADATAP approximation. [sent-36, score-0.108]
23 in order to re- ¡ The Gibbs Free Energy provides a method for computing marginal moments of as well as of within the same approach. [sent-37, score-0.21]
24 shorthand for a vector with elements Since at the total minimum of (with respect to its arguments) the minimizer in (2) is just , we conclude that and the desired marginal moments of are . [sent-40, score-0.21]
25 ¡ r which is based on splitting , We will search for an approximation to where is the Gibbs free energy for a factorising model that is obtained from (1) by setting all . [sent-41, score-0.369]
26 Previous attempts [6, 7] were based on a truncation of the power series expansion of with respect to the at second order. [sent-42, score-0.05]
27 While this truncation leads to the correct TAP equations for the large limit of the so-called SK-model in statistical physics, its general significance is unclear. [sent-43, score-0.091]
28 To make our approximation exact for such a case, we define (generalizing an idea of [8]) for an arbitrary Gaussian likelihood , . [sent-45, score-0.162]
29 The main reason for this definition is the fact that is independent of the actual Gaussian likelihood chosen to compute ! [sent-46, score-0.152]
30 Changes in a Gaussian likelihood can always be absorbed within the Lagrange-multipliers for the constraints. [sent-48, score-0.102]
31 We use this universal form to define the ADATAP approximation as , which by construction is exact for any Gaussian likelihood . [sent-49, score-0.162]
32 Introducing appropriate Lagrange multipliers and , we get x R r o qq ¢pR ¢ ¢¡R x R r 5 3 63 £ 4¤ o $ B %@ ¨ VT S R 7 8o ¤ ` ¦§S o £ X q ¨ $o o $ #"¤ o $ k a ) " ) % ) ) ) @ # $o &9y S o $ # 0 ¨ ¤ $ n ¨ $o % o $ $¤ o $ B ) ) ) 12y ) @ % '(' v'$31 $ 'yY%$ ! [sent-50, score-0.054]
33 Of special interest is the following sequential algorithm, which is a generalization of Minka’s EP [1] for Gaussian process classification to an arbritary model of the type eq. [sent-53, score-0.29]
34 H W S o S G 4$ G $ o G $ G 4$ # # F u Xf ¢ R D 3 W V&A; D3 U F u ¢ R D E D B A F u Xf ¢ R D R W E V&A; D U u f ¢ D (CA DB F The algorithm proceeds then by choosing a new site. [sent-57, score-0.035]
35 1 Cavity interpretation $ ) ) 9 ) ) 3 1 (y $o % y S o $ # 0 8edD fD ¢ 6¨ ¤ $ ¡ c At the fixed point, we may take as the ADATAP approximation to the true marginal distribution of [3]. [sent-60, score-0.138]
36 The sequential approach may thus be considered as a belief propagation algorithm for ADATAP. [sent-61, score-0.46]
37 ) X $ ¡ Although is usually not Gaussian, we can also derive the moments and from the . [sent-62, score-0.168]
38 This auxiliary Gaussian model has a Gaussian distribution corresponding to likelihood and provides us also with an additional approximation to the matrix of covariances via . [sent-63, score-0.268]
39 This is useful when the coupling matrix must be adapted to a set of observations by maximum likelihood II. [sent-64, score-0.161]
40 We will give an example of this for independent component analysis below. [sent-65, score-0.094]
41 Defining , it is easy to show that and where the brackets denote an expectation with respect to the distribution of o $ % $ qx $ i w o $ # p o % o ) ' v'81 ' $ ! [sent-67, score-0.132]
42 This statistics of corresponds to the empty ”cavity” at site . [sent-70, score-0.057]
43 The marginal distribution as computed by ADATAP is equivalent to the approximation that the cavity distribution is Gaussian. [sent-71, score-0.293]
44 1 Models with Gaussian Process Priors For this class of models, we assume that the graph is embedded in , where the vector is the restriction of a Gaussian process (random field) with , to a set of training inputs via . [sent-73, score-0.034]
45 is the posterior distribution corresponding to a local likelihood model, when we set and the matrix is obtained from a positive definite covariance kernel as . [sent-74, score-0.323]
46 The diagonal element is included in the likelihood term. [sent-75, score-0.102]
47 " @ @ o h o h h @ o h ¥ ' q ¨ % ¤ ¨ % ¥ "¤ ¥ ' ¨ ' ¤ (6) x¨ 2H¤ w # @ # h ¨ % "¤ Algorithms for the update of ’s and ’s will usually suffer from time consuming matrix multiplications when is large. [sent-78, score-0.095]
48 This common problem for GP models can be overcome by a sparsity approximation which extends previous on-line approaches [4] to the batch ADATAP approach. [sent-79, score-0.192]
49 The idea is to replace the current version of the approximate Gaussian with a further approximation for which both the the corresponding as well as are nonzero only, when the nodes and belong to a smaller subset of nodes called ”basis vectors” (BV) of size [4]. [sent-80, score-0.157]
50 This yields and by minimizing the relative entropy with the projection matrix . [sent-82, score-0.059]
51 Here is the kernel matrix between BVs and and the kernel matrix between BVs and all nodes. [sent-83, score-0.258]
52 (7) can be used to compute the sparse approximation within the sequential algorithm. [sent-85, score-0.346]
53 In order to recompute the appropriate ”cavity” parameters and when a new node is chosen by the algorithm, one removes a ”pseudovariable” from the likelihood and recomputes the statistics of the remaining ones. [sent-87, score-0.102]
54 We thus have the following likelihood for parameters and sources at time ¡ s§ (9) B R R Y R Y R¨ § @ @ @ @ F F E F E ¨ ¦ uI ¤¡ P iH¦ ¨ ¤ @ uI @ ¡ ¤ ¨ ¥ £¤ j¨ F ¦ uI F E ¡ ¢ ¤ $ ¨ " § # ! [sent-92, score-0.156]
55 The aim of independent component analysis is to recover the unknown quantities: the sources , the mixing matrix and the noise covariance from the observed data using the assumption of statistical independence of the sources . [sent-94, score-0.407]
56 Following [5], we estimate the mixing matrix and the noise covariance , by an MLII procedure, i. [sent-95, score-0.205]
57 The corresponding estimates are and These estimates require averages over the posterior of which has again the structure of the model eq. [sent-98, score-0.047]
58 They can be obtained efficiently using our sequential belief propagation algorithm in an iterative EM fashion, where the E-step amounts to estimating and with fixed and and the M-step consists of updating and . [sent-100, score-0.46]
59 1 Classification with GPs This problem has been studied before [9, 4] using a sequential, sparse algorithm, based on a single sweep through the data only. [sent-104, score-0.151]
60 Within the ADATAP approach we are able to perform multiple sweeps in order to achieve a self-consistent solution. [sent-105, score-0.155]
61 The outputs are biand the likelihood is based on the probit model nary where and measures the noise level. [sent-106, score-0.135]
62 We used the USPS dataset1 of gray-scale handwritten digit images of size with training patterns and test patterns. [sent-109, score-0.039]
63 For the kernel we choose the RBF kernel where is the dimension of the inputs ( in and are parameters. [sent-110, score-0.14]
64 In the simulations we used random training this case), and examples. [sent-111, score-0.046]
65 We performed simulations for different sizes of the BV set and compared multiple iterations with a single sweep through the dataset. [sent-112, score-0.096]
66 The lines show the average results of runs where the task was to classify the digits into fours/non-fours. [sent-115, score-0.033]
67 Our results show that, in contrast to the online learning, the fluctuations caused by the order of presentation are diminished (marked with bars on the figure). [sent-116, score-0.032]
68 2 Density estimation with GPs Bayesian non-parametric models for density estimation can be defined [10] by parametrising densities as and using a Gaussian process prior over the space of functions . [sent-118, score-0.1]
69 5 50 150 250 350 450 550 #BV Figure 1: Results for classification for different BV sizes (x-axis) and multiple sweeps through the data. [sent-130, score-0.155]
70 u ¦ In the last expression, we have introduced an expectation over a new, effective Gaussian obtained by multiplying the old prior and the term and normalizing by . [sent-140, score-0.072]
71 We assume that for sufficiently large the integral over can be performed by Laplace’s , where method, leaving us with an approximate predictor of the form the brackets denote posterior expectation for a GP model with a kernel that is a solution to the integral equation . [sent-141, score-0.352]
72 The likelihood of the fields at the observation points is . [sent-142, score-0.102]
73 For any fixed , we can apply the sparse ADATAP algorithm to this problem. [sent-143, score-0.136]
74 To give a simplified toy example, we choose a kernel which reproduces itself after convolution. [sent-145, score-0.07]
75 We apply the sparse algorithm with multiple sweeps through the data. [sent-150, score-0.291]
76 The sparsity also avoids the numerical problems caused by a possible close to singular Gram matrix. [sent-151, score-0.06]
77 3 Independent Component Analysis We have tested the sequential algorithm on an ICA problem for local feature extraction in hand written digits, i. [sent-157, score-0.22]
78 We assumed positive components of (enforced by Lagrange multipliers) and a positive prior F ¨ $ (10) ) ¡ ) ) S F $ F ¤ ¨ ¨ $ ¡ ¤ ¤ I As in [5] we used 500 handwritten ’3’s which are assumed to be generated by 25 hidden images. [sent-160, score-0.103]
79 We compared a traditional parallel update algorithm with the sequential belief propagation algorithm. [sent-161, score-0.46]
80 We find that the sequential algorithm needs only on average 7 sweeps through the sites to reach the desired accuracy whereas the parallel one fails to reach the desired accuracy in 100 sweeps using a somewhat larger number of flops. [sent-163, score-0.645]
81 The adaptive TAP method using the sequential belief propagation approach is also not more computationally expensive than the linear response method used in [5]. [sent-164, score-0.464]
82 ¨ £ ¤w ¤ ¢ 6 Conclusion and Outlook An obvious future direction for the ADATAP approach is the investigation of other minimization algorithms as an alternative to the EP approach outlined before. [sent-165, score-0.048]
83 Also an extension of the sparse approximation to other non-GP models will be interesting. [sent-166, score-0.194]
84 A highly important but difficult problem is the assessment of the accuracy of the approximation. [sent-167, score-0.031]
85 Opper is grateful to Lars Kai Hansen for suggesting the non-parametric density model. [sent-169, score-0.033]
86 Winther, Tractable approximations for probabilistic models: The adaptive TAP approach, Phys. [sent-191, score-0.17]
87 Hansen, Mean Field Approaches to Independent Component Analysis, Neural Computation accepted (2001). [sent-208, score-0.053]
88 Plefka, Convergence condition of the TAP equations for the infinite-ranged Ising spin glass model, J. [sent-213, score-0.091]
wordName wordTfidf (topN-words)
[('adatap', 0.5), ('bv', 0.255), ('sequential', 0.185), ('tap', 0.181), ('cavity', 0.155), ('winther', 0.155), ('sweeps', 0.155), ('moments', 0.132), ('propagation', 0.128), ('gp', 0.125), ('minka', 0.124), ('energy', 0.121), ('ui', 0.121), ('gibbs', 0.117), ('free', 0.117), ('belief', 0.112), ('bvs', 0.107), ('likelihood', 0.102), ('sparse', 0.101), ('csat', 0.093), ('gaussian', 0.092), ('vt', 0.092), ('opper', 0.087), ('approximations', 0.085), ('lagrange', 0.083), ('yd', 0.079), ('marginal', 0.078), ('expectation', 0.072), ('arbritary', 0.071), ('bbb', 0.071), ('factorising', 0.071), ('rensen', 0.071), ('kernel', 0.07), ('mixing', 0.068), ('hansen', 0.062), ('approximation', 0.06), ('bp', 0.06), ('sparsity', 0.06), ('brackets', 0.06), ('matrix', 0.059), ('ef', 0.059), ('site', 0.057), ('gps', 0.057), ('xf', 0.057), ('minimising', 0.057), ('ciently', 0.055), ('sources', 0.054), ('multipliers', 0.054), ('qs', 0.053), ('accepted', 0.053), ('sites', 0.053), ('usps', 0.053), ('sl', 0.051), ('independent', 0.05), ('densely', 0.05), ('spin', 0.05), ('truncation', 0.05), ('sweep', 0.05), ('minimization', 0.048), ('auxiliary', 0.047), ('message', 0.047), ('posterior', 0.047), ('simulations', 0.046), ('probabilistic', 0.046), ('classi', 0.045), ('covariance', 0.045), ('component', 0.044), ('laplace', 0.043), ('ep', 0.043), ('boltzmann', 0.042), ('wt', 0.042), ('cance', 0.042), ('equations', 0.041), ('de', 0.04), ('batch', 0.039), ('handwritten', 0.039), ('adaptive', 0.039), ('ge', 0.038), ('xed', 0.036), ('passing', 0.036), ('usually', 0.036), ('algorithm', 0.035), ('integral', 0.035), ('extending', 0.035), ('dependencies', 0.035), ('minima', 0.034), ('encodes', 0.034), ('process', 0.034), ('approximate', 0.033), ('digits', 0.033), ('models', 0.033), ('noise', 0.033), ('density', 0.033), ('online', 0.032), ('nodes', 0.032), ('assumed', 0.032), ('dotted', 0.032), ('accuracy', 0.031), ('parisi', 0.031), ('mation', 0.031)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999994 178 nips-2001-TAP Gibbs Free Energy, Belief Propagation and Sparsity
Author: Lehel Csató, Manfred Opper, Ole Winther
Abstract: The adaptive TAP Gibbs free energy for a general densely connected probabilistic model with quadratic interactions and arbritary single site constraints is derived. We show how a specific sequential minimization of the free energy leads to a generalization of Minka’s expectation propagation. Lastly, we derive a sparse representation version of the sequential algorithm. The usefulness of the approach is demonstrated on classification and density estimation with Gaussian processes and on an independent component analysis problem.
2 0.18560237 21 nips-2001-A Variational Approach to Learning Curves
Author: Dörthe Malzahn, Manfred Opper
Abstract: We combine the replica approach from statistical physics with a variational approach to analyze learning curves analytically. We apply the method to Gaussian process regression. As a main result we derive approximative relations between empirical error measures, the generalization error and the posterior variance.
3 0.15827315 95 nips-2001-Infinite Mixtures of Gaussian Process Experts
Author: Carl E. Rasmussen, Zoubin Ghahramani
Abstract: We present an extension to the Mixture of Experts (ME) model, where the individual experts are Gaussian Process (GP) regression models. Using an input-dependent adaptation of the Dirichlet Process, we implement a gating network for an infinite number of Experts. Inference in this model may be done efficiently using a Markov Chain relying on Gibbs sampling. The model allows the effective covariance function to vary with the inputs, and may handle large datasets – thus potentially overcoming two of the biggest hurdles with GP models. Simulations show the viability of this approach.
4 0.15109366 132 nips-2001-Novel iteration schemes for the Cluster Variation Method
Author: Hilbert J. Kappen, Wim Wiegerinck
Abstract: The Cluster Variation method is a class of approximation methods containing the Bethe and Kikuchi approximations as special cases. We derive two novel iteration schemes for the Cluster Variation Method. One is a fixed point iteration scheme which gives a significant improvement over loopy BP, mean field and TAP methods on directed graphical models. The other is a gradient based method, that is guaranteed to converge and is shown to give useful results on random graphs with mild frustration. We conclude that the methods are of significant practical value for large inference problems. 1
5 0.13880467 188 nips-2001-The Unified Propagation and Scaling Algorithm
Author: Yee W. Teh, Max Welling
Abstract: In this paper we will show that a restricted class of constrained minimum divergence problems, named generalized inference problems, can be solved by approximating the KL divergence with a Bethe free energy. The algorithm we derive is closely related to both loopy belief propagation and iterative scaling. This unified propagation and scaling algorithm reduces to a convergent alternative to loopy belief propagation when no constraints are present. Experiments show the viability of our algorithm.
6 0.13527831 76 nips-2001-Fast Parameter Estimation Using Green's Functions
7 0.12450356 97 nips-2001-Information-Geometrical Significance of Sparsity in Gallager Codes
8 0.10770144 139 nips-2001-Online Learning with Kernels
9 0.10189127 43 nips-2001-Bayesian time series classification
10 0.099835411 192 nips-2001-Tree-based reparameterization for approximate inference on loopy graphs
11 0.099153996 129 nips-2001-Multiplicative Updates for Classification by Mixture Models
12 0.097715408 58 nips-2001-Covariance Kernels from Bayesian Generative Models
13 0.097494736 35 nips-2001-Analysis of Sparse Bayesian Learning
14 0.094081827 164 nips-2001-Sampling Techniques for Kernel Methods
15 0.093971781 8 nips-2001-A General Greedy Approximation Algorithm with Applications
16 0.092344493 38 nips-2001-Asymptotic Universality for Learning Curves of Support Vector Machines
17 0.086076908 29 nips-2001-Adaptive Sparseness Using Jeffreys Prior
18 0.084241971 183 nips-2001-The Infinite Hidden Markov Model
19 0.083518207 98 nips-2001-Information Geometrical Framework for Analyzing Belief Propagation Decoder
20 0.08072973 44 nips-2001-Blind Source Separation via Multinode Sparse Representation
topicId topicWeight
[(0, -0.267), (1, 0.044), (2, -0.024), (3, -0.09), (4, -0.036), (5, -0.201), (6, 0.117), (7, -0.1), (8, 0.154), (9, 0.046), (10, -0.039), (11, -0.002), (12, 0.018), (13, -0.175), (14, -0.027), (15, 0.055), (16, -0.019), (17, 0.076), (18, 0.03), (19, -0.054), (20, 0.063), (21, -0.171), (22, -0.036), (23, 0.073), (24, 0.082), (25, 0.07), (26, -0.098), (27, -0.116), (28, 0.009), (29, 0.026), (30, -0.028), (31, 0.071), (32, -0.016), (33, 0.122), (34, 0.022), (35, 0.029), (36, 0.025), (37, 0.019), (38, -0.03), (39, 0.031), (40, -0.036), (41, 0.006), (42, 0.032), (43, 0.079), (44, -0.018), (45, 0.011), (46, -0.097), (47, 0.088), (48, 0.044), (49, -0.012)]
simIndex simValue paperId paperTitle
same-paper 1 0.93981653 178 nips-2001-TAP Gibbs Free Energy, Belief Propagation and Sparsity
Author: Lehel Csató, Manfred Opper, Ole Winther
Abstract: The adaptive TAP Gibbs free energy for a general densely connected probabilistic model with quadratic interactions and arbritary single site constraints is derived. We show how a specific sequential minimization of the free energy leads to a generalization of Minka’s expectation propagation. Lastly, we derive a sparse representation version of the sequential algorithm. The usefulness of the approach is demonstrated on classification and density estimation with Gaussian processes and on an independent component analysis problem.
2 0.71022856 21 nips-2001-A Variational Approach to Learning Curves
Author: Dörthe Malzahn, Manfred Opper
Abstract: We combine the replica approach from statistical physics with a variational approach to analyze learning curves analytically. We apply the method to Gaussian process regression. As a main result we derive approximative relations between empirical error measures, the generalization error and the posterior variance.
3 0.63054603 79 nips-2001-Gaussian Process Regression with Mismatched Models
Author: Peter Sollich
Abstract: Learning curves for Gaussian process regression are well understood when the 'student' model happens to match the 'teacher' (true data generation process). I derive approximations to the learning curves for the more generic case of mismatched models, and find very rich behaviour: For large input space dimensionality, where the results become exact, there are universal (student-independent) plateaux in the learning curve, with transitions in between that can exhibit arbitrarily many over-fitting maxima; over-fitting can occur even if the student estimates the teacher noise level correctly. In lower dimensions, plateaux also appear, and the learning curve remains dependent on the mismatch between student and teacher even in the asymptotic limit of a large number of training examples. Learning with excessively strong smoothness assumptions can be particularly dangerous: For example, a student with a standard radial basis function covariance function will learn a rougher teacher function only logarithmically slowly. All predictions are confirmed by simulations. 1
4 0.61677521 188 nips-2001-The Unified Propagation and Scaling Algorithm
Author: Yee W. Teh, Max Welling
Abstract: In this paper we will show that a restricted class of constrained minimum divergence problems, named generalized inference problems, can be solved by approximating the KL divergence with a Bethe free energy. The algorithm we derive is closely related to both loopy belief propagation and iterative scaling. This unified propagation and scaling algorithm reduces to a convergent alternative to loopy belief propagation when no constraints are present. Experiments show the viability of our algorithm.
5 0.61347812 95 nips-2001-Infinite Mixtures of Gaussian Process Experts
Author: Carl E. Rasmussen, Zoubin Ghahramani
Abstract: We present an extension to the Mixture of Experts (ME) model, where the individual experts are Gaussian Process (GP) regression models. Using an input-dependent adaptation of the Dirichlet Process, we implement a gating network for an infinite number of Experts. Inference in this model may be done efficiently using a Markov Chain relying on Gibbs sampling. The model allows the effective covariance function to vary with the inputs, and may handle large datasets – thus potentially overcoming two of the biggest hurdles with GP models. Simulations show the viability of this approach.
6 0.59556502 192 nips-2001-Tree-based reparameterization for approximate inference on loopy graphs
7 0.5926733 35 nips-2001-Analysis of Sparse Bayesian Learning
8 0.5925532 29 nips-2001-Adaptive Sparseness Using Jeffreys Prior
9 0.56310117 76 nips-2001-Fast Parameter Estimation Using Green's Functions
10 0.52333713 61 nips-2001-Distribution of Mutual Information
11 0.51775634 132 nips-2001-Novel iteration schemes for the Cluster Variation Method
12 0.49044368 196 nips-2001-Very loopy belief propagation for unwrapping phase images
13 0.44134694 43 nips-2001-Bayesian time series classification
14 0.43937576 154 nips-2001-Products of Gaussians
15 0.41793853 58 nips-2001-Covariance Kernels from Bayesian Generative Models
16 0.41691151 129 nips-2001-Multiplicative Updates for Classification by Mixture Models
17 0.40891203 97 nips-2001-Information-Geometrical Significance of Sparsity in Gallager Codes
18 0.40795061 38 nips-2001-Asymptotic Universality for Learning Curves of Support Vector Machines
19 0.38013706 155 nips-2001-Quantizing Density Estimators
20 0.37963539 9 nips-2001-A Generalization of Principal Components Analysis to the Exponential Family
topicId topicWeight
[(14, 0.034), (19, 0.028), (27, 0.154), (30, 0.047), (36, 0.017), (38, 0.015), (59, 0.39), (72, 0.106), (79, 0.035), (83, 0.016), (91, 0.089)]
simIndex simValue paperId paperTitle
1 0.8948018 168 nips-2001-Sequential Noise Compensation by Sequential Monte Carlo Method
Author: K. Yao, S. Nakamura
Abstract: We present a sequential Monte Carlo method applied to additive noise compensation for robust speech recognition in time-varying noise. The method generates a set of samples according to the prior distribution given by clean speech models and noise prior evolved from previous estimation. An explicit model representing noise effects on speech features is used, so that an extended Kalman filter is constructed for each sample, generating the updated continuous state estimate as the estimation of the noise parameter, and prediction likelihood for weighting each sample. Minimum mean square error (MMSE) inference of the time-varying noise parameter is carried out over these samples by fusion the estimation of samples according to their weights. A residual resampling selection step and a Metropolis-Hastings smoothing step are used to improve calculation efficiency. Experiments were conducted on speech recognition in simulated non-stationary noises, where noise power changed artificially, and highly non-stationary Machinegun noise. In all the experiments carried out, we observed that the method can have significant recognition performance improvement, over that achieved by noise compensation with stationary noise assumption. 1
2 0.89199835 164 nips-2001-Sampling Techniques for Kernel Methods
Author: Dimitris Achlioptas, Frank Mcsherry, Bernhard Schölkopf
Abstract: We propose randomized techniques for speeding up Kernel Principal Component Analysis on three levels: sampling and quantization of the Gram matrix in training, randomized rounding in evaluating the kernel expansions, and random projections in evaluating the kernel itself. In all three cases, we give sharp bounds on the accuracy of the obtained approximations. Rather intriguingly, all three techniques can be viewed as instantiations of the following idea: replace the kernel function by a “randomized kernel” which behaves like in expectation.
same-paper 3 0.89154649 178 nips-2001-TAP Gibbs Free Energy, Belief Propagation and Sparsity
Author: Lehel Csató, Manfred Opper, Ole Winther
Abstract: The adaptive TAP Gibbs free energy for a general densely connected probabilistic model with quadratic interactions and arbritary single site constraints is derived. We show how a specific sequential minimization of the free energy leads to a generalization of Minka’s expectation propagation. Lastly, we derive a sparse representation version of the sequential algorithm. The usefulness of the approach is demonstrated on classification and density estimation with Gaussian processes and on an independent component analysis problem.
4 0.88585544 108 nips-2001-Learning Body Pose via Specialized Maps
Author: Rómer Rosales, Stan Sclaroff
Abstract: A nonlinear supervised learning model, the Specialized Mappings Architecture (SMA), is described and applied to the estimation of human body pose from monocular images. The SMA consists of several specialized forward mapping functions and an inverse mapping function. Each specialized function maps certain domains of the input space (image features) onto the output space (body pose parameters). The key algorithmic problems faced are those of learning the specialized domains and mapping functions in an optimal way, as well as performing inference given inputs and knowledge of the inverse function. Solutions to these problems employ the EM algorithm and alternating choices of conditional independence assumptions. Performance of the approach is evaluated with synthetic and real video sequences of human motion. 1
5 0.62904561 154 nips-2001-Products of Gaussians
Author: Christopher Williams, Felix V. Agakov, Stephen N. Felderhof
Abstract: Recently Hinton (1999) has introduced the Products of Experts (PoE) model in which several individual probabilistic models for data are combined to provide an overall model of the data. Below we consider PoE models in which each expert is a Gaussian. Although the product of Gaussians is also a Gaussian, if each Gaussian has a simple structure the product can have a richer structure. We examine (1) Products of Gaussian pancakes which give rise to probabilistic Minor Components Analysis, (2) products of I-factor PPCA models and (3) a products of experts construction for an AR(l) process. Recently Hinton (1999) has introduced the Products of Experts (PoE) model in which several individual probabilistic models for data are combined to provide an overall model of the data. In this paper we consider PoE models in which each expert is a Gaussian. It is easy to see that in this case the product model will also be Gaussian. However, if each Gaussian has a simple structure, the product can have a richer structure. Using Gaussian experts is attractive as it permits a thorough analysis of the product architecture, which can be difficult with other models , e.g. models defined over discrete random variables. Below we examine three cases of the products of Gaussians construction: (1) Products of Gaussian pancakes (PoGP) which give rise to probabilistic Minor Components Analysis (MCA), providing a complementary result to probabilistic Principal Components Analysis (PPCA) obtained by Tipping and Bishop (1999); (2) Products of I-factor PPCA models; (3) A products of experts construction for an AR(l) process. Products of Gaussians If each expert is a Gaussian pi(xI8 i ) '
6 0.62187105 74 nips-2001-Face Recognition Using Kernel Methods
7 0.61207438 155 nips-2001-Quantizing Density Estimators
8 0.61115801 95 nips-2001-Infinite Mixtures of Gaussian Process Experts
9 0.60413569 71 nips-2001-Estimating the Reliability of ICA Projections
10 0.60055518 127 nips-2001-Multi Dimensional ICA to Separate Correlated Sources
11 0.59615952 9 nips-2001-A Generalization of Principal Components Analysis to the Exponential Family
12 0.5947504 102 nips-2001-KLD-Sampling: Adaptive Particle Filters
13 0.59022802 103 nips-2001-Kernel Feature Spaces and Nonlinear Blind Souce Separation
14 0.5898692 84 nips-2001-Global Coordination of Local Linear Models
15 0.58662891 44 nips-2001-Blind Source Separation via Multinode Sparse Representation
16 0.58002907 4 nips-2001-ALGONQUIN - Learning Dynamic Noise Models From Noisy Speech for Robust Speech Recognition
17 0.57901156 88 nips-2001-Grouping and dimensionality reduction by locally linear embedding
18 0.57798034 188 nips-2001-The Unified Propagation and Scaling Algorithm
19 0.57701921 97 nips-2001-Information-Geometrical Significance of Sparsity in Gallager Codes
20 0.57162201 29 nips-2001-Adaptive Sparseness Using Jeffreys Prior