nips nips2000 nips2000-64 knowledge-graph by maker-knowledge-mining

64 nips-2000-High-temperature Expansions for Learning Models of Nonnegative Data

Source: pdf

Author: Oliver B. Downs

Abstract: Recent work has exploited boundedness of data in the unsupervised learning of new types of generative model. For nonnegative data it was recently shown that the maximum-entropy generative model is a Nonnegative Boltzmann Distribution not a Gaussian distribution, when the model is constrained to match the first and second order statistics of the data. Learning for practical sized problems is made difficult by the need to compute expectations under the model distribution. The computational cost of Markov chain Monte Carlo methods and low fidelity of naive mean field techniques has led to increasing interest in advanced mean field theories and variational methods. Here I present a secondorder mean-field approximation for the Nonnegative Boltzmann Machine model, obtained using a

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 High-temperature expansions for learning models of nonnegative data Oliver B. [sent-1, score-0.299]

2 edu Abstract Recent work has exploited boundedness of data in the unsupervised learning of new types of generative model. [sent-4, score-0.188]

3 For nonnegative data it was recently shown that the maximum-entropy generative model is a Nonnegative Boltzmann Distribution not a Gaussian distribution, when the model is constrained to match the first and second order statistics of the data. [sent-5, score-0.588]

4 Learning for practical sized problems is made difficult by the need to compute expectations under the model distribution. [sent-6, score-0.066]

5 The computational cost of Markov chain Monte Carlo methods and low fidelity of naive mean field techniques has led to increasing interest in advanced mean field theories and variational methods. [sent-7, score-0.516]

6 Here I present a secondorder mean-field approximation for the Nonnegative Boltzmann Machine model, obtained using a "high-temperature" expansion. [sent-8, score-0.091]

7 The theory is tested on learning a bimodal 2-dimensional model, a high-dimensional translationally invariant distribution, and a generative model for handwritten digits. [sent-9, score-0.627]

8 1 Introduction Unsupervised learning of generative and feature-extracting models for continuous nonnegative data has recently been proposed [1], [2] . [sent-10, score-0.444]

9 In [1], it was pointed out that the maximum entropy distribution (matching Ist- and 2nd-order statistics) for continuous nonnegative data is not Gaussian, and indeed that a Gaussian is not in general a good approximation to that distribution. [sent-11, score-0.428]

10 (2) (3) In contrast to the Gaussian distribution, the NNBD can be multimodal in which case its modes are confined to the boundaries of the nonnegative orthant. [sent-13, score-0.338]

11 The Nonnegative Boltzmann Machine (NNBM) has been proposed as a method for learning the maximum likelihood parameters for this maximum entropy model from data. [sent-14, score-0.144]

12 (7) x~O This learning rule has hitherto been extremely computationally costly to implement, since naive variationaVmean-field approximations for (XXT)r are found empirically to be poor, leading to the need to use Markov chain Monte Carlo methods. [sent-16, score-0.227]

13 While the NNBD is generally skewed and hence has moments of order greater than 2, the maximum-likelihood learning rule suggests that the distribution can be described solely in terms of the Ist- and 2nd-order statistics of the data. [sent-18, score-0.227]

14 With that in mind, I have pursued advanced approximate models for the NNBM. [sent-19, score-0.104]

15 In the following section I derive a second-order approximation for (XiXj)r analogous to the TAP-On sager correction for the mean-field Ising Model, using a high temperature expansion, [4]. [sent-20, score-0.293]

16 This produces an analytic approximation for the parameters A ij , bi in terms of the mean and cross-correlation matrix of the training data. [sent-21, score-0.267]

17 2 Learning approximate NNBM parameters using high-temperature expansion Here I use Taylor expansion of a "free energy" directly related to the partition function of the distribution, Z in the fJ = 0 limit, to derive a second-order approximation for the NNBM model parameters. [sent-22, score-0.424]

18 In this free energy we embody the constraint that Eq. [sent-23, score-0.278]

19 -In Z = G(fJ, m) + Constant(b, m) (9) Thus, (10) The Lagrange multipliers, Ai embody the constraint that (Xi)f match the mean field of the patterns, mi = (x)c. [sent-27, score-0.291]

20 Since the Lagrange constraint is enforced for all temperatures, we can solve for the specific case (3 = O. [sent-31, score-0.064]

21 8-o = mi TIk Ixoo=0 Xi exp (- L:l Al(O)(XI - ml)) dXk hOO - 1 = -- (11) TIk IXh=o exp (- L:l Al (0) (Xl - ml)) dXk Ai(O) Note that this embodies the unboundedness of Xk in the nonnegative orthant, as compared to the equivalent term of Georges & Yedidia for the Ising model, mi = tanh(Ai(O)). [sent-33, score-0.609]

22 8=0 (32 + 2' (12) Since the integrand becomes factorable in Xi in this limit, the infinite temperature values of G and its derivatives are analytically calculable. [sent-41, score-0.118]

23 8=o = - Lin k {OO_ exp (- LAi(O)(Xi -mi)) dXk }Xh-O (13) i using Eq. [sent-43, score-0.063]

24 8=o =- ~ln (Ak~O) exp (~Ai(O)mi)) =N+ Llnmk (14) k The first derivative is then as follows 8GI 8(3 . [sent-45, score-0.096]

25 j -AijXiXj - L:i(Xi - mi) ćĽźt) exp (- L:l Am(O)(XI - ml)) dXk TIk 10 exp (- L:l Am(O)(XI - ml)) dXk 00 (15) (16) i,j This term is exactly the result of applying naive mean-field theory to this system, as in [1]. [sent-47, score-0.307]

26 Likewise we obtain the second derivative ~~~ Ip~o ~ - ( (~A';X'X;) ') + (pi + O';)A,;m,m;) , . [sent-48, score-0.033]

27 8=0 L L Qijkl Aij Aklmimjmkml i,j k,l (18) Where Qijkl contains the integer coefficients arising from integration by parts in the first and second terms and (1 + Oij) in the second term of Eq. [sent-50, score-0.039]

28 This expansion is to the same order as the TAP-Onsager correction term for the Ising model, which can be derived by an analogous approach to the equivalent free-energy [4]. [sent-52, score-0.265]

29 (3(1 + Oij)mimj - (32 2' L kl QijklAklmimjmkml (19) We arrive at an analytic approximation for Aij as a function of the 1st and 2nd moments of the data, using Eq. [sent-55, score-0.164]

30 We can obtain an equivalent expansion for Ai ((3) and hence bi . [sent-58, score-0.203]

31 To first order in (3 (equivalent to the order of (3 in the approximation for A), we have Ai((3) ~ Ai(O) + (3 8AčˇŻ1 8; P + . [sent-59, score-0.091]

32 11 & 15 (21) (22) = - 2:(1 + c5ij )Aijmj (23) j Hence (24) The approach presented here makes an explicit approximation of the statistics required for the NNBM learning rule (xxT}f' which can be substituted in the fixed-point equation Eq. [sent-63, score-0.231]

33 This is in contrast to the linear response theory approach of Kappen & Rodriguez [6] to the Boltzmann Machine, which exploits the relationship 8 2 1nZ 8b i 8b j = (XiXj) - (Xi) (Xj) = Xij (25) between the free energy and the covariance matrix X of the model. [sent-65, score-0.262]

34 In the learning problem, this produces a quadratic equation in A, the solution of which is non-trivial. [sent-66, score-0.083]

35 Computationally efficient solutions of the linear response theory are then obtained by secondary approximation of the 2nd-order term, compromising the fidelity of the model. [sent-67, score-0.294]

36 3 Learning a 'Competitive' Nonnegative Boltzmann Distribution A visualisable test problem is that of learning a bimodal NNBD in 2 dimensions. [sent-68, score-0.088]

37 MonteCarlo slice sampling (See [1] & [5]) was used to generate 200 samples from a NNBD as shown in Fig. [sent-69, score-0.044]

38 The high temperature expansion was then used to learn approximate parameters for the NNBM model of this data. [sent-71, score-0.341]

39 A surface plot of the resulting model distribution is shown in Fig. [sent-72, score-0.112]

40 l(b), it is clearly a valid candidate generative distribution for the data. [sent-73, score-0.191]

41 This is in strong contrast with a naive mean field ((3 = 0) model, which by construction would be unable to produce a multiple-peaked approximation, as previously described, [1] . [sent-74, score-0.241]

42 4 Orientation Tuning in Visual Cortex - a translationally invariant model The neural network model of Ben-Yishai et. [sent-75, score-0.347]

43 al [7] for orientation-tuning in visual cortex has the property that its dynamics exhibit a continuum of stable states which are trans- (a) 8 ~ 15 - 6 ><. [sent-76, score-0.111]

44 8 Figure 1: (a) Training data, generated from 2-dimensional 'competitive' NNBD, (b) Learned model distribution, under the high temperature expansion. [sent-84, score-0.184]

45 The energy function of the network model is a translationally invariant function of the angles of maximal response, Bi , of the N neurons, and can be mapped directly onto the energy of the NNBM, as described in [1]. [sent-86, score-0.469]

46 Aii=1'(c5ii + ~- ~COS(~li-jl)),bi=1' (26) We can generate training data for the NNBM by sampling from the neural network model with known parameters. [sent-87, score-0.11]

47 The corresponding pair of eigenvectors of A are sinusoids of period equal to the width of the stable activation bumps of the network, with a small relative phase. [sent-89, score-0.051]

48 Here, the NNBM parameters have been solved using the high-temperature expansion for training data generated by Monte Carlo slice-sampling [5] from a lO-neuron model with parameters to = 4, I' = 100 in Eq. [sent-90, score-0.176]

49 2 illustrates modal activity patterns of the learned NNBM model distribution, found using gradient ascent of the log-likelihood function from a random initialisation of the variables. [sent-93, score-0.149]

50 These modes of the approximate NNBM model are highly similar to the training patterns, also the eigenvectors and eigenvalues of A exhibit similar properties between their learned and training forms. [sent-95, score-0.264]

51 This gives evidence that the approximation is successful in learning a high-dimensional translationally invariant NNBM model. [sent-96, score-0.349]

52 5 Generative Model for Handwritten Digits In figure 3, I show the results of applying the high-temperature NNBM to learning a generative model for the feature coactivations of the Nonnegative Matrix Factorization [2] 6 6 ~ Q) rn4 0: 0: OJ c:: . [sent-97, score-0.254]

53 decomposition of a database of the handwritten digits, 0-9. [sent-104, score-0.073]

54 This problem contains none of the space-filling symmetry of the visual cortex model, and hence requires a more strongly multimodal generative model distribution to generate distinct digits. [sent-105, score-0.424]

55 6 Discussion In this work, an approximate technique has been derived for directly determining the NNBM parameters A, b in terms of the Ist- and 2nd-order statistics of the data, using the method of high-temperature expansion. [sent-107, score-0.102]

56 To second order this produces corrections to the naive mean field approximation of the system analogous to the TAP term for the Ising Model/Boltzmann Machine. [sent-108, score-0.427]

57 The efficacy of this approximation has been demonstrated in the pathological case of learning the 'competitive' NNBD, learning the translationally invariant model in 10 dimensions, and a generative model for handwritten digits. [sent-109, score-0.742]

58 These results demonstrate an improvement in approximation to models in this class over a naive mean field ((3 = 0) approach, without reversion to secondary assumptions such as those made in the linear response theory for the Boltzmann Machine. [sent-110, score-0.45]

59 There is strong current interest in the relationship between TAP-like mean field theory, variational approximation and belief-propagation in graphical models with loops. [sent-111, score-0.244]

60 All of these can be interpreted in terms of minimising an effective free energy of the system [8]. [sent-112, score-0.174]

61 The distinction in the work presented here lies in choosing optimal approximate statistics to learn the true model, under the assumption that satisfaction of the fixed-point equations of the true model optimises the free energy. [sent-113, score-0.248]

62 This compares favourably with variational a) b) Figure 3: Digit images generated with feature activations sampled from a) a uniform distribution, and b) a high-temperature NNBM model for the digits. [sent-114, score-0.11]

63 Methods of this type fail when they add spurious fixed points to the learning dynamics. [sent-116, score-0.043]

64 Future work will focus on understanding the origins of such fixed points, and the regimes in which they lead to a poor approximation of the model parameters. [sent-117, score-0.205]

65 How to expand around mean-field theory using hightemperature expansions. [sent-130, score-0.04]

66 Markov chain Monte Carlo methods based on 'slicing' the density function. [sent-133, score-0.04]

67 Efficient learning in Boltzmann Machines using linear response theory. [sent-137, score-0.091]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('nnbm', 0.562), ('nnbd', 0.316), ('nonnegative', 0.256), ('dxk', 0.175), ('generative', 0.145), ('tik', 0.14), ('xixj', 0.14), ('translationally', 0.137), ('boltzmann', 0.137), ('temperature', 0.118), ('expansion', 0.11), ('naive', 0.102), ('xi', 0.101), ('energy', 0.094), ('approximation', 0.091), ('ising', 0.09), ('yedidia', 0.082), ('free', 0.08), ('invariant', 0.078), ('mi', 0.078), ('handwritten', 0.073), ('dd', 0.071), ('kappen', 0.071), ('ml', 0.071), ('downs', 0.07), ('embody', 0.07), ('qijkl', 0.07), ('ai', 0.068), ('model', 0.066), ('field', 0.066), ('exp', 0.063), ('carlo', 0.062), ('monte', 0.062), ('aij', 0.061), ('oj', 0.061), ('bi', 0.061), ('lee', 0.061), ('georges', 0.06), ('js', 0.06), ('aii', 0.06), ('oij', 0.06), ('secondary', 0.06), ('fj', 0.059), ('advanced', 0.057), ('statistics', 0.055), ('hs', 0.055), ('fidelity', 0.055), ('rectified', 0.055), ('xxt', 0.055), ('neuron', 0.052), ('eigenvectors', 0.051), ('modal', 0.051), ('rodriguez', 0.051), ('poor', 0.048), ('response', 0.048), ('subscript', 0.048), ('approximate', 0.047), ('distribution', 0.046), ('analogous', 0.046), ('bimodal', 0.045), ('multimodal', 0.045), ('david', 0.044), ('variational', 0.044), ('generate', 0.044), ('learning', 0.043), ('mean', 0.043), ('cortex', 0.042), ('rule', 0.042), ('moments', 0.041), ('seung', 0.041), ('theory', 0.04), ('produces', 0.04), ('chain', 0.04), ('term', 0.039), ('normalisation', 0.039), ('lagrange', 0.039), ('correction', 0.038), ('taylor', 0.038), ('princeton', 0.038), ('digits', 0.037), ('modes', 0.037), ('xk', 0.037), ('visual', 0.036), ('mackay', 0.035), ('entropy', 0.035), ('constraint', 0.034), ('tuning', 0.033), ('derivative', 0.033), ('al', 0.033), ('learned', 0.032), ('analytic', 0.032), ('equivalent', 0.032), ('eigenvalues', 0.031), ('limit', 0.031), ('orientation', 0.03), ('enforced', 0.03), ('dxp', 0.03), ('lai', 0.03), ('unable', 0.03), ('bert', 0.03)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999988 64 nips-2000-High-temperature Expansions for Learning Models of Nonnegative Data

Author: Oliver B. Downs

2 0.13340591 114 nips-2000-Second Order Approximations for Probability Models

Author: Hilbert J. Kappen, Wim Wiegerinck

Abstract: In this paper, we derive a second order mean field theory for directed graphical probability models. By using an information theoretic argument it is shown how this can be done in the absense of a partition function. This method is a direct generalisation of the well-known TAP approximation for Boltzmann Machines. In a numerical example, it is shown that the method greatly improves the first order mean field approximation. For a restricted class of graphical models, so-called single overlap graphs, the second order method has comparable complexity to the first order method. For sigmoid belief networks, the method is shown to be particularly fast and effective.

3 0.12863748 13 nips-2000-A Tighter Bound for Graphical Models

Author: Martijn A. R. Leisink, Hilbert J. Kappen

Abstract: We present a method to bound the partition function of a Boltzmann machine neural network with any odd order polynomial. This is a direct extension of the mean field bound, which is first order. We show that the third order bound is strictly better than mean field. Additionally we show the rough outline how this bound is applicable to sigmoid belief networks. Numerical experiments indicate that an error reduction of a factor two is easily reached in the region where expansion based approximations are useful. 1

4 0.11653561 31 nips-2000-Beyond Maximum Likelihood and Density Estimation: A Sample-Based Criterion for Unsupervised Learning of Complex Models

Author: Sepp Hochreiter, Michael Mozer

Abstract: The goal of many unsupervised learning procedures is to bring two probability distributions into alignment. Generative models such as Gaussian mixtures and Boltzmann machines can be cast in this light, as can recoding models such as ICA and projection pursuit. We propose a novel sample-based error measure for these classes of models, which applies even in situations where maximum likelihood (ML) and probability density estimation-based formulations cannot be applied, e.g., models that are nonlinear or have intractable posteriors. Furthermore, our sample-based error measure avoids the difficulties of approximating a density function. We prove that with an unconstrained model, (1) our approach converges on the correct solution as the number of samples goes to infinity, and (2) the expected solution of our approach in the generative framework is the ML solution. Finally, we evaluate our approach via simulations of linear and nonlinear models on mixture of Gaussians and ICA problems. The experiments show the broad applicability and generality of our approach. 1

5 0.10785847 14 nips-2000-A Variational Mean-Field Theory for Sigmoidal Belief Networks

Author: Chiranjib Bhattacharyya, S. Sathiya Keerthi

Abstract: A variational derivation of Plefka's mean-field theory is presented. This theory is then applied to sigmoidal belief networks with the aid of further approximations. Empirical evaluation on small scale networks show that the proposed approximations are quite competitive. 1

6 0.093883149 77 nips-2000-Learning Curves for Gaussian Processes Regression: A Framework for Good Approximations

7 0.089118868 100 nips-2000-Permitted and Forbidden Sets in Symmetric Threshold-Linear Networks

8 0.079242818 106 nips-2000-Propagation Algorithms for Variational Bayesian Learning

9 0.079059914 110 nips-2000-Regularization with Dot-Product Kernels

10 0.076636538 108 nips-2000-Recognizing Hand-written Digits Using Hierarchical Products of Experts

11 0.075983375 46 nips-2000-Ensemble Learning and Linear Response Theory for ICA

12 0.071333848 62 nips-2000-Generalized Belief Propagation

13 0.064187132 76 nips-2000-Learning Continuous Distributions: Simulations With Field Theoretic Priors

14 0.06345427 69 nips-2000-Incorporating Second-Order Functional Knowledge for Better Option Pricing

15 0.062698647 107 nips-2000-Rate-coded Restricted Boltzmann Machines for Face Recognition

16 0.062496692 86 nips-2000-Model Complexity, Goodness of Fit and Diminishing Returns

17 0.060072295 20 nips-2000-Algebraic Information Geometry for Learning Machines with Singularities

18 0.058303174 125 nips-2000-Stability and Noise in Biochemical Switches

19 0.057955831 102 nips-2000-Position Variance, Recurrence and Perceptual Learning

20 0.054295536 95 nips-2000-On a Connection between Kernel PCA and Metric Multidimensional Scaling

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.217), (1, -0.022), (2, 0.032), (3, -0.057), (4, 0.21), (5, 0.019), (6, -0.012), (7, -0.08), (8, -0.013), (9, -0.009), (10, -0.041), (11, -0.101), (12, -0.079), (13, 0.033), (14, 0.007), (15, -0.094), (16, 0.125), (17, -0.246), (18, 0.029), (19, -0.037), (20, -0.053), (21, 0.006), (22, 0.009), (23, 0.095), (24, 0.011), (25, 0.04), (26, -0.119), (27, 0.081), (28, 0.006), (29, -0.012), (30, 0.111), (31, 0.023), (32, -0.042), (33, -0.025), (34, -0.159), (35, 0.104), (36, -0.061), (37, 0.134), (38, -0.035), (39, -0.021), (40, -0.061), (41, -0.027), (42, 0.014), (43, -0.035), (44, 0.035), (45, 0.09), (46, -0.075), (47, 0.03), (48, -0.008), (49, 0.143)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.93403018 64 nips-2000-High-temperature Expansions for Learning Models of Nonnegative Data

Author: Oliver B. Downs

2 0.61216247 114 nips-2000-Second Order Approximations for Probability Models

Author: Hilbert J. Kappen, Wim Wiegerinck

3 0.60995418 13 nips-2000-A Tighter Bound for Graphical Models

Author: Martijn A. R. Leisink, Hilbert J. Kappen

4 0.60639799 31 nips-2000-Beyond Maximum Likelihood and Density Estimation: A Sample-Based Criterion for Unsupervised Learning of Complex Models

Author: Sepp Hochreiter, Michael Mozer

5 0.5420118 14 nips-2000-A Variational Mean-Field Theory for Sigmoidal Belief Networks

Author: Chiranjib Bhattacharyya, S. Sathiya Keerthi

6 0.45357534 77 nips-2000-Learning Curves for Gaussian Processes Regression: A Framework for Good Approximations

7 0.42538944 46 nips-2000-Ensemble Learning and Linear Response Theory for ICA

8 0.38213015 20 nips-2000-Algebraic Information Geometry for Learning Machines with Singularities

9 0.38200912 86 nips-2000-Model Complexity, Goodness of Fit and Diminishing Returns

10 0.37520149 62 nips-2000-Generalized Belief Propagation

11 0.36411509 144 nips-2000-Vicinal Risk Minimization

12 0.34429252 110 nips-2000-Regularization with Dot-Product Kernels

13 0.31832001 76 nips-2000-Learning Continuous Distributions: Simulations With Field Theoretic Priors

14 0.29265156 125 nips-2000-Stability and Noise in Biochemical Switches

15 0.28234476 73 nips-2000-Kernel-Based Reinforcement Learning in Average-Cost Problems: An Application to Optimal Portfolio Choice

16 0.27390987 99 nips-2000-Periodic Component Analysis: An Eigenvalue Method for Representing Periodic Structure in Speech

17 0.26983765 34 nips-2000-Competition and Arbors in Ocular Dominance

18 0.26964757 108 nips-2000-Recognizing Hand-written Digits Using Hierarchical Products of Experts

19 0.268574 69 nips-2000-Incorporating Second-Order Functional Knowledge for Better Option Pricing

20 0.2658653 85 nips-2000-Mixtures of Gaussian Processes

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(10, 0.039), (16, 0.295), (17, 0.097), (32, 0.03), (33, 0.036), (36, 0.023), (55, 0.029), (62, 0.029), (65, 0.021), (67, 0.074), (76, 0.1), (79, 0.014), (81, 0.016), (90, 0.032), (91, 0.033), (97, 0.038)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.81360102 147 nips-2000-Who Does What? A Novel Algorithm to Determine Function Localization

Author: Ranit Aharonov-Barki, Isaac Meilijson, Eytan Ruppin

Abstract: We introduce a novel algorithm, termed PPA (Performance Prediction Algorithm), that quantitatively measures the contributions of elements of a neural system to the tasks it performs. The algorithm identifies the neurons or areas which participate in a cognitive or behavioral task, given data about performance decrease in a small set of lesions. It also allows the accurate prediction of performances due to multi-element lesions. The effectiveness of the new algorithm is demonstrated in two models of recurrent neural networks with complex interactions among the elements. The algorithm is scalable and applicable to the analysis of large neural networks. Given the recent advances in reversible inactivation techniques, it has the potential to significantly contribute to the understanding of the organization of biological nervous systems, and to shed light on the long-lasting debate about local versus distributed computation in the brain.

same-paper 2 0.81172615 64 nips-2000-High-temperature Expansions for Learning Models of Nonnegative Data

Author: Oliver B. Downs

3 0.49444652 13 nips-2000-A Tighter Bound for Graphical Models

Author: Martijn A. R. Leisink, Hilbert J. Kappen

4 0.47308424 122 nips-2000-Sparse Representation for Gaussian Process Models

Author: Lehel Csatč´¸, Manfred Opper

Abstract: We develop an approach for a sparse representation for Gaussian Process (GP) models in order to overcome the limitations of GPs caused by large data sets. The method is based on a combination of a Bayesian online algorithm together with a sequential construction of a relevant subsample of the data which fully specifies the prediction of the model. Experimental results on toy examples and large real-world data sets indicate the efficiency of the approach.

5 0.47115272 95 nips-2000-On a Connection between Kernel PCA and Metric Multidimensional Scaling

Author: Christopher K. I. Williams

Abstract: In this paper we show that the kernel peA algorithm of Sch6lkopf et al (1998) can be interpreted as a form of metric multidimensional scaling (MDS) when the kernel function k(x, y) is isotropic, i.e. it depends only on Ilx - yll. This leads to a metric MDS algorithm where the desired configuration of points is found via the solution of an eigenproblem rather than through the iterative optimization of the stress objective function. The question of kernel choice is also discussed. 1

6 0.46779791 106 nips-2000-Propagation Algorithms for Variational Bayesian Learning

7 0.46528196 37 nips-2000-Convergence of Large Margin Separable Linear Classification

8 0.46473277 85 nips-2000-Mixtures of Gaussian Processes

9 0.45643759 134 nips-2000-The Kernel Trick for Distances

10 0.45572573 102 nips-2000-Position Variance, Recurrence and Perceptual Learning

11 0.45512104 74 nips-2000-Kernel Expansions with Unlabeled Examples

12 0.45332754 119 nips-2000-Some New Bounds on the Generalization Error of Combined Classifiers

13 0.45308459 146 nips-2000-What Can a Single Neuron Compute?

14 0.4513739 46 nips-2000-Ensemble Learning and Linear Response Theory for ICA

15 0.44852471 104 nips-2000-Processing of Time Series by Neural Circuits with Biologically Realistic Synaptic Dynamics

16 0.44764158 114 nips-2000-Second Order Approximations for Probability Models

17 0.44585022 21 nips-2000-Algorithmic Stability and Generalization Performance

18 0.44386768 94 nips-2000-On Reversing Jensen's Inequality

19 0.44300321 120 nips-2000-Sparse Greedy Gaussian Process Regression

20 0.44200078 20 nips-2000-Algebraic Information Geometry for Learning Machines with Singularities