nips nips2000 nips2000-110 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Alex J. Smola, Zoltán L. Óvári, Robert C. Williamson
Abstract: In this paper we give necessary and sufficient conditions under which kernels of dot product type k(x, y) = k(x . y) satisfy Mercer's condition and thus may be used in Support Vector Machines (SVM), Regularization Networks (RN) or Gaussian Processes (GP). In particular, we show that if the kernel is analytic (i.e. can be expanded in a Taylor series), all expansion coefficients have to be nonnegative. We give an explicit functional form for the feature map by calculating its eigenfunctions and eigenvalues. 1
Reference: text
sentIndex sentText sentNum sentScore
1 WilliaIDson Department of Engineering Australian National University Canberra, ACT, 0200 Abstract In this paper we give necessary and sufficient conditions under which kernels of dot product type k(x, y) = k(x . [sent-4, score-0.805]
2 y) satisfy Mercer's condition and thus may be used in Support Vector Machines (SVM), Regularization Networks (RN) or Gaussian Processes (GP). [sent-5, score-0.148]
3 In particular, we show that if the kernel is analytic (i. [sent-6, score-0.232]
4 can be expanded in a Taylor series), all expansion coefficients have to be nonnegative. [sent-8, score-0.254]
5 We give an explicit functional form for the feature map by calculating its eigenfunctions and eigenvalues. [sent-9, score-0.188]
6 A possible interpretation of their effects is that they represent dot products in some feature space :7, i. [sent-11, score-0.248]
7 Another interpretation is to connect ¢ with the regularization properties of the corresponding learning algorithm [8]. [sent-14, score-0.328]
8 Most popular kernels can be described by three main categories: translation invariant kernels [9] k(x, y) = k(x - y), (2) kernels originating from generative models (e. [sent-15, score-1.09]
9 those of Jaakkola and Haussler, or Watkins), and thirdly, dot-product kernels k(x, y) = k(x . [sent-17, score-0.339]
10 (3) Since k influences the properties of the estimates generated by any of the algorithms above, it is natural to ask which regularization properties are associated with k. [sent-19, score-0.373]
11 In [8, 10, 9] the general connections between kernels and regularization properties are pointed out, containing details on the connection between the Fourier spectrum of translation invariant kernels and the smoothness properties of the estimates. [sent-20, score-1.191]
12 In a nutshell, the necessary and sufficient condition for k(x - y) to be a Mercer kernel (i. [sent-21, score-0.39]
13 be admissible for any of the aforementioned kernel methods) is that its Fourier transform be nonnegative. [sent-23, score-0.229]
14 This also allowed for an easy to check criterion for new kernel functions. [sent-24, score-0.247]
15 Moreover, [5] gave a similar analysis for kernels derived from generative models. [sent-25, score-0.371]
16 y), on the other hand, have been eluding further theoretical analysis and only a necessary condition [1] was found, based on geometrical considerations. [sent-27, score-0.157]
17 Unfortunately, it does not provide much insight into smoothness properties of the corresponding estimate. [sent-28, score-0.112]
18 Our aim in the present paper is to shed some light on the properties of dot product kernels, give an explicit equation how its eigenvalues can be determined, and, finally, show that for analytic kernels that can be expanded in terms of monomials ~n or associated Legendre polynomials P~(~) [4], i. [sent-29, score-1.159]
19 00 00 n=O k(x, y) = k(x· y) with k(~) = n=O L anC or k(~) = L bnP~(~) (4) a necessary and sufficient condition is an ~ 0 for all n E N if no assumption about the dimensionality of the input space is made (for finite dimensional spaces of dimension d, the condition is that bn ~ 0). [sent-31, score-0.346]
20 In other words, the polynomial series expansion in dot product kernels plays the role of the Fourier transform in translation invariant kernels. [sent-32, score-1.061]
21 2 Regularization, Kernels, and Integral Operators Let us briefly review some results from regularization theory, needed for the further understanding of the paper. [sent-33, score-0.215]
22 ) can be understood as minimizing a regularized risk functional Rreg[f] := Remp[f] + AO[f] (5) where Remp is the tmining error of the function f on the given data, A > 0 and O[f] is the so-called regularization term. [sent-35, score-0.253]
23 ), A is generally adjusted by some model selection criterion, and O[f] is a nonnegative functional of f which models our belief which functions should be considered to be simple (a prior in the Bayesian sense or a structure in a Structuraillisk Minimization sense). [sent-37, score-0.082]
24 1 Regularization Operators One possible interpretation of k is [8] that it leads to regularized risk functionals where O[f] = ~IIPfI12 or equivalently (Pk(x, . [sent-39, score-0.072]
25 (6) Here P is a regularization operator mapping functions f on X into a dot product space (we choose L2(X)), The following theorem allows us to construct explicit operators P and it provides a criterion whether a symmetric function k(x, y) is suitable. [sent-41, score-0.854]
26 Theorem 1 (Mercer [3]) Suppose k E Loo(X2) such that the integml opemtor Tk : L 2 (X) -t L 2 (X), Tkf(-) := Ix k(·,x)f(x)dp,(x) (7) is positive. [sent-42, score-0.118]
27 k(x,x') = ~ Aj«Pj(X)«Pj(x') jEN holds for almost all (x,x'), where the series converges absolutely and uniformly for almost all (x, x'). [sent-49, score-0.102]
28 This means that by finding the eigensystem (Ai, «Pi) of Tk we can also determine the regularization operator P via [8] (8) The eigensystem (Ai, «Pi) tells us which functions are considered "simple" in terms of the operator P. [sent-50, score-0.915]
29 Consequently, in order to determine the regularization properties of dot product kernels we have to find their eigenfunctions and eigenvalues. [sent-51, score-1.077]
30 2 Specific Assumptions Before we diagonalize Tk for a given kernel we have yet to specify the assumptions we make about the measure J. [sent-53, score-0.178]
31 Since a suitable choice can drastically simplify the problem we try to keep as much of the symmetries imposed by k (x . [sent-55, score-0.04]
32 The predominant symmetry in dot product kernels is rotation invariance. [sent-57, score-0.646]
33 Therefore we set choose the unit ball in lRd X:= Ud := {xix E lRd and IIxl12 ::; I}. [sent-58, score-0.032]
34 In some cases the unit sphere in lR: is more amenable to our analysis. [sent-61, score-0.084]
35 There we choose X:= Sd-1 := {xix E lRd and IIxl12 = I}. [sent-62, score-0.032]
36 (10) The latter is a good approximation of the situation where dot product kernels perform best - if the training data has approximately equal Euclidean norm (e. [sent-63, score-0.684]
37 This means that we have to solve the following integral equation: Find functions «Pi : L 2 (X) --+ lR together with coefficients Ai such that Tk«Pi(X) := k(x· y)«pi(y)dy = Ai«Pi(X). [sent-69, score-0.138]
38 Ix 3 Orthogonal Polynomials and Spherical Harmonics Before we can give eigenfunctions or state necessary and sufficient conditions we need some basic relations about Legendre Polynomials and spherical harmonics. [sent-70, score-0.447]
39 They have the following properties • The polynomials Pn(~) and P~(~) are of degree n, and moreover Pn := P~ • The (associated) Legendre Polynomials form an orthogonal basis with r 1 d d 1-1 Pn(~)Pm(~)(I- ~ 2 d-S ) 2 ISd-11 1 d~ = S d-21 N(d,n/m,n. [sent-74, score-0.369]
40 d j 2 Here ISd-1 I = I'(d72) denotes the surface of Sd-b and N ( d, n ) denotes the multiplicity of spherical harmonics of order n on Sd-b i. [sent-77, score-0.334]
41 n n-1 • This admits the orthogonal expansion of any analytic function [-1,1] into P~ by k(~) on Moreover, the Legendre Polynomials may be expanded into an orthonormal basis of spherical harmonics Y':,j by the Funk-Heeke equation (cf. [sent-80, score-0.624]
42 [4]) to obtain IS I N(d,n) P~(x' y) = N(~~~) ~ where Y:'j(x)Y:'j(y) (13) On,n,Oj,j" (14) Ilxll = Ilyll = 1 and moreover 1 Y:'j(X)Y':',j,(x)dx = Sd - l 4 Conditions and Eigensystems on Sd- l Schoenberg [7] gives necessary and sufficient conditions under which a function k(x . [sent-83, score-0.196]
43 In particular he proves the following two theorems: Theorem 2 (Dot Product Kernels in Finite Dimensions) A kernel k(x· y) defined on Sd-l x Sd-l satisfies Mercer's condition if and only if its expansion into Legendre polynomials P~ has only nonnegative coefficients, i. [sent-85, score-0.811]
44 (15) i=O Theorem 3 (Dot Product Kernels in Infinite Dimensions) A kernel k(x·y) defined on the unit sphere in a Hilbert space satisfies Mercer's condition if and only if its Taylor series expansion has only nonnegative coefficients: 00 k(~) = L anC with an :::: O. [sent-88, score-0.697]
45 (16) i=O Therefore, all we have to do in order to check whether a particular kernel may be used in a SV machine or a Gaussian Process is to look at its polynomial series expansion and check the coefficients. [sent-89, score-0.658]
46 Before doing so note that (16) is a more stringent condition than (15). [sent-91, score-0.1]
47 In other words, in order to prove Mercer's condition for arbitrary dimensions it suffices to show that the Taylor expansion contains only positive coefficients. [sent-92, score-0.28]
48 On the other hand, in order to prove that a candidate of a kernel function will never satisfy Mercer's condition, it is sufficient to show this for (15) where P~ = Pm i. [sent-93, score-0.352]
49 We conclude this section with an explicit representation ofthe eigensystem of k(x·y). [sent-96, score-0.315]
50 It is given by the following lemma: Lemma 4 (Eigensystem of Dot Product Kernels) Denote by k(x·y) a kernel on Sd-l x Sd-l satisfying condition (15) of Theorem 2. [sent-97, score-0.278]
51 Then the eigensystem of k is given by 'II n,j = Y,:;'j with eigenvalues An,j = an ~~~~) of multiplicity N(d,n). [sent-98, score-0.375]
52 (17) In other words, N(d,n) determines the regularization properties of k(x· y). [sent-99, score-0.294]
53 Proof Using the Funk-Heeke formula (13) we may expand (15) further into Spherical Harmonics Y:! [sent-100, score-0.114]
54 ,j' The latter, however, are orthonormal, hence computing the dot product of the resulting expansion with Y:! [sent-101, score-0.491]
55 y) on Ud we have to expand k into L:,n=o(llxllllyll)'np~ (~. [sent-106, score-0.114]
56 ~) and expand'll into 'II(llxll)'11 The latter is very technical and is thus omitted. [sent-107, score-0.038]
57 Examples and Applications In the following we will analyze a few kernels and state under which conditions they may be used as SV kernels. [sent-110, score-0.42]
58 Example 1 (Homogeneous Polynomial Kernels k(x, y) = (x· y)P) It is well known that this kernel satisfies Mercer's condition for pEN. [sent-111, score-0.331]
59 We will show that for p ¢ N this is never the case. [sent-112, score-0.04]
60 Thus we have to show that (15) cannot hold for an expansion in terms of Legendre Polynomials (d = 3). [sent-113, score-0.149]
61 (18) For odd n the integral vanishes since Pn(-e) = (-I)npn(e). [sent-120, score-0.082]
62 In order to satisfy (15), the integral has to be nonnegative for all n. [sent-121, score-0.212]
63 = (x· y + I)P) Likewise we might conjecture that k(e) = (1 + e)p is an admissible kernel for all p> O. [sent-124, score-0.229]
64 Again, we expand k in a series of Legendre Polynomials to obtain [2, 7. [sent-125, score-0.216]
65 This violates condition (15), hence such kernels cannot be used in SV machines either. [sent-130, score-0.474]
66 5(~~K with pEN) This kernel can be written as k(~) = E::~ ~n, hence all the coefficients ai = 1 which means that this kernel can be used regardless of the dimensionality of the input space. [sent-132, score-0.496]
67 Likewise we can analyze the an infinite power series: Example 4 (Vovk's Infinite Polynomial k(x,y) = (1- (x· y»-l) This kernel can be written as k(~) = E:=o ~n, hence all the coefficients ai = 1. [sent-133, score-0.386]
68 Example 5 (Neural Networks Kernels k(x,y) = tanh(a + (x· y))) It is a longstanding open question whether kernels k(~) = tanh(a +~) may be used as SV kernels, or, for which sets of pammeters this might be possible. [sent-135, score-0.398]
69 The technique is identical to the one of Examples 1 and 2: we have to show that k fails the conditions of Theorem 2. [sent-137, score-0.047]
70 Expanding tanh(a +~) into a Taylor series yields tanh a + (: ~(1- tanh 2 a)(I- 3tanh2 a) + 0«(:4) " (20) Now we analyze (20) coefficient-wise. [sent-141, score-0.3]
71 Since all of them have to be nonnegative we obtain from the first term a E JO' 00), the third term a E (-00,0], and finally from the fourth term lal E [arctanh 3' arctanh 1]. [sent-142, score-0.141]
72 This leaves us with a E 0, hence under no conditions on its pammeters the kernel above satisfies Mercer's condition. [sent-143, score-0.404]
73 6 1 " cosh' a _ (:2 tanha _ "cosh' a 3 Eigensystems on Ud In order to find the eigensystem of Tk on Ud we have to find a different representation of k where the radial part Ilxllllyll and the angular part ~ = out separately. [sent-144, score-0.373]
74 To see that we can always find such an expansion for analytic functions, first expand k in a Taylor series and then expand each coefficient (1IxIIIIYII~)n into (1Ixllllyll)nEj=ocj(d,n)Pf(~). [sent-147, score-0.533]
75 Rearranging terms into a series of Pf gives expansion (21). [sent-148, score-0.251]
76 This allows us to factorize the integral operator into its radial and its angular part. [sent-149, score-0.277]
77 We obtain the following theorem: Theorem 5 (Eigenfunctions of Tk on Ud) For any kernel k with expansion (21) the eigensystem of the integml opemtor Tk on Ud is given by CPn,j,! [sent-150, score-0.709]
78 Next apply the Funk-Hecke equation (13) to expand the associated Legendre Polynomials P~ into the spherical harmonics Y':' i . [sent-152, score-0.402]
79 As in Lemma 4 this leads to the spherical harmonics as the angular part of the eigensystem. [sent-153, score-0.348]
80 • This leads to the eigensystem of the homogeneous polynomial kernel k(x, y) = (x· y)P: if we use (18) in conjunction with (12) to expand ~P into a series of P~(~) we obtain an expansion of type (21) where all Kn(T",Ty) ex: (T",Ty)P for n ~ p and Kn(T",Ty) = 0 otherwise. [sent-156, score-0.934]
81 7 Discussion In this paper we gave conditions on the properties of dot product kernels, under which the latter satisfy Mercer's condition. [sent-159, score-0.551]
82 Functions of positive and negative type and their connection with the theory of integral equations. [sent-185, score-0.116]
83 The connection between regularization operators and support vector kernels. [sent-224, score-0.341]
wordName wordTfidf (topN-words)
[('kernels', 0.339), ('legendre', 0.323), ('eigensystem', 0.264), ('regularization', 0.215), ('polynomials', 0.215), ('dot', 0.214), ('mercer', 0.189), ('kernel', 0.178), ('spherical', 0.151), ('expansion', 0.149), ('tk', 0.148), ('ud', 0.137), ('eigenfunctions', 0.137), ('harmonics', 0.137), ('expand', 0.114), ('series', 0.102), ('condition', 0.1), ('product', 0.093), ('pn', 0.092), ('operators', 0.092), ('polynomial', 0.091), ('kn', 0.088), ('lrd', 0.088), ('operator', 0.086), ('integral', 0.082), ('tanh', 0.082), ('nonnegative', 0.082), ('properties', 0.079), ('sv', 0.079), ('taylor', 0.079), ('pj', 0.075), ('theorem', 0.071), ('check', 0.069), ('scholkopf', 0.066), ('eigenvalues', 0.065), ('pi', 0.06), ('angular', 0.06), ('anc', 0.059), ('arctanh', 0.059), ('bnp', 0.059), ('cosh', 0.059), ('eigensystems', 0.059), ('integml', 0.059), ('llxll', 0.059), ('opemtor', 0.059), ('pammeters', 0.059), ('vovk', 0.059), ('xix', 0.059), ('necessary', 0.057), ('coefficients', 0.056), ('sufficient', 0.055), ('analytic', 0.054), ('satisfies', 0.053), ('explicit', 0.051), ('admissible', 0.051), ('pen', 0.051), ('amenable', 0.051), ('remp', 0.051), ('radial', 0.049), ('expanded', 0.049), ('ai', 0.049), ('satisfy', 0.048), ('fourier', 0.047), ('conditions', 0.047), ('multiplicity', 0.046), ('canberra', 0.046), ('loo', 0.046), ('orthonormal', 0.046), ('aj', 0.043), ('lemma', 0.043), ('australian', 0.042), ('smola', 0.042), ('never', 0.04), ('translation', 0.04), ('symmetries', 0.04), ('pk', 0.04), ('latter', 0.038), ('orthogonal', 0.038), ('regularized', 0.038), ('moreover', 0.037), ('gp', 0.036), ('homogeneous', 0.036), ('pm', 0.036), ('hence', 0.035), ('interpretation', 0.034), ('analyze', 0.034), ('connection', 0.034), ('bn', 0.034), ('lr', 0.034), ('proves', 0.034), ('infinite', 0.034), ('sphere', 0.033), ('likewise', 0.033), ('pf', 0.033), ('smoothness', 0.033), ('invariant', 0.033), ('choose', 0.032), ('leaves', 0.032), ('gave', 0.032), ('prove', 0.031)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999994 110 nips-2000-Regularization with Dot-Product Kernels
Author: Alex J. Smola, Zoltán L. Óvári, Robert C. Williamson
Abstract: In this paper we give necessary and sufficient conditions under which kernels of dot product type k(x, y) = k(x . y) satisfy Mercer's condition and thus may be used in Support Vector Machines (SVM), Regularization Networks (RN) or Gaussian Processes (GP). In particular, we show that if the kernel is analytic (i.e. can be expanded in a Taylor series), all expansion coefficients have to be nonnegative. We give an explicit functional form for the feature map by calculating its eigenfunctions and eigenvalues. 1
2 0.26378471 134 nips-2000-The Kernel Trick for Distances
Author: Bernhard Schölkopf
Abstract: A method is described which, like the kernel trick in support vector machines (SVMs), lets us generalize distance-based algorithms to operate in feature spaces, usually nonlinearly related to the input space. This is done by identifying a class of kernels which can be represented as norm-based distances in Hilbert spaces. It turns out that common kernel algorithms, such as SVMs and kernel PCA, are actually really distance based algorithms and can be run with that class of kernels, too. As well as providing a useful new insight into how these algorithms work, the present work can form the basis for conceiving new algorithms.
3 0.15096979 130 nips-2000-Text Classification using String Kernels
Author: Huma Lodhi, John Shawe-Taylor, Nello Cristianini, Christopher J. C. H. Watkins
Abstract: We introduce a novel kernel for comparing two text documents. The kernel is an inner product in the feature space consisting of all subsequences of length k. A subsequence is any ordered sequence of k characters occurring in the text though not necessarily contiguously. The subsequences are weighted by an exponentially decaying factor of their full length in the text, hence emphasising those occurrences which are close to contiguous. A direct computation of this feature vector would involve a prohibitive amount of computation even for modest values of k, since the dimension of the feature space grows exponentially with k. The paper describes how despite this fact the inner product can be efficiently evaluated by a dynamic programming technique. A preliminary experimental comparison of the performance of the kernel compared with a standard word feature space kernel [6] is made showing encouraging results. 1
4 0.12241925 121 nips-2000-Sparse Kernel Principal Component Analysis
Author: Michael E. Tipping
Abstract: 'Kernel' principal component analysis (PCA) is an elegant nonlinear generalisation of the popular linear data analysis method, where a kernel function implicitly defines a nonlinear transformation into a feature space wherein standard PCA is performed. Unfortunately, the technique is not 'sparse', since the components thus obtained are expressed in terms of kernels associated with every training vector. This paper shows that by approximating the covariance matrix in feature space by a reduced number of example vectors, using a maximum-likelihood approach, we may obtain a highly sparse form of kernel PCA without loss of effectiveness. 1
5 0.11332321 95 nips-2000-On a Connection between Kernel PCA and Metric Multidimensional Scaling
Author: Christopher K. I. Williams
Abstract: In this paper we show that the kernel peA algorithm of Sch6lkopf et al (1998) can be interpreted as a form of metric multidimensional scaling (MDS) when the kernel function k(x, y) is isotropic, i.e. it depends only on Ilx - yll. This leads to a metric MDS algorithm where the desired configuration of points is found via the solution of an eigenproblem rather than through the iterative optimization of the stress objective function. The question of kernel choice is also discussed. 1
6 0.10555323 74 nips-2000-Kernel Expansions with Unlabeled Examples
7 0.10500421 58 nips-2000-From Margin to Sparsity
8 0.10373563 4 nips-2000-A Linear Programming Approach to Novelty Detection
9 0.10105994 120 nips-2000-Sparse Greedy Gaussian Process Regression
10 0.086624607 77 nips-2000-Learning Curves for Gaussian Processes Regression: A Framework for Good Approximations
11 0.081593148 133 nips-2000-The Kernel Gibbs Sampler
12 0.079408929 9 nips-2000-A PAC-Bayesian Margin Bound for Linear Classifiers: Why SVMs work
13 0.079059914 64 nips-2000-High-temperature Expansions for Learning Models of Nonnegative Data
14 0.07619258 5 nips-2000-A Mathematical Programming Approach to the Kernel Fisher Algorithm
15 0.075143643 54 nips-2000-Feature Selection for SVMs
16 0.074776888 35 nips-2000-Computing with Finite and Infinite Networks
17 0.074526034 75 nips-2000-Large Scale Bayes Point Machines
18 0.073300779 145 nips-2000-Weak Learners and Improved Rates of Convergence in Boosting
19 0.071706131 20 nips-2000-Algebraic Information Geometry for Learning Machines with Singularities
20 0.069223158 34 nips-2000-Competition and Arbors in Ocular Dominance
topicId topicWeight
[(0, 0.228), (1, 0.18), (2, -0.068), (3, 0.054), (4, -0.009), (5, 0.247), (6, -0.061), (7, -0.024), (8, -0.067), (9, -0.01), (10, -0.036), (11, -0.068), (12, -0.147), (13, 0.011), (14, 0.072), (15, 0.026), (16, 0.212), (17, -0.032), (18, 0.023), (19, -0.127), (20, -0.058), (21, 0.02), (22, -0.009), (23, 0.034), (24, 0.061), (25, -0.07), (26, -0.045), (27, -0.1), (28, 0.043), (29, -0.047), (30, -0.009), (31, 0.018), (32, -0.054), (33, 0.006), (34, -0.127), (35, -0.107), (36, -0.074), (37, 0.192), (38, -0.025), (39, -0.077), (40, 0.129), (41, 0.066), (42, -0.008), (43, 0.087), (44, -0.171), (45, -0.055), (46, 0.101), (47, 0.074), (48, -0.006), (49, 0.033)]
simIndex simValue paperId paperTitle
same-paper 1 0.97766519 110 nips-2000-Regularization with Dot-Product Kernels
Author: Alex J. Smola, Zoltán L. Óvári, Robert C. Williamson
Abstract: In this paper we give necessary and sufficient conditions under which kernels of dot product type k(x, y) = k(x . y) satisfy Mercer's condition and thus may be used in Support Vector Machines (SVM), Regularization Networks (RN) or Gaussian Processes (GP). In particular, we show that if the kernel is analytic (i.e. can be expanded in a Taylor series), all expansion coefficients have to be nonnegative. We give an explicit functional form for the feature map by calculating its eigenfunctions and eigenvalues. 1
2 0.78715509 134 nips-2000-The Kernel Trick for Distances
Author: Bernhard Schölkopf
Abstract: A method is described which, like the kernel trick in support vector machines (SVMs), lets us generalize distance-based algorithms to operate in feature spaces, usually nonlinearly related to the input space. This is done by identifying a class of kernels which can be represented as norm-based distances in Hilbert spaces. It turns out that common kernel algorithms, such as SVMs and kernel PCA, are actually really distance based algorithms and can be run with that class of kernels, too. As well as providing a useful new insight into how these algorithms work, the present work can form the basis for conceiving new algorithms.
3 0.61452866 5 nips-2000-A Mathematical Programming Approach to the Kernel Fisher Algorithm
Author: Sebastian Mika, Gunnar R채tsch, Klaus-Robert M체ller
Abstract: We investigate a new kernel-based classifier: the Kernel Fisher Discriminant (KFD). A mathematical programming formulation based on the observation that KFD maximizes the average margin permits an interesting modification of the original KFD algorithm yielding the sparse KFD. We find that both, KFD and the proposed sparse KFD, can be understood in an unifying probabilistic context. Furthermore, we show connections to Support Vector Machines and Relevance Vector Machines. From this understanding, we are able to outline an interesting kernel-regression technique based upon the KFD algorithm. Simulations support the usefulness of our approach.
4 0.54251528 130 nips-2000-Text Classification using String Kernels
Author: Huma Lodhi, John Shawe-Taylor, Nello Cristianini, Christopher J. C. H. Watkins
Abstract: We introduce a novel kernel for comparing two text documents. The kernel is an inner product in the feature space consisting of all subsequences of length k. A subsequence is any ordered sequence of k characters occurring in the text though not necessarily contiguously. The subsequences are weighted by an exponentially decaying factor of their full length in the text, hence emphasising those occurrences which are close to contiguous. A direct computation of this feature vector would involve a prohibitive amount of computation even for modest values of k, since the dimension of the feature space grows exponentially with k. The paper describes how despite this fact the inner product can be efficiently evaluated by a dynamic programming technique. A preliminary experimental comparison of the performance of the kernel compared with a standard word feature space kernel [6] is made showing encouraging results. 1
5 0.53985703 95 nips-2000-On a Connection between Kernel PCA and Metric Multidimensional Scaling
Author: Christopher K. I. Williams
Abstract: In this paper we show that the kernel peA algorithm of Sch6lkopf et al (1998) can be interpreted as a form of metric multidimensional scaling (MDS) when the kernel function k(x, y) is isotropic, i.e. it depends only on Ilx - yll. This leads to a metric MDS algorithm where the desired configuration of points is found via the solution of an eigenproblem rather than through the iterative optimization of the stress objective function. The question of kernel choice is also discussed. 1
6 0.43271086 121 nips-2000-Sparse Kernel Principal Component Analysis
7 0.42035881 74 nips-2000-Kernel Expansions with Unlabeled Examples
8 0.37746066 120 nips-2000-Sparse Greedy Gaussian Process Regression
9 0.35719457 34 nips-2000-Competition and Arbors in Ocular Dominance
10 0.33912522 4 nips-2000-A Linear Programming Approach to Novelty Detection
11 0.32747394 64 nips-2000-High-temperature Expansions for Learning Models of Nonnegative Data
12 0.32314366 20 nips-2000-Algebraic Information Geometry for Learning Machines with Singularities
13 0.30280542 86 nips-2000-Model Complexity, Goodness of Fit and Diminishing Returns
14 0.29822755 77 nips-2000-Learning Curves for Gaussian Processes Regression: A Framework for Good Approximations
15 0.27523461 85 nips-2000-Mixtures of Gaussian Processes
16 0.26769897 73 nips-2000-Kernel-Based Reinforcement Learning in Average-Cost Problems: An Application to Optimal Portfolio Choice
17 0.26156399 58 nips-2000-From Margin to Sparsity
18 0.26052183 44 nips-2000-Efficient Learning of Linear Perceptrons
19 0.25681841 133 nips-2000-The Kernel Gibbs Sampler
20 0.24257243 112 nips-2000-Reinforcement Learning with Function Approximation Converges to a Region
topicId topicWeight
[(10, 0.03), (17, 0.076), (32, 0.016), (33, 0.035), (48, 0.01), (62, 0.024), (67, 0.059), (76, 0.579), (79, 0.017), (90, 0.022), (97, 0.033)]
simIndex simValue paperId paperTitle
same-paper 1 0.97908556 110 nips-2000-Regularization with Dot-Product Kernels
Author: Alex J. Smola, Zoltán L. Óvári, Robert C. Williamson
Abstract: In this paper we give necessary and sufficient conditions under which kernels of dot product type k(x, y) = k(x . y) satisfy Mercer's condition and thus may be used in Support Vector Machines (SVM), Regularization Networks (RN) or Gaussian Processes (GP). In particular, we show that if the kernel is analytic (i.e. can be expanded in a Taylor series), all expansion coefficients have to be nonnegative. We give an explicit functional form for the feature map by calculating its eigenfunctions and eigenvalues. 1
2 0.95620626 77 nips-2000-Learning Curves for Gaussian Processes Regression: A Framework for Good Approximations
Author: Dörthe Malzahn, Manfred Opper
Abstract: Based on a statistical mechanics approach, we develop a method for approximately computing average case learning curves for Gaussian process regression models. The approximation works well in the large sample size limit and for arbitrary dimensionality of the input space. We explain how the approximation can be systematically improved and argue that similar techniques can be applied to general likelihood models. 1
3 0.62019694 95 nips-2000-On a Connection between Kernel PCA and Metric Multidimensional Scaling
Author: Christopher K. I. Williams
Abstract: In this paper we show that the kernel peA algorithm of Sch6lkopf et al (1998) can be interpreted as a form of metric multidimensional scaling (MDS) when the kernel function k(x, y) is isotropic, i.e. it depends only on Ilx - yll. This leads to a metric MDS algorithm where the desired configuration of points is found via the solution of an eigenproblem rather than through the iterative optimization of the stress objective function. The question of kernel choice is also discussed. 1
4 0.5851931 64 nips-2000-High-temperature Expansions for Learning Models of Nonnegative Data
Author: Oliver B. Downs
Abstract: Recent work has exploited boundedness of data in the unsupervised learning of new types of generative model. For nonnegative data it was recently shown that the maximum-entropy generative model is a Nonnegative Boltzmann Distribution not a Gaussian distribution, when the model is constrained to match the first and second order statistics of the data. Learning for practical sized problems is made difficult by the need to compute expectations under the model distribution. The computational cost of Markov chain Monte Carlo methods and low fidelity of naive mean field techniques has led to increasing interest in advanced mean field theories and variational methods. Here I present a secondorder mean-field approximation for the Nonnegative Boltzmann Machine model, obtained using a
5 0.53852242 13 nips-2000-A Tighter Bound for Graphical Models
Author: Martijn A. R. Leisink, Hilbert J. Kappen
Abstract: We present a method to bound the partition function of a Boltzmann machine neural network with any odd order polynomial. This is a direct extension of the mean field bound, which is first order. We show that the third order bound is strictly better than mean field. Additionally we show the rough outline how this bound is applicable to sigmoid belief networks. Numerical experiments indicate that an error reduction of a factor two is easily reached in the region where expansion based approximations are useful. 1
6 0.51250631 114 nips-2000-Second Order Approximations for Probability Models
7 0.50780118 122 nips-2000-Sparse Representation for Gaussian Process Models
8 0.50625622 119 nips-2000-Some New Bounds on the Generalization Error of Combined Classifiers
9 0.4998976 85 nips-2000-Mixtures of Gaussian Processes
10 0.48763692 92 nips-2000-Occam's Razor
11 0.47708768 37 nips-2000-Convergence of Large Margin Separable Linear Classification
12 0.47682595 120 nips-2000-Sparse Greedy Gaussian Process Regression
13 0.47483769 35 nips-2000-Computing with Finite and Infinite Networks
14 0.46750832 46 nips-2000-Ensemble Learning and Linear Response Theory for ICA
15 0.46220082 20 nips-2000-Algebraic Information Geometry for Learning Machines with Singularities
16 0.4503704 146 nips-2000-What Can a Single Neuron Compute?
17 0.45024383 134 nips-2000-The Kernel Trick for Distances
18 0.43869802 44 nips-2000-Efficient Learning of Linear Perceptrons
19 0.43735385 106 nips-2000-Propagation Algorithms for Variational Bayesian Learning
20 0.42990088 25 nips-2000-Analysis of Bit Error Probability of Direct-Sequence CDMA Multiuser Demodulators