nips nips2001 nips2001-134 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Nello Cristianini, John Shawe-Taylor, André Elisseeff, Jaz S. Kandola
Abstract: We introduce the notion of kernel-alignment, a measure of similarity between two kernel functions or between a kernel and a target function. This quantity captures the degree of agreement between a kernel and a given learning task, and has very natural interpretations in machine learning, leading also to simple algorithms for model selection and learning. We analyse its theoretical properties, proving that it is sharply concentrated around its expected value, and we discuss its relation with other standard measures of performance. Finally we describe some of the algorithms that can be obtained within this framework, giving experimental results showing that adapting the kernel to improve alignment on the labelled data significantly increases the alignment on the test set, giving improved classification accuracy. Hence, the approach provides a principled method of performing transduction. Keywords: Kernels, alignment, eigenvectors, eigenvalues, transduction 1
Reference: text
sentIndex sentText sentNum sentScore
1 uk Abstract We introduce the notion of kernel-alignment, a measure of similarity between two kernel functions or between a kernel and a target function. [sent-9, score-0.762]
2 This quantity captures the degree of agreement between a kernel and a given learning task, and has very natural interpretations in machine learning, leading also to simple algorithms for model selection and learning. [sent-10, score-0.418]
3 We analyse its theoretical properties, proving that it is sharply concentrated around its expected value, and we discuss its relation with other standard measures of performance. [sent-11, score-0.217]
4 Finally we describe some of the algorithms that can be obtained within this framework, giving experimental results showing that adapting the kernel to improve alignment on the labelled data significantly increases the alignment on the test set, giving improved classification accuracy. [sent-12, score-1.974]
5 Keywords: Kernels, alignment, eigenvectors, eigenvalues, transduction 1 Introduction Kernel based learning algorithms [1] are modular systems formed by a generalpurpose learning element and by a problem specific kernel function. [sent-14, score-0.374]
6 It is crucial for the performance of the system that the kernel function somehow fits the learning target, that is that in the feature space the data distribution is somehow correlated to the label distribution. [sent-15, score-0.46]
7 Several results exist showing that generalization takes place only when such correlation exists (nofreelunch; luckiness), and many classic estimators of performance (eg the margin) can be understood as estimating this relation. [sent-16, score-0.086]
8 In other words, selecting a kernel in this class of systems amounts to the classic feature and model selection problems in machine learning. [sent-17, score-0.432]
9 Measuring the similarity between two kernels, or the degree of agreement between a kernel and a given target function, is hence an important problem both for conceptual and for practical reasons. [sent-18, score-0.527]
10 As an example, it is well known that one can obtain complex kernels by combining or manipulating simpler ones, but how can one predict whether the resulting kernel is better or worse than its components? [sent-19, score-0.483]
11 What a kernel does is to virtually map data into a feature space so that their relative positions in that space are what matters. [sent-20, score-0.346]
12 The degree of clustering achieved in that space, and the relation between the clusters and the labeling to be learned, should be captured by such an estimator. [sent-21, score-0.072]
13 Alternatively, one could regard kernels as 'oracles' or 'experts' giving their opinion on whether two given points belong to the same class or not. [sent-22, score-0.171]
14 We will argue that - if one were in possess of this information - the ideal kernel for a classification target y(x) would be K(x, z) = y(x)y(z). [sent-24, score-0.454]
15 One way of estimating the extent to which the kernel achieves the right clustering is to compare the sum of the within class distances with the sum of the between class distances. [sent-25, score-0.319]
16 This will correspond to the alignment between the kernel and the ideal kernel y(x)y(z). [sent-26, score-1.408]
17 By measuring the similarity of this kernel with the kernel at hand - on the training set - one can assess the degree of fitness of such kernel. [sent-27, score-0.838]
18 The measure of similarity that we propose, 'kernel alignment' would give in this way a reliable estimate of its expected value, since it is sharply concentrated around its mean. [sent-28, score-0.239]
19 2 Alignment Given an (unlabelled) sample 8 = {Xl, . [sent-30, score-0.028]
20 If we consider K2 = yyl, where y is t he vector of { -1, + I} labels for the sample, then A(8 K I) (K, yyl)F , ,yy =. [sent-35, score-0.025]
21 / / K K) , smce \yy ,yy F = m my \ , F We will occasionally omit t he arguments K or y when t hese are understood from the context or when y forms part of the sample. [sent-38, score-0.029]
22 In the next section we will see how this definition provides with a method for selecting kernel parameters and also for combining kernels. [sent-39, score-0.422]
23 3 Concentration The following theorem shows that the alignment is not too dependent on the training set 8. [sent-40, score-0.807]
24 Concentration means that the probability of an empirical estimate deviating from its mean can be bounded as an exponentially decaying function of that deviation. [sent-42, score-0.066]
25 This will have a number of implications for the application and optimisation of the alignment. [sent-43, score-0.074]
26 For example if we optimise the alignment on a random sample we can expect it to remain high on a second sample. [sent-44, score-0.805]
27 Furthermore we will show in the next section that if the expected value of the alignment is high, then there exist functions that generalise well. [sent-45, score-0.76]
28 Hence, the result suggests that we can optimise the alignment on a training set and expect to keep high alignment and hence good performance on a test set. [sent-46, score-1.623]
29 Our experiments will demonstrate that this is indeed the case. [sent-47, score-0.024]
30 The theorem makes use of the following result due to McDiarmid. [sent-48, score-0.05]
31 Note that lEs is the expectation operator under the selection of the sample. [sent-49, score-0.026]
32 satisfies for 1 ::::; i ::::; n then for all f > 0, TheoreIll 3 The sample based estimate of the alignment is concentrated around its expected value. [sent-55, score-0.875]
33 For a kernel with feature vectors of norm 1, we have that pm{s: 1. [sent-56, score-0.346]
34 Define Al = lES[A1(S)] and A2 = lES[A2(S)], First we make use of McDiarmid's theorem to show that Ai(S) are concentrated for i = 1,2. [sent-65, score-0.113]
35 Consider the training set S' = S \ {(Xi, Yi)} U {(X~, y~)}. [sent-66, score-0.039]
36 We could also define the true Alignment, based on the input distribution P, as follows: given functions f,g : X 2 --+ JR, we define (j,g)p = f(x, z)g(x, z)dP(x)dP(z), Then the alignment of a kernel k1 with a kernel k2 is the quantity A(k1' k 2) = J (kl,k2)P . [sent-71, score-1.446]
37 IX2 (kl ,kl) P (k2 ,k2) P Then it is possible to prove that asymptotically as m tends to infinity the empirical alignment as defined above converges to the true alignment. [sent-72, score-0.747]
38 However if one wants to obtain unbiased convergence it is necessary to slightly modify its definition by removing the diagonal, since for finite samples it biases the expectation by receiving too large a weight. [sent-73, score-0.041]
39 With this modification A(y) in the statement of the theorem becomes the true alignment. [sent-74, score-0.05]
40 4 Generalization In this section we consider the implications of high alignment for the generalisation of a classifier. [sent-76, score-0.874]
41 By generalisation we mean the test error err(h) = P(h(x) ¥- y). [sent-77, score-0.153]
42 Our next observation relates the generalisation of a simple classification function to the value of the alignment. [sent-78, score-0.144]
43 The function we consider is the expected Parzen window estimator hex) = sign(f(x)) = sign (lE(XI ,v') [y'k(x ' , x)]). [sent-79, score-0.152]
44 This corresponds to thresholding a linear function f in the feature space. [sent-80, score-0.027]
45 We will show that if there is high alignment then this function will have good generalisation. [sent-81, score-0.718]
46 Hence, by optimising the alignment we may expect Parzen window estimators to perform well. [sent-82, score-0.881]
47 We will demonstrate that this prediction does indeed hold good in experiments. [sent-83, score-0.024]
48 With probability 1 - 8 over a randomly drawn training set S, the generalisation accuracy of the expected Parzen window estimator h(x) = sign (lE(XI ,yl) [y' k(X', x)]) is bounded from above by Theorem 4 Given any 8 err(h(x)) ::::: 1- A(S) + E+ (mJ A2(S)) - 1, where E = C(S)V! [sent-85, score-0.345]
49 Proof: (sketch) We assume throughout that the kernel has been normalised so that k(x , x) = 1 for all x. [sent-87, score-0.319]
50 The result will follow if we show that with probability greater than 1- 8/2 the generalisation error of hS\(xl,y,) can be upper bounded by 1 - A(y) + ~. [sent-89, score-0.154]
51 The expected margin of the empirical function is concentrated around the expected margin of the expected Parzen window. [sent-92, score-0.31]
52 Hence, with high probability we can bound the error of j in terms of the empirically estimated alignment A(S). [sent-93, score-0.718]
53 5 Algorithms The concentration of the alignment can be directly used for tuning a kernel family to the particular task, or for selecting a kernel from a set, with no need for training. [sent-96, score-1.492]
54 The probability that the level of alignment observed on the training set will be out by more than € from its expectation for one of the kernels is bounded by 6, where J € is given by equation (1) for E = ~ (InINI + lnj), where INI is the size of the set from which the kernel has been chosen. [sent-97, score-1.243]
55 One of the main consequences of the definition of kernel alignment is in providing a practical criterion for combining kernels. [sent-100, score-1.137]
56 We will justify the intuitively appealing idea that two kernels with a certain alignment with a target that are not aligned to each other, will give rise to a more aligned kernel combination. [sent-101, score-1.525]
57 In particular we have that This shows that if two kernels with equal alignment to a given target yare also completely aligned to each other, then IIKI + K211F = IIKlllF + IIK211F and the alignment of the combined kernel remains the same. [sent-102, score-2.092]
58 If on the other hand the kernels are not completely aligned, then the alignment of the combined kernel is correspondingly increased. [sent-103, score-1.167]
59 To illustrate the approach we will take to optimising the kernel, consider a kernel that can be written in the form k(x, Xl) = l:. [sent-104, score-0.374]
60 k I-tk(yk(x)yk(x l )) , where all the yk are orthogonal with respect to the inner product defined on the training set S, (y, yl)S = l:. [sent-105, score-0.082]
61 Assume further that one of them yt is the true label vector. [sent-107, score-0.03]
62 We can now evaluate the alignment as A(y) ~ I-tt/v'l:. [sent-108, score-0.718]
63 In terms of the Gram matrix this is written as Kij = l:. [sent-110, score-0.033]
64 k I-tkyfyj where yf is the i-th label of the k-th classification. [sent-111, score-0.03]
65 This special case is approximated by the decomposition into eigenvectors of the kernel matrix K = l:. [sent-112, score-0.392]
66 Aiviv~, where Vi denotes the transpose of v and Vi is the i-th eigenvector with eigenvalue Ai. [sent-113, score-0.027]
67 In other words, the more peaked the spectrum the more aligned (specific) the kernel can be. [sent-114, score-0.47]
68 If by chance the eigenvector of the largest eigenvalue Al corresponds to the target labeling, then we will give to that labeling a fraction Ad v'l:. [sent-115, score-0.124]
69 The larger the emphasis of the kernel on a given target, the higher its alignment. [sent-117, score-0.319]
70 In the previous subsection we observed that combining non-aligned kernels that are aligned with the target yields a kernel that is more aligned to the target. [sent-118, score-0.841]
71 Consider the base kernels Ki = ViV~ where Vi are the eigenvectors of K, the kernel matrix for both labeled and unlabeled data. [sent-119, score-0.548]
72 Instead of choosing only the most aligned ones, one could use a linear combination, with the weights proportional to their alignment (to the available labels): k = l:. [sent-120, score-0.869]
73 i f(ai)viv~ where ai is the alignment of the kernel K i , and f(a) is a monotonically increasing function (eg. [sent-121, score-1.088]
74 Note that a recombination of these rank 1 kernels was made in so-called latent semantic kernels [2]. [sent-123, score-0.308]
75 The overall alignment of the new kernel with the labeled data should be increased, and the new kernel matrix is expected also to be more aligned to the unseen test labels (because of the concentration, and the assumption that the split was random). [sent-124, score-1.669]
76 Moreover, in general one can set up an optimization problem, aimed at finding the optimal a, that is the parameters that maximize the alignment of the combined kernel with the available labels. [sent-125, score-1.037]
77 Given K = Li aiviv~ , using the orthonormality of the Vi and that (v v' ,uu') F = (v, u)}, the alignment can be written as A. [sent-126, score-0.718]
78 (Vi,Y)} - A2ai = 0 and hence (Vi,Y)}, giving the overall alignment A. [sent-128, score-0.812]
79 ai (X This analysis suggests the following transduction algorithm. [sent-130, score-0.106]
80 Given a partially labelled set of examples optimise its alignment by adapting the full kernel matrix by recombining its rank one eigenmatrices ViV~ using the coefficients ai determined by measuring the alignment between Vi and y on the labelled examples. [sent-131, score-2.074]
81 Our results suggest that we should see a corresponding increase in the alignment on the unlabelled part of the set, and hence a reduction in test error when using a Parzen window estimator. [sent-132, score-0.936]
82 6 Experiments We applied the transduction algorithm designed to take advantage of our results by optimizing alignment with the labeled part of the dataset using the spectral method described above. [sent-134, score-0.799]
83 017) Table 1: Mean and associated standard deviation alignment values using a linear kernel on the Breast (left two columns) and Ionosphere (right two columns). [sent-184, score-1.037]
84 Gram matrices to the label matrix for different sizes of training set. [sent-185, score-0.132]
85 The K matrices are before adaptation, while the G matrices are after optimisation of the alignment using equation (2). [sent-187, score-0.813]
86 The right two columns show the performance of the Parzen window classifier on the test set for Breast linear kernel (left column) and Ionosphere (right column). [sent-190, score-0.511]
87 The results clearly show that optimising the alignment on the training set does indeed increase its value in all but one case by more than the sum of the standard deviations. [sent-191, score-0.836]
88 Furthermore, as predicted by the concentration this improvement is maintained in the alignment measured on the test set with both linear and Gaussian kernels in all but one case (20% train with the linear kernel). [sent-192, score-1.016]
89 Again as predicted by the theory the larger the alignment the better the performance that is obtained using the Parzen window estimator. [sent-194, score-0.801]
90 The results of applying an SVM to the Breast Cancer data using a Gaussian kernel show a very slight improvement in the test error for both 80% and 50% training sets. [sent-195, score-0.394]
91 7 Conclusions We have introduced a measure of performance of a kernel machine that is much easier to analyse than standard measures (eg the margin) and that provides much simpler algorithms. [sent-196, score-0.39]
92 By identifying that the ideal kernel matrix has a structure of the type yy', we have been able to transform a measure of similarity between kernels into a measure of fitness of a given kernel. [sent-198, score-0.68]
93 The ease and reliability with which this quantity can be estimated using only training set information prior to training makes it an ideal tool for practical model selection. [sent-199, score-0.197]
94 We have given preliminary experimental results that largely confirm the theoretical analysis and augur well for the use of this tool in more sophisticated model (kernel) selection applications. [sent-200, score-0.026]
wordName wordTfidf (topN-words)
[('alignment', 0.718), ('kernel', 0.319), ('parzen', 0.189), ('aligned', 0.151), ('kernels', 0.13), ('generalisation', 0.117), ('yy', 0.117), ('les', 0.116), ('concentration', 0.108), ('breast', 0.105), ('ionosphere', 0.098), ('viv', 0.092), ('window', 0.083), ('gram', 0.08), ('col', 0.08), ('yyl', 0.08), ('align', 0.074), ('mcdiarmid', 0.069), ('concentrated', 0.063), ('vi', 0.059), ('optimise', 0.059), ('target', 0.056), ('transduction', 0.055), ('optimising', 0.055), ('aiviv', 0.053), ('fitness', 0.053), ('hence', 0.053), ('ideal', 0.052), ('ai', 0.051), ('theorem', 0.05), ('aj', 0.048), ('cancer', 0.046), ('analyse', 0.046), ('unlabelled', 0.046), ('yiyjk', 0.046), ('andre', 0.046), ('labelled', 0.045), ('yk', 0.043), ('similarity', 0.043), ('expected', 0.042), ('le', 0.042), ('err', 0.042), ('sharply', 0.042), ('somehow', 0.042), ('jaz', 0.042), ('theoreill', 0.042), ('quantity', 0.042), ('labeling', 0.041), ('definition', 0.041), ('giving', 0.041), ('eigenvectors', 0.04), ('training', 0.039), ('ki', 0.039), ('implications', 0.039), ('classifier', 0.038), ('nello', 0.037), ('bounded', 0.037), ('cristianini', 0.036), ('test', 0.036), ('optimisation', 0.035), ('columns', 0.035), ('measuring', 0.034), ('combining', 0.034), ('margin', 0.034), ('dp', 0.034), ('biowulf', 0.034), ('matrix', 0.033), ('classic', 0.032), ('degree', 0.031), ('holloway', 0.031), ('xi', 0.031), ('eg', 0.03), ('technologies', 0.03), ('label', 0.03), ('matrices', 0.03), ('adapting', 0.029), ('understood', 0.029), ('empirical', 0.029), ('sample', 0.028), ('xl', 0.028), ('svm', 0.028), ('selecting', 0.028), ('classification', 0.027), ('eigenvector', 0.027), ('feature', 0.027), ('sign', 0.027), ('labeled', 0.026), ('selection', 0.026), ('semantic', 0.025), ('estimators', 0.025), ('practical', 0.025), ('labels', 0.025), ('measure', 0.025), ('around', 0.024), ('train', 0.024), ('define', 0.024), ('indeed', 0.024), ('li', 0.023), ('rank', 0.023), ('devroye', 0.023)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999982 134 nips-2001-On Kernel-Target Alignment
Author: Nello Cristianini, John Shawe-Taylor, André Elisseeff, Jaz S. Kandola
Abstract: We introduce the notion of kernel-alignment, a measure of similarity between two kernel functions or between a kernel and a target function. This quantity captures the degree of agreement between a kernel and a given learning task, and has very natural interpretations in machine learning, leading also to simple algorithms for model selection and learning. We analyse its theoretical properties, proving that it is sharply concentrated around its expected value, and we discuss its relation with other standard measures of performance. Finally we describe some of the algorithms that can be obtained within this framework, giving experimental results showing that adapting the kernel to improve alignment on the labelled data significantly increases the alignment on the test set, giving improved classification accuracy. Hence, the approach provides a principled method of performing transduction. Keywords: Kernels, alignment, eigenvectors, eigenvalues, transduction 1
2 0.65167493 170 nips-2001-Spectral Kernel Methods for Clustering
Author: Nello Cristianini, John Shawe-Taylor, Jaz S. Kandola
Abstract: In this paper we introduce new algorithms for unsupervised learning based on the use of a kernel matrix. All the information required by such algorithms is contained in the eigenvectors of the matrix or of closely related matrices. We use two different but related cost functions, the Alignment and the 'cut cost'. The first one is discussed in a companion paper [3], the second one is based on graph theoretic concepts. Both functions measure the level of clustering of a labeled dataset, or the correlation between data clusters and labels. We state the problem of unsupervised learning as assigning labels so as to optimize these cost functions. We show how the optimal solution can be approximated by slightly relaxing the corresponding optimization problem, and how this corresponds to using eigenvector information. The resulting simple algorithms are tested on real world data with positive results. 1
3 0.20489912 58 nips-2001-Covariance Kernels from Bayesian Generative Models
Author: Matthias Seeger
Abstract: We propose the framework of mutual information kernels for learning covariance kernels, as used in Support Vector machines and Gaussian process classifiers, from unlabeled task data using Bayesian techniques. We describe an implementation of this framework which uses variational Bayesian mixtures of factor analyzers in order to attack classification problems in high-dimensional spaces where labeled data is sparse, but unlabeled data is abundant. 1
4 0.20077184 164 nips-2001-Sampling Techniques for Kernel Methods
Author: Dimitris Achlioptas, Frank Mcsherry, Bernhard Schölkopf
Abstract: We propose randomized techniques for speeding up Kernel Principal Component Analysis on three levels: sampling and quantization of the Gram matrix in training, randomized rounding in evaluating the kernel expansions, and random projections in evaluating the kernel itself. In all three cases, we give sharp bounds on the accuracy of the obtained approximations. Rather intriguingly, all three techniques can be viewed as instantiations of the following idea: replace the kernel function by a “randomized kernel” which behaves like in expectation.
5 0.16666235 38 nips-2001-Asymptotic Universality for Learning Curves of Support Vector Machines
Author: Manfred Opper, Robert Urbanczik
Abstract: Using methods of Statistical Physics, we investigate the rOle of model complexity in learning with support vector machines (SVMs). We show the advantages of using SVMs with kernels of infinite complexity on noisy target rules, which, in contrast to common theoretical beliefs, are found to achieve optimal generalization error although the training error does not converge to the generalization error. Moreover, we find a universal asymptotics of the learning curves which only depend on the target rule but not on the SVM kernel. 1
6 0.15668295 63 nips-2001-Dynamic Time-Alignment Kernel in Support Vector Machine
7 0.15294129 136 nips-2001-On the Concentration of Spectral Properties
8 0.14234598 15 nips-2001-A New Discriminative Kernel From Probabilistic Models
9 0.10797206 74 nips-2001-Face Recognition Using Kernel Methods
10 0.10222959 191 nips-2001-Transform-invariant Image Decomposition with Similarity Templates
11 0.10110959 56 nips-2001-Convolution Kernels for Natural Language
12 0.10029268 31 nips-2001-Algorithmic Luckiness
13 0.086749621 92 nips-2001-Incorporating Invariances in Non-Linear Support Vector Machines
14 0.085614234 139 nips-2001-Online Learning with Kernels
15 0.083052091 103 nips-2001-Kernel Feature Spaces and Nonlinear Blind Souce Separation
16 0.07803335 171 nips-2001-Spectral Relaxation for K-means Clustering
17 0.076532908 105 nips-2001-Kernel Machines and Boolean Functions
18 0.072936855 109 nips-2001-Learning Discriminative Feature Transforms to Low Dimensions in Low Dimentions
19 0.071639158 69 nips-2001-Escaping the Convex Hull with Extrapolated Vector Machines
20 0.071520597 172 nips-2001-Speech Recognition using SVMs
topicId topicWeight
[(0, -0.248), (1, 0.218), (2, -0.077), (3, -0.252), (4, 0.26), (5, 0.251), (6, 0.023), (7, 0.078), (8, -0.118), (9, 0.261), (10, -0.147), (11, 0.051), (12, -0.039), (13, 0.028), (14, -0.235), (15, -0.025), (16, 0.033), (17, -0.019), (18, 0.006), (19, 0.056), (20, 0.013), (21, 0.112), (22, -0.278), (23, 0.045), (24, 0.065), (25, -0.042), (26, 0.025), (27, -0.022), (28, 0.146), (29, 0.099), (30, 0.268), (31, -0.079), (32, -0.054), (33, 0.099), (34, 0.066), (35, -0.077), (36, 0.044), (37, 0.033), (38, -0.026), (39, 0.019), (40, 0.055), (41, -0.132), (42, 0.035), (43, -0.025), (44, -0.032), (45, 0.037), (46, 0.094), (47, 0.01), (48, 0.045), (49, 0.008)]
simIndex simValue paperId paperTitle
same-paper 1 0.95878994 134 nips-2001-On Kernel-Target Alignment
Author: Nello Cristianini, John Shawe-Taylor, André Elisseeff, Jaz S. Kandola
Abstract: We introduce the notion of kernel-alignment, a measure of similarity between two kernel functions or between a kernel and a target function. This quantity captures the degree of agreement between a kernel and a given learning task, and has very natural interpretations in machine learning, leading also to simple algorithms for model selection and learning. We analyse its theoretical properties, proving that it is sharply concentrated around its expected value, and we discuss its relation with other standard measures of performance. Finally we describe some of the algorithms that can be obtained within this framework, giving experimental results showing that adapting the kernel to improve alignment on the labelled data significantly increases the alignment on the test set, giving improved classification accuracy. Hence, the approach provides a principled method of performing transduction. Keywords: Kernels, alignment, eigenvectors, eigenvalues, transduction 1
2 0.91066736 170 nips-2001-Spectral Kernel Methods for Clustering
Author: Nello Cristianini, John Shawe-Taylor, Jaz S. Kandola
Abstract: In this paper we introduce new algorithms for unsupervised learning based on the use of a kernel matrix. All the information required by such algorithms is contained in the eigenvectors of the matrix or of closely related matrices. We use two different but related cost functions, the Alignment and the 'cut cost'. The first one is discussed in a companion paper [3], the second one is based on graph theoretic concepts. Both functions measure the level of clustering of a labeled dataset, or the correlation between data clusters and labels. We state the problem of unsupervised learning as assigning labels so as to optimize these cost functions. We show how the optimal solution can be approximated by slightly relaxing the corresponding optimization problem, and how this corresponds to using eigenvector information. The resulting simple algorithms are tested on real world data with positive results. 1
3 0.53090149 58 nips-2001-Covariance Kernels from Bayesian Generative Models
Author: Matthias Seeger
Abstract: We propose the framework of mutual information kernels for learning covariance kernels, as used in Support Vector machines and Gaussian process classifiers, from unlabeled task data using Bayesian techniques. We describe an implementation of this framework which uses variational Bayesian mixtures of factor analyzers in order to attack classification problems in high-dimensional spaces where labeled data is sparse, but unlabeled data is abundant. 1
4 0.53067243 164 nips-2001-Sampling Techniques for Kernel Methods
Author: Dimitris Achlioptas, Frank Mcsherry, Bernhard Schölkopf
Abstract: We propose randomized techniques for speeding up Kernel Principal Component Analysis on three levels: sampling and quantization of the Gram matrix in training, randomized rounding in evaluating the kernel expansions, and random projections in evaluating the kernel itself. In all three cases, we give sharp bounds on the accuracy of the obtained approximations. Rather intriguingly, all three techniques can be viewed as instantiations of the following idea: replace the kernel function by a “randomized kernel” which behaves like in expectation.
5 0.48941085 15 nips-2001-A New Discriminative Kernel From Probabilistic Models
Author: Koji Tsuda, Motoaki Kawanabe, Gunnar Rätsch, Sören Sonnenburg, Klaus-Robert Müller
Abstract: Recently, Jaakkola and Haussler proposed a method for constructing kernel functions from probabilistic models. Their so called
6 0.43668112 38 nips-2001-Asymptotic Universality for Learning Curves of Support Vector Machines
7 0.36422855 113 nips-2001-Learning a Gaussian Process Prior for Automatically Generating Music Playlists
8 0.35786322 136 nips-2001-On the Concentration of Spectral Properties
9 0.34073895 63 nips-2001-Dynamic Time-Alignment Kernel in Support Vector Machine
10 0.30791762 74 nips-2001-Face Recognition Using Kernel Methods
11 0.3005634 56 nips-2001-Convolution Kernels for Natural Language
12 0.29598463 92 nips-2001-Incorporating Invariances in Non-Linear Support Vector Machines
13 0.29345268 191 nips-2001-Transform-invariant Image Decomposition with Similarity Templates
14 0.29182816 103 nips-2001-Kernel Feature Spaces and Nonlinear Blind Souce Separation
15 0.27533993 48 nips-2001-Characterizing Neural Gain Control using Spike-triggered Covariance
16 0.25877362 31 nips-2001-Algorithmic Luckiness
17 0.25156367 171 nips-2001-Spectral Relaxation for K-means Clustering
18 0.25012195 66 nips-2001-Efficiency versus Convergence of Boolean Kernels for On-Line Learning Algorithms
19 0.24155277 106 nips-2001-Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering
20 0.23272254 29 nips-2001-Adaptive Sparseness Using Jeffreys Prior
topicId topicWeight
[(14, 0.232), (17, 0.104), (19, 0.031), (20, 0.012), (27, 0.102), (30, 0.035), (59, 0.029), (63, 0.137), (72, 0.076), (79, 0.046), (83, 0.01), (91, 0.096)]
simIndex simValue paperId paperTitle
same-paper 1 0.89270461 134 nips-2001-On Kernel-Target Alignment
Author: Nello Cristianini, John Shawe-Taylor, André Elisseeff, Jaz S. Kandola
Abstract: We introduce the notion of kernel-alignment, a measure of similarity between two kernel functions or between a kernel and a target function. This quantity captures the degree of agreement between a kernel and a given learning task, and has very natural interpretations in machine learning, leading also to simple algorithms for model selection and learning. We analyse its theoretical properties, proving that it is sharply concentrated around its expected value, and we discuss its relation with other standard measures of performance. Finally we describe some of the algorithms that can be obtained within this framework, giving experimental results showing that adapting the kernel to improve alignment on the labelled data significantly increases the alignment on the test set, giving improved classification accuracy. Hence, the approach provides a principled method of performing transduction. Keywords: Kernels, alignment, eigenvectors, eigenvalues, transduction 1
2 0.8751272 170 nips-2001-Spectral Kernel Methods for Clustering
Author: Nello Cristianini, John Shawe-Taylor, Jaz S. Kandola
Abstract: In this paper we introduce new algorithms for unsupervised learning based on the use of a kernel matrix. All the information required by such algorithms is contained in the eigenvectors of the matrix or of closely related matrices. We use two different but related cost functions, the Alignment and the 'cut cost'. The first one is discussed in a companion paper [3], the second one is based on graph theoretic concepts. Both functions measure the level of clustering of a labeled dataset, or the correlation between data clusters and labels. We state the problem of unsupervised learning as assigning labels so as to optimize these cost functions. We show how the optimal solution can be approximated by slightly relaxing the corresponding optimization problem, and how this corresponds to using eigenvector information. The resulting simple algorithms are tested on real world data with positive results. 1
3 0.85062414 139 nips-2001-Online Learning with Kernels
Author: Jyrki Kivinen, Alex J. Smola, Robert C. Williamson
Abstract: We consider online learning in a Reproducing Kernel Hilbert Space. Our method is computationally efficient and leads to simple algorithms. In particular we derive update equations for classification, regression, and novelty detection. The inclusion of the -trick allows us to give a robust parameterization. Moreover, unlike in batch learning where the -trick only applies to the -insensitive loss function we are able to derive general trimmed-mean types of estimators such as for Huber’s robust loss. ¡
4 0.82743371 49 nips-2001-Citcuits for VLSI Implementation of Temporally Asymmetric Hebbian Learning
Author: A. Bofill, D. P. Thompson, Alan F. Murray
Abstract: Experimental data has shown that synaptic strength modification in some types of biological neurons depends upon precise spike timing differences between presynaptic and postsynaptic spikes. Several temporally-asymmetric Hebbian learning rules motivated by this data have been proposed. We argue that such learning rules are suitable to analog VLSI implementation. We describe an easily tunable circuit to modify the weight of a silicon spiking neuron according to those learning rules. Test results from the fabrication of the circuit using a O.6J.lm CMOS process are given. 1
5 0.81446904 38 nips-2001-Asymptotic Universality for Learning Curves of Support Vector Machines
Author: Manfred Opper, Robert Urbanczik
Abstract: Using methods of Statistical Physics, we investigate the rOle of model complexity in learning with support vector machines (SVMs). We show the advantages of using SVMs with kernels of infinite complexity on noisy target rules, which, in contrast to common theoretical beliefs, are found to achieve optimal generalization error although the training error does not converge to the generalization error. Moreover, we find a universal asymptotics of the learning curves which only depend on the target rule but not on the SVM kernel. 1
6 0.77104229 62 nips-2001-Duality, Geometry, and Support Vector Regression
7 0.69651967 137 nips-2001-On the Convergence of Leveraging
8 0.68475872 138 nips-2001-On the Generalization Ability of On-Line Learning Algorithms
9 0.68086618 58 nips-2001-Covariance Kernels from Bayesian Generative Models
10 0.68044817 92 nips-2001-Incorporating Invariances in Non-Linear Support Vector Machines
11 0.68038088 69 nips-2001-Escaping the Convex Hull with Extrapolated Vector Machines
12 0.67827833 126 nips-2001-Motivated Reinforcement Learning
13 0.67338711 13 nips-2001-A Natural Policy Gradient
14 0.66592264 94 nips-2001-Incremental Learning and Selective Sampling via Parametric Optimization Framework for SVM
15 0.66468835 197 nips-2001-Why Neuronal Dynamics Should Control Synaptic Learning Rules
16 0.66117322 8 nips-2001-A General Greedy Approximation Algorithm with Applications
17 0.6593526 15 nips-2001-A New Discriminative Kernel From Probabilistic Models
18 0.6541127 74 nips-2001-Face Recognition Using Kernel Methods
19 0.65200764 22 nips-2001-A kernel method for multi-labelled classification
20 0.65093815 31 nips-2001-Algorithmic Luckiness