nips nips2003 nips2003-112 knowledge-graph by maker-knowledge-mining

112 nips-2003-Learning to Find Pre-Images

Source: pdf

Author: Jason Weston, Bernhard Schölkopf, Gökhan H. Bakir

Abstract: We consider the problem of reconstructing patterns from a feature map. Learning algorithms using kernels to operate in a reproducing kernel Hilbert space (RKHS) express their solutions in terms of input points mapped into the RKHS. We introduce a technique based on kernel principal component analysis and regression to reconstruct corresponding patterns in the input space (aka pre-images) and review its performance in several applications requiring the construction of pre-images. The introduced technique avoids difﬁcult and/or unstable numerical optimization, is easy to implement and, unlike previous methods, permits the computation of pre-images in discrete input spaces. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 de Abstract We consider the problem of reconstructing patterns from a feature map. [sent-4, score-0.13]

2 Learning algorithms using kernels to operate in a reproducing kernel Hilbert space (RKHS) express their solutions in terms of input points mapped into the RKHS. [sent-5, score-0.397]

3 We introduce a technique based on kernel principal component analysis and regression to reconstruct corresponding patterns in the input space (aka pre-images) and review its performance in several applications requiring the construction of pre-images. [sent-6, score-0.645]

4 The introduced technique avoids difﬁcult and/or unstable numerical optimization, is easy to implement and, unlike previous methods, permits the computation of pre-images in discrete input spaces. [sent-7, score-0.17]

5 1 Introduction We denote by Hk the RKHS associated with the kernel k(x, y) = φ(x) φ(y), where φ(x) : X → Hk is a possible nonlinear mapping from input space X (assumed to be a nonempty set) to the possible inﬁnite dimensional space Hk . [sent-8, score-0.426]

6 (1) z This has a signiﬁcant range of applications in kernel methods: for reduced set methods [1], for denoising and compression using kernel principal components analysis (kPCA), and for kernel dependency estimation (KDE), where one ﬁnds a mapping between paired sets of objects. [sent-15, score-1.209]

7 The techniques used so far to solve this nonlinear optimization problem often employ gradient descent [1] or nonlinear iteration methods [2]. [sent-16, score-0.33]

8 , we may be interested in minimizing a classiﬁcation error rather then a distance in feature space); and (iv) not being applicable for pre-images which are objects with discrete variables. [sent-19, score-0.144]

9 Note that this problem is unusual in that it is possible to produce an inﬁnite amount of training data ( and thus expect to get good performance) by generating points in Hk and labeling them using (1). [sent-22, score-0.109]

10 , when denoising digits with kPCA, one expects as a pre-image something that looks like a digit, and an estimate of this distribution is actually given by the original data. [sent-25, score-0.239]

11 Finally, learning to ﬁnd pre-images can also be applied to objects with discrete variables, such as for string outputs as in part-of-speech tagging or protein secondary structure prediction. [sent-27, score-0.821]

12 The remainder of the paper is organized as follows: in Section 2 we review kernel methods requiring the use of pre-images: kPCA and KDE. [sent-28, score-0.285]

13 1 Kernel PCA Denoising and Compression Given data points {xi }m ∈ X , kPCA constructs an orthogonal set of feature extractors i=1 in the RKHS. [sent-32, score-0.141]

14 , vr } lies in the span of m m 1 r the data points, i. [sent-36, score-0.15]

15 It is obtained by solvi i ing the eigenvalue problem mλα = Kα for 1 ≤ i ≤ r where Kij = k(xi , xj ) is the kernel matrix and r ≤ m is the number of nonzero eigenvalues. [sent-42, score-0.268]

16 1 Once built, the orthogonal system P can be used for nonlinear feature extraction. [sent-43, score-0.177]

17 Let x denote a test point, then the nonlinear principal components can be extracted via P φ(x) = m m r 1 i=1 αi k(xi , x) where k(xi , x) is substituted for φ(xi ) φ(x). [sent-44, score-0.206]

18 Beside serving as a feature extractor, kPCA has been proposed as a denoising and compression procedure, both of which require the calculation of input patterns x from feature space points P φ(x). [sent-49, score-0.485]

19 Denoising is a technique used to reconstruct patterns corrupted by noise. [sent-51, score-0.131]

20 If this is the case, then we should be able to construct a denoised input pattern as the pre-image of Pa φ(x). [sent-64, score-0.152]

21 This denoised pattern z can be obtained as solution to the problem z = arg min Pa φ(x) − φ(z) 2 . [sent-65, score-0.193]

22 (2) z For an application of kPCA denoising see [2]. [sent-66, score-0.113]

23 This can be achieved by centering the 1 1 kernel matrix Kc = (I − m 11 )K(I − m 11 ), where 1 ∈ Rm is the vector with every entry equal 1. [sent-72, score-0.239]

24 To learn the mapping from data {xi , yi }m ∈ X × Y, KDE performs two steps. [sent-78, score-0.156]

25 First a kPCA is performed in Hl associated with kernel l. [sent-80, score-0.239]

26 Obtaining the principal axes, one is able to obtain principal components (φl (y) v1 , . [sent-85, score-0.235]

27 Next, we learn the map from φk (x) to (φl (y) v1 , . [sent-90, score-0.142]

28 To this end, for each principal axis vj we solve the problem arg min βj m i=1 (φl (yi ) vj − g(xi , β j ))2 + γ β j 2 , (3) m j where γ β j 2 acts as a regularization term (with γ > 0), g(xi , β j ) = s=1 βs k(xs , xi ), and β ∈ Rm×r . [sent-94, score-0.399]

29 r and K ∈ Rm×m the kernel matrix with entries Kst = k(xs , xt ), with s, t = 1 . [sent-98, score-0.239]

30 Problem (3) can then be minimized, for example via kernel ridge regression, yielding β = (K K + γI)−1 KP. [sent-102, score-0.373]

31 Using the learned map from input patterns to principal components, predicting an output y for a new pattern x requires solving the pre-image problem y = arg min (φl (y) v1 , . [sent-104, score-0.505]

32 One can show that if an exact pre-image exists, and if the kernel can be written as k(x, x ) = fk ((x x )) with an invertible function fk (e. [sent-115, score-0.335]

33 , k(x, x ) = (x x )d with odd d), then one can compute the pre-image analytically as N m −1 z = i=1 fk j=1 αj k(xj , ei ) ei , where {e1 , . [sent-117, score-0.139]

34 Whilst there are certain cases where the minimizer of (1) can be found by solving an eigenvalue problem (for k(x, x ) = (x x )2 ), people in general resort to methods of nonlinear optimization. [sent-125, score-0.103]

35 For instance, if the kernel is differentiable, one can multiply out (1) to express it in terms of the kernel, and then perform gradient descent [1]. [sent-126, score-0.392]

36 Alternatively one can select the k best input points from some training set and use them in combination to minimize the distance (1), see [6] for details. [sent-128, score-0.145]

37 Speciﬁcally, we use kernel regression to estimate the pre-image map from data. [sent-134, score-0.375]

38 If we were to use regression using the kernel k corresponding to Hk , then we would simply look for weight vectors wj ∈ Hk , j = 1, . [sent-142, score-0.335]

39 , dim X such that Γj (Ψ) = wj Ψ, and use the kernel trick to evaluate Γ. [sent-145, score-0.274]

40 However, in general we may want to use a kernel κ which is different from k, and thus we cannot perform our computations implicitly by the use of a kernel. [sent-146, score-0.239]

41 We then learn the pre-image map Γj : Rn → X by solving the learning problem Γj = arg min Γj m i=1 l (xi , Γ(Pn φ(xi ))) + λΩ(Γ). [sent-158, score-0.248]

42 If X is the vector space RN , we can consider the problem (6) as a standard regression problem for the m training points xi and use kernel ridge regression with a kernel κ. [sent-160, score-0.968]

43 Note that the general learning setup of (6) allows to use of any suitable loss function, incorporating invariances and a-priori knowledge. [sent-165, score-0.113]

44 one may wish to ﬁnd a string which is the pre-image of a representation using a string kernel [7, 8]. [sent-170, score-1.145]

45 Using gradient descent techniques this is not possible as the objects have discrete variables (elements of the string). [sent-171, score-0.245]

46 In the case of strings one can predict each character of the string independently given the estimate φl (y ). [sent-174, score-0.603]

47 This is made particularly tractable in ﬁxed-length string prediction problems such as for part-ofspeech tagging or protein secondary structure prediction because the length is known (it is the same length as the input). [sent-175, score-0.868]

48 It may not be the only pre-image, but this does not matter as long as it minimizes the value of predict the length of the output string before predicting each element of it. [sent-177, score-0.644]

49 The task is to predict a string y given a string x and a set of paired examples (xi , yi ) ∈ ∪∞ (Σx )p × ∪∞ (Σy )p . [sent-179, score-1.03]

50 , the length of any paired input p=1 p=1 and output strings are the same. [sent-182, score-0.33]

51 This is the setting of part-of-speech tagging, where Σx are words and Σy are parts of speech, and also secondary structure prediction, where Σx are amino acids of a protein sequence and Σy are classes of structure that the sequence folds into, e. [sent-183, score-0.168]

52 One has to deﬁne an appropriate similarity function for both sets of objects using a kernel function, giving two implicit maps φk (x) and φl (y) using string kernels. [sent-188, score-0.751]

53 KDE then learns a map between the two feature spaces, and for a new test string x one must ﬁnd the pre-image of the estimate φl (y ) as in equation (5). [sent-189, score-0.58]

54 One can ﬁnd this pre-image by predicting each character of the string independently given the estimate φl (y ) as it has known length given the input x. [sent-190, score-0.637]

55 One can thus learn a function bp = f (φl (y ), αp ) where bp is the pth element of the output and αp = (a(p−n/2) a(p−n/2+1) . [sent-191, score-0.196]

56 a(p+n/2) ) is a window of length n + 1 with center at position p in the input string. [sent-194, score-0.178]

57 One computes the entire output string with β = (f (φl (y ), α1 ) . [sent-195, score-0.526]

58 f (φl (y ), α|x| )); window elements outside of the string can be encoded with a special terminal character. [sent-198, score-0.513]

59 The function f can be trained with any multi-class classiﬁcation algorithm to predict one of the elements of the alphabet, the approach can thus be seen as a generalization of the traditional approach which is learning a function f given only a window on the input (the second parameter). [sent-199, score-0.156]

60 Our approach ﬁrst estimates the output using global information from the input and with respect to the loss function of interest on the outputs—it only decodes this global prediction in the ﬁnal step. [sent-200, score-0.216]

61 Note that problems such as secondary structure prediction often have loss functions dependent on the complete outputs, not individual elements of the output string [9]. [sent-201, score-0.691]

62 We performed a similar experiment to the one in [2] for demonstration purposes: we denoised USPS digits using linear and kPCA. [sent-208, score-0.184]

63 5 and selected 100 randomly chosen non-noisy digits for training and a further 100 noisy digits for testing, 10 from each class. [sent-210, score-0.236]

64 As in [2] we chose a nonlinear map via a Gaussian kernel with σ = 8. [sent-211, score-0.388]

65 We found pre-images using the Matlab function fminsearch, and compared this to our pre- image-learning method (RBF kernel K(x, x ) = exp(−||x − x ||2 /2σ 2 ) with σ = 1, and regularization parameter λ = 1). [sent-213, score-0.239]

66 Figure 1 shows the results: our approach appears to perform better than the gradient descent approach. [sent-214, score-0.153]

67 Note that some of the digits shown are actually denoised incorrectly as the wrong class. [sent-225, score-0.184]

68 This is of course possible as choosing the correct digit is a problem which is harder than a standard digit classiﬁcation problem because the images are noisy. [sent-226, score-0.187]

69 In this experiment, we also took a rather small number of training examples, because otherwise the fminsearch code for the gradient descent was very slow, and this allowed us to compare our results more easily. [sent-228, score-0.25]

70 For the compression experiment we use a video sequence consisting of 1000 graylevel images, where every frame has a 100 × 100 pixel resolution. [sent-230, score-0.18]

71 For training we used every 20th frame resulting in a video sequence of 50 frames with 3. [sent-232, score-0.229]

72 The motivation is to store only these 50 frames and to reconstruct all frames in between. [sent-234, score-0.201]

73 We applied a kPCA to all 50 frames with a Gaussian kernel with kernel parameter σ1 . [sent-235, score-0.557]

74 , v50 ∈ R50 are used then to learn the interpolation between the timeline of the 50 principal components vij where i is the time index, j the principal component number j and 1 ≤ i, j ≤ 50. [sent-239, score-0.302]

75 A kernel ridge regression with Gaussian kernel and kernel parameter σ2 and ridge r1 was used for this task. [sent-240, score-1.046]

76 Finally the pre-image map Γ was learned from projections onto vi to frames using kernel ridge regression with kernel parameter σ3 and ridge r2 . [sent-241, score-0.998]

77 All parameters σ1 , σ2 , σ3 , r1 , r2 were selected in a loop such that new synthesized frames looked subjectively best. [sent-242, score-0.172]

78 15 and for the ridge parameters r1 = 10−13 , r2 = 10−7 . [sent-245, score-0.134]

79 Note that the pre-image mechanism could possibly be adapted to take into account invariances and a-priori knowledge like geometries of standard heads to reduce blending effects, making it more powerful than gradient descent or plain linear interpolation of frames. [sent-247, score-0.197]

80 In the following we expose a simple string mapping problem to show the potential of the approach outlined in Section 3. [sent-250, score-0.53]

81 Output strings are generated by the following algorithm: start in a random state (1 or 2) corresponding to one 1 of the output symbols. [sent-253, score-0.16]

82 The length of the string is randomly chosen between 10 and 20 symbols. [sent-255, score-0.506]

83 Figure 2: Kernel PCA compression used to learn intermediate images. [sent-259, score-0.139]

84 The pre-images are in a 100 × 100 dimensional space making gradient-descent based descent impracticable. [sent-260, score-0.1]

85 As the model of the string can be better predicted from the complete string, a global method could be better in principle than a window-based method. [sent-261, score-0.453]

86 We use a string kernel called the spectrum kernel[11] to deﬁne strings for inputs. [sent-262, score-0.779]

87 This method builds a representation which is a frequency count of all possible contiguous subsequences of length p. [sent-263, score-0.129]

88 To deﬁne a feature space for outputs we count the number of contiguous subsequences of length p on the input that, if starting in position q, have the same element of the alphabet at position q + (p − 1)/2 in the output, for odd |x|−p+1 values of p. [sent-268, score-0.366]

89 We can then learn pre-images using a window also of size p as described in Section 3. [sent-273, score-0.127]

90 Note that the output kernel is deﬁned on both the inputs and outputs: such an approach is also used in [12] and called “joint kernels”, and in their approach the calculation of pre-images is also required, so they only consider speciﬁc kernels for computational reasons. [sent-277, score-0.367]

91 We normalized the input and output kernel matrices such that a matrix S is normalized with S ← D−1 SD−1 where D is a diagonal matrix such that Di i = i Sii . [sent-279, score-0.377]

92 We also used a nonlinear map for KDE, via an RBF kernel, i. [sent-280, score-0.149]

93 K(x, x ) = exp(−d(x, x )) where the distance d is induced by the input string kernel deﬁned above, and we set λ = 1. [sent-282, score-0.757]

94 However, as a learning approach, it requires that the patterns used during training reasonably well represent the points for which we subsequently want to compute pre-images. [sent-308, score-0.129]

95 Otherwise, it can fail, an example being a reduced set (see [1]) application, where one needs pre-images of linear combinations of mapped points in H, which can be far away from training points, making generalization of the estimated pre-image map impossible. [sent-309, score-0.155]

96 Nonlinear component analysis as a kernel eigeno u value problem. [sent-347, score-0.239]

97 Finding the pre images in kernel principal component analysis. [sent-367, score-0.371]

98 A novel method of protein secondary structure prediction with high segment overlap measure: Svm approach. [sent-390, score-0.208]

99 A multiview nonlinear active shape model using kernel PCA. [sent-396, score-0.313]

100 The spectrum kernel: A string kernel for SVM protein classiﬁcation. [sent-403, score-0.773]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('kpca', 0.469), ('string', 0.453), ('kernel', 0.239), ('kde', 0.216), ('hk', 0.192), ('ridge', 0.134), ('vr', 0.122), ('denoising', 0.113), ('principal', 0.103), ('descent', 0.1), ('digits', 0.097), ('xi', 0.096), ('pca', 0.093), ('pa', 0.092), ('denoised', 0.087), ('strings', 0.087), ('pn', 0.087), ('secondary', 0.087), ('protein', 0.081), ('frames', 0.079), ('map', 0.075), ('dependency', 0.075), ('nonlinear', 0.074), ('output', 0.073), ('video', 0.072), ('compression', 0.072), ('learn', 0.067), ('hl', 0.066), ('input', 0.065), ('tagging', 0.061), ('regression', 0.061), ('window', 0.06), ('objects', 0.059), ('fminsearch', 0.055), ('subjectively', 0.055), ('kernels', 0.055), ('gradient', 0.053), ('length', 0.053), ('paired', 0.052), ('lkopf', 0.052), ('feature', 0.052), ('orthogonal', 0.051), ('digit', 0.05), ('sch', 0.05), ('patterns', 0.049), ('arg', 0.049), ('rm', 0.049), ('fk', 0.048), ('sender', 0.048), ('mapping', 0.048), ('outputs', 0.047), ('vj', 0.047), ('requiring', 0.046), ('serving', 0.044), ('alphabet', 0.044), ('invariances', 0.044), ('va', 0.044), ('reconstruct', 0.043), ('training', 0.042), ('yi', 0.041), ('subsequences', 0.041), ('prediction', 0.04), ('technique', 0.039), ('loss', 0.038), ('synthesized', 0.038), ('points', 0.038), ('vi', 0.037), ('mse', 0.036), ('rkhs', 0.036), ('jason', 0.036), ('frame', 0.036), ('wj', 0.035), ('contiguous', 0.035), ('whilst', 0.035), ('predicting', 0.034), ('unstable', 0.033), ('subsequence', 0.033), ('discrete', 0.033), ('rn', 0.032), ('character', 0.032), ('xs', 0.032), ('subspace', 0.032), ('predict', 0.031), ('ei', 0.031), ('bernhard', 0.031), ('incorporating', 0.031), ('usps', 0.03), ('components', 0.029), ('odd', 0.029), ('looks', 0.029), ('weston', 0.029), ('images', 0.029), ('problem', 0.029), ('classi', 0.029), ('min', 0.028), ('serve', 0.028), ('symbol', 0.028), ('bp', 0.028), ('span', 0.028), ('classical', 0.028)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999994 112 nips-2003-Learning to Find Pre-Images

Author: Jason Weston, Bernhard Schölkopf, Gökhan H. Bakir

2 0.21910852 160 nips-2003-Prediction on Spike Data Using Kernel Algorithms

Author: Jan Eichhorn, Andreas Tolias, Alexander Zien, Malte Kuss, Jason Weston, Nikos Logothetis, Bernhard Schölkopf, Carl E. Rasmussen

Abstract: We report and compare the performance of different learning algorithms based on data from cortical recordings. The task is to predict the orientation of visual stimuli from the activity of a population of simultaneously recorded neurons. We compare several ways of improving the coding of the input (i.e., the spike data) as well as of the output (i.e., the orientation), and report the results obtained using different kernel algorithms. 1

3 0.17114566 173 nips-2003-Semi-supervised Protein Classification Using Cluster Kernels

Author: Jason Weston, Dengyong Zhou, André Elisseeff, William S. Noble, Christina S. Leslie

Abstract: A key issue in supervised protein classiﬁcation is the representation of input sequences of amino acids. Recent work using string kernels for protein data has achieved state-of-the-art classiﬁcation performance. However, such representations are based only on labeled data — examples with known 3D structures, organized into structural classes — while in practice, unlabeled data is far more plentiful. In this work, we develop simple and scalable cluster kernel techniques for incorporating unlabeled data into the representation of protein sequences. We show that our methods greatly improve the classiﬁcation performance of string kernels and outperform standard approaches for using unlabeled data, such as adding close homologs of the positive examples to the training data. We achieve equal or superior performance to previously presented cluster kernel methods while achieving far greater computational efﬁciency. 1

4 0.13357358 132 nips-2003-Multiple Instance Learning via Disjunctive Programming Boosting

Author: Stuart Andrews, Thomas Hofmann

Abstract: Learning from ambiguous training data is highly relevant in many applications. We present a new learning algorithm for classiﬁcation problems where labels are associated with sets of pattern instead of individual patterns. This encompasses multiple instance learning as a special case. Our approach is based on a generalization of linear programming boosting and uses results from disjunctive programming to generate successively stronger linear relaxations of a discrete non-convex problem. 1

5 0.11887709 88 nips-2003-Image Reconstruction by Linear Programming

Author: Koji Tsuda, Gunnar Rätsch

Abstract: A common way of image denoising is to project a noisy image to the subspace of admissible images made for instance by PCA. However, a major drawback of this method is that all pixels are updated by the projection, even when only a few pixels are corrupted by noise or occlusion. We propose a new method to identify the noisy pixels by 1 -norm penalization and update the identiﬁed pixels only. The identiﬁcation and updating of noisy pixels are formulated as one linear program which can be solved efﬁciently. Especially, one can apply the ν-trick to directly specify the fraction of pixels to be reconstructed. Moreover, we extend the linear program to be able to exploit prior knowledge that occlusions often appear in contiguous blocks (e.g. sunglasses on faces). The basic idea is to penalize boundary points and interior points of the occluded area differently. We are able to show the ν-property also for this extended LP leading a method which is easy to use. Experimental results impressively demonstrate the power of our approach.

6 0.11059473 77 nips-2003-Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data

7 0.10788281 113 nips-2003-Learning with Local and Global Consistency

8 0.10180434 114 nips-2003-Limiting Form of the Sample Covariance Eigenspectrum in PCA and Kernel PCA

9 0.099896334 96 nips-2003-Invariant Pattern Recognition by Semi-Definite Programming Machines

10 0.095038086 122 nips-2003-Margin Maximizing Loss Functions

11 0.094760768 57 nips-2003-Dynamical Modeling with Kernels for Nonlinear Time Series Prediction

12 0.090125456 150 nips-2003-Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering

13 0.088522792 1 nips-2003-1-norm Support Vector Machines

14 0.08761435 126 nips-2003-Measure Based Regularization

15 0.084962465 9 nips-2003-A Kullback-Leibler Divergence Based Kernel for SVM Classification in Multimedia Applications

16 0.082826473 46 nips-2003-Clustering with the Connectivity Kernel

17 0.081677862 125 nips-2003-Maximum Likelihood Estimation of a Stochastic Integrate-and-Fire Neural Model

18 0.081187703 82 nips-2003-Geometric Clustering Using the Information Bottleneck Method

19 0.081116408 176 nips-2003-Sequential Bayesian Kernel Regression

20 0.076303877 120 nips-2003-Locality Preserving Projections

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.273), (1, -0.121), (2, 0.007), (3, -0.094), (4, 0.003), (5, 0.132), (6, -0.078), (7, -0.052), (8, 0.098), (9, -0.036), (10, 0.057), (11, 0.057), (12, -0.046), (13, -0.007), (14, 0.123), (15, -0.014), (16, 0.157), (17, 0.117), (18, -0.08), (19, 0.004), (20, 0.055), (21, 0.09), (22, -0.083), (23, 0.039), (24, 0.052), (25, -0.094), (26, -0.115), (27, 0.093), (28, -0.157), (29, -0.174), (30, -0.027), (31, -0.024), (32, -0.018), (33, 0.004), (34, -0.057), (35, -0.048), (36, 0.01), (37, 0.088), (38, -0.051), (39, -0.097), (40, -0.066), (41, -0.087), (42, -0.179), (43, 0.029), (44, -0.142), (45, 0.013), (46, 0.018), (47, 0.034), (48, -0.019), (49, 0.136)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94894218 112 nips-2003-Learning to Find Pre-Images

Author: Jason Weston, Bernhard Schölkopf, Gökhan H. Bakir

2 0.77126628 173 nips-2003-Semi-supervised Protein Classification Using Cluster Kernels

Author: Jason Weston, Dengyong Zhou, André Elisseeff, William S. Noble, Christina S. Leslie

3 0.75575364 160 nips-2003-Prediction on Spike Data Using Kernel Algorithms

Author: Jan Eichhorn, Andreas Tolias, Alexander Zien, Malte Kuss, Jason Weston, Nikos Logothetis, Bernhard Schölkopf, Carl E. Rasmussen

4 0.56648409 57 nips-2003-Dynamical Modeling with Kernels for Nonlinear Time Series Prediction

Author: Liva Ralaivola, Florence D'alché-buc

Abstract: We consider the question of predicting nonlinear time series. Kernel Dynamical Modeling (KDM), a new method based on kernels, is proposed as an extension to linear dynamical models. The kernel trick is used twice: ﬁrst, to learn the parameters of the model, and second, to compute preimages of the time series predicted in the feature space by means of Support Vector Regression. Our model shows strong connection with the classic Kalman Filter model, with the kernel feature space as hidden state space. Kernel Dynamical Modeling is tested against two benchmark time series and achieves high quality predictions. 1

5 0.53546178 178 nips-2003-Sparse Greedy Minimax Probability Machine Classification

Author: Thomas R. Strohmann, Andrei Belitski, Gregory Z. Grudic, Dennis DeCoste

Abstract: The Minimax Probability Machine Classiﬁcation (MPMC) framework [Lanckriet et al., 2002] builds classiﬁers by minimizing the maximum probability of misclassiﬁcation, and gives direct estimates of the probabilistic accuracy bound Ω. The only assumptions that MPMC makes is that good estimates of means and covariance matrices of the classes exist. However, as with Support Vector Machines, MPMC is computationally expensive and requires extensive cross validation experiments to choose kernels and kernel parameters that give good performance. In this paper we address the computational cost of MPMC by proposing an algorithm that constructs nonlinear sparse MPMC (SMPMC) models by incrementally adding basis functions (i.e. kernels) one at a time – greedily selecting the next one that maximizes the accuracy bound Ω. SMPMC automatically chooses both kernel parameters and feature weights without using computationally expensive cross validation. Therefore the SMPMC algorithm simultaneously addresses the problem of kernel selection and feature selection (i.e. feature weighting), based solely on maximizing the accuracy bound Ω. Experimental results indicate that we can obtain reliable bounds Ω, as well as test set accuracies that are comparable to state of the art classiﬁcation algorithms.

6 0.53030425 114 nips-2003-Limiting Form of the Sample Covariance Eigenspectrum in PCA and Kernel PCA

7 0.50733477 126 nips-2003-Measure Based Regularization

8 0.50336361 77 nips-2003-Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data

9 0.49055651 113 nips-2003-Learning with Local and Global Consistency

10 0.47218102 147 nips-2003-Online Learning via Global Feedback for Phrase Recognition

11 0.46823114 176 nips-2003-Sequential Bayesian Kernel Regression

12 0.46518695 99 nips-2003-Kernels for Structured Natural Language Data

13 0.4552013 132 nips-2003-Multiple Instance Learning via Disjunctive Programming Boosting

14 0.45457035 60 nips-2003-Eigenvoice Speaker Adaptation via Composite Kernel Principal Component Analysis

15 0.4523913 139 nips-2003-Nonlinear Filtering of Electron Micrographs by Means of Support Vector Regression

16 0.43442476 180 nips-2003-Sparseness of Support Vector Machines---Some Asymptotically Sharp Bounds

17 0.4316574 9 nips-2003-A Kullback-Leibler Divergence Based Kernel for SVM Classification in Multimedia Applications

18 0.41984221 98 nips-2003-Kernel Dimensionality Reduction for Supervised Learning

19 0.41779938 120 nips-2003-Locality Preserving Projections

20 0.41234535 96 nips-2003-Invariant Pattern Recognition by Semi-Definite Programming Machines

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.045), (11, 0.075), (29, 0.01), (30, 0.015), (35, 0.058), (49, 0.211), (53, 0.118), (69, 0.019), (71, 0.085), (76, 0.109), (85, 0.074), (91, 0.086), (99, 0.011)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.90714246 76 nips-2003-GPPS: A Gaussian Process Positioning System for Cellular Networks

Author: Anton Schwaighofer, Marian Grigoras, Volker Tresp, Clemens Hoffmann

Abstract: In this article, we present a novel approach to solving the localization problem in cellular networks. The goal is to estimate a mobile user’s position, based on measurements of the signal strengths received from network base stations. Our solution works by building Gaussian process models for the distribution of signal strengths, as obtained in a series of calibration measurements. In the localization stage, the user’s position can be estimated by maximizing the likelihood of received signal strengths with respect to the position. We investigate the accuracy of the proposed approach on data obtained within a large indoor cellular network. 1

2 0.87074947 40 nips-2003-Bias-Corrected Bootstrap and Model Uncertainty

Author: Harald Steck, Tommi S. Jaakkola

Abstract: The bootstrap has become a popular method for exploring model (structure) uncertainty. Our experiments with artiﬁcial and realworld data demonstrate that the graphs learned from bootstrap samples can be severely biased towards too complex graphical models. Accounting for this bias is hence essential, e.g., when exploring model uncertainty. We ﬁnd that this bias is intimately tied to (well-known) spurious dependences induced by the bootstrap. The leading-order bias-correction equals one half of Akaike’s penalty for model complexity. We demonstrate the eﬀect of this simple bias-correction in our experiments. We also relate this bias to the bias of the plug-in estimator for entropy, as well as to the diﬀerence between the expected test and training errors of a graphical model, which asymptotically equals Akaike’s penalty (rather than one half). 1

same-paper 3 0.85338563 112 nips-2003-Learning to Find Pre-Images

Author: Jason Weston, Bernhard Schölkopf, Gökhan H. Bakir

4 0.84578162 42 nips-2003-Bounded Finite State Controllers

Author: Pascal Poupart, Craig Boutilier

Abstract: We describe a new approximation algorithm for solving partially observable MDPs. Our bounded policy iteration approach searches through the space of bounded-size, stochastic ﬁnite state controllers, combining several advantages of gradient ascent (efﬁciency, search through restricted controller space) and policy iteration (less vulnerability to local optima).

5 0.71566427 173 nips-2003-Semi-supervised Protein Classification Using Cluster Kernels

Author: Jason Weston, Dengyong Zhou, André Elisseeff, William S. Noble, Christina S. Leslie

6 0.70077723 9 nips-2003-A Kullback-Leibler Divergence Based Kernel for SVM Classification in Multimedia Applications

7 0.69159037 189 nips-2003-Tree-structured Approximations by Expectation Propagation

8 0.69145274 54 nips-2003-Discriminative Fields for Modeling Spatial Dependencies in Natural Images

9 0.68341082 103 nips-2003-Learning Bounds for a Generalized Family of Bayesian Posterior Distributions

10 0.68319762 113 nips-2003-Learning with Local and Global Consistency

11 0.68268675 12 nips-2003-A Model for Learning the Semantics of Pictures

12 0.68229043 17 nips-2003-A Sampled Texture Prior for Image Super-Resolution

13 0.68103898 126 nips-2003-Measure Based Regularization

14 0.67937165 122 nips-2003-Margin Maximizing Loss Functions

15 0.67789531 178 nips-2003-Sparse Greedy Minimax Probability Machine Classification

16 0.6777848 107 nips-2003-Learning Spectral Clustering

17 0.677508 47 nips-2003-Computing Gaussian Mixture Models with EM Using Equivalence Constraints

18 0.67536485 39 nips-2003-Bayesian Color Constancy with Non-Gaussian Models

19 0.67531765 138 nips-2003-Non-linear CCA and PCA by Alignment of Local Models

20 0.67286295 50 nips-2003-Denoising and Untangling Graphs Using Degree Priors