nips nips2005 nips2005-42 knowledge-graph by maker-knowledge-mining

42 nips-2005-Combining Graph Laplacians for Semi--Supervised Learning

Source: pdf

Author: Andreas Argyriou, Mark Herbster, Massimiliano Pontil

Abstract: A foundational problem in semi-supervised learning is the construction of a graph underlying the data. We propose to use a method which optimally combines a number of differently constructed graphs. For each of these graphs we associate a basic graph kernel. We then compute an optimal combined kernel. This kernel solves an extended regularization problem which requires a joint minimization over both the data and the set of graph kernels. We present encouraging results on different OCR tasks where the optimal combined kernel is computed from graphs constructed with a variety of distances functions and the ‘k’ in nearest neighbors. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 uk Abstract A foundational problem in semi-supervised learning is the construction of a graph underlying the data. [sent-7, score-0.396]

2 For each of these graphs we associate a basic graph kernel. [sent-9, score-0.372]

3 This kernel solves an extended regularization problem which requires a joint minimization over both the data and the set of graph kernels. [sent-11, score-0.713]

4 We present encouraging results on different OCR tasks where the optimal combined kernel is computed from graphs constructed with a variety of distances functions and the ‘k’ in nearest neighbors. [sent-12, score-0.484]

5 1 Introduction Semi-supervised learning has received signiﬁcant attention in machine learning in recent years, see, for example, [2, 3, 4, 8, 9, 16, 17, 18] and references therein. [sent-13, score-0.094]

6 The deﬁning insight of semi-supervised methods is that unlabeled data may be used to improve the performance of learners in a supervised task. [sent-14, score-0.102]

7 Graph construction consists of two stages, ﬁrst selection of a distance function and then application of it to determine the graph’s edges (or weights thereof). [sent-16, score-0.268]

8 For example, in this paper we consider distances between images based on the Euclidean distance, Euclidean distance combined with image transformations, and the related tangent distance [6]; we determine the edge set of the graph with k-nearest neighbors. [sent-17, score-1.032]

9 Another common choice is 2 to weight edges by a decreasing function of the distance d such as e−βd . [sent-18, score-0.196]

10 Although a surplus of unlabeled data may improve the quality of the empirical approximation of the manifold (via the graph) leading to improved performances, practical experience with these methods indicates that their performance signiﬁcantly depends on how the graph is constructed. [sent-19, score-0.458]

11 Hence, the model selection problem must consider both the selection of the distance function and the parameters k or β used in the graph building process described above. [sent-20, score-0.508]

12 A diversity of methods have been proposed for graph construction; in this paper we do not advocate selecting a single graph but, rather we propose combining a number of graphs. [sent-21, score-0.599]

13 Our solution implements a method based on regularization which builds upon the work in [1]. [sent-22, score-0.235]

14 For a given dataset each combination of distance functions and edge set speciﬁcations from the distance will lead to a speciﬁc graph. [sent-23, score-0.501]

15 Each of these graphs may then be associated with a kernel. [sent-24, score-0.09]

16 We then apply regularization to select the best convex combination of these kernels; the minimizing function will trade off its ﬁt to the data against its norm. [sent-25, score-0.509]

17 What is unique about this regularization is that the minimization is not over a single kernel space but rather over a space corresponding to all convex combinations of kernels. [sent-26, score-0.632]

18 Thus all data (labeled vertices) may be conserved for training rather than reduced by cross-validation which is not an appealing option when the number of labeled vertices per class is very small. [sent-27, score-0.307]

19 There, three different distances for 400 images of the digits ‘six’ and ‘nine’ are depicted, namely, the Euclidean distance, a distance invariant under small centered image rotations from [−10◦ , 10◦ ] and a distance invariant under rotations from [−180◦ , 180◦ ]. [sent-29, score-0.777]

20 Clearly, the last distance is problematic as sixes become similar to nines. [sent-30, score-0.196]

21 The performance of our graph regularization learning algorithm discussed in Section 2. [sent-31, score-0.534]

22 2 with these distances is reported below each plot; as expected, this performance is much lower in the case that the third distance is used. [sent-32, score-0.28]

23 In Section 2 we discuss how regularization may be applied to single graphs. [sent-34, score-0.205]

24 First, we review regularization in the context of reproducing kernel Hilbert spaces (Section 2. [sent-35, score-0.53]

25 Here we review the (normalized) Laplacian of the graph and a kernel which is the pseudoinverse of the graph Laplacian. [sent-38, score-0.763]

26 In Section 3 we detail our algorithm for learning an optimal convex combination of Laplacian kernels. [sent-39, score-0.386]

27 2 Background on graph regularization In this section we review graph regularization [2, 9, 14] from the perspective of reproducing kernel Hilbert spaces, see e. [sent-41, score-1.249]

28 1 Reproducing kernel Hilbert spaces Let X be a set and K : X × X → IR a kernel function. [sent-45, score-0.348]

29 We say that HK is a reproducing kernel Hilbert space (RKHS) of functions f : X → IR if (i): for every x ∈ X, K(x, ·) ∈ HK and (ii): the reproducing kernel property f (x) = f, K(x, ·) K holds for every f ∈ HK and x ∈ X, where ·, · K is the inner product on HK . [sent-46, score-0.55]

30 Regularization in an RKHS learns a function f ∈ HK on the basis of available input/output examples {(xi , yi ) : i ∈ INℓ } by solving the variational problem ℓ Eγ (K) := min V (yi , f (xi )) + γ f 2 K : f ∈ HK (2. [sent-51, score-0.175]

31 1) i=1 where V : IR × IR → [0, ∞) is a loss function and γ a positive parameter. [sent-52, score-0.077]

32 1) then it has the form ℓ f (x) = ci K(xi , x), x ∈ X i=1 (2. [sent-54, score-0.093]

33 However, in many practical situations it is more convenient to compute c by solving the dual problem to (2. [sent-59, score-0.088]

34 1), namely ℓ −Eγ (K) := min 1 ⊤ c Kc + V ∗ (yi , ci ) : c ∈ IRℓ 4γ i=1 (2. [sent-60, score-0.138]

35 3) ∗ where K = (K(xi , xj ))ℓ i,j=1 and the function V : IR × IR → IR ∪ {+∞} is the conjugate of the loss function V which is deﬁned, for every z, α ∈ IR, as V ∗ (z, α) := sup{λα − V (z, λ) : λ ∈ IR}, see, for example, [1] for a discussion. [sent-61, score-0.077]

36 The choice of the loss function V leads to different learning methods among which the most prominent are square loss regularization and support vector machines, see, for example [15]. [sent-62, score-0.461]

37 2 Graph regularization Let G be an undirected graph with m vertices and an m × m adjacency matrix A such that Aij = 1 if there is an edge connecting vertices i and j and zero otherwise1 . [sent-64, score-0.741]

38 The graph Laplacian L is the m × m matrix deﬁned as L := D − A, where D = diag(di : i ∈ INm ) m and di is the degree of vertex i, that is di = j=1 Aij . [sent-65, score-0.358]

39 We identify the linear space of real-valued functions deﬁned on the graph with IRm and introduce on it the semi-inner product u, v := u⊤ Lv, u, v ∈ IRm . [sent-66, score-0.282]

40 We let {σi , ui }m be a system of eigeni=1 values/vectors of L where the eigenvalues are non-decreasing in order, σi = 0, i ∈ INr , and deﬁne the linear subspace H(G) of IRm which is orthogonal to the eigenvectors with zero eigenvalue, that is, H(G) := {v : v⊤ ui = 0, i ∈ INr }. [sent-72, score-0.099]

41 Within this framework, we wish to learn a function v ∈ H(G) on the basis of a set of labeled vertices. [sent-73, score-0.149]

42 Without loss of generality we assume that the ﬁrst ℓ ≤ m vertices are labeled and let y1 , . [sent-74, score-0.353]

43 Following [2] we prescribe a loss function V and compute the function v by solving the optimization problem ℓ min V (yi , vi ) + γ v 2 : v ∈ H(G) . [sent-78, score-0.297]

44 4) i=1 We note that a similar approach is presented in [17] where v is (essentially) obtained as the minimal norm interpolant in H(G) to the labeled vertices. [sent-80, score-0.187]

45 4) balances the error on the labeled points with a smoothness term measuring the complexity of v on the graph. [sent-82, score-0.211]

46 Note that this last term contains the information of both the labeled and unlabeled vertices via the graph Laplacian. [sent-83, score-0.66]

47 Moreover, the pseudoinverse of the Laplacian, L+ , is the reproducing kernel of H(G), see, for example, [7] for a proof. [sent-89, score-0.325]

48 This means that for every v ∈ H(G) and i ∈ INm there holds the reproducing kernel property vi = L+ , v , where L+ is the i-th i i column of L+ . [sent-90, score-0.362]

49 Hence, by setting X ≡ INm , f (i) = vi and K(i, j) = L+ , i, j ∈ INm , we ij see that HK ≡ H(G). [sent-91, score-0.118]

50 In particular, for square loss regularization [2] and minimal norm interpolation [17] this requires solving a squared linear system of m and m − ℓ equations respectively. [sent-96, score-0.433]

51 The coefﬁcients ci are obtained by solving problem (2. [sent-100, score-0.181]

52 For example, for square loss regularization the computation of the parameter vector c = (ci : i ∈ INℓ ) involves solving a linear system of ℓ equations, namely (K + γI)c = y. [sent-102, score-0.395]

53 5) Learning a convex combination of Laplacian kernels We now describe our framework for learning with multiple graph Laplacians. [sent-104, score-0.893]

54 We assume that we are given n graphs G(q) , q ∈ INn , all having m vertices, with corresponding Laplacians L(q) , kernels K (q) = (L(q) )+ , Hilbert spaces H(q) := H(G(q) ) and norms v 2 := v⊤ L(q) v, v ∈ H(q) . [sent-105, score-0.4]

55 We propose to learn an optimal convex combination of q graph kernels, that is, we solve the optimization problem ℓ ρ = min V (yi , vi ) + γ v 2 K(λ) : λ ∈ Λ, v ∈ HK(λ) (3. [sent-106, score-0.783]

56 1) i=1 n where we have deﬁned the set Λ := {λ ∈ IRn : λq ≥ 0, q=1 λq = 1} and, for each n λ ∈ Λ, the kernel K(λ) := q=1 λq K (q) . [sent-107, score-0.149]

57 Hence an optimal convex combination of kernels has a smaller right hand side than that of any individual kernel, motivating the expectation of improved performance. [sent-109, score-0.599]

58 1) over v ∈ HK and K ∈ co(K), the convex hull of kernels in a prescribed set K. [sent-113, score-0.484]

59 1) it is important to require that the kernels K (q) satisfy a normalization condition such as that they all have the same trace or the same Frobenius norm, see [10] for a discussion. [sent-117, score-0.26]

60 ˆ ˆ Figure 1: Algorithm to compute an optimal convex combination of kernels in the set co{K (q) : q ∈ INn }. [sent-125, score-0.599]

61 1) we can rewrite this problem as ℓ −ρ = max min 1 ⊤ c K(λ)c + V ∗ (yi , ci ) : c ∈ IRℓ : λ ∈ Λ . [sent-128, score-0.168]

62 2) expresses the optimal convex combination of the kernels as the solution to a saddle point problem. [sent-131, score-0.64]

63 In particular, for square loss regularization this requires ˆ solving the equation (2. [sent-139, score-0.395]

64 4 Experiments In this section we present our experiments on optical character recognition. [sent-141, score-0.105]

65 First, the optimal convex combination of kernels computed by our algorithm is competitive with the best base kernels. [sent-143, score-0.632]

66 Second, by observing the ‘weights’ of the convex combination we can distinguish the strong from the weak candidate kernels. [sent-144, score-0.304]

67 We used the USPS dataset2 of 16×16 images of handwritten digits with pixel values ranging between -1 and 1. [sent-146, score-0.158]

68 We present the results for 5 pairwise classiﬁcation tasks of varying difﬁculty and for odd vs. [sent-147, score-0.213]

69 For pairwise classiﬁcation, the training set consisted of the ﬁrst 200 images for each digit in the USPS training set and the number of labeled points was chosen to be 4, 8 or 12 (with equal numbers for each digit). [sent-149, score-0.464]

70 even digit classiﬁcation, the training set consisted of the ﬁrst 80 images per digit in the USPS training set and the number of labeled points was 10, 20 or 30, with equal numbers for each digit. [sent-151, score-0.541]

71 Performance was averaged over 30 random selections, each with the same number of labeled points. [sent-152, score-0.149]

72 In each experiment, we constructed n = 30 graphs G(q) (q ∈ INn ) by combining k-nearest neighbors (k ∈ IN10 ) with three different distances. [sent-153, score-0.161]

73 Since kernels obtained from different types of graphs can vary widely, it was necessary to renormalize them. [sent-156, score-0.35]

74 Hence, we chose to normalize each kernel during the 2 Available at: http://www-stat-class. [sent-157, score-0.149]

75 7 Table 1: Misclassiﬁcation error percentage (top) and standard deviation (bottom) for the best convex combination of kernels on different handwritten digit recognition tasks, using different distances. [sent-312, score-0.766]

76 training process by the Frobenius norm of its submatrix corresponding to the labeled data. [sent-314, score-0.218]

77 The regularization parameter was set to 10−5 in all algorithms. [sent-316, score-0.205]

78 For convex minimization, as the starting kernel in the algorithm in Figure 1 we always used the average of the n kernels and as the maximum number of iterations T = 100. [sent-317, score-0.604]

79 Table 1 shows the results obtained using three distances as combined with k-NN (k ∈ IN10 ). [sent-318, score-0.128]

80 The ﬁrst distance is the Euclidean distance between images. [sent-319, score-0.392]

81 The second method is transformation, where the distance between two images is given by the smallest Euclidean distance between any pair of transformed images as determined by applying a number of afﬁne transformations and a thickness transformation3 , see [6] for more information. [sent-320, score-0.566]

82 The third distance is tangent distance, as described in [6], which is a ﬁrst-order approximation to the above transformations. [sent-321, score-0.319]

83 For the ﬁrst three columns in the table the Euclidean distance was used, for columns 4–6 the image transformation distance was used, for columns 7–9 the tangent distance was used. [sent-322, score-0.98]

84 We also noted that within each of the methods the performance of the convex combination is comparable to that of the best kernels. [sent-325, score-0.304]

85 Figure 2 reports the weight of each individual kernel learned by our algorithm when 2% labels are used in the pairwise tasks and 20 labels are used for odd vs. [sent-326, score-0.479]

86 7 task, the large weights are associated with the graphs/kernels built with the tangent distance. [sent-329, score-0.158]

87 The effectiveness of our algorithm in selecting the good graphs/kernels is better demonstrated in Figure 3, where the Euclidean and the transformation kernels are combined with a “low-quality” kernel. [sent-330, score-0.41]

88 3 This distance was approximated using Matlab’s constrained minimization function. [sent-332, score-0.243]

89 The ﬁgure shows the distance matrix on the set of labeled and unlabeled data for the Euclidean, transformation and “low-quality distance” respectively. [sent-333, score-0.553]

90 The best error among 15 different values of k within each method, the error of the learned convex combination and the total learned weights for each method are shown below each plot. [sent-334, score-0.467]

91 It is clear that the solution of the algorithm is dominated by the good kernels and is not inﬂuenced by the ones with low performance. [sent-335, score-0.26]

92 As a result, the error of the convex combination is comparable to that of the Euclidean and transformation methods. [sent-336, score-0.443]

93 The ﬁnal experiment (see Figure 4) demonstrates that unlabeled data improves the performance of our method. [sent-337, score-0.102]

94 Euclidean Transformation Low−quality distance 0 0 0 50 50 50 100 100 100 150 150 150 200 200 200 250 250 250 300 300 300 350 350 400 0 100 200 300 400 error = 0. [sent-384, score-0.229]

95 26% Figure 3: Similarity matrices and corresponding learned coefﬁcients of the convex combination for the 6 vs. [sent-391, score-0.335]

96 5 Conclusion We have presented a method for computing an optimal kernel within the framework of regularization over graphs. [sent-394, score-0.389]

97 When tested on optical character recognition tasks, the method exhibits competitive performance and is able to select good graph structures. [sent-396, score-0.42]

98 In particular, we may consider a continuous family of graphs each corresponding to a different weight matrix and study graph kernel combinations over this class. [sent-398, score-0.557]

99 The number of labeled points is 10 on the left and 20 on the right. [sent-424, score-0.149]

100 Diffusion kernels on graphs and other discrete input spaces. [sent-486, score-0.35]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('ir', 0.301), ('graph', 0.282), ('kernels', 0.26), ('euclidean', 0.222), ('regularization', 0.205), ('inm', 0.203), ('distance', 0.196), ('convex', 0.195), ('inn', 0.177), ('hk', 0.158), ('kernel', 0.149), ('labeled', 0.149), ('irm', 0.136), ('odd', 0.129), ('vertices', 0.127), ('reproducing', 0.126), ('tangent', 0.123), ('digit', 0.115), ('laplacian', 0.114), ('combination', 0.109), ('transformation', 0.106), ('inp', 0.102), ('unlabeled', 0.102), ('hilbert', 0.097), ('ci', 0.093), ('graphs', 0.09), ('vi', 0.087), ('usps', 0.086), ('distances', 0.084), ('loss', 0.077), ('laplacians', 0.075), ('images', 0.07), ('argyriou', 0.068), ('herbster', 0.068), ('inr', 0.068), ('co', 0.067), ('character', 0.062), ('solving', 0.058), ('aij', 0.056), ('square', 0.055), ('handwritten', 0.054), ('spaces', 0.05), ('micchelli', 0.05), ('pseudoinverse', 0.05), ('th', 0.049), ('minimization', 0.047), ('pontil', 0.047), ('learning', 0.047), ('tasks', 0.046), ('rotations', 0.045), ('min', 0.045), ('manifold', 0.044), ('combined', 0.044), ('optical', 0.043), ('rkhs', 0.043), ('labels', 0.043), ('columns', 0.042), ('yi', 0.042), ('pk', 0.041), ('saddle', 0.041), ('frobenius', 0.041), ('eigenvectors', 0.039), ('belkin', 0.038), ('norm', 0.038), ('di', 0.038), ('pairwise', 0.038), ('construction', 0.037), ('zhu', 0.037), ('misclassi', 0.037), ('image', 0.037), ('constructed', 0.036), ('combinations', 0.036), ('combining', 0.035), ('optimal', 0.035), ('invariant', 0.035), ('weights', 0.035), ('transformations', 0.034), ('colt', 0.034), ('digits', 0.034), ('competitive', 0.033), ('error', 0.033), ('semide', 0.032), ('ij', 0.031), ('learned', 0.031), ('cients', 0.031), ('text', 0.031), ('training', 0.031), ('consisted', 0.03), ('ui', 0.03), ('builds', 0.03), ('smola', 0.03), ('smoothing', 0.03), ('experience', 0.03), ('problem', 0.03), ('balances', 0.029), ('irn', 0.029), ('kondor', 0.029), ('lesser', 0.029), ('matveeva', 0.029), ('prescribed', 0.029)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000006 42 nips-2005-Combining Graph Laplacians for Semi--Supervised Learning

Author: Andreas Argyriou, Mark Herbster, Massimiliano Pontil

2 0.23455104 123 nips-2005-Maximum Margin Semi-Supervised Learning for Structured Variables

Author: Y. Altun, D. McAllester, M. Belkin

Abstract: Many real-world classiﬁcation problems involve the prediction of multiple inter-dependent variables forming some structural dependency. Recent progress in machine learning has mainly focused on supervised classiﬁcation of such structured variables. In this paper, we investigate structured classiﬁcation in a semi-supervised setting. We present a discriminative approach that utilizes the intrinsic geometry of input patterns revealed by unlabeled data points and we derive a maximum-margin formulation of semi-supervised learning for structured variables. Unlike transductive algorithms, our formulation naturally extends to new test points. 1

3 0.23417634 27 nips-2005-Analysis of Spectral Kernel Design based Semi-supervised Learning

Author: Tong Zhang, Rie Kubota Ando

Abstract: We consider a framework for semi-supervised learning using spectral decomposition based un-supervised kernel design. This approach subsumes a class of previously proposed semi-supervised learning methods on data graphs. We examine various theoretical properties of such methods. In particular, we derive a generalization performance bound, and obtain the optimal kernel design by minimizing the bound. Based on the theoretical analysis, we are able to demonstrate why spectral kernel design based methods can often improve the predictive performance. Experiments are used to illustrate the main consequences of our analysis.

4 0.23401718 92 nips-2005-Hyperparameter and Kernel Learning for Graph Based Semi-Supervised Classification

Author: Ashish Kapoor, Hyungil Ahn, Yuan Qi, Rosalind W. Picard

Abstract: There have been many graph-based approaches for semi-supervised classiﬁcation. One problem is that of hyperparameter learning: performance depends greatly on the hyperparameters of the similarity graph, transformation of the graph Laplacian and the noise model. We present a Bayesian framework for learning hyperparameters for graph-based semisupervised classiﬁcation. Given some labeled data, which can contain inaccurate labels, we pose the semi-supervised classiﬁcation as an inference problem over the unknown labels. Expectation Propagation is used for approximate inference and the mean of the posterior is used for classiﬁcation. The hyperparameters are learned using EM for evidence maximization. We also show that the posterior mean can be written in terms of the kernel matrix, providing a Bayesian classiﬁer to classify new points. Tests on synthetic and real datasets show cases where there are signiﬁcant improvements in performance over the existing approaches. 1

5 0.19544083 57 nips-2005-Distance Metric Learning for Large Margin Nearest Neighbor Classification

Author: Kilian Q. Weinberger, John Blitzer, Lawrence K. Saul

Abstract: We show how to learn a Mahanalobis distance metric for k-nearest neighbor (kNN) classiﬁcation by semideﬁnite programming. The metric is trained with the goal that the k-nearest neighbors always belong to the same class while examples from different classes are separated by a large margin. On seven data sets of varying size and difﬁculty, we ﬁnd that metrics trained in this way lead to signiﬁcant improvements in kNN classiﬁcation—for example, achieving a test error rate of 1.3% on the MNIST handwritten digits. As in support vector machines (SVMs), the learning problem reduces to a convex optimization based on the hinge loss. Unlike learning in SVMs, however, our framework requires no modiﬁcation or extension for problems in multiway (as opposed to binary) classiﬁcation. 1

6 0.16379881 105 nips-2005-Large-Scale Multiclass Transduction

7 0.15478024 178 nips-2005-Soft Clustering on Graphs

8 0.15057719 126 nips-2005-Metric Learning by Collapsing Classes

9 0.13923551 50 nips-2005-Convex Neural Networks

10 0.12801148 10 nips-2005-A General and Efficient Multiple Kernel Learning Algorithm

11 0.12526633 47 nips-2005-Consistency of one-class SVM and related algorithms

12 0.12524198 56 nips-2005-Diffusion Maps, Spectral Clustering and Eigenfunctions of Fokker-Planck Operators

13 0.11976518 175 nips-2005-Sequence and Tree Kernels with Statistical Feature Mining

14 0.11811413 114 nips-2005-Learning Rankings via Convex Hull Separation

15 0.1173673 82 nips-2005-Generalization Error Bounds for Aggregation by Mirror Descent with Averaging

16 0.1109932 199 nips-2005-Value Function Approximation with Diffusion Wavelets and Laplacian Eigenfunctions

17 0.10598096 190 nips-2005-The Curse of Highly Variable Functions for Local Kernel Machines

18 0.10507421 202 nips-2005-Variational EM Algorithms for Non-Gaussian Latent Variable Models

19 0.10335619 138 nips-2005-Non-Local Manifold Parzen Windows

20 0.098105162 75 nips-2005-Fixing two weaknesses of the Spectral Method

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.321), (1, 0.204), (2, -0.149), (3, -0.141), (4, -0.11), (5, 0.113), (6, 0.157), (7, -0.025), (8, 0.105), (9, -0.036), (10, 0.09), (11, -0.099), (12, -0.19), (13, -0.01), (14, -0.085), (15, -0.245), (16, 0.059), (17, -0.011), (18, -0.018), (19, 0.093), (20, -0.045), (21, 0.005), (22, -0.03), (23, -0.019), (24, 0.062), (25, -0.029), (26, 0.031), (27, -0.041), (28, -0.126), (29, -0.072), (30, 0.022), (31, -0.013), (32, -0.044), (33, 0.03), (34, 0.07), (35, 0.089), (36, -0.072), (37, 0.053), (38, -0.019), (39, -0.007), (40, -0.075), (41, 0.025), (42, 0.087), (43, 0.057), (44, -0.012), (45, -0.045), (46, -0.09), (47, 0.002), (48, 0.015), (49, -0.049)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97389805 42 nips-2005-Combining Graph Laplacians for Semi--Supervised Learning

Author: Andreas Argyriou, Mark Herbster, Massimiliano Pontil

2 0.79151893 123 nips-2005-Maximum Margin Semi-Supervised Learning for Structured Variables

Author: Y. Altun, D. McAllester, M. Belkin

3 0.71322095 105 nips-2005-Large-Scale Multiclass Transduction

Author: Thomas Gärtner, Quoc V. Le, Simon Burton, Alex J. Smola, Vishy Vishwanathan

Abstract: We present a method for performing transductive inference on very large datasets. Our algorithm is based on multiclass Gaussian processes and is effective whenever the multiplication of the kernel matrix or its inverse with a vector can be computed sufﬁciently fast. This holds, for instance, for certain graph and string kernels. Transduction is achieved by variational inference over the unlabeled data subject to a balancing constraint. 1

4 0.70503819 92 nips-2005-Hyperparameter and Kernel Learning for Graph Based Semi-Supervised Classification

Author: Ashish Kapoor, Hyungil Ahn, Yuan Qi, Rosalind W. Picard

5 0.70500475 27 nips-2005-Analysis of Spectral Kernel Design based Semi-supervised Learning

Author: Tong Zhang, Rie Kubota Ando

6 0.64142305 10 nips-2005-A General and Efficient Multiple Kernel Learning Algorithm

7 0.59680748 57 nips-2005-Distance Metric Learning for Large Margin Nearest Neighbor Classification

8 0.56859267 126 nips-2005-Metric Learning by Collapsing Classes

9 0.52063876 114 nips-2005-Learning Rankings via Convex Hull Separation

10 0.48898911 44 nips-2005-Computing the Solution Path for the Regularized Support Vector Regression

11 0.48580769 107 nips-2005-Large scale networks fingerprinting and visualization using the k-core decomposition

12 0.4776006 190 nips-2005-The Curse of Highly Variable Functions for Local Kernel Machines

13 0.46580634 75 nips-2005-Fixing two weaknesses of the Spectral Method

14 0.45632693 175 nips-2005-Sequence and Tree Kernels with Statistical Feature Mining

15 0.45607811 116 nips-2005-Learning Topology with the Generative Gaussian Graph and the EM Algorithm

16 0.41662639 82 nips-2005-Generalization Error Bounds for Aggregation by Mirror Descent with Averaging

17 0.41502389 178 nips-2005-Soft Clustering on Graphs

18 0.41354191 9 nips-2005-A Domain Decomposition Method for Fast Manifold Learning

19 0.41096094 71 nips-2005-Fast Krylov Methods for N-Body Learning

20 0.40786904 56 nips-2005-Diffusion Maps, Spectral Clustering and Eigenfunctions of Fokker-Planck Operators

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(3, 0.04), (10, 0.035), (31, 0.018), (34, 0.573), (41, 0.012), (50, 0.012), (55, 0.023), (65, 0.013), (69, 0.036), (73, 0.041), (88, 0.09), (91, 0.029)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.99452198 2 nips-2005-A Bayes Rule for Density Matrices

Author: Manfred K. Warmuth

Abstract: The classical Bayes rule computes the posterior model probability from the prior probability and the data likelihood. We generalize this rule to the case when the prior is a density matrix (symmetric positive deﬁnite and trace one) and the data likelihood a covariance matrix. The classical Bayes rule is retained as the special case when the matrices are diagonal. In the classical setting, the calculation of the probability of the data is an expected likelihood, where the expectation is over the prior distribution. In the generalized setting, this is replaced by an expected variance calculation where the variance is computed along the eigenvectors of the prior density matrix and the expectation is over the eigenvalues of the density matrix (which form a probability vector). The variances along any direction is determined by the covariance matrix. Curiously enough this expected variance calculation is a quantum measurement where the co-variance matrix speciﬁes the instrument and the prior density matrix the mixture state of the particle. We motivate both the classical and the generalized Bayes rule with a minimum relative entropy principle, where the Kullbach-Leibler version gives the classical Bayes rule and Umegaki’s quantum relative entropy the new Bayes rule for density matrices. 1

2 0.98641884 12 nips-2005-A PAC-Bayes approach to the Set Covering Machine

Author: François Laviolette, Mario Marchand, Mohak Shah

Abstract: We design a new learning algorithm for the Set Covering Machine from a PAC-Bayes perspective and propose a PAC-Bayes risk bound which is minimized for classiﬁers achieving a non trivial margin-sparsity trade-oﬀ. 1

3 0.98597836 10 nips-2005-A General and Efficient Multiple Kernel Learning Algorithm

Author: Sören Sonnenburg, Gunnar Rätsch, Christin Schäfer

Abstract: While classical kernel-based learning algorithms are based on a single kernel, in practice it is often desirable to use multiple kernels. Lankriet et al. (2004) considered conic combinations of kernel matrices for classiﬁcation, leading to a convex quadratically constraint quadratic program. We show that it can be rewritten as a semi-inﬁnite linear program that can be efﬁciently solved by recycling the standard SVM implementations. Moreover, we generalize the formulation and our method to a larger class of problems, including regression and one-class classiﬁcation. Experimental results show that the proposed algorithm helps for automatic model selection, improving the interpretability of the learning result and works for hundred thousands of examples or hundreds of kernels to be combined. 1

same-paper 4 0.98533446 42 nips-2005-Combining Graph Laplacians for Semi--Supervised Learning

Author: Andreas Argyriou, Mark Herbster, Massimiliano Pontil

5 0.96565157 143 nips-2005-Off-Road Obstacle Avoidance through End-to-End Learning

Author: Urs Muller, Jan Ben, Eric Cosatto, Beat Flepp, Yann L. Cun

Abstract: We describe a vision-based obstacle avoidance system for off-road mobile robots. The system is trained from end to end to map raw input images to steering angles. It is trained in supervised mode to predict the steering angles provided by a human driver during training runs collected in a wide variety of terrains, weather conditions, lighting conditions, and obstacle types. The robot is a 50cm off-road truck, with two forwardpointing wireless color cameras. A remote computer processes the video and controls the robot via radio. The learning system is a large 6-layer convolutional network whose input is a single left/right pair of unprocessed low-resolution images. The robot exhibits an excellent ability to detect obstacles and navigate around them in real time at speeds of 2 m/s.

6 0.83656406 44 nips-2005-Computing the Solution Path for the Regularized Support Vector Regression

7 0.82521278 57 nips-2005-Distance Metric Learning for Large Margin Nearest Neighbor Classification

8 0.8092525 114 nips-2005-Learning Rankings via Convex Hull Separation

9 0.79627877 123 nips-2005-Maximum Margin Semi-Supervised Learning for Structured Variables

10 0.7939831 190 nips-2005-The Curse of Highly Variable Functions for Local Kernel Machines

11 0.79114282 105 nips-2005-Large-Scale Multiclass Transduction

12 0.78623897 77 nips-2005-From Lasso regression to Feature vector machine

13 0.78563625 31 nips-2005-Asymptotics of Gaussian Regularized Least Squares

14 0.77075446 184 nips-2005-Structured Prediction via the Extragradient Method

15 0.76105952 50 nips-2005-Convex Neural Networks

16 0.7542007 126 nips-2005-Metric Learning by Collapsing Classes

17 0.75282496 195 nips-2005-Transfer learning for text classification

18 0.74933219 138 nips-2005-Non-Local Manifold Parzen Windows

19 0.74369216 58 nips-2005-Divergences, surrogate loss functions and experimental design

20 0.74338186 47 nips-2005-Consistency of one-class SVM and related algorithms