nips nips2005 nips2005-10 knowledge-graph by maker-knowledge-mining

10 nips-2005-A General and Efficient Multiple Kernel Learning Algorithm


Source: pdf

Author: Sören Sonnenburg, Gunnar Rätsch, Christin Schäfer

Abstract: While classical kernel-based learning algorithms are based on a single kernel, in practice it is often desirable to use multiple kernels. Lankriet et al. (2004) considered conic combinations of kernel matrices for classification, leading to a convex quadratically constraint quadratic program. We show that it can be rewritten as a semi-infinite linear program that can be efficiently solved by recycling the standard SVM implementations. Moreover, we generalize the formulation and our method to a larger class of problems, including regression and one-class classification. Experimental results show that the proposed algorithm helps for automatic model selection, improving the interpretability of the learning result and works for hundred thousands of examples or hundreds of kernels to be combined. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 de Gunnar R¨ tsch a Friedrich Miescher Lab Max Planck Society Spemannstr. [sent-4, score-0.109]

2 de Abstract While classical kernel-based learning algorithms are based on a single kernel, in practice it is often desirable to use multiple kernels. [sent-10, score-0.084]

3 (2004) considered conic combinations of kernel matrices for classification, leading to a convex quadratically constraint quadratic program. [sent-12, score-0.449]

4 We show that it can be rewritten as a semi-infinite linear program that can be efficiently solved by recycling the standard SVM implementations. [sent-13, score-0.086]

5 Moreover, we generalize the formulation and our method to a larger class of problems, including regression and one-class classification. [sent-14, score-0.116]

6 Experimental results show that the proposed algorithm helps for automatic model selection, improving the interpretability of the learning result and works for hundred thousands of examples or hundreds of kernels to be combined. [sent-15, score-0.231]

7 They employ a so-called kernel function k(xi , xj ) which intuitively computes the similarity between two examples xi and xj . [sent-17, score-0.458]

8 The result of SVM learning is a α-weighted linear combination of kernel elements and the bias b: N f (x) = sign αi yi k(xi , x) + b , (1) i=1 where the xi ’s are N labeled training examples (yi ∈ {±1}). [sent-18, score-0.525]

9 Recent developments in the literature on the SVM and other kernel methods have shown the need to consider multiple kernels. [sent-19, score-0.234]

10 This provides flexibility, and also reflects the fact that typical learning problems often involve multiple, heterogeneous data sources. [sent-20, score-0.093]

11 While this so-called “multiple kernel learning” (MKL) problem can in principle be solved via cross-validation, several recent papers have focused on more efficient methods for multiple kernel learning [4, 5, 1, 7, 3, 9, 2]. [sent-21, score-0.507]

12 One of the problems with kernel methods compared to other techniques is that the resulting decision function (1) is hard to interpret and, hence, is difficult to use in order to extract rel∗ For more details, datasets and pseudocode see http://www. [sent-22, score-0.209]

13 One can approach this problem by considering convex combinations of K kernels, i. [sent-28, score-0.109]

14 K k(xi , xj ) = βk kk (xi , xj ) (2) k=1 with βk ≥ 0 and k βk = 1, where each kernel kk uses only a distinct set of features of each instance. [sent-30, score-0.591]

15 We will illustrate that the considered MKL formulation provides useful insights and is at the same time is very efficient. [sent-32, score-0.075]

16 This is an important property missing in current kernel based algorithms. [sent-33, score-0.179]

17 We consider the framework proposed by [7], which results in a convex optimization problem - a quadratically-constrained quadratic program (QCQP). [sent-34, score-0.235]

18 This problem is more challenging than the standard SVM QP, but it can in principle be solved by general-purpose optimization toolboxes. [sent-35, score-0.097]

19 Since the use of such algorithms will only be feasible for small problems with few data points and kernels, [1] suggested an algorithm based on sequential minimization optimization (SMO) [10]. [sent-36, score-0.095]

20 While the kernel learning problem is convex, it is also non-smooth, making the direct application of simple local descent algorithms such as SMO infeasible. [sent-37, score-0.238]

21 In this work we follow a different direction: We reformulate the problem as a semi-infinite linear program (SILP), which can be efficiently solved using an off-the-shelf LP solver and a standard SVM implementation (cf. [sent-39, score-0.152]

22 Using this approach we are able to solve problems with more than hundred thousand examples or with several hundred kernels quite efficiently. [sent-41, score-0.317]

23 We extend our previous work and show that the transformation to a SILP works with a large class of convex loss functions (cf. [sent-43, score-0.18]

24 Our column-generation based algorithm for solving the SILP works by repeatedly using an algorithm that can efficiently solve the single kernel problem in order to solve the MKL problem. [sent-45, score-0.353]

25 Hence, if there exists an algorithm that solves the simpler problem efficiently (like SVMs), then our new algorithm can efficiently solve the multiple kernel learning problem. [sent-46, score-0.365]

26 We conclude the paper by illustrating the usefulness of our algorithms in several examples relating to the interpretation of results and to automatic model selection. [sent-47, score-0.071]

27 2 Multiple Kernel Learning for Classification using SILP In the Multiple Kernel Learning (MKL) problem for binary classification one is given N data points (xi , yi ) (yi ∈ {±1}), where xi is translated via a mapping Φk (x) → RDk , k = 1 . [sent-48, score-0.308]

28 Then one solves the following optimization problem [1], which is equivalent to the linear SVM for K = 1:1 wk ∈RDk ,ξ∈RN ,β∈RK ,b∈R + + 1 2 s. [sent-55, score-0.159]

29 yi min 2 K βk wk 2 N +C ξi K K ⊤ βk wk Φk (xi ) + b k=1 (3) i=1 k=1 ≥ 1 − ξi and βk = 1. [sent-57, score-0.377]

30 Note that the ℓ1 -norm of β is constrained to one, while one is penalizing the ℓ2 -norm of wk in each block k separately. [sent-62, score-0.097]

31 [1] derived the dual for problem (3), which can be equivalently written as: N N 1 αi yi = 0 (4) min γ s. [sent-66, score-0.273]

32 αi αj yi yj kk (xi , xj ) − αi ≤ γ and 2 i,j=1 γ∈R,1C≥α∈RN + i=1 i=1 =:Sk (α) for k = 1, . [sent-68, score-0.36]

33 Note that we have one quadratic constraint per kernel (Sk (α) ≤ γ). [sent-72, score-0.296]

34 In order to solve (4), one may solve the following saddle point problem (Lagrangian): K L := γ + βk (Sk (α) − γ) (5) k=1 minimized w. [sent-74, score-0.174]

35 α ∈ RN , γ ∈ R (subject to α ≤ C1 and i αi yi = 0) and maximized + w. [sent-77, score-0.154]

36 to γ to zero, one obtains the constraint k βk = + K 1 and (5) simplifies to: L = S(α, β) := k=1 βk Sk (α) and leads to a min-max problem: K max N min βk Sk (α) s. [sent-84, score-0.103]

37 k βk Sk (α) ≥ θ (7) k=1 for all α with 0 ≤ α ≤ C1 and yi αi = 0 i Note that this is a linear program, as θ and β are only linearly constrained. [sent-90, score-0.154]

38 However there are infinitely many constraints: one for each α ∈ RN satisfying 0 ≤ α ≤ C and N i=1 αi yi = 0. [sent-91, score-0.154]

39 We will discuss in Section 4 how to solve such semi infinite linear programs. [sent-99, score-0.072]

40 3 Multiple Kernel Learning with General Cost Functions In this section we consider the more general class of MKL problems, where one is given an arbitrary strictly convex differentiable loss function, for which we derive its MKL SILP formulation. [sent-100, score-0.266]

41 We will then investigate in this general MKL SILP using different loss functions, in particular the soft-margin loss, the ǫ-insensitive loss and the quadratic loss. [sent-101, score-0.245]

42 We define the MKL primal formulation for a strictly convex and differentiable loss function L as: (for simplicity we omit a bias term) min wk ∈RDk 1 2 2 K wk k=1 N + K L(f (xi ), yi ) s. [sent-102, score-0.721]

43 : 2 i=1 2 To derive the SILP formulation we follow the same recipe as in Section 2: deriving the Lagrangian leads to a max-min problem formulation to be eventually reformulated to a SILP: K max θ∈R,β∈RK θ βk = 1 s. [sent-109, score-0.122]

44 K k=1 and βk Sk (α) ≥ θ, ∀α ∈ RN , k=1 N N ′−1 where Sk (α) = − L(L ′−1 (αi , yi ), yi ) + i=1 αi L i=1 1 (αi , yi ) + 2 2 N αi Φk (xi ) i=1 . [sent-111, score-0.462]

45 2 We assumed that L(x, y) is strictly convex and differentiable in x. [sent-112, score-0.165]

46 Unfortunately, the soft margin and ǫ-insensitive loss do not have these properties. [sent-113, score-0.201]

47 Soft Margin Loss We use the following loss in order to approximate the soft margin loss: Lσ (x, y) = C log(1 + exp((1 − xy)σ)). [sent-115, score-0.201]

48 Moreover, Lσ is strictly convex and differentiable for σ < ∞. [sent-117, score-0.165]

49 Using this loss and assuming yi ∈ {±1}, we obtain : „ «« X N N X C „ „ Cyi « αi 1 Sk (α) = − log + log − + αi yi + σ αi + Cyi αi + Cyi 2 i=1 i=1 ‚N ‚2 ‚X ‚ ‚ ‚ αi Φk (xi )‚ . [sent-118, score-0.409]

50 ‚ ‚ ‚ i=1 2 If σ → ∞, then the first two terms vanish provided that −C ≤ αi ≤ 0 if yi = 1 and N ˜ 0 ≤ αi ≤ C if yi = −1. [sent-119, score-0.308]

51 Substituting α = −˜ i yi , we then obtain Sk (α) = − i=1 αi + α ˜ 1 2 N i=1 αi yi Φk (xi ) ˜ only the i 2 2 , with 0 ≤ αi ≤ C (i = 1, . [sent-120, score-0.308]

52 , N ), which is very similar to (4): ˜ αi yi = 0 constraint is missing, since we omitted the bias. [sent-123, score-0.228]

53 One-Class Soft Margin Loss The one-class SVM soft margin (e. [sent-124, score-0.1]

54 ǫ-insensitive Loss Using the same technique for the epsilon insensitive loss L(x, y) = C(1 − |x − y|)+ , we obtain 1 Sk (α, α ) = 2 2 N ∗ (αi − ∗ αi )Φk (xi ) i=1 N 2 N ∗ (αi + αi )ǫ − − i=1 ∗ (αi − αi )yi , i=1 with 0 ≤ α, α∗ ≤ C1. [sent-127, score-0.101]

55 When including a bias term, we additionally have the constraint N ∗ i=1 (αi − αi )yi = 0. [sent-128, score-0.074]

56 It is straightforward to derive the dual problem for other loss functions such as the quadratic loss. [sent-129, score-0.234]

57 4 Algorithms to solve SILPs The SILPs considered in this work all have the following form: max θ∈R,β∈RM + θ s. [sent-131, score-0.101]

58 Using Theorem 5 in [12] one can show that the above SILP has a solution if the corresponding primal is feasible and bounded. [sent-134, score-0.093]

59 For all loss functions considered in this paper this holds true. [sent-139, score-0.13]

60 We propose to use a technique called Column Generation to solve (10). [sent-140, score-0.072]

61 Then a second algorithm generates a new constraint determined by α. [sent-143, score-0.074]

62 In the best case the other algorithm finds the constraint that maximizes the constraint violation for the given intermediate solution (β, θ), i. [sent-144, score-0.254]

63 (11) k K If αβ satisfies the constraint k=1 βk Sk (αβ ) ≥ θ, then the solution is optimal. [sent-147, score-0.102]

64 Otherwise, the constraint is added to the set of constraints. [sent-148, score-0.074]

65 Note that the problem is solved when all constraints are satisfied. [sent-155, score-0.065]

66 Hence, it is a natural choice to use the normalized PK β t S (αt ) , maximal constraint violation as a convergence criterion, i. [sent-156, score-0.152]

67 ǫ := 1 − k=1 θk k t where (β t , θt ) is the optimal solution at iteration t − 1 and αt corresponds to the newly found maximally violating constraint of the next iteration. [sent-158, score-0.14]

68 Note that (11) is for all considered cases exactly the dual optimization problem of the single kernel case for fixed β. [sent-160, score-0.33]

69 For instance for binary classification, (11) reduces to the standard SVM dual using the kernel k(xi , xj ) = k βk kk (xi , xj ): N N αi αj yi yj k(xi , xj ) − min α∈RN i,j=1 N αi i=1 with 0 ≤ α ≤ C1 and αi yi = 0. [sent-161, score-0.952]

70 Since there exist a large number of efficient algorithms to solve the single kernel problems for all sorts of cost functions, we have therefore found an easy way to extend their applicability to the problem of Multiple Kernel Learning. [sent-163, score-0.311]

71 5 Experiments In this section we will discuss toy examples for binary classification and regression, demonstrating that MKL can recover information about the problem at hand, followed by a brief review on problems for which MKL has been successfully used. [sent-167, score-0.129]

72 1 Classifications In Figure 1 we consider a binary classification problem, where we used MKL-SVMs with five RBF-kernels with different widths, to distinguish the dark star-like shape from the 2 It has been shown that solving semi-infinite problems like (7), using a method related to boosting (e. [sent-169, score-0.16]

73 [8]) one requires at most T = O(log(M )/ˆ2 ) iterations, where ǫ is the remaining constraint ǫ ˆ violation and the constants may depend on the kernels and the number of examples N [11, 14]. [sent-171, score-0.281]

74 Algorithm 1 The column generation algorithm employs a linear programming solver to iteratively solve the semi-infinite linear optimization problem (10). [sent-174, score-0.2]

75 , K K X t βk Sk (α) by single kernel algorithm with K = K X t βk K k k=1 k=1 t βk Sk (αt ) k=1 St | ≤ ǫ then break θt t+1 t+1 (β , θ ) = argmax θ if |1 − w. [sent-183, score-0.179]

76 ) Shown are the obtained kernel weightings for the five kernels and the test error which quickly drops to zero as the problem becomes separable. [sent-191, score-0.297]

77 Note that the RBF kernel with largest width was not appropriate and thus never chosen. [sent-192, score-0.31]

78 Also with increasing distance between the stars kernels with greater widths are used. [sent-193, score-0.214]

79 2 Regression We applied the newly derived MKL support vector regression formulation, to the task of learning a sine function using three RBF-kernels with different widths. [sent-196, score-0.267]

80 We then increased the frequency of the sine wave. [sent-197, score-0.15]

81 As can be seen in Figure 2, MKL-SV regression abruptly switches to the width of the RBF-kernel fitting the regression problem best. [sent-198, score-0.301]

82 In another regression experiment, we combined a linear function with two sine waves, one of lower frequency and one of high frequency, i. [sent-199, score-0.22]

83 Using ten RBF-kernels of different width (see Figure 3) we trained a MKL-SVR and display the learned weights (a column in the figure). [sent-202, score-0.131]

84 The largest selected width (100) models the linear component (since RBF with large widths are effectively linear) and the medium width (1) corresponds to the lower frequency sine. [sent-203, score-0.406]

85 We varied the frequency of the high frequency sine wave from low to high (left to right in the figure). [sent-204, score-0.237]

86 One observes that MKL determines Figure 1: A 2-class toy problem where the dark grey star-like shape is to be distinguished from the light grey star inside of the dark grey star. [sent-205, score-0.242]

87 1 0 0 1 2 3 4 5 frequency Figure 2: MKL-Support Vector Regression for the task of learning a sine wave (please see text for details). [sent-219, score-0.212]

88 an appropriate combination of kernels of low and high widths, while decreasing the RBFkernel width with increased frequency. [sent-220, score-0.245]

89 1 1000 2 4 6 8 10 12 frequency 14 16 18 0 20 Figure 3: MKL support vector regression on a linear combination of three functions: f (x) = c · x + sin(ax) + sin(bx). [sent-237, score-0.184]

90 It was shown to improve classification performance on the task of ribosomal and membrane protein prediction, where a weighting over different kernels each corresponding to a different feature set was learned. [sent-242, score-0.117]

91 Moreover, on a splice site recognition task we used MKL as a tool for interpreting the SVM classifier [16], as is displayed in Figure 4. [sent-244, score-0.106]

92 Using specifically optimized string kernels, we were able to solve the classification MKL SILP for N = 1. [sent-245, score-0.072]

93 005 0 −50 −40 −30 −20 −10 Exon Start +10 +20 +30 +40 +50 Figure 4: The figure shows an importance weighting for each position in a DNA sequence (around a so called splice site). [sent-259, score-0.107]

94 MKL was used to learn these weights, each corresponding to a sub-kernel which uses information at that position to discriminate true splice sites from fake ones. [sent-260, score-0.078]

95 6 Conclusion We have proposed a simple, yet efficient algorithm to solve the multiple kernel learning problem for a large class of loss functions. [sent-264, score-0.466]

96 The proposed method is able to exploit the existing single kernel algorithms, whereby extending their applicability. [sent-265, score-0.179]

97 In experiments we have illustrated that the MKL for classification and regression can be useful for automatic model selection and for obtaining comprehensible information about the learning problem at hand. [sent-266, score-0.159]

98 Multiple kernel learning, conic duality, and the SMO algorithm. [sent-277, score-0.224]

99 Sparse regression ensembles in infinite and finite hya pothesis spaces. [sent-359, score-0.07]

100 Learning interpretable svms for biological sequence a a classification. [sent-367, score-0.089]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('mkl', 0.598), ('silp', 0.312), ('sk', 0.294), ('kernel', 0.179), ('yi', 0.154), ('kk', 0.135), ('width', 0.131), ('svm', 0.111), ('tsch', 0.109), ('sonnenburg', 0.104), ('loss', 0.101), ('wk', 0.097), ('xi', 0.096), ('sine', 0.096), ('smo', 0.096), ('widths', 0.09), ('kernels', 0.088), ('convex', 0.079), ('cyi', 0.078), ('rdk', 0.078), ('splice', 0.078), ('violation', 0.078), ('rn', 0.078), ('sch', 0.076), ('constraint', 0.074), ('rk', 0.074), ('solve', 0.072), ('xj', 0.071), ('regression', 0.07), ('classi', 0.063), ('boosting', 0.062), ('dual', 0.06), ('svms', 0.056), ('multiple', 0.055), ('frequency', 0.054), ('christin', 0.052), ('kristin', 0.052), ('silps', 0.052), ('margin', 0.051), ('program', 0.051), ('germany', 0.05), ('soft', 0.049), ('differentiable', 0.049), ('sin', 0.046), ('formulation', 0.046), ('bx', 0.045), ('conic', 0.045), ('fer', 0.045), ('grey', 0.044), ('quadratic', 0.043), ('hundred', 0.043), ('examples', 0.041), ('dark', 0.04), ('chapelle', 0.038), ('kdd', 0.038), ('kekul', 0.038), ('newly', 0.038), ('potsdam', 0.038), ('berlin', 0.037), ('rbf', 0.037), ('strictly', 0.037), ('lanckriet', 0.036), ('solver', 0.036), ('stars', 0.036), ('solved', 0.035), ('details', 0.035), ('heterogeneous', 0.034), ('bach', 0.034), ('duality', 0.034), ('violated', 0.034), ('support', 0.034), ('smola', 0.034), ('nite', 0.034), ('feasible', 0.033), ('interpretable', 0.033), ('wave', 0.033), ('xy', 0.033), ('cation', 0.033), ('optimization', 0.032), ('argmin', 0.032), ('ax', 0.032), ('bioinformatics', 0.032), ('fraunhofer', 0.032), ('lp', 0.032), ('primal', 0.032), ('penalized', 0.031), ('ef', 0.03), ('generation', 0.03), ('automatic', 0.03), ('problem', 0.03), ('problems', 0.03), ('min', 0.029), ('considered', 0.029), ('learning', 0.029), ('weighting', 0.029), ('lagrangian', 0.029), ('binary', 0.028), ('solution', 0.028), ('site', 0.028), ('combination', 0.026)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000004 10 nips-2005-A General and Efficient Multiple Kernel Learning Algorithm

Author: Sören Sonnenburg, Gunnar Rätsch, Christin Schäfer

Abstract: While classical kernel-based learning algorithms are based on a single kernel, in practice it is often desirable to use multiple kernels. Lankriet et al. (2004) considered conic combinations of kernel matrices for classification, leading to a convex quadratically constraint quadratic program. We show that it can be rewritten as a semi-infinite linear program that can be efficiently solved by recycling the standard SVM implementations. Moreover, we generalize the formulation and our method to a larger class of problems, including regression and one-class classification. Experimental results show that the proposed algorithm helps for automatic model selection, improving the interpretability of the learning result and works for hundred thousands of examples or hundreds of kernels to be combined. 1

2 0.15679139 27 nips-2005-Analysis of Spectral Kernel Design based Semi-supervised Learning

Author: Tong Zhang, Rie Kubota Ando

Abstract: We consider a framework for semi-supervised learning using spectral decomposition based un-supervised kernel design. This approach subsumes a class of previously proposed semi-supervised learning methods on data graphs. We examine various theoretical properties of such methods. In particular, we derive a generalization performance bound, and obtain the optimal kernel design by minimizing the bound. Based on the theoretical analysis, we are able to demonstrate why spectral kernel design based methods can often improve the predictive performance. Experiments are used to illustrate the main consequences of our analysis.

3 0.15617795 113 nips-2005-Learning Multiple Related Tasks using Latent Independent Component Analysis

Author: Jian Zhang, Zoubin Ghahramani, Yiming Yang

Abstract: We propose a probabilistic model based on Independent Component Analysis for learning multiple related tasks. In our model the task parameters are assumed to be generated from independent sources which account for the relatedness of the tasks. We use Laplace distributions to model hidden sources which makes it possible to identify the hidden, independent components instead of just modeling correlations. Furthermore, our model enjoys a sparsity property which makes it both parsimonious and robust. We also propose efficient algorithms for both empirical Bayes method and point estimation. Our experimental results on two multi-label text classification data sets show that the proposed approach is promising.

4 0.13260141 14 nips-2005-A Probabilistic Interpretation of SVMs with an Application to Unbalanced Classification

Author: Yves Grandvalet, Johnny Mariethoz, Samy Bengio

Abstract: In this paper, we show that the hinge loss can be interpreted as the neg-log-likelihood of a semi-parametric model of posterior probabilities. From this point of view, SVMs represent the parametric component of a semi-parametric model fitted by a maximum a posteriori estimation procedure. This connection enables to derive a mapping from SVM scores to estimated posterior probabilities. Unlike previous proposals, the suggested mapping is interval-valued, providing a set of posterior probabilities compatible with each SVM score. This framework offers a new way to adapt the SVM optimization problem to unbalanced classification, when decisions result in unequal (asymmetric) losses. Experiments show improvements over state-of-the-art procedures. 1

5 0.12801148 42 nips-2005-Combining Graph Laplacians for Semi--Supervised Learning

Author: Andreas Argyriou, Mark Herbster, Massimiliano Pontil

Abstract: A foundational problem in semi-supervised learning is the construction of a graph underlying the data. We propose to use a method which optimally combines a number of differently constructed graphs. For each of these graphs we associate a basic graph kernel. We then compute an optimal combined kernel. This kernel solves an extended regularization problem which requires a joint minimization over both the data and the set of graph kernels. We present encouraging results on different OCR tasks where the optimal combined kernel is computed from graphs constructed with a variety of distances functions and the ‘k’ in nearest neighbors. 1

6 0.12196425 50 nips-2005-Convex Neural Networks

7 0.11940482 123 nips-2005-Maximum Margin Semi-Supervised Learning for Structured Variables

8 0.11935195 47 nips-2005-Consistency of one-class SVM and related algorithms

9 0.10582764 44 nips-2005-Computing the Solution Path for the Regularized Support Vector Regression

10 0.093518578 57 nips-2005-Distance Metric Learning for Large Margin Nearest Neighbor Classification

11 0.092977688 175 nips-2005-Sequence and Tree Kernels with Statistical Feature Mining

12 0.091065384 190 nips-2005-The Curse of Highly Variable Functions for Local Kernel Machines

13 0.089566894 77 nips-2005-From Lasso regression to Feature vector machine

14 0.087082766 114 nips-2005-Learning Rankings via Convex Hull Separation

15 0.08477211 184 nips-2005-Structured Prediction via the Extragradient Method

16 0.081090309 92 nips-2005-Hyperparameter and Kernel Learning for Graph Based Semi-Supervised Classification

17 0.079861775 196 nips-2005-Two view learning: SVM-2K, Theory and Practice

18 0.079850078 126 nips-2005-Metric Learning by Collapsing Classes

19 0.079251081 105 nips-2005-Large-Scale Multiclass Transduction

20 0.078581966 186 nips-2005-TD(0) Leads to Better Policies than Approximate Value Iteration


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.237), (1, 0.13), (2, -0.092), (3, -0.099), (4, 0.075), (5, 0.088), (6, 0.119), (7, -0.005), (8, 0.102), (9, -0.057), (10, -0.003), (11, -0.166), (12, -0.098), (13, 0.051), (14, 0.016), (15, 0.056), (16, 0.085), (17, -0.014), (18, -0.013), (19, -0.044), (20, 0.012), (21, -0.089), (22, 0.078), (23, -0.141), (24, -0.008), (25, 0.028), (26, 0.014), (27, -0.071), (28, -0.006), (29, -0.032), (30, 0.042), (31, -0.04), (32, 0.028), (33, -0.072), (34, 0.07), (35, -0.047), (36, 0.02), (37, -0.025), (38, -0.026), (39, -0.003), (40, -0.01), (41, -0.06), (42, 0.104), (43, -0.022), (44, 0.038), (45, 0.068), (46, -0.037), (47, 0.022), (48, 0.044), (49, -0.013)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95189327 10 nips-2005-A General and Efficient Multiple Kernel Learning Algorithm

Author: Sören Sonnenburg, Gunnar Rätsch, Christin Schäfer

Abstract: While classical kernel-based learning algorithms are based on a single kernel, in practice it is often desirable to use multiple kernels. Lankriet et al. (2004) considered conic combinations of kernel matrices for classification, leading to a convex quadratically constraint quadratic program. We show that it can be rewritten as a semi-infinite linear program that can be efficiently solved by recycling the standard SVM implementations. Moreover, we generalize the formulation and our method to a larger class of problems, including regression and one-class classification. Experimental results show that the proposed algorithm helps for automatic model selection, improving the interpretability of the learning result and works for hundred thousands of examples or hundreds of kernels to be combined. 1

2 0.77017844 44 nips-2005-Computing the Solution Path for the Regularized Support Vector Regression

Author: Lacey Gunter, Ji Zhu

Abstract: In this paper we derive an algorithm that computes the entire solution path of the support vector regression, with essentially the same computational cost as fitting one SVR model. We also propose an unbiased estimate for the degrees of freedom of the SVR model, which allows convenient selection of the regularization parameter. 1

3 0.69153619 123 nips-2005-Maximum Margin Semi-Supervised Learning for Structured Variables

Author: Y. Altun, D. McAllester, M. Belkin

Abstract: Many real-world classification problems involve the prediction of multiple inter-dependent variables forming some structural dependency. Recent progress in machine learning has mainly focused on supervised classification of such structured variables. In this paper, we investigate structured classification in a semi-supervised setting. We present a discriminative approach that utilizes the intrinsic geometry of input patterns revealed by unlabeled data points and we derive a maximum-margin formulation of semi-supervised learning for structured variables. Unlike transductive algorithms, our formulation naturally extends to new test points. 1

4 0.62531596 42 nips-2005-Combining Graph Laplacians for Semi--Supervised Learning

Author: Andreas Argyriou, Mark Herbster, Massimiliano Pontil

Abstract: A foundational problem in semi-supervised learning is the construction of a graph underlying the data. We propose to use a method which optimally combines a number of differently constructed graphs. For each of these graphs we associate a basic graph kernel. We then compute an optimal combined kernel. This kernel solves an extended regularization problem which requires a joint minimization over both the data and the set of graph kernels. We present encouraging results on different OCR tasks where the optimal combined kernel is computed from graphs constructed with a variety of distances functions and the ‘k’ in nearest neighbors. 1

5 0.61918998 77 nips-2005-From Lasso regression to Feature vector machine

Author: Fan Li, Yiming Yang, Eric P. Xing

Abstract: Lasso regression tends to assign zero weights to most irrelevant or redundant features, and hence is a promising technique for feature selection. Its limitation, however, is that it only offers solutions to linear models. Kernel machines with feature scaling techniques have been studied for feature selection with non-linear models. However, such approaches require to solve hard non-convex optimization problems. This paper proposes a new approach named the Feature Vector Machine (FVM). It reformulates the standard Lasso regression into a form isomorphic to SVM, and this form can be easily extended for feature selection with non-linear models by introducing kernels defined on feature vectors. FVM generates sparse solutions in the nonlinear feature space and it is much more tractable compared to feature scaling kernel machines. Our experiments with FVM on simulated data show encouraging results in identifying the small number of dominating features that are non-linearly correlated to the response, a task the standard Lasso fails to complete.

6 0.61767989 14 nips-2005-A Probabilistic Interpretation of SVMs with an Application to Unbalanced Classification

7 0.61434245 175 nips-2005-Sequence and Tree Kernels with Statistical Feature Mining

8 0.5868088 47 nips-2005-Consistency of one-class SVM and related algorithms

9 0.58266646 195 nips-2005-Transfer learning for text classification

10 0.57276487 27 nips-2005-Analysis of Spectral Kernel Design based Semi-supervised Learning

11 0.56787145 103 nips-2005-Kernels for gene regulatory regions

12 0.52962017 113 nips-2005-Learning Multiple Related Tasks using Latent Independent Component Analysis

13 0.52288604 185 nips-2005-Subsequence Kernels for Relation Extraction

14 0.51551533 57 nips-2005-Distance Metric Learning for Large Margin Nearest Neighbor Classification

15 0.513641 105 nips-2005-Large-Scale Multiclass Transduction

16 0.50095558 58 nips-2005-Divergences, surrogate loss functions and experimental design

17 0.49533844 50 nips-2005-Convex Neural Networks

18 0.49369738 12 nips-2005-A PAC-Bayes approach to the Set Covering Machine

19 0.49238479 184 nips-2005-Structured Prediction via the Extragradient Method

20 0.48588613 92 nips-2005-Hyperparameter and Kernel Learning for Graph Based Semi-Supervised Classification


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(3, 0.048), (10, 0.023), (31, 0.017), (34, 0.586), (41, 0.023), (50, 0.017), (55, 0.023), (69, 0.041), (73, 0.029), (88, 0.073), (91, 0.022)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.99241763 2 nips-2005-A Bayes Rule for Density Matrices

Author: Manfred K. Warmuth

Abstract: The classical Bayes rule computes the posterior model probability from the prior probability and the data likelihood. We generalize this rule to the case when the prior is a density matrix (symmetric positive definite and trace one) and the data likelihood a covariance matrix. The classical Bayes rule is retained as the special case when the matrices are diagonal. In the classical setting, the calculation of the probability of the data is an expected likelihood, where the expectation is over the prior distribution. In the generalized setting, this is replaced by an expected variance calculation where the variance is computed along the eigenvectors of the prior density matrix and the expectation is over the eigenvalues of the density matrix (which form a probability vector). The variances along any direction is determined by the covariance matrix. Curiously enough this expected variance calculation is a quantum measurement where the co-variance matrix specifies the instrument and the prior density matrix the mixture state of the particle. We motivate both the classical and the generalized Bayes rule with a minimum relative entropy principle, where the Kullbach-Leibler version gives the classical Bayes rule and Umegaki’s quantum relative entropy the new Bayes rule for density matrices. 1

2 0.98592108 12 nips-2005-A PAC-Bayes approach to the Set Covering Machine

Author: François Laviolette, Mario Marchand, Mohak Shah

Abstract: We design a new learning algorithm for the Set Covering Machine from a PAC-Bayes perspective and propose a PAC-Bayes risk bound which is minimized for classifiers achieving a non trivial margin-sparsity trade-off. 1

same-paper 3 0.98152488 10 nips-2005-A General and Efficient Multiple Kernel Learning Algorithm

Author: Sören Sonnenburg, Gunnar Rätsch, Christin Schäfer

Abstract: While classical kernel-based learning algorithms are based on a single kernel, in practice it is often desirable to use multiple kernels. Lankriet et al. (2004) considered conic combinations of kernel matrices for classification, leading to a convex quadratically constraint quadratic program. We show that it can be rewritten as a semi-infinite linear program that can be efficiently solved by recycling the standard SVM implementations. Moreover, we generalize the formulation and our method to a larger class of problems, including regression and one-class classification. Experimental results show that the proposed algorithm helps for automatic model selection, improving the interpretability of the learning result and works for hundred thousands of examples or hundreds of kernels to be combined. 1

4 0.98010123 42 nips-2005-Combining Graph Laplacians for Semi--Supervised Learning

Author: Andreas Argyriou, Mark Herbster, Massimiliano Pontil

Abstract: A foundational problem in semi-supervised learning is the construction of a graph underlying the data. We propose to use a method which optimally combines a number of differently constructed graphs. For each of these graphs we associate a basic graph kernel. We then compute an optimal combined kernel. This kernel solves an extended regularization problem which requires a joint minimization over both the data and the set of graph kernels. We present encouraging results on different OCR tasks where the optimal combined kernel is computed from graphs constructed with a variety of distances functions and the ‘k’ in nearest neighbors. 1

5 0.95834166 143 nips-2005-Off-Road Obstacle Avoidance through End-to-End Learning

Author: Urs Muller, Jan Ben, Eric Cosatto, Beat Flepp, Yann L. Cun

Abstract: We describe a vision-based obstacle avoidance system for off-road mobile robots. The system is trained from end to end to map raw input images to steering angles. It is trained in supervised mode to predict the steering angles provided by a human driver during training runs collected in a wide variety of terrains, weather conditions, lighting conditions, and obstacle types. The robot is a 50cm off-road truck, with two forwardpointing wireless color cameras. A remote computer processes the video and controls the robot via radio. The learning system is a large 6-layer convolutional network whose input is a single left/right pair of unprocessed low-resolution images. The robot exhibits an excellent ability to detect obstacles and navigate around them in real time at speeds of 2 m/s.

6 0.8263526 44 nips-2005-Computing the Solution Path for the Regularized Support Vector Regression

7 0.80955362 57 nips-2005-Distance Metric Learning for Large Margin Nearest Neighbor Classification

8 0.79651958 114 nips-2005-Learning Rankings via Convex Hull Separation

9 0.78469181 123 nips-2005-Maximum Margin Semi-Supervised Learning for Structured Variables

10 0.78069609 190 nips-2005-The Curse of Highly Variable Functions for Local Kernel Machines

11 0.77506977 31 nips-2005-Asymptotics of Gaussian Regularized Least Squares

12 0.77469772 105 nips-2005-Large-Scale Multiclass Transduction

13 0.7703777 77 nips-2005-From Lasso regression to Feature vector machine

14 0.75689745 184 nips-2005-Structured Prediction via the Extragradient Method

15 0.7462337 50 nips-2005-Convex Neural Networks

16 0.74320167 126 nips-2005-Metric Learning by Collapsing Classes

17 0.73629093 58 nips-2005-Divergences, surrogate loss functions and experimental design

18 0.73380679 195 nips-2005-Transfer learning for text classification

19 0.73286569 138 nips-2005-Non-Local Manifold Parzen Windows

20 0.73138034 47 nips-2005-Consistency of one-class SVM and related algorithms