nips nips2008 nips2008-20 knowledge-graph by maker-knowledge-mining

20 nips-2008-An Extended Level Method for Efficient Multiple Kernel Learning


Source: pdf

Author: Zenglin Xu, Rong Jin, Irwin King, Michael Lyu

Abstract: We consider the problem of multiple kernel learning (MKL), which can be formulated as a convex-concave problem. In the past, two efficient methods, i.e., Semi-Infinite Linear Programming (SILP) and Subgradient Descent (SD), have been proposed for large-scale multiple kernel learning. Despite their success, both methods have their own shortcomings: (a) the SD method utilizes the gradient of only the current solution, and (b) the SILP method does not regularize the approximate solution obtained from the cutting plane model. In this work, we extend the level method, which was originally designed for optimizing non-smooth objective functions, to convex-concave optimization, and apply it to multiple kernel learning. The extended level method overcomes the drawbacks of SILP and SD by exploiting all the gradients computed in past iterations and by regularizing the solution via a projection to a level set. Empirical study with eight UCI datasets shows that the extended level method can significantly improve efficiency by saving on average 91.9% of computational time over the SILP method and 70.3% over the SD method. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu † Abstract We consider the problem of multiple kernel learning (MKL), which can be formulated as a convex-concave problem. [sent-11, score-0.203]

2 , Semi-Infinite Linear Programming (SILP) and Subgradient Descent (SD), have been proposed for large-scale multiple kernel learning. [sent-14, score-0.203]

3 Despite their success, both methods have their own shortcomings: (a) the SD method utilizes the gradient of only the current solution, and (b) the SILP method does not regularize the approximate solution obtained from the cutting plane model. [sent-15, score-0.743]

4 In this work, we extend the level method, which was originally designed for optimizing non-smooth objective functions, to convex-concave optimization, and apply it to multiple kernel learning. [sent-16, score-0.519]

5 The extended level method overcomes the drawbacks of SILP and SD by exploiting all the gradients computed in past iterations and by regularizing the solution via a projection to a level set. [sent-17, score-0.732]

6 Empirical study with eight UCI datasets shows that the extended level method can significantly improve efficiency by saving on average 91. [sent-18, score-0.319]

7 This is due to the importance of kernel methods in that kernel functions define a generalized similarity measure among data. [sent-22, score-0.336]

8 A generic approach to learning a kernel function is known as multiple kernel learning (MKL) [5]: given a list of base kernel functions/matrices, MKL searches for the linear combination of base kernel functions which maximizes a generalized performance measure. [sent-23, score-0.819]

9 Previous studies [5, 14, 13, 4, 1] have shown that MKL is usually able to identify appropriate combination of kernel functions, and as a result to improve the performance. [sent-24, score-0.168]

10 For instance, base kernels can be created by using different kernel functions; they can also be created by using a single kernel function but with different subsets of features. [sent-26, score-0.42]

11 As for the performance measures needed to find the optimal kernel function, several measures have been studied for multiple kernel learning, including maximum margin classification errors [5], kernel-target alignment [4], and Fisher discriminative analysis [13]. [sent-27, score-0.393]

12 The multiple kernel learning problem was first formulated as a semi-definite programming (SDP) problem by [5]. [sent-28, score-0.243]

13 SILP is an iterative algorithm that alternates between the optimization of kernel weights and the optimization of the SVM classifier. [sent-31, score-0.273]

14 In each step, given the current solution of kernel weights, it solves a classical SVM with the combined kernel; it then constructs a cutting plane model for the objective function and updates the kernel weights by solving a corresponding linear programming problem. [sent-32, score-1.059]

15 One shortcoming of the SILP method is that it updates kernel weights solely based on the cutting plane model. [sent-34, score-0.762]

16 Given that a cutting plane model usually differs significantly from the original objective function when the solution is far away from the points where the cutting plane model is constructed, the optimal solution to the cutting plane model could be significantly off target. [sent-35, score-1.64]

17 To further improve the computational efficiency of MKL, we extended the level method [6], which was originally designed for optimizing non-smooth functions, to the optimization of convex-concave problems. [sent-38, score-0.387]

18 In the present work, similar to the SILP method, we construct in each iteration a cutting plane model for the target objective function using the solutions to the intermediate SVM problems. [sent-40, score-0.599]

19 A new solution for kernel weights is obtained by solving the cutting plane model. [sent-41, score-0.746]

20 We furthermore adjust the new solution via a projection to a level set. [sent-42, score-0.317]

21 This adjustment is critical in that it ensures on one hand the new solution is sufficiently close to the current solution, and on the other hand the new solution significantly reduces the objective function. [sent-43, score-0.192]

22 We show that the extended level method has a convergence rate of O(1/ε2 ) for a ε-accurate solution. [sent-44, score-0.311]

23 Although this is similar to that of the SD method, the extended level method is advantageous in that it utilizes all the gradients that have been computed so far. [sent-45, score-0.392]

24 Empirical results with eight UCI datasets show that the extended level method is able to greatly improve the efficiency of multiple kernel learning in comparison with the SILP method and the SD method. [sent-46, score-0.587]

25 In section 2, we review the efficient algorithms that have been designed for multiple kernel learning. [sent-48, score-0.226]

26 In section 3, we describe the details of the extended level method for MKL, including a study of its convergence rate. [sent-49, score-0.311]

27 In section 4, we present experimental results by comparing both the effectiveness and the efficiency of the extended level method with the corresponding measures of SILP and SD. [sent-50, score-0.288]

28 Here, e is a vector of all ones, C is the trade-off parameter in SVM, {Ki }m is a group i=1 of base kernel matrices, and ◦ defines the element-wise product between two vectors. [sent-62, score-0.224]

29 They differ in the 4th step in Algorithm 1: the SILP method updates p by solving a cutting plane model, while the SD method updates p using the subgradient of the current solution. [sent-73, score-0.737]

30 , i}, (2) ϕi (p; α) SD = 1 p − pi 2 (3) ν 2 2 + γi (p − pi )⊤ ∇p f (pi , αi ), where γi is the step size that needs to be decided dynamically (e. [sent-77, score-0.316]

31 Comparing the two methods, we observe • In SILP, the cutting plane model ϕSILP (p) utilizes all the {αj }i obtained in past iteraj=1 tions. [sent-84, score-0.581]

32 In contrast, SD only utilizes αi of the current solution pi . [sent-85, score-0.294]

33 • SILP updates the solution for p based on the cutting plane model ϕSILP (p). [sent-86, score-0.558]

34 Since the cutting plane model is usually inaccurate when p is far away from {pj }i , the updated j=1 solution p could be significantly off target [3]. [sent-87, score-0.595]

35 In contrast, a regularization term p − pi 2 /2 is introduced in SD to prevent the new solution being far from the current one, pi . [sent-88, score-0.424]

36 2 The proposed level method combines the strengths of both methods. [sent-89, score-0.25]

37 Similar to SILP, it utilizes the gradient information of all the iterations; similar to SD, a regularization scheme is introduced to prevent the updated solution from being too far from the current solution. [sent-90, score-0.207]

38 3 Extended Level Method for MKL We first introduce the basic steps of the level method, followed by the extension of the level method to convex-concave problems and its application to MKL. [sent-91, score-0.435]

39 1 Introduction to the Level Method The level method [6] is from the family of bundle methods, which have recently been employed to efficiently solve regularized risk minimization problems [11]. [sent-93, score-0.301]

40 In the ith iteration, the level method first constructs a lower bound for f (x) by a cutting plane model, denoted by g i (x). [sent-96, score-0.709]

41 The optimal solution, denoted i by xi , that minimizes the cutting plane model g i (x) is then computed. [sent-97, score-0.481]

42 Next, a level set for the cutting plane model g i (x) is constructed, denoted by Li = {x ∈ G : ˆ i g i (x) ≤ λf + (1 − λ)f i } where λ ∈ (0, 1) is a tradeoff constant. [sent-99, score-0.644]

43 Finally, a new solution xi+1 is computed by projecting xi onto the level set Li . [sent-100, score-0.287]

44 It is important to note that the projection step, serving a similar purpose to the regularization term in SD, prevents the new solution xi+1 from being too far away from the old one xi . [sent-101, score-0.205]

45 The cutting plane model at x0 is g 0 (x) = 9 − 6(x + 3). [sent-104, score-0.459]

46 The level method alleviates this problem by projecting x0 = −3 to the level set L0 = {x : g 0 (x) ≤ 0. [sent-107, score-0.454]

47 2 Extension of the Level Method to MKL We now extend the level method, which was originally designed for optimizing non-smooth functions, to convex-concave optimization. [sent-114, score-0.248]

48 First, since f (p, α) is convex in p and concave in α, according to van Neuman Lemma, for any optimal solution (p∗ , α∗ ) we have f (p, α∗ ) = max f (p, α) ≥ f (p∗ , α∗ ) ≥ f (p∗ , α) = min f (p, α). [sent-115, score-0.167]

49 To apply the level method, we first construct the cutting plane model. [sent-117, score-0.666]

50 We construct a cutting plane model g i (p) as follows: g i (p) = max f (p, αj ). [sent-120, score-0.516]

51 (5) 1≤j≤i We have the following proposition for the cutting plane model g i (x) Proposition 1. [sent-121, score-0.459]

52 p∈P α∈Q p∈P Second, since f (pj , αj ) = max f (pj , α), we have α∈Q i f = min f (pj , αj ) = 1≤j≤i min max f (p, α) ≥ min max f (p, α) = f (p∗ , α∗ ). [sent-136, score-0.198]

53 The following corollary indicates that the gap ∆i can be used to measure the sub-optimality for solution pi and αi . [sent-143, score-0.271]

54 i In the third step, we construct the level set Li using the estimated bounds f and f i as follows: i Li = {p ∈ P : g i (p) ≤ ℓi = λf + (1 − λ)f i }, (7) where λ ∈ (0, 1) is a predefined constant. [sent-152, score-0.229]

55 The new solution, denoted by pi+1 , is computed as the projection of pi onto the level set Li , which is equivalent to solving the following optimization problem: pi+1 = arg min p p − pi 2 2 : p ∈ P, f (p, αj ) ≤ ℓi , j = 1, . [sent-153, score-0.683]

56 (8) Although the projection is regarded as a quadratic programming problem, it can often be solved efficiently because its solution is likely to be the projection onto one of the hyperplanes of polyhedron Li . [sent-157, score-0.263]

57 As we argue in the last subsection, by means of the projection, we on the one hand ensure pi+1 is not very far away from pi , and on the other hand ensure significant progress is made in terms of g i (p) when the solution is updated from pi to pi+1 . [sent-160, score-0.452]

58 Note that the projection step in the level method saves the effort of searching for the optimal step size in SD, which is computationally expensive as will be revealed later. [sent-161, score-0.371]

59 We summarize the steps of the extended level method in Algorithm 2. [sent-162, score-0.288]

60 2) 6: Compute the projection of pi onto the level set Li by solving the optimization problem in (8) 7: Update i = i + 1 8: until ∆i ≤ ε Finally, we discuss the convergence behavior of the level method. [sent-164, score-0.702]

61 The following theorem shows the convergence rate of the level method when applied to multiple kernel learning. [sent-166, score-0.493]

62 , | maxα∈Q f (p, α) − f (p∗ , α∗ )| ≤ ε, the maximum number of iterations N that the level method requires is bounded 2 1√ 2 max Λmax (Ki ). [sent-170, score-0.318]

63 Theorem 3 tells us that the convergence rate of the level method is O(1/ε2 ). [sent-173, score-0.273]

64 It is important to note that according to Information Based Complexity (IBC) theory, given a function family F(L) with a fixed Lipschitz constant L, O(1/ε2 ) is almost the optimal convergence rate that can be achieved for any optimization method based on the black box first order oracle. [sent-174, score-0.146]

65 In other words, no matter which optimization method is used, there always exists an function f (·) ∈ F(L) such that the convergence rate is O(1/ε2 ) as long as the optimization method is based on a black box first order oracle. [sent-175, score-0.225]

66 1 Experimental Setup We follow the settings in [10] to construct the base kernel matrices, i. [sent-179, score-0.246]

67 0 Each base kernel matrix is normalized to unit trace. [sent-331, score-0.224]

68 According to the above scheme of constructing base kernel matrices, we select a batch of UCI data sets, with the cardinality and dimension allowed by the memory limit of the PC, from the UCI repository for evaluation. [sent-334, score-0.224]

69 , m max (α◦y)⊤ Ki (α◦y)−(α◦y)⊤ j=1 pj Kj (α◦y), and stop the algorithm when the criterion 1≤i≤m is less than 0. [sent-341, score-0.142]

70 01 for all experiments, since a larger λ accelerates the projection when the solution is close to the optimal one. [sent-346, score-0.154]

71 The linear programming in the SILP method and the auxiliary subproblems in the level method are solved using a general optimization toolbox MOSEK (http://www. [sent-348, score-0.422]

72 The toolbox for the level method can be downloaded from http://www. [sent-351, score-0.281]

73 We also observe that the level method is the most efficient among three methods in comparison. [sent-363, score-0.276]

74 To obtain a better picture of the computational efficiency of the proposed level method, we compute the time-saving ratio, as shown in Table 2. [sent-364, score-0.185]

75 In order to see more details of each optimization algorithm, we plot the logarithm values of the MKL objective function to base 10 against time in Figure 1. [sent-368, score-0.212]

76 It is interesting to find that the level method converges overwhelmingly faster than the other two methods. [sent-370, score-0.25]

77 The efficiency of the level method arises from two aspects: (a) the cutting plane model utilizes the computational results of all iterations and therefore boosts the search efficiency, and (b) the projection to the level sets ensures the stability of the new solution. [sent-371, score-1.071]

78 As an example, we found that for dataset “Iono”, although SD and the level method require similar numbers of iterations, SD calls the SVM solver 1231 times on average, while the level method only calls it 47 times. [sent-374, score-0.557]

79 This instability leads to very slow convergence when the solution is close to the optimal one, as indicated by the long tail of SILP in Figure 1. [sent-376, score-0.132]

80 The instability of SILP is further confirmed by the examination of kernel weights, as shown below. [sent-377, score-0.193]

81 , p), we plot the evolution curves of the five largest kernel weights for datasets “Iono”, “Breast”, and “Pima” in Figure 2. [sent-380, score-0.357]

82 We observe that the values of p computed by the SILP method are the most unstable due to oscillation of the solutions to the cutting plane models. [sent-381, score-0.595]

83 In contrast, for the proposed level method, the values of p change smoothly through iterations. [sent-383, score-0.203]

84 We believe that the stability of the level method is mainly due to the accurate estimation of bounds as well as the regularization of the projection to the level sets. [sent-384, score-0.551]

85 This observation also sheds light on why the level method can be more efficient than the SILP and the SD methods. [sent-385, score-0.25]

86 Table 2: Time-saving ratio of the level method over the SILP and the SD method Iono 78. [sent-386, score-0.315]

87 2 0 10 20 30 40 50 60 70 time (s) (c) Pima Figure 1: Evolution of objective values over time (seconds) for datasets “Iono”, “Breast”, and “Pima”. [sent-430, score-0.153]

88 Only parts of the evolution curves are plotted for SILP due to their long tails. [sent-432, score-0.125]

89 5 Conclusion and Future Work In this paper, we propose an extended level method to efficiently solve the multiple kernel learning problem. [sent-433, score-0.51]

90 In particular, the level method overcomes the drawbacks of both the SILP method and the SD method for MKL. [sent-434, score-0.422]

91 It is the employment of the projection step that guarantees finding an updated solution that, on the one hand, is close to the existing one, and one the other hand, significantly reduces the objective function. [sent-436, score-0.225]

92 Our experimental results have shown that the level method is able to greatly reduce the computational time of MKL over both the SD method and the SILP method. [sent-437, score-0.333]

93 For future work, we plan to find a scheme to adaptively set the value of λ in the level method and apply the level method to other tasks, such as one-class classification, multi-class classification, and regression. [sent-438, score-0.5]

94 Consistency of the group Lasso and multiple kernel learning. [sent-443, score-0.203]

95 Evolution of the kernel weight values in SD Evolution of the kernel weight values in SILP Evolution of the kernel weight values in Level method 0. [sent-445, score-0.719]

96 1 0 100 200 iteration 300 400 0 500 (a) Iono/SD 10 (b) Iono/SILP Evolution of the kernel weight values in SD 15 20 25 30 35 (c) Iono/Level Evolution of the kernel weight values in SILP Evolution of the kernel weight values in Level method 0. [sent-472, score-0.769]

97 1 0 20 40 60 iteration 80 100 120 0 140 0 5 10 iteration (d) Breast/SD (e) Breast/SILP Evolution of the kernel weight values in SD 15 20 iteration (f) Breast/Level Evolution of the kernel weight values in SILP Evolution of the kernel weight values in Level method 1 0. [sent-499, score-0.869]

98 1 0 20 40 60 80 100 0 0 iteration (h) Pima/SILP 5 10 15 20 25 30 iteration (i) Pima/Level Figure 2: The evolution curves of the five largest kernel weights for datasets “Iono”, “Breast” and “Pima” computed by the three MKL algorithms [2] F. [sent-526, score-0.457]

99 Multiple kernel learning, conic duality, and the SMO algorithm. [sent-534, score-0.168]

100 Discriminant kernel and regularization parameter learning via semidefinite programming. [sent-617, score-0.192]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('silp', 0.603), ('sd', 0.396), ('mkl', 0.313), ('cutting', 0.272), ('plane', 0.187), ('level', 0.185), ('kernel', 0.168), ('pi', 0.158), ('iono', 0.15), ('evolution', 0.125), ('pima', 0.094), ('pj', 0.087), ('breast', 0.079), ('utilizes', 0.074), ('projection', 0.07), ('objective', 0.068), ('method', 0.065), ('solution', 0.062), ('svm', 0.058), ('base', 0.056), ('ciency', 0.051), ('subgradient', 0.05), ('iteration', 0.05), ('sonar', 0.044), ('programming', 0.04), ('extended', 0.038), ('li', 0.038), ('updates', 0.037), ('optimization', 0.036), ('multiple', 0.035), ('max', 0.035), ('kong', 0.034), ('nemirovski', 0.033), ('iterations', 0.033), ('weights', 0.033), ('stopping', 0.033), ('gap', 0.033), ('bundle', 0.032), ('hong', 0.032), ('weight', 0.032), ('toolbox', 0.031), ('min', 0.031), ('datasets', 0.031), ('uci', 0.031), ('gradients', 0.03), ('chal', 0.029), ('lemar', 0.029), ('saves', 0.029), ('wpbc', 0.029), ('kj', 0.029), ('kernels', 0.028), ('verify', 0.027), ('oscillation', 0.027), ('away', 0.027), ('ki', 0.026), ('observe', 0.026), ('instability', 0.025), ('king', 0.025), ('lyu', 0.025), ('wdbc', 0.025), ('updated', 0.025), ('regularization', 0.024), ('ef', 0.024), ('solving', 0.024), ('vote', 0.024), ('saddle', 0.024), ('sdp', 0.024), ('designed', 0.023), ('convergence', 0.023), ('construct', 0.022), ('bounds', 0.022), ('optimal', 0.022), ('overcomes', 0.022), ('far', 0.022), ('past', 0.022), ('onto', 0.021), ('originally', 0.021), ('lanckriet', 0.021), ('criterion', 0.02), ('drawbacks', 0.02), ('calls', 0.02), ('cristianini', 0.02), ('pc', 0.02), ('projecting', 0.019), ('cantly', 0.019), ('optimizing', 0.019), ('solve', 0.019), ('corollary', 0.018), ('regularize', 0.018), ('values', 0.018), ('time', 0.018), ('initialize', 0.017), ('repeat', 0.017), ('heart', 0.017), ('theorem', 0.017), ('semide', 0.017), ('solver', 0.017), ('convex', 0.017), ('cycles', 0.016), ('logarithm', 0.016)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999994 20 nips-2008-An Extended Level Method for Efficient Multiple Kernel Learning

Author: Zenglin Xu, Rong Jin, Irwin King, Michael Lyu

Abstract: We consider the problem of multiple kernel learning (MKL), which can be formulated as a convex-concave problem. In the past, two efficient methods, i.e., Semi-Infinite Linear Programming (SILP) and Subgradient Descent (SD), have been proposed for large-scale multiple kernel learning. Despite their success, both methods have their own shortcomings: (a) the SD method utilizes the gradient of only the current solution, and (b) the SILP method does not regularize the approximate solution obtained from the cutting plane model. In this work, we extend the level method, which was originally designed for optimizing non-smooth objective functions, to convex-concave optimization, and apply it to multiple kernel learning. The extended level method overcomes the drawbacks of SILP and SD by exploiting all the gradients computed in past iterations and by regularizing the solution via a projection to a level set. Empirical study with eight UCI datasets shows that the extended level method can significantly improve efficiency by saving on average 91.9% of computational time over the SILP method and 70.3% over the SD method. 1

2 0.32411075 143 nips-2008-Multi-label Multiple Kernel Learning

Author: Shuiwang Ji, Liang Sun, Rong Jin, Jieping Ye

Abstract: We present a multi-label multiple kernel learning (MKL) formulation in which the data are embedded into a low-dimensional space directed by the instancelabel correlations encoded into a hypergraph. We formulate the problem in the kernel-induced feature space and propose to learn the kernel matrix as a linear combination of a given collection of kernel matrices in the MKL framework. The proposed learning formulation leads to a non-smooth min-max problem, which can be cast into a semi-infinite linear program (SILP). We further propose an approximate formulation with a guaranteed error bound which involves an unconstrained convex optimization problem. In addition, we show that the objective function of the approximate formulation is differentiable with Lipschitz continuous gradient, and hence existing methods can be employed to compute the optimal solution efficiently. We apply the proposed formulation to the automated annotation of Drosophila gene expression pattern images, and promising results have been reported in comparison with representative algorithms.

3 0.13197185 63 nips-2008-Dimensionality Reduction for Data in Multiple Feature Representations

Author: Yen-yu Lin, Tyng-luh Liu, Chiou-shann Fuh

Abstract: In solving complex visual learning tasks, adopting multiple descriptors to more precisely characterize the data has been a feasible way for improving performance. These representations are typically high dimensional and assume diverse forms. Thus finding a way to transform them into a unified space of lower dimension generally facilitates the underlying tasks, such as object recognition or clustering. We describe an approach that incorporates multiple kernel learning with dimensionality reduction (MKL-DR). While the proposed framework is flexible in simultaneously tackling data in various feature representations, the formulation itself is general in that it is established upon graph embedding. It follows that any dimensionality reduction techniques explainable by graph embedding can be generalized by our method to consider data in multiple feature representations.

4 0.10596494 79 nips-2008-Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning

Author: Francis R. Bach

Abstract: For supervised and unsupervised learning, positive definite kernels allow to use large and potentially infinite dimensional feature spaces with a computational cost that only depends on the number of observations. This is usually done through the penalization of predictor functions by Euclidean or Hilbertian norms. In this paper, we explore penalizing by sparsity-inducing norms such as the ℓ1 -norm or the block ℓ1 -norm. We assume that the kernel decomposes into a large sum of individual basis kernels which can be embedded in a directed acyclic graph; we show that it is then possible to perform kernel selection through a hierarchical multiple kernel learning framework, in polynomial time in the number of selected kernels. This framework is naturally applied to non linear variable selection; our extensive simulations on synthetic datasets and datasets from the UCI repository show that efficiently exploring the large feature space through sparsity-inducing norms leads to state-of-the-art predictive performance.

5 0.098560922 175 nips-2008-PSDBoost: Matrix-Generation Linear Programming for Positive Semidefinite Matrices Learning

Author: Chunhua Shen, Alan Welsh, Lei Wang

Abstract: In this work, we consider the problem of learning a positive semidefinite matrix. The critical issue is how to preserve positive semidefiniteness during the course of learning. Our algorithm is mainly inspired by LPBoost [1] and the general greedy convex optimization framework of Zhang [2]. We demonstrate the essence of the algorithm, termed PSDBoost (positive semidefinite Boosting), by focusing on a few different applications in machine learning. The proposed PSDBoost algorithm extends traditional Boosting algorithms in that its parameter is a positive semidefinite matrix with trace being one instead of a classifier. PSDBoost is based on the observation that any trace-one positive semidefinite matrix can be decomposed into linear convex combinations of trace-one rank-one matrices, which serve as base learners of PSDBoost. Numerical experiments are presented. 1

6 0.091012798 98 nips-2008-Hierarchical Semi-Markov Conditional Random Fields for Recursive Sequential Data

7 0.062165227 56 nips-2008-Deep Learning with Kernel Regularization for Visual Recognition

8 0.061904084 80 nips-2008-Extended Grassmann Kernels for Subspace-Based Learning

9 0.061290216 113 nips-2008-Kernelized Sorting

10 0.061181381 97 nips-2008-Hierarchical Fisher Kernels for Longitudinal Data

11 0.058828559 196 nips-2008-Relative Margin Machines

12 0.055397373 238 nips-2008-Theory of matching pursuit

13 0.055080615 16 nips-2008-Adaptive Template Matching with Shift-Invariant Semi-NMF

14 0.049928427 214 nips-2008-Sparse Online Learning via Truncated Gradient

15 0.048305728 227 nips-2008-Supervised Exponential Family Principal Component Analysis via Convex Optimization

16 0.046610292 66 nips-2008-Dynamic visual attention: searching for coding length increments

17 0.045634035 145 nips-2008-Multi-stage Convex Relaxation for Learning with Sparse Regularization

18 0.04533945 48 nips-2008-Clustering via LP-based Stabilities

19 0.044361159 21 nips-2008-An Homotopy Algorithm for the Lasso with Online Observations

20 0.043350086 204 nips-2008-Self-organization using synaptic plasticity


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.154), (1, -0.048), (2, -0.064), (3, 0.062), (4, 0.059), (5, -0.008), (6, -0.017), (7, -0.054), (8, -0.02), (9, -0.037), (10, 0.219), (11, -0.069), (12, -0.0), (13, 0.066), (14, 0.174), (15, 0.016), (16, 0.058), (17, -0.158), (18, -0.074), (19, -0.06), (20, 0.046), (21, 0.176), (22, -0.022), (23, 0.015), (24, -0.084), (25, -0.022), (26, 0.092), (27, 0.043), (28, 0.043), (29, -0.155), (30, -0.079), (31, 0.073), (32, -0.188), (33, -0.142), (34, -0.07), (35, -0.106), (36, -0.042), (37, -0.09), (38, -0.041), (39, -0.078), (40, 0.01), (41, -0.173), (42, -0.007), (43, -0.024), (44, 0.03), (45, -0.118), (46, 0.012), (47, 0.059), (48, -0.094), (49, -0.012)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.93886751 20 nips-2008-An Extended Level Method for Efficient Multiple Kernel Learning

Author: Zenglin Xu, Rong Jin, Irwin King, Michael Lyu

Abstract: We consider the problem of multiple kernel learning (MKL), which can be formulated as a convex-concave problem. In the past, two efficient methods, i.e., Semi-Infinite Linear Programming (SILP) and Subgradient Descent (SD), have been proposed for large-scale multiple kernel learning. Despite their success, both methods have their own shortcomings: (a) the SD method utilizes the gradient of only the current solution, and (b) the SILP method does not regularize the approximate solution obtained from the cutting plane model. In this work, we extend the level method, which was originally designed for optimizing non-smooth objective functions, to convex-concave optimization, and apply it to multiple kernel learning. The extended level method overcomes the drawbacks of SILP and SD by exploiting all the gradients computed in past iterations and by regularizing the solution via a projection to a level set. Empirical study with eight UCI datasets shows that the extended level method can significantly improve efficiency by saving on average 91.9% of computational time over the SILP method and 70.3% over the SD method. 1

2 0.85423988 143 nips-2008-Multi-label Multiple Kernel Learning

Author: Shuiwang Ji, Liang Sun, Rong Jin, Jieping Ye

Abstract: We present a multi-label multiple kernel learning (MKL) formulation in which the data are embedded into a low-dimensional space directed by the instancelabel correlations encoded into a hypergraph. We formulate the problem in the kernel-induced feature space and propose to learn the kernel matrix as a linear combination of a given collection of kernel matrices in the MKL framework. The proposed learning formulation leads to a non-smooth min-max problem, which can be cast into a semi-infinite linear program (SILP). We further propose an approximate formulation with a guaranteed error bound which involves an unconstrained convex optimization problem. In addition, we show that the objective function of the approximate formulation is differentiable with Lipschitz continuous gradient, and hence existing methods can be employed to compute the optimal solution efficiently. We apply the proposed formulation to the automated annotation of Drosophila gene expression pattern images, and promising results have been reported in comparison with representative algorithms.

3 0.57899898 63 nips-2008-Dimensionality Reduction for Data in Multiple Feature Representations

Author: Yen-yu Lin, Tyng-luh Liu, Chiou-shann Fuh

Abstract: In solving complex visual learning tasks, adopting multiple descriptors to more precisely characterize the data has been a feasible way for improving performance. These representations are typically high dimensional and assume diverse forms. Thus finding a way to transform them into a unified space of lower dimension generally facilitates the underlying tasks, such as object recognition or clustering. We describe an approach that incorporates multiple kernel learning with dimensionality reduction (MKL-DR). While the proposed framework is flexible in simultaneously tackling data in various feature representations, the formulation itself is general in that it is established upon graph embedding. It follows that any dimensionality reduction techniques explainable by graph embedding can be generalized by our method to consider data in multiple feature representations.

4 0.50868875 79 nips-2008-Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning

Author: Francis R. Bach

Abstract: For supervised and unsupervised learning, positive definite kernels allow to use large and potentially infinite dimensional feature spaces with a computational cost that only depends on the number of observations. This is usually done through the penalization of predictor functions by Euclidean or Hilbertian norms. In this paper, we explore penalizing by sparsity-inducing norms such as the ℓ1 -norm or the block ℓ1 -norm. We assume that the kernel decomposes into a large sum of individual basis kernels which can be embedded in a directed acyclic graph; we show that it is then possible to perform kernel selection through a hierarchical multiple kernel learning framework, in polynomial time in the number of selected kernels. This framework is naturally applied to non linear variable selection; our extensive simulations on synthetic datasets and datasets from the UCI repository show that efficiently exploring the large feature space through sparsity-inducing norms leads to state-of-the-art predictive performance.

5 0.48542166 175 nips-2008-PSDBoost: Matrix-Generation Linear Programming for Positive Semidefinite Matrices Learning

Author: Chunhua Shen, Alan Welsh, Lei Wang

Abstract: In this work, we consider the problem of learning a positive semidefinite matrix. The critical issue is how to preserve positive semidefiniteness during the course of learning. Our algorithm is mainly inspired by LPBoost [1] and the general greedy convex optimization framework of Zhang [2]. We demonstrate the essence of the algorithm, termed PSDBoost (positive semidefinite Boosting), by focusing on a few different applications in machine learning. The proposed PSDBoost algorithm extends traditional Boosting algorithms in that its parameter is a positive semidefinite matrix with trace being one instead of a classifier. PSDBoost is based on the observation that any trace-one positive semidefinite matrix can be decomposed into linear convex combinations of trace-one rank-one matrices, which serve as base learners of PSDBoost. Numerical experiments are presented. 1

6 0.40813982 80 nips-2008-Extended Grassmann Kernels for Subspace-Based Learning

7 0.4021697 56 nips-2008-Deep Learning with Kernel Regularization for Visual Recognition

8 0.38435295 25 nips-2008-An interior-point stochastic approximation method and an L1-regularized delta rule

9 0.37992805 44 nips-2008-Characteristic Kernels on Groups and Semigroups

10 0.37282935 97 nips-2008-Hierarchical Fisher Kernels for Longitudinal Data

11 0.36892718 68 nips-2008-Efficient Direct Density Ratio Estimation for Non-stationarity Adaptation and Outlier Detection

12 0.36782074 98 nips-2008-Hierarchical Semi-Markov Conditional Random Fields for Recursive Sequential Data

13 0.35320914 227 nips-2008-Supervised Exponential Family Principal Component Analysis via Convex Optimization

14 0.33169898 9 nips-2008-A mixture model for the evolution of gene expression in non-homogeneous datasets

15 0.32099262 113 nips-2008-Kernelized Sorting

16 0.30947053 2 nips-2008-A Convex Upper Bound on the Log-Partition Function for Binary Distributions

17 0.29987496 196 nips-2008-Relative Margin Machines

18 0.29303926 173 nips-2008-Optimization on a Budget: A Reinforcement Learning Approach

19 0.27709576 122 nips-2008-Learning with Consistency between Inductive Functions and Kernels

20 0.27484855 129 nips-2008-MAS: a multiplicative approximation scheme for probabilistic inference


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(6, 0.08), (7, 0.09), (12, 0.031), (15, 0.018), (19, 0.284), (28, 0.144), (57, 0.061), (59, 0.011), (63, 0.041), (71, 0.029), (77, 0.035), (82, 0.011), (83, 0.048), (86, 0.011)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.8115235 217 nips-2008-Sparsity of SVMs that use the epsilon-insensitive loss

Author: Ingo Steinwart, Andreas Christmann

Abstract: In this paper lower and upper bounds for the number of support vectors are derived for support vector machines (SVMs) based on the -insensitive loss function. It turns out that these bounds are asymptotically tight under mild assumptions on the data generating distribution. Finally, we briefly discuss a trade-off in between sparsity and accuracy if the SVM is used to estimate the conditional median. 1

same-paper 2 0.72701705 20 nips-2008-An Extended Level Method for Efficient Multiple Kernel Learning

Author: Zenglin Xu, Rong Jin, Irwin King, Michael Lyu

Abstract: We consider the problem of multiple kernel learning (MKL), which can be formulated as a convex-concave problem. In the past, two efficient methods, i.e., Semi-Infinite Linear Programming (SILP) and Subgradient Descent (SD), have been proposed for large-scale multiple kernel learning. Despite their success, both methods have their own shortcomings: (a) the SD method utilizes the gradient of only the current solution, and (b) the SILP method does not regularize the approximate solution obtained from the cutting plane model. In this work, we extend the level method, which was originally designed for optimizing non-smooth objective functions, to convex-concave optimization, and apply it to multiple kernel learning. The extended level method overcomes the drawbacks of SILP and SD by exploiting all the gradients computed in past iterations and by regularizing the solution via a projection to a level set. Empirical study with eight UCI datasets shows that the extended level method can significantly improve efficiency by saving on average 91.9% of computational time over the SILP method and 70.3% over the SD method. 1

3 0.71668726 91 nips-2008-Generative and Discriminative Learning with Unknown Labeling Bias

Author: Steven J. Phillips, Miroslav Dudík

Abstract: We apply robust Bayesian decision theory to improve both generative and discriminative learners under bias in class proportions in labeled training data, when the true class proportions are unknown. For the generative case, we derive an entropybased weighting that maximizes expected log likelihood under the worst-case true class proportions. For the discriminative case, we derive a multinomial logistic model that minimizes worst-case conditional log loss. We apply our theory to the modeling of species geographic distributions from presence data, an extreme case of labeling bias since there is no absence data. On a benchmark dataset, we find that entropy-based weighting offers an improvement over constant estimates of class proportions, consistently reducing log loss on unbiased test data. 1

4 0.59149384 62 nips-2008-Differentiable Sparse Coding

Author: J. A. Bagnell, David M. Bradley

Abstract: Prior work has shown that features which appear to be biologically plausible as well as empirically useful can be found by sparse coding with a prior such as a laplacian (L1 ) that promotes sparsity. We show how smoother priors can preserve the benefits of these sparse priors while adding stability to the Maximum A-Posteriori (MAP) estimate that makes it more useful for prediction problems. Additionally, we show how to calculate the derivative of the MAP estimate efficiently with implicit differentiation. One prior that can be differentiated this way is KL-regularization. We demonstrate its effectiveness on a wide variety of applications, and find that online optimization of the parameters of the KL-regularized model can significantly improve prediction performance. 1

5 0.5811004 143 nips-2008-Multi-label Multiple Kernel Learning

Author: Shuiwang Ji, Liang Sun, Rong Jin, Jieping Ye

Abstract: We present a multi-label multiple kernel learning (MKL) formulation in which the data are embedded into a low-dimensional space directed by the instancelabel correlations encoded into a hypergraph. We formulate the problem in the kernel-induced feature space and propose to learn the kernel matrix as a linear combination of a given collection of kernel matrices in the MKL framework. The proposed learning formulation leads to a non-smooth min-max problem, which can be cast into a semi-infinite linear program (SILP). We further propose an approximate formulation with a guaranteed error bound which involves an unconstrained convex optimization problem. In addition, we show that the objective function of the approximate formulation is differentiable with Lipschitz continuous gradient, and hence existing methods can be employed to compute the optimal solution efficiently. We apply the proposed formulation to the automated annotation of Drosophila gene expression pattern images, and promising results have been reported in comparison with representative algorithms.

6 0.58035946 79 nips-2008-Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning

7 0.58012533 75 nips-2008-Estimating vector fields using sparse basis field expansions

8 0.57784629 202 nips-2008-Robust Regression and Lasso

9 0.57781196 194 nips-2008-Regularized Learning with Networks of Features

10 0.57744038 175 nips-2008-PSDBoost: Matrix-Generation Linear Programming for Positive Semidefinite Matrices Learning

11 0.57396454 63 nips-2008-Dimensionality Reduction for Data in Multiple Feature Representations

12 0.57387412 205 nips-2008-Semi-supervised Learning with Weakly-Related Unlabeled Data : Towards Better Text Categorization

13 0.57321423 226 nips-2008-Supervised Dictionary Learning

14 0.57275951 245 nips-2008-Unlabeled data: Now it helps, now it doesn't

15 0.57223976 71 nips-2008-Efficient Sampling for Gaussian Process Inference using Control Variables

16 0.57187361 196 nips-2008-Relative Margin Machines

17 0.57154244 54 nips-2008-Covariance Estimation for High Dimensional Data Vectors Using the Sparse Matrix Transform

18 0.57082599 135 nips-2008-Model Selection in Gaussian Graphical Models: High-Dimensional Consistency of \boldmath$\ell 1$-regularized MLE

19 0.57082081 137 nips-2008-Modeling Short-term Noise Dependence of Spike Counts in Macaque Prefrontal Cortex

20 0.56944001 149 nips-2008-Near-minimax recursive density estimation on the binary hypercube