nips nips2009 nips2009-80 knowledge-graph by maker-knowledge-mining

80 nips-2009-Efficient and Accurate Lp-Norm Multiple Kernel Learning


Source: pdf

Author: Marius Kloft, Ulf Brefeld, Pavel Laskov, Klaus-Robert Müller, Alexander Zien, Sören Sonnenburg

Abstract: Learning linear combinations of multiple kernels is an appealing strategy when the right choice of features is unknown. Previous approaches to multiple kernel learning (MKL) promote sparse kernel combinations to support interpretability. Unfortunately, 1 -norm MKL is hardly observed to outperform trivial baselines in practical applications. To allow for robust kernel mixtures, we generalize MKL to arbitrary p -norms. We devise new insights on the connection between several existing MKL formulations and develop two efficient interleaved optimization strategies for arbitrary p > 1. Empirically, we demonstrate that the interleaved optimization strategies are much faster compared to the traditionally used wrapper approaches. Finally, we apply p -norm MKL to real-world problems from computational biology, showing that non-sparse MKL achieves accuracies that go beyond the state-of-the-art. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Efficient and Accurate p-Norm Multiple Kernel Learning Marius Kloft University of California Berkeley, USA Pavel Laskov Universit¨ t T¨ bingen a u T¨ bingen, Germany u Ulf Brefeld Yahoo! [sent-1, score-0.037]

2 Previous approaches to multiple kernel learning (MKL) promote sparse kernel combinations to support interpretability. [sent-3, score-0.47]

3 To allow for robust kernel mixtures, we generalize MKL to arbitrary p -norms. [sent-5, score-0.149]

4 We devise new insights on the connection between several existing MKL formulations and develop two efficient interleaved optimization strategies for arbitrary p > 1. [sent-6, score-0.172]

5 Empirically, we demonstrate that the interleaved optimization strategies are much faster compared to the traditionally used wrapper approaches. [sent-7, score-0.202]

6 The interpretability is one of the main reasons for the popularity of sparse methods in complex domains such as computational biology, and consequently building sparse models from data has received a significant amount of recent attention. [sent-12, score-0.134]

7 Unfortunately, sparse models do not always perform well in practice [7, 15]. [sent-13, score-0.067]

8 This holds particularly for learning sparse linear combinations of data sources [15], an abstraction of which is known as multiple kernel learning (MKL) [10]. [sent-14, score-0.262]

9 The data sources give rise to a set of (possibly correlated) kernel matrices K1 , . [sent-15, score-0.177]

10 Previous MKL research aims at finding sparse mixtures to effectively simplify the underlying data representation. [sent-19, score-0.119]

11 For instance, [10] study semi-definite matrices K 0 inducing sparseness by bounding the trace tr(K) ≤ c; unfortunately, the resulting semi-definite optimization problems are computationally too expensive for large-scale deployment. [sent-20, score-0.092]

12 Recent approaches to MKL promote sparse solutions either by Tikhonov regularization over the mixing coefficients [25] or by incorporating an additional constraint θ ≤ 1 [18, 27] requiring solutions on the standard simplex, known as Ivanov regularization. [sent-21, score-0.21]

13 Based on the one or the other, efficient optimization strategies have been proposed for solving 1 -norm MKL using semi-infinite linear programming [21], second order approaches [6], gradient-based optimization [19], and levelset methods [26]. [sent-22, score-0.206]

14 1 Previous approaches to MKL successfully identify sparse kernel mixtures, however, the solutions found, frequently suffer from poor generalization performances. [sent-24, score-0.241]

15 Often, trivial baselines using unweighted-sum kernels K = m Km are observed to outperform the sparse mixture [7]. [sent-25, score-0.233]

16 One reason for the collapse of 1 -norm MKL is that kernels deployed in real-world tasks are usually highly sophisticated and effectively capture relevant aspects of the data. [sent-26, score-0.139]

17 In contrast, sparse approaches to MKL rely on the assumption that some kernels are irrelevant for solving the problem. [sent-27, score-0.231]

18 Enforcing sparse mixtures in these situations may lead to degenerate models. [sent-28, score-0.119]

19 As a remedy, we propose to sacrifice sparseness in these situations and deploy non-sparse mixtures instead. [sent-29, score-0.081]

20 Although non-sparse solutions are not as easy to interpret, they account for (even small) contributions of all available kernels to live up to practical applications. [sent-31, score-0.165]

21 Our theorem allows for a generalized view of recent strands of multiple kernel learning research. [sent-33, score-0.195]

22 Our approach can either be motivated by additionally regularizing over the mixing coefficients θ p , or equivalently by incorporating the constraint θ p ≤ 1. [sent-35, score-0.055]

23 We propose two p p alternative optimization strategies based on Newton descent and cutting planes, respectively. [sent-36, score-0.22]

24 Large-scale experiments on gene start detection show a significant improvement of predictive accuracy compared to 1 - and ∞ -norm MKL. [sent-38, score-0.051]

25 We present our main contributions in Section 2, the theoretical analysis of existing approaches to MKL, our p -norm MKL generalization with two highly efficient optimization strategies, and relations to 1 -norm MKL. [sent-40, score-0.114]

26 to the loss V : R × Y → R, regularizer Ω : H → R, and trade-off parameter λ > 0. [sent-51, score-0.064]

27 We will later make use of kernel functions K(x, x ) = ψ(x), ψ(x ) H to compute inner products in H. [sent-53, score-0.149]

28 2 Learning with Multiple Kernels When learning with multiple kernels, we are given M different feature mappings ψm : X → Hm , m = 1, . [sent-55, score-0.046]

29 M , each giving rise to a reproducing kernel Km of Hm . [sent-58, score-0.177]

30 Approaches to multiple kernel learning consider linear kernel mixtures Kθ = θm Km , θm ≥ 0. [sent-59, score-0.396]

31 (1), the primal model for learning with multiple kernels is extended to M ˜ fw,b,θ (x) = w ψθ (xi ) + b = ˜ ˜ θm wm ψm (x) + b, (2) m=1 ˜√ ˜ where the weight vector w and the composite feature map ψθ have a block structure w = √ ˜ ˜ (w1 , . [sent-61, score-0.423]

32 The idea in learning with multiple kernels is to minimize the loss on the training data w. [sent-68, score-0.238]

33 to optimal kernel mixture θm Km in addition to regularizing θ to avoid overfitting. [sent-71, score-0.176]

34 Hence, in terms 2 of regularized risk minimization, the optimization problem becomes 1 n inf ˜ w,b,θ≥0 n M V (fw,b,θ (xi ), yi ) + i=1 λ ˜ wm 2 m=1 + µΩ[θ]. [sent-72, score-0.295]

35 ˜˜ 2 2 (3) ˜ Previous approaches to multiple kernel learning employ regularizers of the form Ω(θ) = ||θ||1 to promote sparse kernel mixtures. [sent-73, score-0.47]

36 The non-convexity of the p √ ˜ resulting optimization problem is not inherent and can be resolved by substituting wm ← θm wm . [sent-75, score-0.473]

37 We obtain the following convex optimization problem [5] that has also been considered by [25] for hinge loss and p = 1, n w,b,θ≥0 M M ˜ C inf V wm ψm (xi ) + b, yi + m=1 t that 0 i=1 wm 1 2 m=1 θm 2 2 + µ||θ||p , p (4) = 0 if t = 0 and ∞ otherwise. [sent-77, score-0.567]

38 An alternative approach has where we use the convention been studied by [18, 27] (again using hinge loss and p = 1). [sent-78, score-0.067]

39 They upper bound the value of the regularizer θ 1 ≤ 1 and incorporate the latter as an additional constraint into the optimization problem. [sent-79, score-0.125]

40 For C > 0, they arrive at n inf w,b,θ≥0 C i=1 M M V wm ψm (xi ) + b, yi m=1 + ||wm ||2 1 2 2 m=1 θm s. [sent-80, score-0.264]

41 2 Zien and Ong [27] showed that the MKL optimization problems by Bach et al. [sent-94, score-0.063]

42 As a main implication of Theorem 1 and by using the result of Zien and Ong it follows that the optimization problem of Varma and Ray [25] and the ones from [3, 18, 21, 27] all are equivalent. [sent-97, score-0.063]

43 Furthermore, we will focus on binary classification problems with Y = {−1, +1}, equipped with the hinge loss V (f (x), y) = max{0, 1 − yf (x)}. [sent-105, score-0.067]

44 However note, that all our results can easily be transferred to regression and multi-class settings using appropriate convex loss functions and joint kernel extensions. [sent-106, score-0.176]

45 3 Non-Sparse Multiple Kernel Learning We now extend the existing MKL framework to allow for non-sparse kernel mixtures θ, see also [13]. [sent-108, score-0.201]

46 (5) by expanding the hinge loss into the slack variables as follows M min θ,w,b,ξ ||wm ||2 1 2 +C ξ 2 m=1 θm (6) 1 M s. [sent-110, score-0.067]

47 ∀i : yi wm ψm (xi ) + b ≥ 1 − ξi ; m=1 3 ξ≥0; θ≥0; θ p p ≤ 1. [sent-112, score-0.232]

48 (7), we arrive at the following optimization problem which solely depends on α. [sent-130, score-0.095]

49 (9) m=1 In the limit p → ∞, the above problem reduces to the SVM dual (with Q = m Qm ), while p → 1 gives rise to a QCQP 1 -MKL variant. [sent-134, score-0.055]

50 From an optimization standpoint, our work is most closely related to the SILP approach [21] and the simpleMKL method [19, 24]. [sent-140, score-0.063]

51 The two alternative approaches proposed for p -norm MKL proposed in this paper are largely inspired by these methods and extend them in two aspects: customization to arbitrary norms and a tight coupling with minor iterations of an SVM solver, respectively. [sent-142, score-0.063]

52 α is performed by chunking optimization with minor iterations. [sent-151, score-0.15]

53 1 Newton Descent For a Newton descent on the mixing coefficients, we first compute the partial derivatives 1 wm wm ∂L p−1 = − + δθm 2 ∂θm 2 θm =: and ∂2L w wm p−2 = m3 + (p − 1)δθm 2θ ∂ m θm =:hm θm of the original Lagrangian. [sent-155, score-0.671]

54 2 Cutting Planes In order to obtain an alternative optimization strategy, we fix θ and build the partial Lagrangian w. [sent-167, score-0.063]

55 The above optimization problem is a saddle point problem and can be solved by alternating α and θ optimization step. [sent-175, score-0.126]

56 [21] optimize the above SIP for p ≥ 1 with interleaving cutting plane algorithms. [sent-183, score-0.123]

57 The solution of a quadratic program (here the regular SVM) generates the most strongly violated constraint for the actual mixture θ. [sent-184, score-0.103]

58 The optimal mixture is then used for computing a new constraint and so on. [sent-186, score-0.052]

59 Unfortunately, for p > 1, a non-linearity is introduced by requiring θ p ≤ 1 and such constraint is p unlikely to be found in standard optimization toolboxes that often handle only linear and quadratic constraints. [sent-187, score-0.114]

60 Note that the quadratic term in the approximation is diagonal wherefore the subsequent quadratically constrained problem can be solved efficiently. [sent-194, score-0.051]

61 (left) Training using fixed number of 50 kernels varying training set size. [sent-198, score-0.165]

62 1 We apply the method of [21] for 1 -norm results as it is contained as a special case of our cutting plane strategy. [sent-203, score-0.123]

63 We write ∞ -norm MKL for a regular SVM with the unweighted-sum kernel K = m Km . [sent-204, score-0.149]

64 We compare our p -norm MKL with two methods for 1 -norm MKL, simpleMKL [19] and SILP-based chunking [21], and to SVMs using the unweighted-sum kernel ( ∞ -norm MKL) as additional baseline. [sent-209, score-0.198]

65 Figure 1 (left) displays the results for varying sample sizes and 50 precomputed Gaussian kernels with different bandwidths. [sent-214, score-0.169]

66 Unsurprisingly, the SVM with the unweighted-sum kernel is the fastest method. [sent-216, score-0.149]

67 1) is slightly faster than the cutting plane variant (Section 2. [sent-219, score-0.123]

68 2 Figure 1 (right) shows the results for varying the number of precomputed RBF kernels for a fixed sample size of 500. [sent-223, score-0.169]

69 The SVM with the unweighted-sum kernel is hardly affected by this setup and performs constantly. [sent-224, score-0.178]

70 The 1 -norm MKL by [21] handles the increasing number of kernels best and is the fastest MKL method. [sent-225, score-0.139]

71 Overall, our proposed Newton and cutting plane based optimization strategies achieve a speedup of often more than one order of magnitude. [sent-228, score-0.241]

72 2 Protein Subcellular Localization The prediction of the subcellular localization of proteins is one of the rare empirical success stories of 1 -norm-regularized MKL [17, 27]: after defining 69 kernels that capture diverse aspects of 1 2 Available at http://www. [sent-230, score-0.25]

73 41 protein sequences, 1 -norm-MKL could raise the predictive accuracy significantly above that of the unweighted sum of kernels (thereby also improving on established prediction systems for this problem). [sent-241, score-0.21]

74 3 Gene Start Recognition This experiment aims at detecting transcription start sites (TSS) of RNA Polymerase II binding genes in genomic DNA sequences. [sent-252, score-0.124]

75 Accurate detection of the transcription start site is crucial to identify genes and their promoter regions and can be regarded as a first step in deciphering the key regulatory elements in the promoter region that determine transcription. [sent-253, score-0.28]

76 These are translated into positive training instances by extracting windows of size [−1000, +1000] around the TSS. [sent-255, score-0.053]

77 Similar to [4], 85,042 negative instances are generated from the interior of the gene using the same window size. [sent-256, score-0.053]

78 Following [22], we employ five different kernels representing the TSS signal (weighted degree with shift), the promoter (spectrum), the 1st exon (spectrum), angles (linear), and energies (linear). [sent-257, score-0.23]

79 Optimal kernel parameters are determined by model selection in [22]. [sent-258, score-0.149]

80 Every kernel is normalized such that all points have unit length in feature space. [sent-259, score-0.149]

81 We reserve 13,000 and 20,000 randomly drawn instances for holdout and test sets, respectively, and use the remaining 60,000 as the training pool. [sent-260, score-0.053]

82 Figure 2 shows test errors for varying training set sizes drawn from the pool; training sets of the same size are disjoint. [sent-261, score-0.052]

83 Error bars indicate standard errors of repetitions for small training set sizes. [sent-262, score-0.05]

84 4 Conclusion and Discussion We presented an efficient and accurate approach to non-sparse multiple kernel learning and showed that our p -norm MKL can be motivated as Tikhonov and Ivanov regularization of the mixing coefficients, respectively. [sent-268, score-0.254]

85 Furthermore, we devised two efficient approaches to non-sparse multiple kernel learning for arbitrary p -norms, p > 1. [sent-270, score-0.22]

86 For p = 1 consistent sparse solutios are obtained while the optimal p = 2 distributes wheights on the weighted degree and the 2 spectrum kernels in good agreement to [22]. [sent-288, score-0.232]

87 optimization strategies are based on semi-infinite programming and Newton descent, both interleaved with chunking-based SVM training. [sent-289, score-0.172]

88 Execution times moreover revealed that our interleaved optimization vastly outperforms commonly used wrapper approaches. [sent-290, score-0.147]

89 We would like to note that there is a certain preference/obsession for sparse models in the scientific community due to various reasons. [sent-291, score-0.067]

90 Rather on the contrary: non-sparse model may improve quite impressively over sparse ones. [sent-293, score-0.067]

91 We remark nevertheless that some interesting asymptotic results exist that show model selection consistency of sparse MKL (or the closely related group lasso) [2, 14], in other words in the limit n → ∞ MKL is guaranteed to find the correct subset of kernels. [sent-295, score-0.067]

92 However, also the rate of convergence to the true estimator needs to be considered, thus √ we conjecture that the rate slower than n which is common to sparse estimators [11] may be one of the reasons for finding excellent (nonasymptotic) results in non-sparse MKL. [sent-296, score-0.095]

93 Cross-validation based model building that includes the choice of p will however inevitably tell us which classes should be treated sparse and which non-sparse. [sent-305, score-0.067]

94 Large-scale experiments on TSS recognition even raised the bar for 1 -norm MKL: non-sparse MKL proved consistently better than its sparse counterparts which were outperformed by an unweightedsum kernel. [sent-306, score-0.067]

95 MK, UB, SS, and AZ each contributed substantially to both mathematical modelling, design and implementation of algorithms, conception and execution of experiments, and writing of the manuscript. [sent-309, score-0.075]

96 Consistency of the group lasso and multiple kernel learning. [sent-323, score-0.219]

97 Multiple kernel learning, conic duality, and the smo algorithm. [sent-337, score-0.149]

98 Emergence of simple-cell receptive field properties by learning a sparse code for natural images. [sent-429, score-0.067]

99 dbTSS: Database of human transcriptional start sites and full-length cDNAs. [sent-475, score-0.051]

100 An extended level method for efficient multiple kernel learning. [sent-493, score-0.195]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('mkl', 0.793), ('wm', 0.205), ('kernel', 0.149), ('kernels', 0.139), ('newton', 0.135), ('simplemkl', 0.124), ('tss', 0.101), ('promoter', 0.091), ('qm', 0.088), ('km', 0.083), ('subcellular', 0.081), ('cutting', 0.076), ('sparse', 0.067), ('berlin', 0.065), ('svm', 0.065), ('optimization', 0.063), ('ivanov', 0.06), ('mcc', 0.06), ('sip', 0.06), ('zien', 0.057), ('sonnenburg', 0.057), ('strategies', 0.055), ('interleaved', 0.054), ('lagrangian', 0.052), ('mixtures', 0.052), ('tikhonov', 0.049), ('chunking', 0.049), ('plane', 0.047), ('multiple', 0.046), ('protein', 0.041), ('cpa', 0.04), ('dbtss', 0.04), ('qcqp', 0.04), ('remp', 0.04), ('sqclp', 0.04), ('suzuki', 0.04), ('execution', 0.04), ('germany', 0.04), ('hinge', 0.04), ('ong', 0.039), ('norm', 0.039), ('bach', 0.039), ('minor', 0.038), ('transcription', 0.038), ('regularizer', 0.037), ('universit', 0.037), ('bingen', 0.037), ('hm', 0.035), ('contributed', 0.035), ('genes', 0.035), ('ub', 0.035), ('promote', 0.034), ('auc', 0.033), ('primal', 0.033), ('duality', 0.033), ('technische', 0.032), ('rakotomamonjy', 0.032), ('arrive', 0.032), ('mixing', 0.03), ('lanckriet', 0.03), ('unweighted', 0.03), ('canu', 0.03), ('precomputed', 0.03), ('wrapper', 0.03), ('localization', 0.03), ('sparseness', 0.029), ('regularization', 0.029), ('hardly', 0.029), ('rise', 0.028), ('estimators', 0.028), ('planes', 0.027), ('varma', 0.027), ('mixture', 0.027), ('instances', 0.027), ('dual', 0.027), ('loss', 0.027), ('yi', 0.027), ('bioinformatics', 0.026), ('spectrum', 0.026), ('remedy', 0.026), ('sites', 0.026), ('descent', 0.026), ('gene', 0.026), ('contributions', 0.026), ('training', 0.026), ('quadratic', 0.026), ('taylor', 0.025), ('quadratically', 0.025), ('coef', 0.025), ('program', 0.025), ('approaches', 0.025), ('biology', 0.025), ('constraint', 0.025), ('start', 0.025), ('cients', 0.025), ('remarkable', 0.024), ('tiny', 0.024), ('strategy', 0.024), ('lasso', 0.024), ('repetitions', 0.024)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999905 80 nips-2009-Efficient and Accurate Lp-Norm Multiple Kernel Learning

Author: Marius Kloft, Ulf Brefeld, Pavel Laskov, Klaus-Robert Müller, Alexander Zien, Sören Sonnenburg

Abstract: Learning linear combinations of multiple kernels is an appealing strategy when the right choice of features is unknown. Previous approaches to multiple kernel learning (MKL) promote sparse kernel combinations to support interpretability. Unfortunately, 1 -norm MKL is hardly observed to outperform trivial baselines in practical applications. To allow for robust kernel mixtures, we generalize MKL to arbitrary p -norms. We devise new insights on the connection between several existing MKL formulations and develop two efficient interleaved optimization strategies for arbitrary p > 1. Empirically, we demonstrate that the interleaved optimization strategies are much faster compared to the traditionally used wrapper approaches. Finally, we apply p -norm MKL to real-world problems from computational biology, showing that non-sparse MKL achieves accuracies that go beyond the state-of-the-art. 1

2 0.56215072 179 nips-2009-On the Algorithmics and Applications of a Mixed-norm based Kernel Learning Formulation

Author: Saketha N. Jagarlapudi, Dinesh G, Raman S, Chiranjib Bhattacharyya, Aharon Ben-tal, Ramakrishnan K.r.

Abstract: Motivated from real world problems, like object categorization, we study a particular mixed-norm regularization for Multiple Kernel Learning (MKL). It is assumed that the given set of kernels are grouped into distinct components where each component is crucial for the learning task at hand. The formulation hence employs l∞ regularization for promoting combinations at the component level and l1 regularization for promoting sparsity among kernels in each component. While previous attempts have formulated this as a non-convex problem, the formulation given here is an instance of non-smooth convex optimization problem which admits an efficient Mirror-Descent (MD) based procedure. The MD procedure optimizes over product of simplexes, which is not a well-studied case in literature. Results on real-world datasets show that the new MKL formulation is well-suited for object categorization tasks and that the MD based algorithm outperforms stateof-the-art MKL solvers like simpleMKL in terms of computational effort. 1

3 0.1360084 128 nips-2009-Learning Non-Linear Combinations of Kernels

Author: Corinna Cortes, Mehryar Mohri, Afshin Rostamizadeh

Abstract: This paper studies the general problem of learning kernels based on a polynomial combination of base kernels. We analyze this problem in the case of regression and the kernel ridge regression algorithm. We examine the corresponding learning kernel optimization problem, show how that minimax problem can be reduced to a simpler minimization problem, and prove that the global solution of this problem always lies on the boundary. We give a projection-based gradient descent algorithm for solving the optimization problem, shown empirically to converge in few iterations. Finally, we report the results of extensive experiments with this algorithm using several publicly available datasets demonstrating the effectiveness of our technique.

4 0.10367044 64 nips-2009-Data-driven calibration of linear estimators with minimal penalties

Author: Sylvain Arlot, Francis R. Bach

Abstract: This paper tackles the problem of selecting among several linear estimators in non-parametric regression; this includes model selection for linear regression, the choice of a regularization parameter in kernel ridge regression or spline smoothing, and the choice of a kernel in multiple kernel learning. We propose a new algorithm which first estimates consistently the variance of the noise, based upon the concept of minimal penalty which was previously introduced in the context of model selection. Then, plugging our variance estimate in Mallows’ CL penalty is proved to lead to an algorithm satisfying an oracle inequality. Simulation experiments with kernel ridge regression and multiple kernel learning show that the proposed algorithm often improves significantly existing calibration procedures such as 10-fold cross-validation or generalized cross-validation. 1

5 0.10236232 119 nips-2009-Kernel Methods for Deep Learning

Author: Youngmin Cho, Lawrence K. Saul

Abstract: We introduce a new family of positive-definite kernel functions that mimic the computation in large, multilayer neural nets. These kernel functions can be used in shallow architectures, such as support vector machines (SVMs), or in deep kernel-based architectures that we call multilayer kernel machines (MKMs). We evaluate SVMs and MKMs with these kernel functions on problems designed to illustrate the advantages of deep architectures. On several problems, we obtain better results than previous, leading benchmarks from both SVMs with Gaussian kernels as well as deep belief nets. 1

6 0.096560851 118 nips-2009-Kernel Choice and Classifiability for RKHS Embeddings of Probability Distributions

7 0.084193997 120 nips-2009-Kernels and learning curves for Gaussian process regression on random graphs

8 0.084060274 79 nips-2009-Efficient Recovery of Jointly Sparse Vectors

9 0.08237087 95 nips-2009-Fast subtree kernels on graphs

10 0.080909893 33 nips-2009-Analysis of SVM with Indefinite Kernels

11 0.077796616 77 nips-2009-Efficient Match Kernel between Sets of Features for Visual Recognition

12 0.070553958 20 nips-2009-A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers

13 0.061067201 72 nips-2009-Distribution Matching for Transduction

14 0.060542323 236 nips-2009-Structured output regression for detection with partial truncation

15 0.059696283 246 nips-2009-Time-Varying Dynamic Bayesian Networks

16 0.056772716 108 nips-2009-Heterogeneous multitask learning with joint sparsity constraints

17 0.056602471 8 nips-2009-A Fast, Consistent Kernel Two-Sample Test

18 0.053495016 160 nips-2009-Multiple Incremental Decremental Learning of Support Vector Machines

19 0.052279957 61 nips-2009-Convex Relaxation of Mixture Regression with Efficient Algorithms

20 0.052224066 92 nips-2009-Fast Graph Laplacian Regularized Kernel Learning via Semidefinite–Quadratic–Linear Programming


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.186), (1, 0.12), (2, -0.073), (3, 0.124), (4, -0.085), (5, -0.049), (6, 0.025), (7, 0.159), (8, -0.037), (9, 0.036), (10, 0.236), (11, -0.283), (12, -0.017), (13, 0.064), (14, -0.207), (15, 0.145), (16, -0.152), (17, -0.006), (18, -0.086), (19, 0.092), (20, 0.074), (21, -0.238), (22, -0.05), (23, 0.082), (24, 0.109), (25, -0.145), (26, -0.032), (27, 0.114), (28, 0.044), (29, -0.022), (30, 0.118), (31, 0.223), (32, 0.104), (33, 0.034), (34, 0.083), (35, -0.121), (36, 0.039), (37, -0.152), (38, -0.131), (39, -0.089), (40, 0.046), (41, 0.03), (42, 0.052), (43, 0.087), (44, -0.091), (45, -0.1), (46, 0.016), (47, -0.027), (48, 0.001), (49, 0.011)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.92383617 80 nips-2009-Efficient and Accurate Lp-Norm Multiple Kernel Learning

Author: Marius Kloft, Ulf Brefeld, Pavel Laskov, Klaus-Robert Müller, Alexander Zien, Sören Sonnenburg

Abstract: Learning linear combinations of multiple kernels is an appealing strategy when the right choice of features is unknown. Previous approaches to multiple kernel learning (MKL) promote sparse kernel combinations to support interpretability. Unfortunately, 1 -norm MKL is hardly observed to outperform trivial baselines in practical applications. To allow for robust kernel mixtures, we generalize MKL to arbitrary p -norms. We devise new insights on the connection between several existing MKL formulations and develop two efficient interleaved optimization strategies for arbitrary p > 1. Empirically, we demonstrate that the interleaved optimization strategies are much faster compared to the traditionally used wrapper approaches. Finally, we apply p -norm MKL to real-world problems from computational biology, showing that non-sparse MKL achieves accuracies that go beyond the state-of-the-art. 1

2 0.91126209 179 nips-2009-On the Algorithmics and Applications of a Mixed-norm based Kernel Learning Formulation

Author: Saketha N. Jagarlapudi, Dinesh G, Raman S, Chiranjib Bhattacharyya, Aharon Ben-tal, Ramakrishnan K.r.

Abstract: Motivated from real world problems, like object categorization, we study a particular mixed-norm regularization for Multiple Kernel Learning (MKL). It is assumed that the given set of kernels are grouped into distinct components where each component is crucial for the learning task at hand. The formulation hence employs l∞ regularization for promoting combinations at the component level and l1 regularization for promoting sparsity among kernels in each component. While previous attempts have formulated this as a non-convex problem, the formulation given here is an instance of non-smooth convex optimization problem which admits an efficient Mirror-Descent (MD) based procedure. The MD procedure optimizes over product of simplexes, which is not a well-studied case in literature. Results on real-world datasets show that the new MKL formulation is well-suited for object categorization tasks and that the MD based algorithm outperforms stateof-the-art MKL solvers like simpleMKL in terms of computational effort. 1

3 0.47836575 128 nips-2009-Learning Non-Linear Combinations of Kernels

Author: Corinna Cortes, Mehryar Mohri, Afshin Rostamizadeh

Abstract: This paper studies the general problem of learning kernels based on a polynomial combination of base kernels. We analyze this problem in the case of regression and the kernel ridge regression algorithm. We examine the corresponding learning kernel optimization problem, show how that minimax problem can be reduced to a simpler minimization problem, and prove that the global solution of this problem always lies on the boundary. We give a projection-based gradient descent algorithm for solving the optimization problem, shown empirically to converge in few iterations. Finally, we report the results of extensive experiments with this algorithm using several publicly available datasets demonstrating the effectiveness of our technique.

4 0.4697521 79 nips-2009-Efficient Recovery of Jointly Sparse Vectors

Author: Liang Sun, Jun Liu, Jianhui Chen, Jieping Ye

Abstract: We consider the reconstruction of sparse signals in the multiple measurement vector (MMV) model, in which the signal, represented as a matrix, consists of a set of jointly sparse vectors. MMV is an extension of the single measurement vector (SMV) model employed in standard compressive sensing (CS). Recent theoretical studies focus on the convex relaxation of the MMV problem based on the (2, 1)-norm minimization, which is an extension of the well-known 1-norm minimization employed in SMV. However, the resulting convex optimization problem in MMV is significantly much more difficult to solve than the one in SMV. Existing algorithms reformulate it as a second-order cone programming (SOCP) or semidefinite programming (SDP) problem, which is computationally expensive to solve for problems of moderate size. In this paper, we propose a new (dual) reformulation of the convex optimization problem in MMV and develop an efficient algorithm based on the prox-method. Interestingly, our theoretical analysis reveals the close connection between the proposed reformulation and multiple kernel learning. Our simulation studies demonstrate the scalability of the proposed algorithm.

5 0.41111451 33 nips-2009-Analysis of SVM with Indefinite Kernels

Author: Yiming Ying, Colin Campbell, Mark Girolami

Abstract: The recent introduction of indefinite SVM by Luss and d’Aspremont [15] has effectively demonstrated SVM classification with a non-positive semi-definite kernel (indefinite kernel). This paper studies the properties of the objective function introduced there. In particular, we show that the objective function is continuously differentiable and its gradient can be explicitly computed. Indeed, we further show that its gradient is Lipschitz continuous. The main idea behind our analysis is that the objective function is smoothed by the penalty term, in its saddle (min-max) representation, measuring the distance between the indefinite kernel matrix and the proxy positive semi-definite one. Our elementary result greatly facilitates the application of gradient-based algorithms. Based on our analysis, we further develop Nesterov’s smooth optimization approach [17, 18] for indefinite SVM which has an optimal convergence rate for smooth problems. Experiments on various benchmark datasets validate our analysis and demonstrate the efficiency of our proposed algorithms.

6 0.36693028 77 nips-2009-Efficient Match Kernel between Sets of Features for Visual Recognition

7 0.35157353 119 nips-2009-Kernel Methods for Deep Learning

8 0.31479165 92 nips-2009-Fast Graph Laplacian Regularized Kernel Learning via Semidefinite–Quadratic–Linear Programming

9 0.31029826 118 nips-2009-Kernel Choice and Classifiability for RKHS Embeddings of Probability Distributions

10 0.30684376 64 nips-2009-Data-driven calibration of linear estimators with minimal penalties

11 0.26955128 95 nips-2009-Fast subtree kernels on graphs

12 0.26730993 76 nips-2009-Efficient Learning using Forward-Backward Splitting

13 0.26414126 160 nips-2009-Multiple Incremental Decremental Learning of Support Vector Machines

14 0.25758806 108 nips-2009-Heterogeneous multitask learning with joint sparsity constraints

15 0.24976276 222 nips-2009-Sparse Estimation Using General Likelihoods and Non-Factorial Priors

16 0.24176487 72 nips-2009-Distribution Matching for Transduction

17 0.23228867 26 nips-2009-Adaptive Regularization for Transductive Support Vector Machine

18 0.23043115 224 nips-2009-Sparse and Locally Constant Gaussian Graphical Models

19 0.22690041 73 nips-2009-Dual Averaging Method for Regularized Stochastic Learning and Online Optimization

20 0.22365843 173 nips-2009-Nonparametric Greedy Algorithms for the Sparse Learning Problem


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(7, 0.283), (21, 0.021), (24, 0.033), (25, 0.058), (35, 0.05), (36, 0.14), (39, 0.041), (58, 0.099), (61, 0.023), (71, 0.051), (81, 0.013), (86, 0.093)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.9721033 165 nips-2009-Noise Characterization, Modeling, and Reduction for In Vivo Neural Recording

Author: Zhi Yang, Qi Zhao, Edward Keefer, Wentai Liu

Abstract: Studying signal and noise properties of recorded neural data is critical in developing more efficient algorithms to recover the encoded information. Important issues exist in this research including the variant spectrum spans of neural spikes that make it difficult to choose a globally optimal bandpass filter. Also, multiple sources produce aggregated noise that deviates from the conventional white Gaussian noise. In this work, the spectrum variability of spikes is addressed, based on which the concept of adaptive bandpass filter that fits the spectrum of individual spikes is proposed. Multiple noise sources have been studied through analytical models as well as empirical measurements. The dominant noise source is identified as neuron noise followed by interface noise of the electrode. This suggests that major efforts to reduce noise from electronics are not well spent. The measured noise from in vivo experiments shows a family of 1/f x spectrum that can be reduced using noise shaping techniques. In summary, the methods of adaptive bandpass filtering and noise shaping together result in several dB signal-to-noise ratio (SNR) enhancement.

2 0.88018918 95 nips-2009-Fast subtree kernels on graphs

Author: Nino Shervashidze, Karsten M. Borgwardt

Abstract: In this article, we propose fast subtree kernels on graphs. On graphs with n nodes and m edges and maximum degree d, these kernels comparing subtrees of height h can be computed in O(mh), whereas the classic subtree kernel by Ramon & G¨ rtner scales as O(n2 4d h). Key to this efficiency is the observation that the a Weisfeiler-Lehman test of isomorphism from graph theory elegantly computes a subtree kernel as a byproduct. Our fast subtree kernels can deal with labeled graphs, scale up easily to large graphs and outperform state-of-the-art graph kernels on several classification benchmark datasets in terms of accuracy and runtime. 1

3 0.87440377 170 nips-2009-Nonlinear directed acyclic structure learning with weakly additive noise models

Author: Arthur Gretton, Peter Spirtes, Robert E. Tillman

Abstract: The recently proposed additive noise model has advantages over previous directed structure learning approaches since it (i) does not assume linearity or Gaussianity and (ii) can discover a unique DAG rather than its Markov equivalence class. However, for certain distributions, e.g. linear Gaussians, the additive noise model is invertible and thus not useful for structure learning, and it was originally proposed for the two variable case with a multivariate extension which requires enumerating all possible DAGs. We introduce weakly additive noise models, which extends this framework to cases where the additive noise model is invertible and when additive noise is not present. We then provide an algorithm that learns an equivalence class for such models from data, by combining a PC style search using recent advances in kernel measures of conditional dependence with local searches for additive noise models in substructures of the Markov equivalence class. This results in a more computationally efficient approach that is useful for arbitrary distributions even when additive noise models are invertible. 1

4 0.86344159 69 nips-2009-Discrete MDL Predicts in Total Variation

Author: Marcus Hutter

Abstract: The Minimum Description Length (MDL) principle selects the model that has the shortest code for data plus model. We show that for a countable class of models, MDL predictions are close to the true distribution in a strong sense. The result is completely general. No independence, ergodicity, stationarity, identifiability, or other assumption on the model class need to be made. More formally, we show that for any countable class of models, the distributions selected by MDL (or MAP) asymptotically predict (merge with) the true measure in the class in total variation distance. Implications for non-i.i.d. domains like time-series forecasting, discriminative learning, and reinforcement learning are discussed. 1

same-paper 5 0.83335435 80 nips-2009-Efficient and Accurate Lp-Norm Multiple Kernel Learning

Author: Marius Kloft, Ulf Brefeld, Pavel Laskov, Klaus-Robert Müller, Alexander Zien, Sören Sonnenburg

Abstract: Learning linear combinations of multiple kernels is an appealing strategy when the right choice of features is unknown. Previous approaches to multiple kernel learning (MKL) promote sparse kernel combinations to support interpretability. Unfortunately, 1 -norm MKL is hardly observed to outperform trivial baselines in practical applications. To allow for robust kernel mixtures, we generalize MKL to arbitrary p -norms. We devise new insights on the connection between several existing MKL formulations and develop two efficient interleaved optimization strategies for arbitrary p > 1. Empirically, we demonstrate that the interleaved optimization strategies are much faster compared to the traditionally used wrapper approaches. Finally, we apply p -norm MKL to real-world problems from computational biology, showing that non-sparse MKL achieves accuracies that go beyond the state-of-the-art. 1

6 0.74845475 17 nips-2009-A Sparse Non-Parametric Approach for Single Channel Separation of Known Sounds

7 0.63196403 237 nips-2009-Subject independent EEG-based BCI decoding

8 0.62968075 217 nips-2009-Sharing Features among Dynamical Systems with Beta Processes

9 0.62623161 162 nips-2009-Neural Implementation of Hierarchical Bayesian Inference by Importance Sampling

10 0.62268788 191 nips-2009-Positive Semidefinite Metric Learning with Boosting

11 0.6187712 226 nips-2009-Spatial Normalized Gamma Processes

12 0.61537427 32 nips-2009-An Online Algorithm for Large Scale Image Similarity Learning

13 0.6149528 38 nips-2009-Augmenting Feature-driven fMRI Analyses: Semi-supervised learning and resting state activity

14 0.61093551 184 nips-2009-Optimizing Multi-Class Spatio-Spectral Filters via Bayes Error Estimation for EEG Classification

15 0.60887212 62 nips-2009-Correlation Coefficients are Insufficient for Analyzing Spike Count Dependencies

16 0.6087603 168 nips-2009-Non-stationary continuous dynamic Bayesian networks

17 0.60855067 210 nips-2009-STDP enables spiking neurons to detect hidden causes of their inputs

18 0.60466474 3 nips-2009-AUC optimization and the two-sample problem

19 0.60336864 104 nips-2009-Group Sparse Coding

20 0.60288632 128 nips-2009-Learning Non-Linear Combinations of Kernels