nips nips2008 nips2008-143 knowledge-graph by maker-knowledge-mining

143 nips-2008-Multi-label Multiple Kernel Learning


Source: pdf

Author: Shuiwang Ji, Liang Sun, Rong Jin, Jieping Ye

Abstract: We present a multi-label multiple kernel learning (MKL) formulation in which the data are embedded into a low-dimensional space directed by the instancelabel correlations encoded into a hypergraph. We formulate the problem in the kernel-induced feature space and propose to learn the kernel matrix as a linear combination of a given collection of kernel matrices in the MKL framework. The proposed learning formulation leads to a non-smooth min-max problem, which can be cast into a semi-infinite linear program (SILP). We further propose an approximate formulation with a guaranteed error bound which involves an unconstrained convex optimization problem. In addition, we show that the objective function of the approximate formulation is differentiable with Lipschitz continuous gradient, and hence existing methods can be employed to compute the optimal solution efficiently. We apply the proposed formulation to the automated annotation of Drosophila gene expression pattern images, and promising results have been reported in comparison with representative algorithms.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract We present a multi-label multiple kernel learning (MKL) formulation in which the data are embedded into a low-dimensional space directed by the instancelabel correlations encoded into a hypergraph. [sent-9, score-0.456]

2 We formulate the problem in the kernel-induced feature space and propose to learn the kernel matrix as a linear combination of a given collection of kernel matrices in the MKL framework. [sent-10, score-0.502]

3 The proposed learning formulation leads to a non-smooth min-max problem, which can be cast into a semi-infinite linear program (SILP). [sent-11, score-0.318]

4 We further propose an approximate formulation with a guaranteed error bound which involves an unconstrained convex optimization problem. [sent-12, score-0.504]

5 In addition, we show that the objective function of the approximate formulation is differentiable with Lipschitz continuous gradient, and hence existing methods can be employed to compute the optimal solution efficiently. [sent-13, score-0.408]

6 We apply the proposed formulation to the automated annotation of Drosophila gene expression pattern images, and promising results have been reported in comparison with representative algorithms. [sent-14, score-0.634]

7 In this paradigm, a weighted graph is constructed for the data set, where the nodes represent the data points and the edge weights characterize the relationships between vertices. [sent-16, score-0.111]

8 The structural and spectral properties of graph can then be exploited to perform the learning task. [sent-17, score-0.06]

9 Hypergraphs [1, 2] generalize traditional graphs by allowing edges, called hyperedges, to connect more than two vertices, thereby being able to capture the relationships among multiple vertices. [sent-19, score-0.101]

10 In this paper, we propose to use a hypergraph to capture the correlation information for multi-label learning [3]. [sent-20, score-0.403]

11 In particular, we propose to construct a hypergraph for multi-label data in which all data points annotated with a common label are included in a hyperedge, thereby capturing the similarity among data points with a common label. [sent-21, score-0.495]

12 By exploiting the spectral properties of the constructed hypergraph, we propose to embed the multi-label data into a lower-dimensional space in which data points with a common label tend to be close to each other. [sent-22, score-0.204]

13 We formulate the multi-label learning problem in the kernel-induced feature space, and show that the well-known kernel canonical correlation analysis (KCCA) [4] is a special case of the proposed framework. [sent-23, score-0.286]

14 As the kernel plays an essential role in the formulation, we propose to learn the kernel matrix as a linear combination of a given collection of kernel matrices in the multiple kernel learning (MKL) framework. [sent-24, score-0.878]

15 The resulting formulation involves a non-smooth min-max problem, and we show that it can be cast into a semiinfinite linear program (SILP). [sent-25, score-0.284]

16 To further improve the efficiency and reduce the non-smoothness effect of the SILP formulation, we propose an approximate formulation by introducing a smoothing term into the original problem. [sent-26, score-0.386]

17 In addition, the objective function of the approximate formulation is shown to be differentiable with Lipschitz continuous gradient. [sent-28, score-0.377]

18 We can thus employ the Nesterov’s method [5, 6], which solves smooth convex problems with the optimal convergence rate, to compute the solution efficiently. [sent-29, score-0.135]

19 We apply the proposed formulation to the automated annotation of Drosophila gene expression pattern images, which document the spatial and temporal dynamics of gene expression during Drosophila embryogenesis [7]. [sent-30, score-0.817]

20 To facilitate pattern comparison and searching, groups of images are annotated with a variable number of labels by human curators in the Berkeley Drosophila Genome Project (BDGP) high-throughput study [7]. [sent-32, score-0.202]

21 Since the labels are associated with groups of a variable number of images, we propose to extract invariant features from each image and construct kernels between groups of images by employing the vocabulary-guided pyramid match algorithm [9]. [sent-35, score-0.397]

22 By applying various local descriptors, we obtain multiple kernel matrices and the proposed multi-label MKL formulation is applied to obtain an optimal kernel matrix for the low-dimensional embedding. [sent-36, score-0.755]

23 Experimental results demonstrate the effectiveness of the kernel matrices obtained by the proposed formulation. [sent-37, score-0.263]

24 Moreover, the approximate formulation is shown to yield similar results to the original formulation, while it is much more efficient. [sent-38, score-0.287]

25 We propose to capture such information through a hypergraph as described below. [sent-40, score-0.361]

26 1 Hypergraph Spectral Learning Hypergraphs generalize traditional graphs by allowing hyperedges to connect more than two vertices, thus capturing the joint relationships among multiple vertices. [sent-42, score-0.155]

27 We propose to construct a hypergraph for multi-label data in which each data point is represented as a vertex. [sent-43, score-0.392]

28 To document the joint similarity among data points annotated with a common label, we propose to construct a hyperedge for each label and include all data points annotated with a common label into one hyperedge. [sent-44, score-0.354]

29 In this formulation, the instance-label correlations are encoded into L through the hypergraph, and data points sharing a common label tend to be close to each other in the embedded space. [sent-46, score-0.08]

30 (1) can be reformulated as max tr B T (KCK)B (2) B subject to B T (K 2 + λK)B = I, where K = φ(X)T φ(X) is the kernel matrix. [sent-49, score-0.387]

31 Kernel canonical correlation analysis (KCCA) [4] is a widely-used method for dimensionality reduction. [sent-50, score-0.081]

32 2 A Semi-infinite Linear Program Formulation It follows from the theory of kernel methods [11] that the kernel K in Eq. [sent-55, score-0.342]

33 Thus, kernel selection (learning) is one of the central issues in kernel methods. [sent-57, score-0.342]

34 (3) that all the candidate kernel matrices are normalized to have a unit trace value. [sent-60, score-0.229]

35 It has been shown [8] that the optimal weights maximizing the objective function in Eq. [sent-61, score-0.128]

36 (2) can be obtained by solving a semi-infinite linear program (SILP) [13] in which a linear objective is optimized subject to an infinite number of linear constraints, as summarized in the following theorem: Theorem 2. [sent-62, score-0.161]

37 Given a set of p kernel matrices {Kj }p , the optimal kernel matrix in K that maxij=1 mizes the objective function in Eq. [sent-64, score-0.528]

38 3 The Approximate Formulation The multi-label kernel learning formulation proposed in Theorem 2. [sent-71, score-0.422]

39 1 involves optimizing a linear objective subject to an infinite number of constraints. [sent-72, score-0.096]

40 The column generation technique used to solve this problem adds constraints to the problem successively until all the constraints are satisfied. [sent-73, score-0.062]

41 In this section, we propose an approximate formulation by introducing a smoothing term into the original problem. [sent-75, score-0.386]

42 This results in an unconstrained and smooth convex problem. [sent-76, score-0.167]

43 We propose to employ existing methods to solve the smooth convex optimization problem efficiently in the next section. [sent-77, score-0.199]

44 1 as p max min θ:θ T e=1,θ≥0 Z θj Sj (Z) j=1 and exchanging the minimization and maximization, the SILP formulation can be expressed as min f (Z) (7) Z where f (Z) is defined as p f (Z) = max θ:θ T e=1,θ≥0 θj Sj (Z). [sent-79, score-0.323]

45 (8) with respect to θ leads to a non-smooth objective function for f (Z). [sent-81, score-0.058]

46 To reduce this effect, we introduce a smoothing term and modify the objective to fµ (Z) as   p  p  fµ (Z) = max θj Sj (Z) − µ θj log θj , (9)  θ:θ T e=1,θ≥0  j=1 j=1 where µ is a positive constant controlling the approximation. [sent-82, score-0.202]

47 (9) can be solved analytically, and the optimal value can be expressed as   p 1 Sj (Z)  . [sent-87, score-0.066]

48 By removing {αj }p and substituting θj into the objective function j=1 in Eq. [sent-94, score-0.058]

49 The above discussion shows that we can approximate the original non-smooth constrained min-max problem in Eq. [sent-99, score-0.07]

50 (7) by the following smooth unconstrained minimization problem: min fµ (Z), (13) Z where fµ (Z) is defined in Eq. [sent-100, score-0.108]

51 We show in the following two lemmas that the approximate formulation in Eq. [sent-102, score-0.287]

52 (13) is convex and has a guaranteed approximation bound controlled by µ. [sent-103, score-0.059]

53 (13) can be expressed equivalently as   p min p Z,{uj }j=1 ,{vj }p j=1 subject to µ log  µuj ≥ k exp uj + vj − j=1 1 4 i=1 k T zi zi , i=1 µvj ≥ 1 4λ T z i hi  (14) k T zi Kj zi , j = 1, · · · , p. [sent-110, score-0.378]

54 i=1 Since the log-exponential-sum function is a convex function and the two constraints are second-order cone constraints, the problem in Eq. [sent-111, score-0.09]

55 Thus, we have − j=1 θj log θj ≤ log p and |fµ (Z) − f (Z)| = p −µ j=1 θj log θj ≤ µ log p. [sent-123, score-0.22]

56 4 Solving the Approximate Formulation Using the Nesterov’s Method The Nesterov’s method (known as “the optimal method” in [5]) is an algorithm for solving smooth convex problems with the optimal rate of convergence. [sent-125, score-0.2]

57 In this method, the objective function needs to be differentiable with Lipschitz continuous gradient. [sent-126, score-0.09]

58 In order to apply this method to solve the proposed approximate formulation, we first compute the Lipschitz constant for the gradient of function fµ (Z), as summarized in the following lemma: Lemma 4. [sent-127, score-0.104]

59 Then the Lipschitz constant L of the gradient of fµ (Z) can be bounded from above as L ≤ Lµ , (15) where Lµ is defined as Lµ = 1 1 1 + max λmax (Kj ) + tr(Z T Z) max λmax ((Ki − Kj )(Ki − Kj )T ), (16) 1≤i,j≤p 2 2λ 1≤j≤p 8µλ2 and λmax (·) denotes the maximum eigenvalue. [sent-131, score-0.106]

60 Moreover, the distance from the origin to the optimal 2 2 set of Z can be bounded as tr(Z T Z) ≤ Rµ where Rµ is defined as 2 k 2 Rµ = ||[Cj ]i ||2 + 4µ log p + tr T Cj i=1 1 Cj = 2 I + λ Kj −1 1 I + Kj Cj λ , (17) H and [Cj ]i denotes the ith column of Cj . [sent-132, score-0.211]

61 Then we have i=1 L≤ 1 1 1 + max λmax (Kj ) + max tr(Z T (Ki − Kj )(Ki − Kj )T Z) ≤ Lµ . [sent-135, score-0.106]

62 To this end, we first rewrite Sj (Z) as Sj (Z) = 1 1 1 1 T tr (Z − Cj )T I + Kj (Z − Cj ) − tr Cj I + Kj Cj . [sent-139, score-0.25]

63 4 λ 4 λ Since min fµ (Z) ≤ fµ (0) = µ log p, and fµ (Z) ≥ Sj (Z), we have Sj (Z) ≤ µ log p for j = 1 T 1, · · · , p. [sent-140, score-0.11]

64 It follows that 1 tr (Z − Cj )T (Z − Cj ) ≤ µ log p + 1 tr Cj I + λ Kj Cj . [sent-141, score-0.305]

65 The Nesterov’s method for solving the proposed approximate formulation is presented in Table 1. [sent-144, score-0.355]

66 After the optimal Z is obtained from the Nesterov’s method, the optimal {θj }p can be computed j=1 from Eq. [sent-145, score-0.062]

67 It follows from the convergence proof in [5] that after N iterations, as long as i 0 fµ (X ) ≤ fµ (X ) for i = 1, · · · , N, we have fµ (Z N +1 ) − fµ (Z ∗ ) ≤ 2 4Lµ Rµ , (N + 1)2 (20) Table 1: The Nesterov’s method for solving the proposed multi-label MKL formulation. [sent-147, score-0.068]

68 Furthermore, since fµ (Z N +1 ) ≥ f (Z N +1 ) and fµ (Z ∗ ) ≤ f (Z ∗ ) + µ log p, we have 2 4Lµ Rµ f (Z N +1 ) − f (Z ∗ ) ≤ µ log p + . [sent-149, score-0.11]

69 5 Experiments In this section, we evaluate the proposed formulation on the automated annotation of gene expression pattern images. [sent-153, score-0.634]

70 The performance of the approximate formulation is also validated. [sent-154, score-0.287]

71 Experimental Setup The experiments use a collection of gene expression pattern images retrieved from the FlyExpress database (http://www. [sent-155, score-0.275]

72 We apply nine local descriptors (SIFT, shape context, PCA-SIFT, spin image, steerable filters, differential invariants, complex filters, moment invariants, and cross correlation) on regular grids of 16 and 32 pixels in radius and spacing on each image. [sent-158, score-0.07]

73 These local descriptors are commonly used in computer vision problems [15]. [sent-159, score-0.07]

74 After generating the features, we apply the vocabulary-guided pyramid match algorithm [9] to construct kernels between the image sets. [sent-162, score-0.228]

75 A total of 23 kernel matrices (2 grid size × 9 local descriptors + 2 Gabor + 3 pixel) are constructed. [sent-163, score-0.299]

76 Then the proposed MKL formulation is employed to obtain the optimal integrated kernel matrix based on which the low-dimensional embedding is computed. [sent-164, score-0.576]

77 We use the expansion-based approach (star and clique) to construct the hypergraph Laplacian, since it has been shown [1] that the Laplacians constructed in this way are similar to those obtained directly from a hypergraph. [sent-165, score-0.364]

78 The performance of kernel matrices (either single or integrated) is evaluated by applying the support vector machine (SVM) for each term using the one-against-rest scheme. [sent-166, score-0.229]

79 Performance Evaluation It can be observed from Tables 2 and 3 that in terms of both macro and micro F1 scores, the kernels integrated by either star or clique expansions achieve the highest performance on almost all of the data sets. [sent-170, score-0.407]

80 In particular, the integrated kernels outperform the best individual kernel significantly on all data sets. [sent-171, score-0.331]

81 This shows that the proposed formulation is effective Table 2: Performance of integrated kernels and the best individual kernel (denoted as BIK) in terms of macro F1 score. [sent-172, score-0.636]

82 “SILP”, “APP”, “SVM1”, and “Uniform” denote the performance of kernels combined with the SILP formulation, the approximate formulation, the 1-norm SVM formulation proposed in [12] applied for each label separately, and the case where all kernels are given the same weight, respectively. [sent-174, score-0.585]

83 3781 BIK in combining multiple kernels and exploiting the complementary information contained in different kernels constructed from various features. [sent-340, score-0.287]

84 Moreover, the proposed formulation based on a hypergraph outperforms the classical KCCA consistently. [sent-341, score-0.549]

85 SILP versus the Approximate Formulation In terms of classification performance, we can observe from Tables 2 and 3 that the SILP and the approximate formulations are similar. [sent-342, score-0.123]

86 More precisely, the approximate formulations perform slightly better than SILP in almost all cases. [sent-343, score-0.123]

87 Figure 1 compares the computation time and the kernel weights of SILPstar and APPstar . [sent-345, score-0.21]

88 It can be observed that in general the approximate formulation is significantly faster than SILP, especially when the number of labels and the number of image sets are large, while they both yields very similar kernel weights. [sent-346, score-0.564]

89 6 Conclusions and Future Work We present a multi-label learning formulation that incorporates instance-label correlations by a hypergraph. [sent-347, score-0.251]

90 We formulate the problem in the kernel-induced feature space and propose to learn the kernel matrix in the MKL framework. [sent-348, score-0.273]

91 The resulting formulation leads to a non-smooth min-max problem, and it can be cast as an SILP. [sent-349, score-0.253]

92 We propose an approximate formulation by introducing a smoothing term and show that the resulting formulation is an unconstrained convex problem that can be solved by the Nesterov’s method. [sent-350, score-0.76]

93 We demonstrate the effectiveness and efficiency of the method on the task of automated annotation of gene expression pattern images. [sent-351, score-0.383]

94 3 300 Weight for kernels Computation time (in seconds) 350 0. [sent-354, score-0.109]

95 The left panel plots the computation time of two formulations on one partition of the data set as the number of labels and image sets increase gradually, and the right panel plots the weights assigned to each of the 23 kernels by SILPstar and APPstar on a data set of 40 labels and 1000 image sets. [sent-361, score-0.413]

96 The experiments in this paper focus on the annotation of gene expression pattern images. [sent-362, score-0.317]

97 The proposed formulation can also be applied to the task of multiple object recognition in computer vision. [sent-363, score-0.285]

98 Experimental results indicate that the best individual kernel may not lead to a large weight by the proposed MKL formulation. [sent-365, score-0.205]

99 Systematic determination of patterns of gene expression during Drosophila embryogenesis. [sent-408, score-0.183]

100 Automated annotation of Drosophila gene expression patterns using a controlled vocabulary. [sent-416, score-0.278]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('kj', 0.347), ('silp', 0.304), ('hypergraph', 0.298), ('nesterov', 0.244), ('sj', 0.219), ('formulation', 0.217), ('silpstar', 0.189), ('cj', 0.185), ('mkl', 0.182), ('kernel', 0.171), ('vec', 0.146), ('appstar', 0.135), ('kcca', 0.13), ('tr', 0.125), ('gene', 0.124), ('drosophila', 0.122), ('kernels', 0.109), ('annotation', 0.095), ('star', 0.091), ('lipschitz', 0.085), ('ki', 0.083), ('arizona', 0.081), ('hypergraphs', 0.081), ('tempe', 0.081), ('gj', 0.07), ('descriptors', 0.07), ('approximate', 0.07), ('automated', 0.066), ('bik', 0.065), ('app', 0.065), ('propose', 0.063), ('unconstrained', 0.063), ('spectral', 0.06), ('convex', 0.059), ('expression', 0.059), ('objective', 0.058), ('matrices', 0.058), ('az', 0.058), ('annotated', 0.057), ('log', 0.055), ('clique', 0.055), ('appclique', 0.054), ('appkcca', 0.054), ('hyperedge', 0.054), ('hyperedges', 0.054), ('macro', 0.054), ('silpclique', 0.054), ('silpkcca', 0.054), ('image', 0.053), ('labels', 0.053), ('images', 0.053), ('formulations', 0.053), ('max', 0.053), ('integrated', 0.051), ('zi', 0.05), ('lagrangian', 0.049), ('micro', 0.047), ('label', 0.046), ('uj', 0.045), ('smooth', 0.045), ('invariants', 0.043), ('correlation', 0.042), ('lemma', 0.042), ('jin', 0.041), ('qi', 0.04), ('vj', 0.04), ('weights', 0.039), ('pattern', 0.039), ('canonical', 0.039), ('matrix', 0.039), ('subject', 0.038), ('relationships', 0.037), ('smoothing', 0.036), ('ti', 0.036), ('lters', 0.036), ('laplacian', 0.036), ('cast', 0.036), ('li', 0.035), ('sun', 0.035), ('pyramid', 0.035), ('constructed', 0.035), ('solved', 0.035), ('proposed', 0.034), ('correlations', 0.034), ('gabor', 0.034), ('zhou', 0.034), ('multiple', 0.034), ('solving', 0.034), ('embedding', 0.033), ('ji', 0.033), ('optimization', 0.032), ('sch', 0.032), ('differentiable', 0.032), ('dk', 0.032), ('program', 0.031), ('optimal', 0.031), ('construct', 0.031), ('constraints', 0.031), ('genome', 0.031), ('connect', 0.03)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999976 143 nips-2008-Multi-label Multiple Kernel Learning

Author: Shuiwang Ji, Liang Sun, Rong Jin, Jieping Ye

Abstract: We present a multi-label multiple kernel learning (MKL) formulation in which the data are embedded into a low-dimensional space directed by the instancelabel correlations encoded into a hypergraph. We formulate the problem in the kernel-induced feature space and propose to learn the kernel matrix as a linear combination of a given collection of kernel matrices in the MKL framework. The proposed learning formulation leads to a non-smooth min-max problem, which can be cast into a semi-infinite linear program (SILP). We further propose an approximate formulation with a guaranteed error bound which involves an unconstrained convex optimization problem. In addition, we show that the objective function of the approximate formulation is differentiable with Lipschitz continuous gradient, and hence existing methods can be employed to compute the optimal solution efficiently. We apply the proposed formulation to the automated annotation of Drosophila gene expression pattern images, and promising results have been reported in comparison with representative algorithms.

2 0.32411075 20 nips-2008-An Extended Level Method for Efficient Multiple Kernel Learning

Author: Zenglin Xu, Rong Jin, Irwin King, Michael Lyu

Abstract: We consider the problem of multiple kernel learning (MKL), which can be formulated as a convex-concave problem. In the past, two efficient methods, i.e., Semi-Infinite Linear Programming (SILP) and Subgradient Descent (SD), have been proposed for large-scale multiple kernel learning. Despite their success, both methods have their own shortcomings: (a) the SD method utilizes the gradient of only the current solution, and (b) the SILP method does not regularize the approximate solution obtained from the cutting plane model. In this work, we extend the level method, which was originally designed for optimizing non-smooth objective functions, to convex-concave optimization, and apply it to multiple kernel learning. The extended level method overcomes the drawbacks of SILP and SD by exploiting all the gradients computed in past iterations and by regularizing the solution via a projection to a level set. Empirical study with eight UCI datasets shows that the extended level method can significantly improve efficiency by saving on average 91.9% of computational time over the SILP method and 70.3% over the SD method. 1

3 0.15539837 63 nips-2008-Dimensionality Reduction for Data in Multiple Feature Representations

Author: Yen-yu Lin, Tyng-luh Liu, Chiou-shann Fuh

Abstract: In solving complex visual learning tasks, adopting multiple descriptors to more precisely characterize the data has been a feasible way for improving performance. These representations are typically high dimensional and assume diverse forms. Thus finding a way to transform them into a unified space of lower dimension generally facilitates the underlying tasks, such as object recognition or clustering. We describe an approach that incorporates multiple kernel learning with dimensionality reduction (MKL-DR). While the proposed framework is flexible in simultaneously tackling data in various feature representations, the formulation itself is general in that it is established upon graph embedding. It follows that any dimensionality reduction techniques explainable by graph embedding can be generalized by our method to consider data in multiple feature representations.

4 0.14977659 9 nips-2008-A mixture model for the evolution of gene expression in non-homogeneous datasets

Author: Gerald Quon, Yee W. Teh, Esther Chan, Timothy Hughes, Michael Brudno, Quaid D. Morris

Abstract: We address the challenge of assessing conservation of gene expression in complex, non-homogeneous datasets. Recent studies have demonstrated the success of probabilistic models in studying the evolution of gene expression in simple eukaryotic organisms such as yeast, for which measurements are typically scalar and independent. Models capable of studying expression evolution in much more complex organisms such as vertebrates are particularly important given the medical and scientific interest in species such as human and mouse. We present Brownian Factor Phylogenetic Analysis, a statistical model that makes a number of significant extensions to previous models to enable characterization of changes in expression among highly complex organisms. We demonstrate the efficacy of our method on a microarray dataset profiling diverse tissues from multiple vertebrate species. We anticipate that the model will be invaluable in the study of gene expression patterns in other diverse organisms as well, such as worms and insects. 1

5 0.14380032 79 nips-2008-Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning

Author: Francis R. Bach

Abstract: For supervised and unsupervised learning, positive definite kernels allow to use large and potentially infinite dimensional feature spaces with a computational cost that only depends on the number of observations. This is usually done through the penalization of predictor functions by Euclidean or Hilbertian norms. In this paper, we explore penalizing by sparsity-inducing norms such as the ℓ1 -norm or the block ℓ1 -norm. We assume that the kernel decomposes into a large sum of individual basis kernels which can be embedded in a directed acyclic graph; we show that it is then possible to perform kernel selection through a hierarchical multiple kernel learning framework, in polynomial time in the number of selected kernels. This framework is naturally applied to non linear variable selection; our extensive simulations on synthetic datasets and datasets from the UCI repository show that efficiently exploring the large feature space through sparsity-inducing norms leads to state-of-the-art predictive performance.

6 0.11428012 146 nips-2008-Multi-task Gaussian Process Learning of Robot Inverse Dynamics

7 0.10678696 113 nips-2008-Kernelized Sorting

8 0.099005707 155 nips-2008-Nonparametric regression and classification with joint sparsity constraints

9 0.098388255 116 nips-2008-Learning Hybrid Models for Image Annotation with Partially Labeled Data

10 0.09724503 56 nips-2008-Deep Learning with Kernel Regularization for Visual Recognition

11 0.096814036 44 nips-2008-Characteristic Kernels on Groups and Semigroups

12 0.089542709 80 nips-2008-Extended Grassmann Kernels for Subspace-Based Learning

13 0.08452186 175 nips-2008-PSDBoost: Matrix-Generation Linear Programming for Positive Semidefinite Matrices Learning

14 0.082496375 97 nips-2008-Hierarchical Fisher Kernels for Longitudinal Data

15 0.078229114 193 nips-2008-Regularized Co-Clustering with Dual Supervision

16 0.075650968 135 nips-2008-Model Selection in Gaussian Graphical Models: High-Dimensional Consistency of \boldmath$\ell 1$-regularized MLE

17 0.075271487 227 nips-2008-Supervised Exponential Family Principal Component Analysis via Convex Optimization

18 0.071430981 202 nips-2008-Robust Regression and Lasso

19 0.070433743 117 nips-2008-Learning Taxonomies by Dependence Maximization

20 0.070270687 75 nips-2008-Estimating vector fields using sparse basis field expansions


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.231), (1, -0.093), (2, -0.058), (3, 0.054), (4, 0.099), (5, -0.035), (6, -0.017), (7, -0.053), (8, -0.008), (9, -0.05), (10, 0.248), (11, -0.115), (12, 0.011), (13, 0.027), (14, 0.207), (15, 0.01), (16, 0.028), (17, -0.1), (18, -0.066), (19, -0.071), (20, 0.035), (21, 0.215), (22, -0.032), (23, 0.017), (24, -0.097), (25, -0.022), (26, 0.047), (27, 0.158), (28, 0.133), (29, -0.206), (30, -0.018), (31, 0.091), (32, -0.149), (33, -0.103), (34, -0.008), (35, -0.056), (36, -0.028), (37, -0.104), (38, -0.125), (39, -0.047), (40, 0.062), (41, -0.136), (42, 0.002), (43, -0.034), (44, 0.029), (45, -0.167), (46, -0.007), (47, 0.008), (48, -0.051), (49, -0.038)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94199568 143 nips-2008-Multi-label Multiple Kernel Learning

Author: Shuiwang Ji, Liang Sun, Rong Jin, Jieping Ye

Abstract: We present a multi-label multiple kernel learning (MKL) formulation in which the data are embedded into a low-dimensional space directed by the instancelabel correlations encoded into a hypergraph. We formulate the problem in the kernel-induced feature space and propose to learn the kernel matrix as a linear combination of a given collection of kernel matrices in the MKL framework. The proposed learning formulation leads to a non-smooth min-max problem, which can be cast into a semi-infinite linear program (SILP). We further propose an approximate formulation with a guaranteed error bound which involves an unconstrained convex optimization problem. In addition, we show that the objective function of the approximate formulation is differentiable with Lipschitz continuous gradient, and hence existing methods can be employed to compute the optimal solution efficiently. We apply the proposed formulation to the automated annotation of Drosophila gene expression pattern images, and promising results have been reported in comparison with representative algorithms.

2 0.87391627 20 nips-2008-An Extended Level Method for Efficient Multiple Kernel Learning

Author: Zenglin Xu, Rong Jin, Irwin King, Michael Lyu

Abstract: We consider the problem of multiple kernel learning (MKL), which can be formulated as a convex-concave problem. In the past, two efficient methods, i.e., Semi-Infinite Linear Programming (SILP) and Subgradient Descent (SD), have been proposed for large-scale multiple kernel learning. Despite their success, both methods have their own shortcomings: (a) the SD method utilizes the gradient of only the current solution, and (b) the SILP method does not regularize the approximate solution obtained from the cutting plane model. In this work, we extend the level method, which was originally designed for optimizing non-smooth objective functions, to convex-concave optimization, and apply it to multiple kernel learning. The extended level method overcomes the drawbacks of SILP and SD by exploiting all the gradients computed in past iterations and by regularizing the solution via a projection to a level set. Empirical study with eight UCI datasets shows that the extended level method can significantly improve efficiency by saving on average 91.9% of computational time over the SILP method and 70.3% over the SD method. 1

3 0.62521178 63 nips-2008-Dimensionality Reduction for Data in Multiple Feature Representations

Author: Yen-yu Lin, Tyng-luh Liu, Chiou-shann Fuh

Abstract: In solving complex visual learning tasks, adopting multiple descriptors to more precisely characterize the data has been a feasible way for improving performance. These representations are typically high dimensional and assume diverse forms. Thus finding a way to transform them into a unified space of lower dimension generally facilitates the underlying tasks, such as object recognition or clustering. We describe an approach that incorporates multiple kernel learning with dimensionality reduction (MKL-DR). While the proposed framework is flexible in simultaneously tackling data in various feature representations, the formulation itself is general in that it is established upon graph embedding. It follows that any dimensionality reduction techniques explainable by graph embedding can be generalized by our method to consider data in multiple feature representations.

4 0.54413706 9 nips-2008-A mixture model for the evolution of gene expression in non-homogeneous datasets

Author: Gerald Quon, Yee W. Teh, Esther Chan, Timothy Hughes, Michael Brudno, Quaid D. Morris

Abstract: We address the challenge of assessing conservation of gene expression in complex, non-homogeneous datasets. Recent studies have demonstrated the success of probabilistic models in studying the evolution of gene expression in simple eukaryotic organisms such as yeast, for which measurements are typically scalar and independent. Models capable of studying expression evolution in much more complex organisms such as vertebrates are particularly important given the medical and scientific interest in species such as human and mouse. We present Brownian Factor Phylogenetic Analysis, a statistical model that makes a number of significant extensions to previous models to enable characterization of changes in expression among highly complex organisms. We demonstrate the efficacy of our method on a microarray dataset profiling diverse tissues from multiple vertebrate species. We anticipate that the model will be invaluable in the study of gene expression patterns in other diverse organisms as well, such as worms and insects. 1

5 0.53392214 79 nips-2008-Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning

Author: Francis R. Bach

Abstract: For supervised and unsupervised learning, positive definite kernels allow to use large and potentially infinite dimensional feature spaces with a computational cost that only depends on the number of observations. This is usually done through the penalization of predictor functions by Euclidean or Hilbertian norms. In this paper, we explore penalizing by sparsity-inducing norms such as the ℓ1 -norm or the block ℓ1 -norm. We assume that the kernel decomposes into a large sum of individual basis kernels which can be embedded in a directed acyclic graph; we show that it is then possible to perform kernel selection through a hierarchical multiple kernel learning framework, in polynomial time in the number of selected kernels. This framework is naturally applied to non linear variable selection; our extensive simulations on synthetic datasets and datasets from the UCI repository show that efficiently exploring the large feature space through sparsity-inducing norms leads to state-of-the-art predictive performance.

6 0.47971851 175 nips-2008-PSDBoost: Matrix-Generation Linear Programming for Positive Semidefinite Matrices Learning

7 0.44676575 44 nips-2008-Characteristic Kernels on Groups and Semigroups

8 0.43530747 97 nips-2008-Hierarchical Fisher Kernels for Longitudinal Data

9 0.41967842 56 nips-2008-Deep Learning with Kernel Regularization for Visual Recognition

10 0.41827735 80 nips-2008-Extended Grassmann Kernels for Subspace-Based Learning

11 0.40505543 113 nips-2008-Kernelized Sorting

12 0.39714953 68 nips-2008-Efficient Direct Density Ratio Estimation for Non-stationarity Adaptation and Outlier Detection

13 0.35848498 227 nips-2008-Supervised Exponential Family Principal Component Analysis via Convex Optimization

14 0.35180739 155 nips-2008-Nonparametric regression and classification with joint sparsity constraints

15 0.34582871 146 nips-2008-Multi-task Gaussian Process Learning of Robot Inverse Dynamics

16 0.33808887 203 nips-2008-Scalable Algorithms for String Kernels with Inexact Matching

17 0.32669821 122 nips-2008-Learning with Consistency between Inductive Functions and Kernels

18 0.31632981 225 nips-2008-Supervised Bipartite Graph Inference

19 0.31339344 193 nips-2008-Regularized Co-Clustering with Dual Supervision

20 0.30998015 200 nips-2008-Robust Kernel Principal Component Analysis


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(6, 0.109), (7, 0.098), (12, 0.054), (15, 0.017), (28, 0.138), (57, 0.063), (59, 0.019), (63, 0.028), (71, 0.023), (77, 0.04), (82, 0.25), (83, 0.069)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.79560435 8 nips-2008-A general framework for investigating how far the decoding process in the brain can be simplified

Author: Masafumi Oizumi, Toshiyuki Ishii, Kazuya Ishibashi, Toshihiko Hosoya, Masato Okada

Abstract: “How is information decoded in the brain?” is one of the most difficult and important questions in neuroscience. Whether neural correlation is important or not in decoding neural activities is of special interest. We have developed a general framework for investigating how far the decoding process in the brain can be simplified. First, we hierarchically construct simplified probabilistic models of neural responses that ignore more than Kth-order correlations by using a maximum entropy principle. Then, we compute how much information is lost when information is decoded using the simplified models, i.e., “mismatched decoders”. We introduce an information theoretically correct quantity for evaluating the information obtained by mismatched decoders. We applied our proposed framework to spike data for vertebrate retina. We used 100-ms natural movies as stimuli and computed the information contained in neural activities about these movies. We found that the information loss is negligibly small in population activities of ganglion cells even if all orders of correlation are ignored in decoding. We also found that if we assume stationarity for long durations in the information analysis of dynamically changing stimuli like natural movies, pseudo correlations seem to carry a large portion of the information. 1

same-paper 2 0.78140378 143 nips-2008-Multi-label Multiple Kernel Learning

Author: Shuiwang Ji, Liang Sun, Rong Jin, Jieping Ye

Abstract: We present a multi-label multiple kernel learning (MKL) formulation in which the data are embedded into a low-dimensional space directed by the instancelabel correlations encoded into a hypergraph. We formulate the problem in the kernel-induced feature space and propose to learn the kernel matrix as a linear combination of a given collection of kernel matrices in the MKL framework. The proposed learning formulation leads to a non-smooth min-max problem, which can be cast into a semi-infinite linear program (SILP). We further propose an approximate formulation with a guaranteed error bound which involves an unconstrained convex optimization problem. In addition, we show that the objective function of the approximate formulation is differentiable with Lipschitz continuous gradient, and hence existing methods can be employed to compute the optimal solution efficiently. We apply the proposed formulation to the automated annotation of Drosophila gene expression pattern images, and promising results have been reported in comparison with representative algorithms.

3 0.66026455 62 nips-2008-Differentiable Sparse Coding

Author: J. A. Bagnell, David M. Bradley

Abstract: Prior work has shown that features which appear to be biologically plausible as well as empirically useful can be found by sparse coding with a prior such as a laplacian (L1 ) that promotes sparsity. We show how smoother priors can preserve the benefits of these sparse priors while adding stability to the Maximum A-Posteriori (MAP) estimate that makes it more useful for prediction problems. Additionally, we show how to calculate the derivative of the MAP estimate efficiently with implicit differentiation. One prior that can be differentiated this way is KL-regularization. We demonstrate its effectiveness on a wide variety of applications, and find that online optimization of the parameters of the KL-regularized model can significantly improve prediction performance. 1

4 0.65622747 75 nips-2008-Estimating vector fields using sparse basis field expansions

Author: Stefan Haufe, Vadim V. Nikulin, Andreas Ziehe, Klaus-Robert Müller, Guido Nolte

Abstract: We introduce a novel framework for estimating vector fields using sparse basis field expansions (S-FLEX). The notion of basis fields, which are an extension of scalar basis functions, arises naturally in our framework from a rotational invariance requirement. We consider a regression setting as well as inverse problems. All variants discussed lead to second-order cone programming formulations. While our framework is generally applicable to any type of vector field, we focus in this paper on applying it to solving the EEG/MEG inverse problem. It is shown that significantly more precise and neurophysiologically more plausible location and shape estimates of cerebral current sources from EEG/MEG measurements become possible with our method when comparing to the state-of-the-art. 1

5 0.6524089 202 nips-2008-Robust Regression and Lasso

Author: Huan Xu, Constantine Caramanis, Shie Mannor

Abstract: We consider robust least-squares regression with feature-wise disturbance. We show that this formulation leads to tractable convex optimization problems, and we exhibit a particular uncertainty set for which the robust problem is equivalent to 1 regularized regression (Lasso). This provides an interpretation of Lasso from a robust optimization perspective. We generalize this robust formulation to consider more general uncertainty sets, which all lead to tractable convex optimization problems. Therefore, we provide a new methodology for designing regression algorithms, which generalize known formulations. The advantage is that robustness to disturbance is a physical property that can be exploited: in addition to obtaining new formulations, we use it directly to show sparsity properties of Lasso, as well as to prove a general consistency result for robust regression problems, including Lasso, from a unified robustness perspective. 1

6 0.65112376 194 nips-2008-Regularized Learning with Networks of Features

7 0.65019733 79 nips-2008-Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning

8 0.64735478 245 nips-2008-Unlabeled data: Now it helps, now it doesn't

9 0.64347053 205 nips-2008-Semi-supervised Learning with Weakly-Related Unlabeled Data : Towards Better Text Categorization

10 0.64207804 91 nips-2008-Generative and Discriminative Learning with Unknown Labeling Bias

11 0.63925564 14 nips-2008-Adaptive Forward-Backward Greedy Algorithm for Sparse Learning with Linear Models

12 0.63798702 226 nips-2008-Supervised Dictionary Learning

13 0.63727689 120 nips-2008-Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text

14 0.63558894 182 nips-2008-Posterior Consistency of the Silverman g-prior in Bayesian Model Choice

15 0.63532287 63 nips-2008-Dimensionality Reduction for Data in Multiple Feature Representations

16 0.6352672 99 nips-2008-High-dimensional support union recovery in multivariate regression

17 0.63437706 149 nips-2008-Near-minimax recursive density estimation on the binary hypercube

18 0.63406444 196 nips-2008-Relative Margin Machines

19 0.63399863 54 nips-2008-Covariance Estimation for High Dimensional Data Vectors Using the Sparse Matrix Transform

20 0.63385636 155 nips-2008-Nonparametric regression and classification with joint sparsity constraints