nips nips2008 nips2008-63 knowledge-graph by maker-knowledge-mining

63 nips-2008-Dimensionality Reduction for Data in Multiple Feature Representations


Source: pdf

Author: Yen-yu Lin, Tyng-luh Liu, Chiou-shann Fuh

Abstract: In solving complex visual learning tasks, adopting multiple descriptors to more precisely characterize the data has been a feasible way for improving performance. These representations are typically high dimensional and assume diverse forms. Thus finding a way to transform them into a unified space of lower dimension generally facilitates the underlying tasks, such as object recognition or clustering. We describe an approach that incorporates multiple kernel learning with dimensionality reduction (MKL-DR). While the proposed framework is flexible in simultaneously tackling data in various feature representations, the formulation itself is general in that it is established upon graph embedding. It follows that any dimensionality reduction techniques explainable by graph embedding can be generalized by our method to consider data in multiple feature representations.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 tw Abstract In solving complex visual learning tasks, adopting multiple descriptors to more precisely characterize the data has been a feasible way for improving performance. [sent-7, score-0.456]

2 Thus finding a way to transform them into a unified space of lower dimension generally facilitates the underlying tasks, such as object recognition or clustering. [sent-9, score-0.147]

3 We describe an approach that incorporates multiple kernel learning with dimensionality reduction (MKL-DR). [sent-10, score-0.509]

4 While the proposed framework is flexible in simultaneously tackling data in various feature representations, the formulation itself is general in that it is established upon graph embedding. [sent-11, score-0.173]

5 It follows that any dimensionality reduction techniques explainable by graph embedding can be generalized by our method to consider data in multiple feature representations. [sent-12, score-0.569]

6 1 Introduction The fact that most visual learning problems deal with high dimensional data has made dimensionality reduction an inherent part of the current research. [sent-13, score-0.313]

7 However, despite the great applicability, the existing dimensionality reduction methods suffer from two main restrictions. [sent-17, score-0.222]

8 Second, there seems to be lacking a systematic way of integrating multiple image features for dimensionality reduction. [sent-20, score-0.355]

9 When addressing applications that no single descriptor can appropriately depict the whole dataset, this shortcoming becomes even more evident. [sent-21, score-0.148]

10 Alas, it is usually the case for addressing complex visual learning tasks [4]. [sent-22, score-0.126]

11 Aiming to relax the two above-mentioned restrictions, we introduce an approach called MKL-DR that incorporates multiple kernel learning (MKL) into the training process of dimensionality reduction (DR) algorithms. [sent-23, score-0.509]

12 [8], in which learning an optimal kernel over a given convex set of kernels is coupled with kernel Fisher discriminant analysis (KFDA), but their method only considers binary-class data. [sent-25, score-0.634]

13 First, it works with multiple base kernels, each of which is created based on a specific kind of visual feature, and combines these features in the domain of kernel matrices. [sent-27, score-0.592]

14 Second, the formulation is illustrated with the framework of graph embedding [19], which presents a unified view for a large family of DR methods. [sent-28, score-0.256]

15 2 Related work This section describes some of the key concepts used in the establishment of the proposed approach, including graph embedding and multiple kernel learning. [sent-31, score-0.513]

16 1 Graph embedding Many dimensionality reduction methods focus on modeling the pairwise relationships among data, and utilize graph-based structures. [sent-33, score-0.358]

17 In particular, the framework of graph embedding [19] provides a unified formulation for a set of DR algorithms. [sent-34, score-0.256]

18 A DR i=1 scheme accounted for by graph embedding involves a complete graph G whose vertices are over Ω. [sent-36, score-0.316]

19 Then the optimal linear embedding v∗ ∈ Rd can be obtained by solving v∗ = arg min , or v⊤ XLX ⊤v, (1) v⊤ XDX ⊤ v=1 v⊤ XL′ X ⊤ v=1 where X = [x1 x2 · · · xN ] is the data matrix, and L = diag(W · 1) − W is the graph Laplacian of G. [sent-38, score-0.286]

20 Otherwise another complete graph G′ over Ω is required for the second ′ constraint, where L′ and W ′ = [wij ] ∈ RN ×N are respectively the graph Laplacian and affinity ′ matrix of G . [sent-41, score-0.18]

21 The meaning of (1) can be better understood with the following equivalent problem: N i,j=1 v subject to ||v⊤ xi − v⊤ xj ||2 wij (2) N ⊤ 2 i=1 ||v xi || dii = 1, or N ⊤ ⊤ 2 ′ i,j=1 ||v xi − v xj || wij min (3) = 1. [sent-42, score-0.884]

22 (4) The constrained optimization problem (2) implies that pairwise distances or distances to the origin of projected data (in the form of v⊤ x) are modeled by one or two graphs in the framework. [sent-43, score-0.187]

23 [19] show that a set of dimensionality reduction methods, such as PCA, LPP [7], LDA, and MFA [19] can be expressed by (1). [sent-45, score-0.222]

24 2 Multiple kernel learning MKL refers to the process of learning a kernel machine with multiple kernel functions or kernel matrices. [sent-47, score-0.944]

25 , [9, 14, 16] have shown that learning SVMs with multiple kernels not only increases the accuracy but also enhances the interpretability of the resulting classifier. [sent-50, score-0.173]

26 Suppose we have a set of base kernel functions {km }M (or base kernel matrices {Km }M ). [sent-52, score-0.842]

27 An m=1 m=1 ensemble kernel function k (or an ensemble kernel matrix K) is then defined by k(xi , xj ) = K = M m=1 M m=1 βm km (xi , xj ), βm K m , βm ≥ 0 , βm ≥ 0 . [sent-53, score-0.905]

28 (5) (6) Consequently, the learned model from binary-class data {(xi , yi ∈ ±1)} will be of the form: f (x) = N i=1 N i=1 αi yi {βm }M is m=1 αi yi k(xi , x) + b = {αi }N i=1 M m=1 βm km (xi , x) + b. [sent-54, score-0.354]

29 Our approach leverages such an MKL optimization to yield more flexible dimensionality reduction schemes for data in different feature representations. [sent-56, score-0.357]

30 3 The MKL-DR framework To establish the proposed method, we first discuss the construction of a set of base kernels from multiple features, and then explain how to integrate these kernels for dimensionality reduction. [sent-57, score-0.572]

31 Finally, we design an optimization procedure to learn the projection for dimensionality reduction. [sent-58, score-0.246]

32 1 Kernel as a unified feature representation Consider a dataset Ω of N samples, and M kinds of descriptors to characterize each sample. [sent-60, score-0.384]

33 Let Ω = {xi }N , xi = {xi,m ∈ Xm }M , and dm : Xm × Xm → 0 ∪ R+ be the distance function for m=1 i=1 data representation under the mth descriptor. [sent-61, score-0.152]

34 To eliminate these varieties in representation, we represent data under each descriptor as a kernel matrix. [sent-65, score-0.332]

35 There are several ways to accomplish this goal, such as using RBF kernel for data in the form of vector, or pyramid match kernel [6] for data in the form of bag-of-features. [sent-66, score-0.505]

36 We may also convert pairwise distances between data samples to a kernel matrix [18, 20]. [sent-67, score-0.255]

37 By coupling each representation and its corresponding distance function, we obtain a set of M dissimilarity-based kernel matrices {Km }M with m=1 Km (i, j) = km (xi , xj ) = exp 2 −d2 (xi,m , xj,m )/σm m (8) where σm is a positive constant. [sent-68, score-0.598]

38 As several well-designed descriptors and their associated distance functions have been introduced over the years, the use of dissimilarity-based kernel is convenient in solving visual learning tasks. [sent-69, score-0.558]

39 , βM } can be interpreted as finding appropriate weights for best fusing the M feature representations. [sent-76, score-0.121]

40 2 The MKL-DR algorithm Instead of designing a specific dimensionality reduction algorithm, we choose to describe MKL-DR upon graph embedding. [sent-78, score-0.312]

41 This way we can derive a general framework: If a dimensionality reduction scheme is explained by graph embedding, then it will also be extendible by MKL-DR to handle data in multiple feature representations. [sent-79, score-0.433]

42 In graph embedding (2), there are two possible types of constraints. [sent-80, score-0.226]

43 It has been shown that a set of linear dimensionality reduction methods can be kernelized to nonlinear ones via kernel trick. [sent-83, score-0.441]

44 The procedure of kernelization in MKL-DR is mostly accomplished in a similar way, but with the key difference in using multiple kernels {Km }M . [sent-84, score-0.22]

45 Suppose the ensemble m=1 kernel K in MKL-DR is generated by linearly combining the base kernels {Km }M as in (6). [sent-85, score-0.569]

46 (10) To show that the underlying algorithm can be reformulated in the form of inner product and accomplished in the new feature space F , we observe that plugging into (2) each mapped sample φ(xi ) and projection v would appear exclusively in the form of vT φ(xi ). [sent-94, score-0.198]

47 Hence, it suffices to show that in MKL-DR, vT φ(xi ) can be evaluated via the kernel trick:   α=  vT φ(xi ) =  N n=1 α1 β1 . [sent-95, score-0.219]

48 αN βM M m=1 αn βm km (xn , xi ) = αT K(i) β where (11)   K1 (1, i) · · · KM (1, i)    . [sent-101, score-0.327]

49 K1 (N, i) · · · KM (N, i)  With (2) and (11), we define the constrained optimization problem for 1-D MKL-DR as follows: min α,β N i,j=1 ||αT K(i) β − αT K(j) β||2 wij (12) subject to N i,j=1 ′ ||αT K(i) β − αT K(j) β||2 wij = 1, (13) βm ≥ 0, m = 1, 2, . [sent-111, score-0.618]

50 (14) The additional constraints in (14) are included to ensure the the resulting kernel K in MKL-DR is a non-negative combination of base kernels. [sent-115, score-0.385]

51 xi,1 φ1 X1 φ1 (xi ) β1 F1 βm φm V φ(xi ) F xi,M φM XM φM (xi ) V T φ(xi ) = AT K(i) β RP βM FM Figure 1: Four kinds of spaces in MKL-DR: the input space of each feature representation, the RKHS induced by each base kernel, the RKHS by the ensemble kernel, and the projected space. [sent-117, score-0.413]

52 3 Optimization Observe from (11) that the one-dimensional projection v of MKL-DR is specified by a sample coefficient vector α and a kernel weight vector β. [sent-119, score-0.317]

53 The two vectors respectively account for the relative importance among the samples and the base kernels. [sent-120, score-0.166]

54 (15) With A and β, each 1-D projection vi is determined by a specific sample coefficient vector αi and the (shared) kernel weight vector β. [sent-122, score-0.317]

55 Analogous to the 1-D case, a projected sample xi can be written as V ⊤ φ(xi ) = A⊤ K(i) β ∈ RP . [sent-124, score-0.179]

56 (16) The optimization problem (12) can now be extended to accommodate multi-dimensional projection: min A,β N i,j=1 ||A⊤ K(i) β − A⊤ K(j) β||2 wij subject to N i,j=1 ′ ||A⊤ K(i) β − A⊤ K(j) β||2 wij = 1, (17) βm ≥ 0, m = 1, 2, . [sent-125, score-0.618]

57 In Figure 1, we give an illustration of the four kinds of spaces related to MKL-DR, including the input space of each feature representation, the RKHS induced by each base kernel and the ensemble kernel, and the projected Euclidean space. [sent-129, score-0.632]

58 On optimizing A: By fixing β, the optimization problem (17) is reduced to β min trace(A⊤ SW A) A β subject to trace(A⊤ SW ′ A) = 1 (18) where N (i) − K(j) )ββ⊤ (K(i) − K(j) )⊤ , i,j=1 wij (K N β ′ SW ′ = i,j=1 wij (K(i) − K(j) )ββ ⊤ (K(i) − K(j) )⊤ . [sent-133, score-0.618]

59 (2)); Various visual features expressed by base kernels {Km }M (cf. [sent-142, score-0.41]

60 i,j=1 wij (K (23) (24) The additional constraints β ≥ 0 cause that the optimization to (22) can no longer be formulated as a generalized eigenvalue problem. [sent-151, score-0.333]

61 We instead consider solving its convex relaxation by adding an auxiliary variable B of size M × M : A (25) min trace(SW B) β,B A trace(SW ′ B) = 1, (26) eT β m subject to (27) ≥ 0, m = 1, 2, . [sent-153, score-0.135]

62 The optimization problem (25) is an SDP relaxation of the nonconvex QCQP problem (22), and can be efficiently solved by semidefinite programming (SDP). [sent-157, score-0.138]

63 We have tried two possibilities: 1) β is initialized by setting all of its elements as 1 to equally weight each base kernel; 2) A is initialized by assuming AA⊤ = I. [sent-167, score-0.166]

64 dimension by Given a testing sample z, it is projected to the learned space of lower z → AT K(z) β, where K(z) ∈ RN ×M and K(z) (n, m) = km (xn , z). [sent-172, score-0.338]

65 (29) 4 Experimental results To evaluate the effectiveness of MKL-DR, we test the technique with the supervised visual learning task of object category recognition. [sent-173, score-0.188]

66 In the application, two (base) DR methods and a set of descriptors are properly chosen to serve as the input to MKL-DR. [sent-174, score-0.185]

67 1 Dataset The Caltech-101 image dataset [4] consists of 101 object categories and one additional class of background images. [sent-176, score-0.193]

68 Still, the dataset provides a good test bed to demonstrate the advantage of using multiple image descriptors for complex recognition tasks. [sent-179, score-0.477]

69 To implement MKL-DR for recognition, we need to select some proper graph-based DR method to be generalized and a set of image descriptors, and then derive (in our case) a pair of affinity matrices and a set of base kernels. [sent-181, score-0.32]

70 2 Image descriptors For the Caltech-101 dataset, we consider seven kinds of image descriptors that result in the seven base kernels (denoted below in bold and in abbreviation): GB-1/GB-2: From a given image, we randomly sample 300 edge pixels, and apply geometric blur descriptor [1] to them. [sent-184, score-1.09]

71 With these image features, we adopt the distance function, as is suggested in equation (2) of the work by Zhang et al. [sent-185, score-0.175]

72 SIFT-Dist: The base kernel is analogously constructed as in GB-2, except now the SIFT descriptor [11] is used to extract features. [sent-187, score-0.547]

73 SIFT-Grid: We apply SIFT with three different scales to an evenly sampled grid of each image, and use k-means clustering to generate visual words from the resulting local features of all images. [sent-188, score-0.139]

74 Each image can then be represented by a histogram over the visual words. [sent-189, score-0.173]

75 The χ2 distance is used to derive this base kernel via (8). [sent-190, score-0.416]

76 For each of the two kinds of C2 features, an RBF kernel is respectively constructed. [sent-194, score-0.271]

77 PHOG: We adopt the PHOG descriptor [2] to capture image features, and limit the pyramid level up to 2. [sent-195, score-0.291]

78 Together with χ2 distance, the base kernel is established. [sent-196, score-0.385]

79 3 Dimensionality reduction methods We consider two supervised DR schemes, namely, linear discriminant analysis (LDA) and local discriminant embedding (LDE) [3], and show how MKL-DR can generalize them. [sent-198, score-0.412]

80 Both LDA and LDE perform discriminant learning on a fully labeled dataset Ω = {(xi , yi )}N , but make different i=1 assumptions about data distribution: LDA assumes data of each class can be modeled by a Gaussian, while LDE assumes they spread as a submanifold. [sent-199, score-0.181]

81 Each of the two methods can be specified by a pair of affinity matrices to fit the formulation of graph embedding (2), and the resulting MKL dimensionality reduction schemes are respectively termed as MKL-LDA and MKL-LDE. [sent-200, score-0.58]

82 ′ Affinity matrices for LDA: The two affinity matrices W = [wij ] and W ′ = [wij ] are defined as wij = 1/nyi , if yi = yj , 0, otherwise, and ′ wij = 1 , N where nyi is the number of data points with label yi . [sent-201, score-0.743]

83 (32) where i ∈ Nk (j) means that sample xi is one of the k nearest neighbors for sample xj . [sent-284, score-0.21]

84 However, since there are now multiple image descriptors, we need to construct an affinity matrix for data under each descriptor, and average the resulting affinity matrices from all the descriptors. [sent-286, score-0.222]

85 Via MKL-DR, the data are projected to the learned space, and the recognition task is accomplished there by enforcing the nearest-neighbor rule. [sent-295, score-0.199]

86 Coupling the seven base kernels with the affinity matrices of LDA and LDE, we can respectively derive MKL-LDA and MKL-LDE using Algorithm 1. [sent-296, score-0.428]

87 Since KFD considers only one base kernel at a time, we implement two strategies to take account of the classification outcomes from the seven resulting KFD classifiers. [sent-298, score-0.47]

88 6% over the best recognition rate by the seven KFD classifiers. [sent-307, score-0.174]

89 On the other hand, while KFD-Voting and KFD-SAMME try to combine the separately trained KFD classifiers, MKL-LDA jointly integrates the seven kernels into the learning process. [sent-308, score-0.19]

90 The quantitative results show that MKL-LDA can make the most of fusing various feature descriptors, and improves the recognition rates from 68. [sent-309, score-0.21]

91 In [6], Grauman and Darrell report a 50% recognition rate based on the pyramid matching kernel over data in bag-of-features representation. [sent-317, score-0.375]

92 Our related work [10] that performs adaptive feature fusing via locally combining kernel matrices has a recognition rate 59. [sent-324, score-0.501]

93 Multiple kernel learning is also used in Varma and Ray [18], and it can yield a top recognition rate of 87. [sent-326, score-0.308]

94 5 Conclusions and discussions The proposed MKL-DR technique is useful as it has the advantage of learning a unified space of low dimension for data in multiple feature representations. [sent-328, score-0.121]

95 Such flexibilities allow one to make use of more prior knowledge for effectively analyzing a given dataset, including choosing a proper set of visual features to better characterize the data, and adopting a graph-based DR method to appropriately model the relationship among the data points. [sent-330, score-0.219]

96 On the other hand, via integrating with a suitable DR scheme, MKL-DR can extend the multiple kernel learning framework to address not just the supervised learning problems but also the unsupervised and the semisupervised ones. [sent-331, score-0.348]

97 Shape matching and object recognition using low distortion correspondences. [sent-338, score-0.147]

98 Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. [sent-359, score-0.149]

99 Graph embedding and extensions: A general framework for dimensionality reduction. [sent-456, score-0.264]

100 Svm-knn: Discriminative nearest neighbor classification for visual category recognition. [sent-463, score-0.159]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('sw', 0.416), ('wij', 0.248), ('km', 0.243), ('kfd', 0.234), ('kernel', 0.219), ('lde', 0.208), ('dr', 0.206), ('descriptors', 0.185), ('base', 0.166), ('nity', 0.152), ('mkl', 0.136), ('embedding', 0.136), ('af', 0.132), ('dimensionality', 0.128), ('descriptor', 0.113), ('kernels', 0.105), ('reduction', 0.094), ('discriminant', 0.091), ('visual', 0.091), ('graph', 0.09), ('recognition', 0.089), ('seven', 0.085), ('xi', 0.084), ('image', 0.082), ('trace', 0.081), ('ensemble', 0.079), ('nk', 0.078), ('phog', 0.078), ('taiwan', 0.078), ('semide', 0.078), ('lda', 0.078), ('matrices', 0.072), ('cvpr', 0.071), ('fusing', 0.068), ('multiple', 0.068), ('pyramid', 0.067), ('projection', 0.066), ('zhang', 0.063), ('projected', 0.063), ('object', 0.058), ('sdp', 0.055), ('dataset', 0.053), ('feature', 0.053), ('nonconvex', 0.053), ('kinds', 0.052), ('klde', 0.052), ('taipei', 0.052), ('varma', 0.052), ('optimization', 0.052), ('berg', 0.05), ('uni', 0.05), ('analogously', 0.049), ('features', 0.048), ('accomplished', 0.047), ('aa', 0.047), ('xm', 0.046), ('mina', 0.045), ('mutch', 0.045), ('constraint', 0.044), ('subject', 0.042), ('qcqp', 0.042), ('frome', 0.042), ('yan', 0.042), ('characterize', 0.041), ('rkhs', 0.04), ('category', 0.039), ('adopting', 0.039), ('serre', 0.039), ('grauman', 0.039), ('vt', 0.039), ('adopted', 0.038), ('yi', 0.037), ('fisher', 0.037), ('mth', 0.037), ('distances', 0.036), ('classi', 0.036), ('addressing', 0.035), ('coef', 0.035), ('jmlr', 0.034), ('voting', 0.034), ('rn', 0.034), ('eigenvalue', 0.033), ('et', 0.033), ('xj', 0.033), ('relaxation', 0.033), ('guess', 0.032), ('semisupervised', 0.032), ('solving', 0.032), ('sample', 0.032), ('tsch', 0.031), ('sift', 0.031), ('distance', 0.031), ('schemes', 0.03), ('formulation', 0.03), ('integrating', 0.029), ('nearest', 0.029), ('adopt', 0.029), ('yj', 0.029), ('xn', 0.028), ('min', 0.028)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999976 63 nips-2008-Dimensionality Reduction for Data in Multiple Feature Representations

Author: Yen-yu Lin, Tyng-luh Liu, Chiou-shann Fuh

Abstract: In solving complex visual learning tasks, adopting multiple descriptors to more precisely characterize the data has been a feasible way for improving performance. These representations are typically high dimensional and assume diverse forms. Thus finding a way to transform them into a unified space of lower dimension generally facilitates the underlying tasks, such as object recognition or clustering. We describe an approach that incorporates multiple kernel learning with dimensionality reduction (MKL-DR). While the proposed framework is flexible in simultaneously tackling data in various feature representations, the formulation itself is general in that it is established upon graph embedding. It follows that any dimensionality reduction techniques explainable by graph embedding can be generalized by our method to consider data in multiple feature representations.

2 0.15658732 79 nips-2008-Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning

Author: Francis R. Bach

Abstract: For supervised and unsupervised learning, positive definite kernels allow to use large and potentially infinite dimensional feature spaces with a computational cost that only depends on the number of observations. This is usually done through the penalization of predictor functions by Euclidean or Hilbertian norms. In this paper, we explore penalizing by sparsity-inducing norms such as the ℓ1 -norm or the block ℓ1 -norm. We assume that the kernel decomposes into a large sum of individual basis kernels which can be embedded in a directed acyclic graph; we show that it is then possible to perform kernel selection through a hierarchical multiple kernel learning framework, in polynomial time in the number of selected kernels. This framework is naturally applied to non linear variable selection; our extensive simulations on synthetic datasets and datasets from the UCI repository show that efficiently exploring the large feature space through sparsity-inducing norms leads to state-of-the-art predictive performance.

3 0.15539837 143 nips-2008-Multi-label Multiple Kernel Learning

Author: Shuiwang Ji, Liang Sun, Rong Jin, Jieping Ye

Abstract: We present a multi-label multiple kernel learning (MKL) formulation in which the data are embedded into a low-dimensional space directed by the instancelabel correlations encoded into a hypergraph. We formulate the problem in the kernel-induced feature space and propose to learn the kernel matrix as a linear combination of a given collection of kernel matrices in the MKL framework. The proposed learning formulation leads to a non-smooth min-max problem, which can be cast into a semi-infinite linear program (SILP). We further propose an approximate formulation with a guaranteed error bound which involves an unconstrained convex optimization problem. In addition, we show that the objective function of the approximate formulation is differentiable with Lipschitz continuous gradient, and hence existing methods can be employed to compute the optimal solution efficiently. We apply the proposed formulation to the automated annotation of Drosophila gene expression pattern images, and promising results have been reported in comparison with representative algorithms.

4 0.14722182 80 nips-2008-Extended Grassmann Kernels for Subspace-Based Learning

Author: Jihun Hamm, Daniel D. Lee

Abstract: Subspace-based learning problems involve data whose elements are linear subspaces of a vector space. To handle such data structures, Grassmann kernels have been proposed and used previously. In this paper, we analyze the relationship between Grassmann kernels and probabilistic similarity measures. Firstly, we show that the KL distance in the limit yields the Projection kernel on the Grassmann manifold, whereas the Bhattacharyya kernel becomes trivial in the limit and is suboptimal for subspace-based problems. Secondly, based on our analysis of the KL distance, we propose extensions of the Projection kernel which can be extended to the set of affine as well as scaled subspaces. We demonstrate the advantages of these extended kernels for classification and recognition tasks with Support Vector Machines and Kernel Discriminant Analysis using synthetic and real image databases. 1

5 0.1384861 56 nips-2008-Deep Learning with Kernel Regularization for Visual Recognition

Author: Kai Yu, Wei Xu, Yihong Gong

Abstract: In this paper we aim to train deep neural networks for rapid visual recognition. The task is highly challenging, largely due to the lack of a meaningful regularizer on the functions realized by the networks. We propose a novel regularization method that takes advantage of kernel methods, where an oracle kernel function represents prior knowledge about the recognition task of interest. We derive an efficient algorithm using stochastic gradient descent, and demonstrate encouraging results on a wide range of recognition tasks, in terms of both accuracy and speed. 1

6 0.13197185 20 nips-2008-An Extended Level Method for Efficient Multiple Kernel Learning

7 0.12326555 97 nips-2008-Hierarchical Fisher Kernels for Longitudinal Data

8 0.11137439 194 nips-2008-Regularized Learning with Networks of Features

9 0.10778491 227 nips-2008-Supervised Exponential Family Principal Component Analysis via Convex Optimization

10 0.10659084 113 nips-2008-Kernelized Sorting

11 0.10187382 175 nips-2008-PSDBoost: Matrix-Generation Linear Programming for Positive Semidefinite Matrices Learning

12 0.092205584 116 nips-2008-Learning Hybrid Models for Image Annotation with Partially Labeled Data

13 0.091399923 61 nips-2008-Diffeomorphic Dimensionality Reduction

14 0.084854513 44 nips-2008-Characteristic Kernels on Groups and Semigroups

15 0.082076497 64 nips-2008-DiscLDA: Discriminative Learning for Dimensionality Reduction and Classification

16 0.080993012 226 nips-2008-Supervised Dictionary Learning

17 0.080402784 142 nips-2008-Multi-Level Active Prediction of Useful Image Annotations for Recognition

18 0.080329269 118 nips-2008-Learning Transformational Invariants from Natural Movies

19 0.080292746 200 nips-2008-Robust Kernel Principal Component Analysis

20 0.079482868 138 nips-2008-Modeling human function learning with Gaussian processes


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.257), (1, -0.13), (2, -0.008), (3, 0.001), (4, 0.075), (5, -0.034), (6, -0.008), (7, -0.092), (8, 0.017), (9, -0.083), (10, 0.262), (11, -0.117), (12, -0.07), (13, 0.058), (14, 0.109), (15, -0.055), (16, 0.086), (17, -0.074), (18, -0.11), (19, -0.025), (20, -0.001), (21, -0.009), (22, 0.058), (23, -0.003), (24, -0.042), (25, -0.043), (26, 0.059), (27, 0.032), (28, 0.041), (29, -0.031), (30, -0.051), (31, 0.018), (32, -0.047), (33, -0.019), (34, 0.0), (35, 0.006), (36, -0.03), (37, -0.028), (38, -0.057), (39, -0.087), (40, 0.078), (41, -0.027), (42, 0.007), (43, -0.058), (44, -0.006), (45, 0.046), (46, 0.047), (47, 0.064), (48, -0.017), (49, 0.103)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96151912 63 nips-2008-Dimensionality Reduction for Data in Multiple Feature Representations

Author: Yen-yu Lin, Tyng-luh Liu, Chiou-shann Fuh

Abstract: In solving complex visual learning tasks, adopting multiple descriptors to more precisely characterize the data has been a feasible way for improving performance. These representations are typically high dimensional and assume diverse forms. Thus finding a way to transform them into a unified space of lower dimension generally facilitates the underlying tasks, such as object recognition or clustering. We describe an approach that incorporates multiple kernel learning with dimensionality reduction (MKL-DR). While the proposed framework is flexible in simultaneously tackling data in various feature representations, the formulation itself is general in that it is established upon graph embedding. It follows that any dimensionality reduction techniques explainable by graph embedding can be generalized by our method to consider data in multiple feature representations.

2 0.83138794 80 nips-2008-Extended Grassmann Kernels for Subspace-Based Learning

Author: Jihun Hamm, Daniel D. Lee

Abstract: Subspace-based learning problems involve data whose elements are linear subspaces of a vector space. To handle such data structures, Grassmann kernels have been proposed and used previously. In this paper, we analyze the relationship between Grassmann kernels and probabilistic similarity measures. Firstly, we show that the KL distance in the limit yields the Projection kernel on the Grassmann manifold, whereas the Bhattacharyya kernel becomes trivial in the limit and is suboptimal for subspace-based problems. Secondly, based on our analysis of the KL distance, we propose extensions of the Projection kernel which can be extended to the set of affine as well as scaled subspaces. We demonstrate the advantages of these extended kernels for classification and recognition tasks with Support Vector Machines and Kernel Discriminant Analysis using synthetic and real image databases. 1

3 0.81603831 79 nips-2008-Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning

Author: Francis R. Bach

Abstract: For supervised and unsupervised learning, positive definite kernels allow to use large and potentially infinite dimensional feature spaces with a computational cost that only depends on the number of observations. This is usually done through the penalization of predictor functions by Euclidean or Hilbertian norms. In this paper, we explore penalizing by sparsity-inducing norms such as the ℓ1 -norm or the block ℓ1 -norm. We assume that the kernel decomposes into a large sum of individual basis kernels which can be embedded in a directed acyclic graph; we show that it is then possible to perform kernel selection through a hierarchical multiple kernel learning framework, in polynomial time in the number of selected kernels. This framework is naturally applied to non linear variable selection; our extensive simulations on synthetic datasets and datasets from the UCI repository show that efficiently exploring the large feature space through sparsity-inducing norms leads to state-of-the-art predictive performance.

4 0.78899926 143 nips-2008-Multi-label Multiple Kernel Learning

Author: Shuiwang Ji, Liang Sun, Rong Jin, Jieping Ye

Abstract: We present a multi-label multiple kernel learning (MKL) formulation in which the data are embedded into a low-dimensional space directed by the instancelabel correlations encoded into a hypergraph. We formulate the problem in the kernel-induced feature space and propose to learn the kernel matrix as a linear combination of a given collection of kernel matrices in the MKL framework. The proposed learning formulation leads to a non-smooth min-max problem, which can be cast into a semi-infinite linear program (SILP). We further propose an approximate formulation with a guaranteed error bound which involves an unconstrained convex optimization problem. In addition, we show that the objective function of the approximate formulation is differentiable with Lipschitz continuous gradient, and hence existing methods can be employed to compute the optimal solution efficiently. We apply the proposed formulation to the automated annotation of Drosophila gene expression pattern images, and promising results have been reported in comparison with representative algorithms.

5 0.75063437 20 nips-2008-An Extended Level Method for Efficient Multiple Kernel Learning

Author: Zenglin Xu, Rong Jin, Irwin King, Michael Lyu

Abstract: We consider the problem of multiple kernel learning (MKL), which can be formulated as a convex-concave problem. In the past, two efficient methods, i.e., Semi-Infinite Linear Programming (SILP) and Subgradient Descent (SD), have been proposed for large-scale multiple kernel learning. Despite their success, both methods have their own shortcomings: (a) the SD method utilizes the gradient of only the current solution, and (b) the SILP method does not regularize the approximate solution obtained from the cutting plane model. In this work, we extend the level method, which was originally designed for optimizing non-smooth objective functions, to convex-concave optimization, and apply it to multiple kernel learning. The extended level method overcomes the drawbacks of SILP and SD by exploiting all the gradients computed in past iterations and by regularizing the solution via a projection to a level set. Empirical study with eight UCI datasets shows that the extended level method can significantly improve efficiency by saving on average 91.9% of computational time over the SILP method and 70.3% over the SD method. 1

6 0.71815759 56 nips-2008-Deep Learning with Kernel Regularization for Visual Recognition

7 0.68310493 97 nips-2008-Hierarchical Fisher Kernels for Longitudinal Data

8 0.60502887 203 nips-2008-Scalable Algorithms for String Kernels with Inexact Matching

9 0.58492839 44 nips-2008-Characteristic Kernels on Groups and Semigroups

10 0.56928456 200 nips-2008-Robust Kernel Principal Component Analysis

11 0.56214148 227 nips-2008-Supervised Exponential Family Principal Component Analysis via Convex Optimization

12 0.54606634 122 nips-2008-Learning with Consistency between Inductive Functions and Kernels

13 0.53537101 61 nips-2008-Diffeomorphic Dimensionality Reduction

14 0.53085881 113 nips-2008-Kernelized Sorting

15 0.52232075 225 nips-2008-Supervised Bipartite Graph Inference

16 0.50514948 196 nips-2008-Relative Margin Machines

17 0.49683756 126 nips-2008-Localized Sliced Inverse Regression

18 0.47843194 68 nips-2008-Efficient Direct Density Ratio Estimation for Non-stationarity Adaptation and Outlier Detection

19 0.46101707 34 nips-2008-Bayesian Network Score Approximation using a Metagraph Kernel

20 0.46081254 175 nips-2008-PSDBoost: Matrix-Generation Linear Programming for Positive Semidefinite Matrices Learning


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(6, 0.061), (7, 0.121), (12, 0.04), (15, 0.019), (17, 0.228), (28, 0.148), (57, 0.112), (59, 0.018), (63, 0.024), (71, 0.017), (77, 0.046), (78, 0.012), (83, 0.067)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.8147698 63 nips-2008-Dimensionality Reduction for Data in Multiple Feature Representations

Author: Yen-yu Lin, Tyng-luh Liu, Chiou-shann Fuh

Abstract: In solving complex visual learning tasks, adopting multiple descriptors to more precisely characterize the data has been a feasible way for improving performance. These representations are typically high dimensional and assume diverse forms. Thus finding a way to transform them into a unified space of lower dimension generally facilitates the underlying tasks, such as object recognition or clustering. We describe an approach that incorporates multiple kernel learning with dimensionality reduction (MKL-DR). While the proposed framework is flexible in simultaneously tackling data in various feature representations, the formulation itself is general in that it is established upon graph embedding. It follows that any dimensionality reduction techniques explainable by graph embedding can be generalized by our method to consider data in multiple feature representations.

2 0.69918483 66 nips-2008-Dynamic visual attention: searching for coding length increments

Author: Xiaodi Hou, Liqing Zhang

Abstract: A visual attention system should respond placidly when common stimuli are presented, while at the same time keep alert to anomalous visual inputs. In this paper, a dynamic visual attention model based on the rarity of features is proposed. We introduce the Incremental Coding Length (ICL) to measure the perspective entropy gain of each feature. The objective of our model is to maximize the entropy of the sampled visual features. In order to optimize energy consumption, the limit amount of energy of the system is re-distributed amongst features according to their Incremental Coding Length. By selecting features with large coding length increments, the computational system can achieve attention selectivity in both static and dynamic scenes. We demonstrate that the proposed model achieves superior accuracy in comparison to mainstream approaches in static saliency map generation. Moreover, we also show that our model captures several less-reported dynamic visual search behaviors, such as attentional swing and inhibition of return. 1

3 0.69787472 62 nips-2008-Differentiable Sparse Coding

Author: J. A. Bagnell, David M. Bradley

Abstract: Prior work has shown that features which appear to be biologically plausible as well as empirically useful can be found by sparse coding with a prior such as a laplacian (L1 ) that promotes sparsity. We show how smoother priors can preserve the benefits of these sparse priors while adding stability to the Maximum A-Posteriori (MAP) estimate that makes it more useful for prediction problems. Additionally, we show how to calculate the derivative of the MAP estimate efficiently with implicit differentiation. One prior that can be differentiated this way is KL-regularization. We demonstrate its effectiveness on a wide variety of applications, and find that online optimization of the parameters of the KL-regularized model can significantly improve prediction performance. 1

4 0.69527328 200 nips-2008-Robust Kernel Principal Component Analysis

Author: Minh H. Nguyen, Fernando Torre

Abstract: Kernel Principal Component Analysis (KPCA) is a popular generalization of linear PCA that allows non-linear feature extraction. In KPCA, data in the input space is mapped to higher (usually) dimensional feature space where the data can be linearly modeled. The feature space is typically induced implicitly by a kernel function, and linear PCA in the feature space is performed via the kernel trick. However, due to the implicitness of the feature space, some extensions of PCA such as robust PCA cannot be directly generalized to KPCA. This paper presents a technique to overcome this problem, and extends it to a unified framework for treating noise, missing data, and outliers in KPCA. Our method is based on a novel cost function to perform inference in KPCA. Extensive experiments, in both synthetic and real data, show that our algorithm outperforms existing methods. 1

5 0.69466311 192 nips-2008-Reducing statistical dependencies in natural signals using radial Gaussianization

Author: Siwei Lyu, Eero P. Simoncelli

Abstract: We consider the problem of transforming a signal to a representation in which the components are statistically independent. When the signal is generated as a linear transformation of independent Gaussian or non-Gaussian sources, the solution may be computed using a linear transformation (PCA or ICA, respectively). Here, we consider a complementary case, in which the source is non-Gaussian but elliptically symmetric. Such a source cannot be decomposed into independent components using a linear transform, but we show that a simple nonlinear transformation, which we call radial Gaussianization (RG), is able to remove all dependencies. We apply this methodology to natural signals, demonstrating that the joint distributions of nearby bandpass filter responses, for both sounds and images, are closer to being elliptically symmetric than linearly transformed factorial sources. Consistent with this, we demonstrate that the reduction in dependency achieved by applying RG to either pairs or blocks of bandpass filter responses is significantly greater than that achieved by PCA or ICA.

6 0.69421542 71 nips-2008-Efficient Sampling for Gaussian Process Inference using Control Variables

7 0.69178391 27 nips-2008-Artificial Olfactory Brain for Mixture Identification

8 0.69156641 194 nips-2008-Regularized Learning with Networks of Features

9 0.69028199 116 nips-2008-Learning Hybrid Models for Image Annotation with Partially Labeled Data

10 0.68995565 208 nips-2008-Shared Segmentation of Natural Scenes Using Dependent Pitman-Yor Processes

11 0.68879575 79 nips-2008-Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning

12 0.68873549 64 nips-2008-DiscLDA: Discriminative Learning for Dimensionality Reduction and Classification

13 0.68798506 221 nips-2008-Stochastic Relational Models for Large-scale Dyadic Data using MCMC

14 0.68635505 197 nips-2008-Relative Performance Guarantees for Approximate Inference in Latent Dirichlet Allocation

15 0.68503797 248 nips-2008-Using matrices to model symbolic relationship

16 0.68495142 95 nips-2008-Grouping Contours Via a Related Image

17 0.68393201 205 nips-2008-Semi-supervised Learning with Weakly-Related Unlabeled Data : Towards Better Text Categorization

18 0.6834169 118 nips-2008-Learning Transformational Invariants from Natural Movies

19 0.68250954 138 nips-2008-Modeling human function learning with Gaussian processes

20 0.68180954 100 nips-2008-How memory biases affect information transmission: A rational analysis of serial reproduction