nips nips2008 nips2008-80 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Jihun Hamm, Daniel D. Lee
Abstract: Subspace-based learning problems involve data whose elements are linear subspaces of a vector space. To handle such data structures, Grassmann kernels have been proposed and used previously. In this paper, we analyze the relationship between Grassmann kernels and probabilistic similarity measures. Firstly, we show that the KL distance in the limit yields the Projection kernel on the Grassmann manifold, whereas the Bhattacharyya kernel becomes trivial in the limit and is suboptimal for subspace-based problems. Secondly, based on our analysis of the KL distance, we propose extensions of the Projection kernel which can be extended to the set of affine as well as scaled subspaces. We demonstrate the advantages of these extended kernels for classification and recognition tasks with Support Vector Machines and Kernel Discriminant Analysis using synthetic and real image databases. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract Subspace-based learning problems involve data whose elements are linear subspaces of a vector space. [sent-6, score-0.206]
2 To handle such data structures, Grassmann kernels have been proposed and used previously. [sent-7, score-0.317]
3 In this paper, we analyze the relationship between Grassmann kernels and probabilistic similarity measures. [sent-8, score-0.389]
4 Firstly, we show that the KL distance in the limit yields the Projection kernel on the Grassmann manifold, whereas the Bhattacharyya kernel becomes trivial in the limit and is suboptimal for subspace-based problems. [sent-9, score-0.475]
5 Secondly, based on our analysis of the KL distance, we propose extensions of the Projection kernel which can be extended to the set of affine as well as scaled subspaces. [sent-10, score-0.304]
6 We demonstrate the advantages of these extended kernels for classification and recognition tasks with Support Vector Machines and Kernel Discriminant Analysis using synthetic and real image databases. [sent-11, score-0.44]
7 In this paper we focus on the domain where each data sample is a linear subspace of vectors, rather than a single vector, of a Euclidean space. [sent-14, score-0.155]
8 Low-dimensional subspace structures are commonly encountered in computer vision problems. [sent-15, score-0.132]
9 For example, the variation of images due to the change of pose, illumination, etc, is well-aproximated by the subspace spanned by a few “eigenfaces”. [sent-16, score-0.159]
10 More recent examples include the dynamical system models of video sequences from human actions or time-varying textures, represented by the linear span of the observability matrices [1, 14, 13]. [sent-17, score-0.106]
11 Subspace-based learning is an approach to handle the data as a collection of subspaces instead of the usual vectors. [sent-18, score-0.183]
12 The appropriate data space for the subspace-based learning is the Grassmann manifold G(m, D), which is defined as the set of m-dimensional linear subspaces in RD . [sent-19, score-0.302]
13 In particular, we can define positive definite kernels on the Grassmann manifold, which allows us to treat the space as if it were a Euclidean space. [sent-20, score-0.317]
14 Previously, the Binet-Cauchy kernel [17, 15] and the Projection kernel [16, 6] have been proposed and demonstrated the potential for subspace-based learning problems. [sent-21, score-0.336]
15 Furthermore, the Bhattacharyya affinity is indeed a positive definite kernel function on the space of distributions and have nice closed-form expressions for the exponential family [7]. [sent-27, score-0.168]
16 1 by distance we mean any nonnegative measure of similarity and not necessarily a metric. [sent-28, score-0.101]
17 1 In this paper, we investigate the relationship between the Grassmann kernels and the probabilistic distances. [sent-29, score-0.363]
18 The link is provided by the probabilistic generalization of subspaces with a Factor Analyzer which is a Gaussian ‘blob’ that has nonzero volume along all dimensions. [sent-30, score-0.229]
19 Firstly, we show that the KL distance yields the Projection kernel on the Grassmann manifold in the limit of zero noise, whereas the Bhattacharyya kernel becomes trivial in the limit and is suboptimal for subspace-based problems. [sent-31, score-0.571]
20 Secondly, based on our analysis of the KL distance, we propose an extension of the Projection kernel which is originally confined to the set of linear subspaces, to the set of affine as well as scaled subspaces. [sent-32, score-0.28]
21 We will demonstrate the extended kernels with the Support Vector Machines and the Kernel Discriminant Analysis using synthetic and real image databases. [sent-33, score-0.419]
22 The proposed kernels show the better performance compared to the previously used kernels such as Binet-Cauchy and the Bhattacharyya kernels. [sent-34, score-0.634]
23 2 Probabilistic subspace distances and kernels In this section we will consider the two well-known probabilistic distances, the KL distance and the Bhattacharyya distance, and establish their relationships to the Grassmann kernels. [sent-35, score-0.615]
24 Although these probabilistic distances are not restricted to specific distributions, we will model the data distribution as the Mixture of Factor Analyzers (MFA) [4]. [sent-36, score-0.091]
25 samples from the i-th Factor Analyzer x ∼ pi (x) = N (ui , Ci ), Ci = Yi Yi + σ 2 ID , (1) D where ui ∈ R is the mean, Yi is a full-rank D × m matrix (D > m), and σ is the ambient noise level. [sent-43, score-0.116]
26 More importantly, we use the FA distribution to provide the link between the Grassmann manifold and the space of probabilistic distributions. [sent-46, score-0.142]
27 In fact a linear subspace can be considered as the ‘flattened’ (σ → 0) limit of a zero-mean (ui = 0), homogeneous (Yi Yi = Im ) FA distribution. [sent-47, score-0.243]
28 We will look at the limits of the KL distance and the Bhattacharyya kernel under this condition. [sent-48, score-0.243]
29 1 KL distance in the limit The (symmetrized) KL distance is defined as JKL (p1 , p2 ) = [p1 (x) − p2 (x)] log p1 (x) dx. [sent-50, score-0.182]
30 2 Under the subspace condition (σ → 0, ui = 0, Yi Yi = Im , i = 1, . [sent-53, score-0.203]
31 , N ), the KL distance simplifies to 1 m σ −2 1 JKL (p1 , p2 ) = (−2 2 )+ 2m − 2 2 tr(Y1 Y2 Y2 Y1 ) 2 σ +1 2 σ +1 1 = (2m − 2tr(Y1 Y2 Y2 Y1 )) 2 (σ 2 + 1) 2σ We can ignore the multiplying factors which do not depend on Y1 or Y2 , and rewrite the distance as JKL (p1 , p2 ) = 2m − 2tr(Y1 Y2 Y2 Y1 ). [sent-56, score-0.15]
32 We immediately realize that the distance JKL (p1 , p2 ) coincides with the definition of the squared Projection distance [2, 16, 6], with the corresponding Projection kernel kProj (Y1 , Y2 ) = tr(Y1 Y2 Y2 Y1 ). [sent-57, score-0.318]
33 2 Bhattacharyya kernel in the limit Jebara and Kondor [7, 8] proposed the Probability Product kernel kProb (p1 , p2 ) = [p1 (x) p2 (x)]α dx (α > 0) (5) which includes the Bhattacharyya kernel as a special case. [sent-59, score-0.536]
34 Under the subspace condition (σ → 0, ui = 0, Yi Yi = Im , i = 1, . [sent-60, score-0.203]
35 , N ) the kernel kProb becomes kProb (p1 , p2 ) = π (1−2α)D 2−αD α−D/2 ∝ det Im − σ 2α(m−D)+D det(I2m − Y Y )−1/2 (σ 2 + 1)αm 1 Y Y2 Y2 Y1 (2σ 2 + 1)2 1 (6) −1/2 . [sent-63, score-0.216]
36 (7) Suppose the two subspaces span(Y1 ) and span(Y2 ) intersect only at the origin, that is, the singular values of Y1 Y2 are strictly less than 1. [sent-64, score-0.183]
37 This implies that after normalizing the kernel by the diagonal terms, the resulting kernel becomes a trivial kernel ˜ kProb (Yi , Yj ) = 1, span(Yi ) = span(Yj ) , as σ → 0. [sent-67, score-0.504]
38 3 Extended Projection Kernel Based on the analysis of the previous section, we will extend the Projection kernel (4) to more general spaces than the Grassmann manifold in this section. [sent-70, score-0.288]
39 We will examine the two directions of extension: from linear to affine, and from homogeneous to scaled subspaces. [sent-71, score-0.144]
40 1 Extension to affine subspaces An affine subspace in RD is a linear subspace with an ‘offset’ . [sent-73, score-0.47]
41 In that sense a linear subspace is simply an affine subspace with a zero offset. [sent-74, score-0.287]
42 Analogously to the (linear) Grassmann manifold, we can define an affine Grassmann manifold as the set of all m-dim affine subspaces in RD space 2 . [sent-75, score-0.279]
43 The affine span is defined from the orthonormal basis Y ∈ RD×m and an offset u ∈ RD by aspan(Y, u) {x | x = Y v + u, ∀v ∈ Rm }. [sent-76, score-0.129]
44 Similarly to the definition of Grassmann kernels [6], we can now formally define the affine Grassmann kernel as follows. [sent-79, score-0.485]
45 2 The Grassmann manifold is defined as a quotient space O(D)/O(m) × O(D − m) where O is the orthogonal group. [sent-81, score-0.096]
46 The affine Grassmann manifold is similarly defined as E(D)/E(m) × O(D − m), where E is the Euclidean group. [sent-82, score-0.096]
47 With this definition we check if the KL distance in the limit suggests an affine Grassmann kernel. [sent-86, score-0.107]
48 The KL distance with the homogeneity condition only Y1 Y1 = Y2 Y2 = Im becomes, 1 [2m − 2tr(Y1 Y2 Y2 Y1 ) + (u1 − u2 ) (2ID − Y1 Y1 − Y2 Y2 ) (u1 − u2 )] . [sent-87, score-0.098]
49 Combined with the linear term kLin , this defines the new ‘affine’ kernel kAff (Y1 , u1 , Y2 , u2 ) = tr(Y1 Y1 Y2 Y2 ) + u1 (I − Y1 Y1 )(I − Y2 Y2 )u2 . [sent-91, score-0.191]
50 (13) As we can see, the KL distance with only the homogeneity condition has two terms related to the subspace Y and the offset u. [sent-92, score-0.256]
51 If we have two separate positive kernels for subspaces and for offsets, we can add or multiply them together to construct new kernels [10]. [sent-94, score-0.817]
52 2 Extension to scaled subspaces We have assumed homogeneous subspace so far. [sent-96, score-0.436]
53 However, if the subspaces are computed from the PCA of real data, the eigenvalues in general will have non-homogeneous values. [sent-97, score-0.183]
54 To incorporate these scales for affine subspaces, we now allow the Y to be non-orthonormal and check if the resultant kernel is still valid. [sent-98, score-0.189]
55 This is a positive definite kernel which can be shown from the definition. [sent-105, score-0.168]
56 4 (14) Summary of the extended Projection kernels The proposed kernels are summarized below. [sent-106, score-0.705]
57 For linear kernels the Y and Y are computed from data assuming u = 0, whereas for affine kernels the Y and Y are computed after removing the estimated mean u from the data. [sent-111, score-0.657]
58 3 Extension to nonlinear subspaces A systematic way of extending the Projection kernel from linear/affine subspaces to nonlinear spaces is to use an implicit map via a kernel function, where the latter kernel is to be distinguished from the former kernels. [sent-113, score-0.934]
59 Note that the proposed kernels (15) can be computed only from the inner products of the column vectors of Y ’s and u’s including the orthonormalization procedure. [sent-114, score-0.361]
60 If we replace the inner products of those vectors yi yi by a positive definite function f (yi , yj ) on Euclidean spaces, this implicitly defines a nonlinear feature space. [sent-115, score-0.298]
61 This ‘doubly kernel’ approach has already been proposed for the Binet-Cauchy kernel [17, 8] and for probabilistic distances in general [18]. [sent-116, score-0.259]
62 We can adopt the trick for the extended Projection kernels as well to extend the kernels to operate on ‘nonlinear subspaces’3 . [sent-117, score-0.705]
63 4 Experiments with synthetic data In this section we demonstrate the application of the extended Projection kernels to two-class classification problems with Support Vector Machines (SVMs). [sent-118, score-0.419]
64 1 Synthetic data The extended kernels are defined under different assumptions of data distribution. [sent-120, score-0.388]
65 To test the kernels we generate three types of data – ‘easy’, ‘intermediate’ and ‘difficult’ – from MFA distribution, which cover the different ranges of data distribution. [sent-121, score-0.317]
66 The distances are measured by the KL distance JKL . [sent-130, score-0.12]
67 3 the preimage corresponding to the linear subspaces in the RKHS via the feature map 5 4. [sent-131, score-0.206]
68 2 Algorithms and results We compare the performance of the Euclidean SVM with linear/ polynomial/ RBF kernels and the performance of SVM with Grassmann kernels. [sent-132, score-0.317]
69 The polynomial kernel used is 3 k(x1 , x2 ) = ( x1 , x2 + 1) . [sent-135, score-0.168]
70 To test the Grassmann SVM, we first estimated the mean ui and the basis Yi from n = 50 points of each FA distribution pi (x) used for the original SVM. [sent-136, score-0.116]
71 We evaluate the algorithms with leave-one-out test by holding out one subspace and training with the other N − 1 subspaces. [sent-139, score-0.154]
72 The results shows that best rates are obtained from the extended kernels, and the Euclidean kernels lag behind for all three types of data. [sent-167, score-0.388]
73 Interestingly the polynomial kernels often perform worse than the linear kernels, and the RBF kernel performed even worse which we do not report. [sent-168, score-0.508]
74 As expected, the linear kernel is inappropriate for data with nonzero offsets (‘easy’ and ‘intermediate’), whereas the affine kernel performs well regardless of the offsets. [sent-170, score-0.398]
75 However, there is no significant difference between the homogeneous and the scaled kernels. [sent-171, score-0.121]
76 We conclude that under certain conditions the extended kernels have clear advantages over the original linear kernels and the Euclidean kernels for the subspace-based classification problem. [sent-173, score-1.045]
77 5 Experiments with real-world data In this section we demonstrate the application of the extended Projection kernels to recognition problems with the kernel Fisher Discriminant Analysis [10]. [sent-174, score-0.577]
78 1 Databases The Yale face database and the Extended Yale face database [3] together consist of pictures of 38 subjects with 9 different poses and 45 different lighting conditions. [sent-176, score-0.285]
79 The ETH-80 [9] is an object database designed for object categorization test under varying poses. [sent-177, score-0.125]
80 The database consists of pictures of 8 object categories and 10 object instances for each category, recored under 41 different poses. [sent-178, score-0.127]
81 For ETH-80 database, a set consists of images of all possible poses of an object from a category. [sent-182, score-0.08]
82 The images are resized to the dimension of D = 504 and 1024 respectively, and the maximum of m = 9 dimensional subspaces are used to compute the kernels. [sent-185, score-0.21]
83 The subspace parameters Yi , ui and σ are estimated from the probabilistic PCA [12]. [sent-186, score-0.249]
84 Since these databases are highly multiclass (31 and 8 classes) relative to the total number of samples, we use the kernel Discriminant Analysis to reduce dimensionality and extract features, in conjunction with a 1-NN classifier. [sent-189, score-0.228]
85 The six different Grassmann kernels are compared: the extended Projection (Lin/LinSc/Aff/Affsc) kernels, the Binet-Cauchy kernel, and the Bhattacharyya kernel. [sent-190, score-0.388]
86 The baseline algorithm (Eucl) is the Linear Discriminant Analysis applied to the original images in the data from which the subspaces are computed. [sent-191, score-0.21]
87 The results shows that best rates are achieved from the extended kernels: linear scaled kernel in Yale Face and the affine kernel in ETH80. [sent-193, score-0.495]
88 However the difference within the extended kernels are small. [sent-194, score-0.388]
89 The performance of the extended kernels remain relatively unaffected by the subspace dimensionality, which is a convenient property in practice since we do not know the true dimensionality a priori. [sent-195, score-0.54]
90 However the Binet-Cauchy and the Bhattacharyya kernels do not perform as well, and degrade fast as the subspace dimension increases. [sent-196, score-0.473]
91 6 Conclusion In this paper we analyzed the relationship between probabilistic distances and the geometric Grassmann kernels, especially the KL distance and the Projection kernel. [sent-198, score-0.166]
92 This analysis help us to understand the limitations of the Bhattacharyya kernel for subspace-based problems, and also suggest the extensions of the Projection kernel. [sent-199, score-0.168]
93 With synthetic and real data we demonstrated that the extended kernels can outperform the original Projection kernel, as well as the previously used Bhattacharyya and the Binet-Cauchy kernels for subspace-based classification problems. [sent-200, score-0.736]
94 The relationship between other probabilistic distances and the Grassmann kernels is yet to be fully explored, and we expect to see more results from a follow-up study. [sent-201, score-0.408]
95 From few to many: Illumination cone models for face recognition under variable lighting and pose. [sent-220, score-0.116]
96 7 Yale Face 100 rate (%) 90 Eucl Lin LinSc Aff AffSc BC Bhat 80 70 60 50 40 1 3 5 7 9 subspace dimension (m) ETH! [sent-245, score-0.132]
97 80 100 rate (%) 90 Eucl Lin LinSc Aff AffSc BC Bhat 80 70 60 50 40 1 3 5 7 9 subspace dimension (m) Figure 1: Comparison of Grassmann kernels for face recognition/ object categorization tasks with kernel discriminant analysis. [sent-246, score-0.799]
98 The extended Projection kernels (Lin/LinSc/Aff/ AffSc) outperform the baseline method (Eucl) and the Binet-Cauchy (BC) and the Bhattacharyya (Bhat) kernels. [sent-247, score-0.388]
99 Subspace distance analysis with application to adaptive bayesian algorithm for face recognition. [sent-297, score-0.144]
100 From sample similarity to ensemble similarity: Probabilistic distance measures in Reproducing Kernel Hilbert Space. [sent-308, score-0.101]
wordName wordTfidf (topN-words)
[('grassmann', 0.628), ('kernels', 0.317), ('bhattacharyya', 0.291), ('subspaces', 0.183), ('af', 0.18), ('kernel', 0.168), ('subspace', 0.132), ('aspan', 0.129), ('jkl', 0.129), ('kprob', 0.129), ('yi', 0.129), ('kl', 0.117), ('projection', 0.109), ('yale', 0.097), ('manifold', 0.096), ('bhat', 0.092), ('tr', 0.092), ('fa', 0.089), ('span', 0.083), ('distance', 0.075), ('eucl', 0.074), ('euclidean', 0.072), ('ui', 0.071), ('extended', 0.071), ('face', 0.069), ('scaled', 0.065), ('aff', 0.065), ('im', 0.061), ('discriminant', 0.057), ('homogeneous', 0.056), ('affsc', 0.055), ('jihun', 0.055), ('klin', 0.055), ('svms', 0.053), ('bc', 0.051), ('rama', 0.048), ('det', 0.048), ('probabilistic', 0.046), ('pi', 0.045), ('distances', 0.045), ('orthonormalization', 0.044), ('illumination', 0.042), ('databases', 0.04), ('offsets', 0.039), ('rd', 0.039), ('analyzer', 0.037), ('ashok', 0.037), ('hamm', 0.037), ('imre', 0.037), ('kaff', 0.037), ('kaffsc', 0.037), ('linsc', 0.037), ('risi', 0.037), ('veeraraghavan', 0.037), ('database', 0.037), ('ne', 0.035), ('lin', 0.033), ('pennsylvania', 0.032), ('grasp', 0.032), ('tony', 0.032), ('object', 0.032), ('limit', 0.032), ('synthetic', 0.031), ('ku', 0.03), ('kondor', 0.03), ('ci', 0.029), ('mfa', 0.028), ('images', 0.027), ('similarity', 0.026), ('offset', 0.026), ('pictures', 0.026), ('lighting', 0.026), ('jebara', 0.026), ('intermediate', 0.025), ('extension', 0.024), ('categorization', 0.024), ('invariant', 0.024), ('spaces', 0.024), ('degrade', 0.024), ('sc', 0.024), ('daniel', 0.023), ('homogeneity', 0.023), ('linear', 0.023), ('machines', 0.023), ('pca', 0.022), ('firstly', 0.022), ('holding', 0.022), ('thesis', 0.021), ('scales', 0.021), ('recognition', 0.021), ('poses', 0.021), ('yj', 0.02), ('alexander', 0.02), ('valued', 0.02), ('dimensionality', 0.02), ('philadelphia', 0.02), ('orthonormal', 0.02), ('nonlinear', 0.02), ('overlapping', 0.019), ('treating', 0.019)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999994 80 nips-2008-Extended Grassmann Kernels for Subspace-Based Learning
Author: Jihun Hamm, Daniel D. Lee
Abstract: Subspace-based learning problems involve data whose elements are linear subspaces of a vector space. To handle such data structures, Grassmann kernels have been proposed and used previously. In this paper, we analyze the relationship between Grassmann kernels and probabilistic similarity measures. Firstly, we show that the KL distance in the limit yields the Projection kernel on the Grassmann manifold, whereas the Bhattacharyya kernel becomes trivial in the limit and is suboptimal for subspace-based problems. Secondly, based on our analysis of the KL distance, we propose extensions of the Projection kernel which can be extended to the set of affine as well as scaled subspaces. We demonstrate the advantages of these extended kernels for classification and recognition tasks with Support Vector Machines and Kernel Discriminant Analysis using synthetic and real image databases. 1
2 0.1831255 79 nips-2008-Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning
Author: Francis R. Bach
Abstract: For supervised and unsupervised learning, positive definite kernels allow to use large and potentially infinite dimensional feature spaces with a computational cost that only depends on the number of observations. This is usually done through the penalization of predictor functions by Euclidean or Hilbertian norms. In this paper, we explore penalizing by sparsity-inducing norms such as the ℓ1 -norm or the block ℓ1 -norm. We assume that the kernel decomposes into a large sum of individual basis kernels which can be embedded in a directed acyclic graph; we show that it is then possible to perform kernel selection through a hierarchical multiple kernel learning framework, in polynomial time in the number of selected kernels. This framework is naturally applied to non linear variable selection; our extensive simulations on synthetic datasets and datasets from the UCI repository show that efficiently exploring the large feature space through sparsity-inducing norms leads to state-of-the-art predictive performance.
3 0.14722182 63 nips-2008-Dimensionality Reduction for Data in Multiple Feature Representations
Author: Yen-yu Lin, Tyng-luh Liu, Chiou-shann Fuh
Abstract: In solving complex visual learning tasks, adopting multiple descriptors to more precisely characterize the data has been a feasible way for improving performance. These representations are typically high dimensional and assume diverse forms. Thus finding a way to transform them into a unified space of lower dimension generally facilitates the underlying tasks, such as object recognition or clustering. We describe an approach that incorporates multiple kernel learning with dimensionality reduction (MKL-DR). While the proposed framework is flexible in simultaneously tackling data in various feature representations, the formulation itself is general in that it is established upon graph embedding. It follows that any dimensionality reduction techniques explainable by graph embedding can be generalized by our method to consider data in multiple feature representations.
4 0.11952396 44 nips-2008-Characteristic Kernels on Groups and Semigroups
Author: Kenji Fukumizu, Arthur Gretton, Bernhard Schölkopf, Bharath K. Sriperumbudur
Abstract: Embeddings of random variables in reproducing kernel Hilbert spaces (RKHSs) may be used to conduct statistical inference based on higher order moments. For sufficiently rich (characteristic) RKHSs, each probability distribution has a unique embedding, allowing all statistical properties of the distribution to be taken into consideration. Necessary and sufficient conditions for an RKHS to be characteristic exist for Rn . In the present work, conditions are established for an RKHS to be characteristic on groups and semigroups. Illustrative examples are provided, including characteristic kernels on periodic domains, rotation matrices, and Rn . + 1
5 0.10285579 97 nips-2008-Hierarchical Fisher Kernels for Longitudinal Data
Author: Zhengdong Lu, Jeffrey Kaye, Todd K. Leen
Abstract: We develop new techniques for time series classification based on hierarchical Bayesian generative models (called mixed-effect models) and the Fisher kernel derived from them. A key advantage of the new formulation is that one can compute the Fisher information matrix despite varying sequence lengths and varying sampling intervals. This avoids the commonly-used ad hoc replacement of the Fisher information matrix with the identity which destroys the geometric invariance of the kernel. Our construction retains the geometric invariance, resulting in a kernel that is properly invariant under change of coordinates in the model parameter space. Experiments on detecting cognitive decline show that classifiers based on the proposed kernel out-perform those based on generative models and other feature extraction routines, and on Fisher kernels that use the identity in place of the Fisher information.
6 0.099155545 56 nips-2008-Deep Learning with Kernel Regularization for Visual Recognition
7 0.096165031 51 nips-2008-Convergence and Rate of Convergence of a Manifold-Based Dimension Reduction Algorithm
8 0.089542709 143 nips-2008-Multi-label Multiple Kernel Learning
9 0.087408789 203 nips-2008-Scalable Algorithms for String Kernels with Inexact Matching
10 0.074356444 238 nips-2008-Theory of matching pursuit
11 0.073937401 62 nips-2008-Differentiable Sparse Coding
12 0.072922796 200 nips-2008-Robust Kernel Principal Component Analysis
13 0.071547061 113 nips-2008-Kernelized Sorting
14 0.069790021 122 nips-2008-Learning with Consistency between Inductive Functions and Kernels
15 0.069058917 151 nips-2008-Non-parametric Regression Between Manifolds
16 0.068315409 188 nips-2008-QUIC-SVD: Fast SVD Using Cosine Trees
17 0.067505769 138 nips-2008-Modeling human function learning with Gaussian processes
18 0.067022391 78 nips-2008-Exact Convex Confidence-Weighted Learning
19 0.06642554 227 nips-2008-Supervised Exponential Family Principal Component Analysis via Convex Optimization
20 0.061904084 20 nips-2008-An Extended Level Method for Efficient Multiple Kernel Learning
topicId topicWeight
[(0, -0.165), (1, -0.084), (2, -0.041), (3, 0.049), (4, 0.069), (5, -0.032), (6, 0.02), (7, -0.053), (8, 0.043), (9, -0.039), (10, 0.3), (11, -0.12), (12, -0.048), (13, 0.073), (14, 0.117), (15, 0.0), (16, 0.09), (17, -0.059), (18, -0.035), (19, 0.028), (20, -0.012), (21, -0.027), (22, 0.068), (23, -0.045), (24, -0.055), (25, 0.06), (26, 0.063), (27, -0.032), (28, 0.08), (29, 0.055), (30, -0.089), (31, 0.031), (32, 0.062), (33, 0.022), (34, -0.013), (35, 0.01), (36, -0.044), (37, 0.045), (38, 0.055), (39, -0.049), (40, 0.067), (41, 0.017), (42, -0.111), (43, 0.048), (44, -0.054), (45, 0.172), (46, 0.087), (47, 0.049), (48, 0.086), (49, 0.096)]
simIndex simValue paperId paperTitle
same-paper 1 0.96637911 80 nips-2008-Extended Grassmann Kernels for Subspace-Based Learning
Author: Jihun Hamm, Daniel D. Lee
Abstract: Subspace-based learning problems involve data whose elements are linear subspaces of a vector space. To handle such data structures, Grassmann kernels have been proposed and used previously. In this paper, we analyze the relationship between Grassmann kernels and probabilistic similarity measures. Firstly, we show that the KL distance in the limit yields the Projection kernel on the Grassmann manifold, whereas the Bhattacharyya kernel becomes trivial in the limit and is suboptimal for subspace-based problems. Secondly, based on our analysis of the KL distance, we propose extensions of the Projection kernel which can be extended to the set of affine as well as scaled subspaces. We demonstrate the advantages of these extended kernels for classification and recognition tasks with Support Vector Machines and Kernel Discriminant Analysis using synthetic and real image databases. 1
2 0.78595853 79 nips-2008-Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning
Author: Francis R. Bach
Abstract: For supervised and unsupervised learning, positive definite kernels allow to use large and potentially infinite dimensional feature spaces with a computational cost that only depends on the number of observations. This is usually done through the penalization of predictor functions by Euclidean or Hilbertian norms. In this paper, we explore penalizing by sparsity-inducing norms such as the ℓ1 -norm or the block ℓ1 -norm. We assume that the kernel decomposes into a large sum of individual basis kernels which can be embedded in a directed acyclic graph; we show that it is then possible to perform kernel selection through a hierarchical multiple kernel learning framework, in polynomial time in the number of selected kernels. This framework is naturally applied to non linear variable selection; our extensive simulations on synthetic datasets and datasets from the UCI repository show that efficiently exploring the large feature space through sparsity-inducing norms leads to state-of-the-art predictive performance.
3 0.74450672 63 nips-2008-Dimensionality Reduction for Data in Multiple Feature Representations
Author: Yen-yu Lin, Tyng-luh Liu, Chiou-shann Fuh
Abstract: In solving complex visual learning tasks, adopting multiple descriptors to more precisely characterize the data has been a feasible way for improving performance. These representations are typically high dimensional and assume diverse forms. Thus finding a way to transform them into a unified space of lower dimension generally facilitates the underlying tasks, such as object recognition or clustering. We describe an approach that incorporates multiple kernel learning with dimensionality reduction (MKL-DR). While the proposed framework is flexible in simultaneously tackling data in various feature representations, the formulation itself is general in that it is established upon graph embedding. It follows that any dimensionality reduction techniques explainable by graph embedding can be generalized by our method to consider data in multiple feature representations.
4 0.67244798 44 nips-2008-Characteristic Kernels on Groups and Semigroups
Author: Kenji Fukumizu, Arthur Gretton, Bernhard Schölkopf, Bharath K. Sriperumbudur
Abstract: Embeddings of random variables in reproducing kernel Hilbert spaces (RKHSs) may be used to conduct statistical inference based on higher order moments. For sufficiently rich (characteristic) RKHSs, each probability distribution has a unique embedding, allowing all statistical properties of the distribution to be taken into consideration. Necessary and sufficient conditions for an RKHS to be characteristic exist for Rn . In the present work, conditions are established for an RKHS to be characteristic on groups and semigroups. Illustrative examples are provided, including characteristic kernels on periodic domains, rotation matrices, and Rn . + 1
5 0.66868263 203 nips-2008-Scalable Algorithms for String Kernels with Inexact Matching
Author: Pavel P. Kuksa, Pai-hsi Huang, Vladimir Pavlovic
Abstract: We present a new family of linear time algorithms for string comparison with mismatches under the string kernels framework. Based on sufficient statistics, our algorithms improve theoretical complexity bounds of existing approaches while scaling well in sequence alphabet size, the number of allowed mismatches and the size of the dataset. In particular, on large alphabets and under loose mismatch constraints our algorithms are several orders of magnitude faster than the existing algorithms for string comparison under the mismatch similarity measure. We evaluate our algorithms on synthetic data and real applications in music genre classification, protein remote homology detection and protein fold prediction. The scalability of the algorithms allows us to consider complex sequence transformations, modeled using longer string features and larger numbers of mismatches, leading to a state-of-the-art performance with significantly reduced running times. 1
6 0.62642682 97 nips-2008-Hierarchical Fisher Kernels for Longitudinal Data
7 0.60971653 122 nips-2008-Learning with Consistency between Inductive Functions and Kernels
8 0.6014924 56 nips-2008-Deep Learning with Kernel Regularization for Visual Recognition
9 0.47977301 20 nips-2008-An Extended Level Method for Efficient Multiple Kernel Learning
10 0.4718608 143 nips-2008-Multi-label Multiple Kernel Learning
11 0.46842936 126 nips-2008-Localized Sliced Inverse Regression
12 0.45687941 188 nips-2008-QUIC-SVD: Fast SVD Using Cosine Trees
13 0.4553633 51 nips-2008-Convergence and Rate of Convergence of a Manifold-Based Dimension Reduction Algorithm
14 0.4449307 83 nips-2008-Fast High-dimensional Kernel Summations Using the Monte Carlo Multipole Method
15 0.43781331 196 nips-2008-Relative Margin Machines
16 0.43582436 151 nips-2008-Non-parametric Regression Between Manifolds
17 0.41424358 178 nips-2008-Performance analysis for L\ 2 kernel classification
18 0.41155094 113 nips-2008-Kernelized Sorting
19 0.40673745 61 nips-2008-Diffeomorphic Dimensionality Reduction
20 0.40380943 32 nips-2008-Bayesian Kernel Shaping for Learning Control
topicId topicWeight
[(6, 0.03), (7, 0.093), (12, 0.026), (28, 0.104), (57, 0.526), (59, 0.014), (63, 0.017), (77, 0.036), (78, 0.011), (83, 0.034)]
simIndex simValue paperId paperTitle
1 0.94039756 191 nips-2008-Recursive Segmentation and Recognition Templates for 2D Parsing
Author: Leo Zhu, Yuanhao Chen, Yuan Lin, Chenxi Lin, Alan L. Yuille
Abstract: Language and image understanding are two major goals of artificial intelligence which can both be conceptually formulated in terms of parsing the input signal into a hierarchical representation. Natural language researchers have made great progress by exploiting the 1D structure of language to design efficient polynomialtime parsing algorithms. By contrast, the two-dimensional nature of images makes it much harder to design efficient image parsers and the form of the hierarchical representations is also unclear. Attempts to adapt representations and algorithms from natural language have only been partially successful. In this paper, we propose a Hierarchical Image Model (HIM) for 2D image parsing which outputs image segmentation and object recognition. This HIM is represented by recursive segmentation and recognition templates in multiple layers and has advantages for representation, inference, and learning. Firstly, the HIM has a coarse-to-fine representation which is capable of capturing long-range dependency and exploiting different levels of contextual information. Secondly, the structure of the HIM allows us to design a rapid inference algorithm, based on dynamic programming, which enables us to parse the image rapidly in polynomial time. Thirdly, we can learn the HIM efficiently in a discriminative manner from a labeled dataset. We demonstrate that HIM outperforms other state-of-the-art methods by evaluation on the challenging public MSRC image dataset. Finally, we sketch how the HIM architecture can be extended to model more complex image phenomena. 1
2 0.92856586 233 nips-2008-The Gaussian Process Density Sampler
Author: Iain Murray, David MacKay, Ryan P. Adams
Abstract: We present the Gaussian Process Density Sampler (GPDS), an exchangeable generative model for use in nonparametric Bayesian density estimation. Samples drawn from the GPDS are consistent with exact, independent samples from a fixed density function that is a transformation of a function drawn from a Gaussian process prior. Our formulation allows us to infer an unknown density from data using Markov chain Monte Carlo, which gives samples from the posterior distribution over density functions and from the predictive distribution on data space. We can also infer the hyperparameters of the Gaussian process. We compare this density modeling technique to several existing techniques on a toy problem and a skullreconstruction task. 1
3 0.92198402 148 nips-2008-Natural Image Denoising with Convolutional Networks
Author: Viren Jain, Sebastian Seung
Abstract: We present an approach to low-level vision that combines two main ideas: the use of convolutional networks as an image processing architecture and an unsupervised learning procedure that synthesizes training samples from specific noise models. We demonstrate this approach on the challenging problem of natural image denoising. Using a test set with a hundred natural images, we find that convolutional networks provide comparable and in some cases superior performance to state of the art wavelet and Markov random field (MRF) methods. Moreover, we find that a convolutional network offers similar performance in the blind denoising setting as compared to other techniques in the non-blind setting. We also show how convolutional networks are mathematically related to MRF approaches by presenting a mean field theory for an MRF specially designed for image denoising. Although these approaches are related, convolutional networks avoid computational difficulties in MRF approaches that arise from probabilistic learning and inference. This makes it possible to learn image processing architectures that have a high degree of representational power (we train models with over 15,000 parameters), but whose computational expense is significantly less than that associated with inference in MRF approaches with even hundreds of parameters. 1 Background Low-level image processing tasks include edge detection, interpolation, and deconvolution. These tasks are useful both in themselves, and as a front-end for high-level visual tasks like object recognition. This paper focuses on the task of denoising, defined as the recovery of an underlying image from an observation that has been subjected to Gaussian noise. One approach to image denoising is to transform an image from pixel intensities into another representation where statistical regularities are more easily captured. For example, the Gaussian scale mixture (GSM) model introduced by Portilla and colleagues is based on a multiscale wavelet decomposition that provides an effective description of local image statistics [1, 2]. Another approach is to try and capture statistical regularities of pixel intensities directly using Markov random fields (MRFs) to define a prior over the image space. Initial work used handdesigned settings of the parameters, but recently there has been increasing success in learning the parameters of such models from databases of natural images [3, 4, 5, 6, 7, 8]. Prior models can be used for tasks such as image denoising by augmenting the prior with a noise model. Alternatively, an MRF can be used to model the probability distribution of the clean image conditioned on the noisy image. This conditional random field (CRF) approach is said to be discriminative, in contrast to the generative MRF approach. Several researchers have shown that the CRF approach can outperform generative learning on various image restoration and labeling tasks [9, 10]. CRFs have recently been applied to the problem of image denoising as well [5]. 1 The present work is most closely related to the CRF approach. Indeed, certain special cases of convolutional networks can be seen as performing maximum likelihood inference on a CRF [11]. The advantage of the convolutional network approach is that it avoids a general difficulty with applying MRF-based methods to image analysis: the computational expense associated with both parameter estimation and inference in probabilistic models. For example, naive methods of learning MRFbased models involve calculation of the partition function, a normalization factor that is generally intractable for realistic models and image dimensions. As a result, a great deal of research has been devoted to approximate MRF learning and inference techniques that meliorate computational difficulties, generally at the cost of either representational power or theoretical guarantees [12, 13]. Convolutional networks largely avoid these difficulties by posing the computational task within the statistical framework of regression rather than density estimation. Regression is a more tractable computation and therefore permits models with greater representational power than methods based on density estimation. This claim will be argued for with empirical results on the denoising problem, as well as mathematical connections between MRF and convolutional network approaches. 2 Convolutional Networks Convolutional networks have been extensively applied to visual object recognition using architectures that accept an image as input and, through alternating layers of convolution and subsampling, produce one or more output values that are thresholded to yield binary predictions regarding object identity [14, 15]. In contrast, we study networks that accept an image as input and produce an entire image as output. Previous work has used such architectures to produce images with binary targets in image restoration problems for specialized microscopy data [11, 16]. Here we show that similar architectures can also be used to produce images with the analog fluctuations found in the intensity distributions of natural images. Network Dynamics and Architecture A convolutional network is an alternating sequence of linear filtering and nonlinear transformation operations. The input and output layers include one or more images, while intermediate layers contain “hidden
4 0.90800077 236 nips-2008-The Mondrian Process
Author: Daniel M. Roy, Yee W. Teh
Abstract: We describe a novel class of distributions, called Mondrian processes, which can be interpreted as probability distributions over kd-tree data structures. Mondrian processes are multidimensional generalizations of Poisson processes and this connection allows us to construct multidimensional generalizations of the stickbreaking process described by Sethuraman (1994), recovering the Dirichlet process in one dimension. After introducing the Aldous-Hoover representation for jointly and separately exchangeable arrays, we show how the process can be used as a nonparametric prior distribution in Bayesian models of relational data. 1
same-paper 5 0.90386599 80 nips-2008-Extended Grassmann Kernels for Subspace-Based Learning
Author: Jihun Hamm, Daniel D. Lee
Abstract: Subspace-based learning problems involve data whose elements are linear subspaces of a vector space. To handle such data structures, Grassmann kernels have been proposed and used previously. In this paper, we analyze the relationship between Grassmann kernels and probabilistic similarity measures. Firstly, we show that the KL distance in the limit yields the Projection kernel on the Grassmann manifold, whereas the Bhattacharyya kernel becomes trivial in the limit and is suboptimal for subspace-based problems. Secondly, based on our analysis of the KL distance, we propose extensions of the Projection kernel which can be extended to the set of affine as well as scaled subspaces. We demonstrate the advantages of these extended kernels for classification and recognition tasks with Support Vector Machines and Kernel Discriminant Analysis using synthetic and real image databases. 1
6 0.80070788 100 nips-2008-How memory biases affect information transmission: A rational analysis of serial reproduction
7 0.77683347 27 nips-2008-Artificial Olfactory Brain for Mixture Identification
8 0.72581357 208 nips-2008-Shared Segmentation of Natural Scenes Using Dependent Pitman-Yor Processes
9 0.68038279 158 nips-2008-Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks
10 0.64299285 234 nips-2008-The Infinite Factorial Hidden Markov Model
11 0.62629199 35 nips-2008-Bayesian Synchronous Grammar Induction
12 0.61985648 116 nips-2008-Learning Hybrid Models for Image Annotation with Partially Labeled Data
13 0.61627752 200 nips-2008-Robust Kernel Principal Component Analysis
14 0.60497296 66 nips-2008-Dynamic visual attention: searching for coding length increments
15 0.60189533 232 nips-2008-The Conjoint Effect of Divisive Normalization and Orientation Selectivity on Redundancy Reduction
16 0.60077953 192 nips-2008-Reducing statistical dependencies in natural signals using radial Gaussianization
17 0.59301591 127 nips-2008-Logistic Normal Priors for Unsupervised Probabilistic Grammar Induction
18 0.58646774 197 nips-2008-Relative Performance Guarantees for Approximate Inference in Latent Dirichlet Allocation
19 0.58192241 42 nips-2008-Cascaded Classification Models: Combining Models for Holistic Scene Understanding
20 0.57817358 248 nips-2008-Using matrices to model symbolic relationship