nips nips2004 nips2004-30 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Alex J. Smola, S.v.n. Vishwanathan
Abstract: We propose a family of kernels based on the Binet-Cauchy theorem and its extension to Fredholm operators. This includes as special cases all currently known kernels derived from the behavioral framework, diffusion processes, marginalized kernels, kernels on graphs, and the kernels on sets arising from the subspace angle approach. Many of these kernels can be seen as the extrema of a new continuum of kernel functions, which leads to numerous new special cases. As an application, we apply the new class of kernels to the problem of clustering of video sequences with encouraging results. 1
Reference: text
sentIndex sentText sentNum sentScore
1 au Abstract We propose a family of kernels based on the Binet-Cauchy theorem and its extension to Fredholm operators. [sent-9, score-0.391]
2 This includes as special cases all currently known kernels derived from the behavioral framework, diffusion processes, marginalized kernels, kernels on graphs, and the kernels on sets arising from the subspace angle approach. [sent-10, score-1.052]
3 Many of these kernels can be seen as the extrema of a new continuum of kernel functions, which leads to numerous new special cases. [sent-11, score-0.452]
4 As an application, we apply the new class of kernels to the problem of clustering of video sequences with encouraging results. [sent-12, score-0.397]
5 1 Introduction Recent years have see a combinatorial explosion of results on kernels for structured and semi-structured data, including trees, strings, graphs, transducers and dynamical systems [6, 8, 15, 13]. [sent-13, score-0.447]
6 The fact that these kernels are very specific to the type of discrete data under consideration is a major cause of confusion to the practitioner. [sent-14, score-0.311]
7 What is required is a) an unified view of the field and b) a recipe to design new kernels easily. [sent-15, score-0.29]
8 The present paper takes a step in this direction by unifying these diverse kernels by means of the Binet-Cauchy theorem. [sent-16, score-0.314]
9 Our point of departure is the work of Wolf and Shashua [17], or more specifically, their proof that det A⊤ B is a kernel on matrices A, B ∈ Rm×n . [sent-17, score-0.314]
10 Wolf and Shashua only exploit the Binet-Cauchy theorem for one particular choice of parameters. [sent-21, score-0.12]
11 It turns out that the continuum of these values corresponds to a large class of kernels some of which are well known and others which are novel. [sent-22, score-0.315]
12 This points to a close connection with rational kernels [3]. [sent-25, score-0.315]
13 Outline of the paper: Section 2 contains the main result of the present paper: the definition of Binet-Cauchy kernels and their efficient computation. [sent-26, score-0.29]
14 Subsequently, section 3 discusses a number of special cases, which allows us to recover well known kernel functions. [sent-27, score-0.165]
15 1 The General Composition Formula We begin by defining compound matrices. [sent-31, score-0.172]
16 Then the compound matrix of order q is defined as n m [Cq (A)]i,j := det(A(ik , jl ))q k,l=1 where i ∈ Iq and j ∈ Iq . [sent-40, score-0.174]
17 0, 1) A second extension of Theorem 2 is to replace matrices by Fredholm operators, as they can be expressed as integral operators with corresponding kernels. [sent-48, score-0.166]
18 Definition 4 (Fredholm Operator) A Fredholm operator is a bounded linear operator between two Hilbert spaces with closed range and whose kernel and co-kernel are finitedimensional. [sent-50, score-0.22]
19 Here the convolution of kernels kA and kB plays the same role as the matrix multiplication. [sent-54, score-0.315]
20 To extend the Binet-Cauchy theorem we need to introduce the analog of compound matrices: Definition 6 (Compound Kernel and Operator) Denote by X, Y ordered sets and let k : X Y X × Y → R. [sent-55, score-0.289]
21 Then the compound kernel of order q is defined as X Y k [q] (x, y) := det(k(xk , yl ))q k,l=1 where x ∈ Iq and y ∈ Iq . [sent-60, score-0.253]
22 (3) If k is the integral kernel of an operator A we define Cq (A) to be the integral operator corresponding to k [q] . [sent-61, score-0.276]
23 2 Kernels The key idea in turning the Binet-Cauchy theorem and its various incarnations into a kernel is to exploit the fact that tr A⊤ B and det A⊤ B are kernels on operators A, B. [sent-72, score-0.963]
24 We extend this by replacing A⊤ B with some functions ψ(A)⊤ ψ(B) involving compound operators. [sent-73, score-0.188]
25 Theorem 8 (Trace and Determinant Kernel) Let A, B : L2 (X) → L2 (Y) be Fredholm operators and let S : L2 (Y) → L2 (Y), T : L2 (X) → L2 (X) be positive trace-class operators. [sent-74, score-0.133]
26 Then the following two kernels are well defined and they satisfy Mercer’s condition: k(A, B) = tr SA⊤ T B ⊤ k(A, B) = det SA T B . [sent-75, score-0.625]
27 (5) (6) Note that determinants are not defined in general for infinite dimensional operators, hence our restriction to Fredholm operators A, B in (6). [sent-76, score-0.213]
28 By virtue of the commutativity of the trace we have ⊤ that k(A, B) = tr [VT AVS ] [VT BVS ] . [sent-79, score-0.174]
29 The remaining terms VT AVS and VT BVS are again Fredholm operators for which determinants are well defined. [sent-81, score-0.213]
30 Next we use special choices of A, B, S, T involving compound operators directly to state the main theorem of our paper. [sent-82, score-0.416]
31 Theorem 9 (Binet-Cauchy Kernel) Under the assumptions of Theorem 8 it follows that for all q ∈ N the kernels k(A, B) = tr Cq SA⊤ T B and k(A, B) = det Cq SA⊤ T B satisfy Mercer’s condition. [sent-83, score-0.625]
32 Finally, we define a kernel based on the Fredholm determinant itself. [sent-86, score-0.194]
33 Fredholm determinants are defined as follows [11]: ∞ µq tr Cq (A). [sent-88, score-0.248]
34 It suggests a kernel involving weighted combinations of the kernels of Theorem 9. [sent-91, score-0.413]
35 Then the following kernel satisfies Mercer’s condition: k(A, B) := D(A⊤ B, µ) where µ > 0. [sent-93, score-0.104]
36 (8) ⊤ D(A B, µ) is a weighted combination of the kernels discussed above. [sent-94, score-0.29]
37 ensures that the series converges even in the case of exponential growth of the values of the compound kernel. [sent-96, score-0.149]
38 3 Efficient Computation At first glance, computing the kernels of Theorem 9 and Corollary 10 presents a formidable computational task even in the finite dimensional case. [sent-98, score-0.29]
39 If A, B ∈ Rm×n , the matrix Cq (A⊤ B) has n rows and columns and each of the entries requires the computation of q a determinant of a q-dimensional matrix. [sent-99, score-0.133]
40 When computing determinants, we can take recourse to Franke’s Theorem [7] which states that n−1 det Cq (A) = (det A)( q−1 ) . [sent-102, score-0.186]
41 (9) n−1 and consequently k(A, B) = det Cq [SA⊤ T B] = (det[SA⊤ T B])( q−1 ) . [sent-103, score-0.186]
42 1 This indicates that the determinant kernel may be of limited use, due to the typically quite high power in the exponent. [sent-104, score-0.194]
43 Kernels building on tr Cq are not plagued by this problem and we give an efficient recursion below. [sent-105, score-0.199]
44 It follows from the ANOVA kernel recursion of [1]: Lemma 11 Denote by A ∈ Cm×m a square matrix and let λ1 , . [sent-106, score-0.198]
45 Then tr Cq (A) can be computed by the following recursion: tr Cq (A) = 1 q q n ¯ ¯ ¯ (−1)j+1 Cq−j (A)Cj (A) where Cq (A) = j=1 λq . [sent-110, score-0.298]
46 Repeated application of the Binet-Cauchy Theorem yields tr Cq (A) = tr Cq (P )Cq (D)Cq (P −1 ) = tr Cq (D)Cq (P −1 )Cq (P ) = tr Cq (D) (11) For a triangular matrix the determinant is the product of its diagonal entries. [sent-113, score-0.803]
47 This is analog to the requirement of the ANOVA kernel of [1]. [sent-115, score-0.104]
48 We can now compute the Jordan normal form of SA⊤ T B in O(n3 ) time and apply Lemma 11 directly to it to compute the kernel value. [sent-117, score-0.104]
49 This is no more expensive than computing tr Cq directly. [sent-119, score-0.149]
50 For this to succeed, we will map various systems such as graphs, dynamical systems, or video sequences into Fredholm operators. [sent-124, score-0.242]
51 A suitable choice of this mapping and of the operators S, T of Theorem 9 will allow us to recover many well-known kernels as special cases. [sent-125, score-0.494]
52 Its time-evolution equations are given by yt = P xt + wt xt = Qxt−1 + vt where wt ∼ N(0, R) where vt ∼ N(0, S). [sent-128, score-0.513]
53 (12a) (12b) Here yt ∈ Rm is observed, xt ∈ Rn is the hidden or latent variable, and P ∈ Rm×n , Q ∈ Rn×n , R ∈ Rm×m and, S ∈ Rn×n , moreover R, S 0. [sent-129, score-0.12]
54 Following the behavioral framework of [16] we associate dynamical systems, X := (P, Q, R, S, x0 ), with their trajectories, that is, the set of yt with t ∈ N for discrete time systems (and t ∈ [0, ∞) for the continuous-time case). [sent-133, score-0.265]
55 (9) can be seen as follows: the compound matrix of an orthogonal matrix is orthogonal and consequently its determinant is unity. [sent-135, score-0.329]
56 Subsequently use an SVD factorization of the argument of the compound operator to compute the determinant of the compound matrix of a diagonal matrix. [sent-136, score-0.517]
57 as linear operators mapping from Rm (the space of observations y) into the time domain (N or [0, ∞)) and vice versa. [sent-137, score-0.143]
58 The diagram below depicts this mapping: X / Traj(X) / Cq (Traj(X)) Finally, Cq (Traj(X)) is weighted in a suitable fashion by operators S and T and the trace is evaluated. [sent-138, score-0.139]
59 2 Dynamical Systems Kernels We begin with kernels on dynamical systems (12) as proposed in [14]. [sent-142, score-0.47]
60 Set S = 1, q = 1 and T to be the diagonal operator with entries e−λt . [sent-143, score-0.122]
61 In this case the Binet-Cauchy kernel between systems X and X ′ becomes ∞ tr Cq (S Traj(X) T Traj(X ′ )⊤ ) = ⊤ ′ e−λt yt yt . [sent-144, score-0.407]
62 (13) i=1 ′ ′ ′ Since yt , yt are random variables, we also need to take expectations over wt , vt , wt , vt . [sent-145, score-0.474]
63 Some tedious yet straightforward algebra [14] allows us to compute (13) as follows: 1 tr [SM2 + R] , (14) k(X, X ′ ) = x⊤ M1 x′ + λ 0 0 e −1 where M1 , M2 satisfy the Sylvester equations: M1 = e−λ Q⊤ P ⊤ P ′ Q′ + e−λ Q⊤ M1 Q′ and M2 = P ⊤ P ′ + e−λ Q⊤ M2 Q′ . [sent-146, score-0.149]
64 (15) 3 Such kernels can be computed in O(n ) time [5]. [sent-147, score-0.29]
65 In Section 4 we will use this kernel to compute similarities between video sequences, after having encoded the latter as a dynamical system. [sent-149, score-0.301]
66 This will allow us to compare sequences of different length, as they are all mapped to dynamical systems in the first place. [sent-150, score-0.182]
67 3 Martin Kernel A characteristic property of (14) is that it takes initial conditions of the dynamical system into account. [sent-152, score-0.137]
68 If this is not desired, one may choose to pick only the subspace spanned by the trajectory yt . [sent-153, score-0.131]
69 Then it can be easily verified the kernel proposed by [10] can be written as k(X, X ′ ) = tr Cq (SQX T Q⊤ ′ ) = det(QX Q⊤ ′ ). [sent-157, score-0.253]
70 Subsequently Wolf and Shashua [17] modified (16) to allow for kernels: to deal with determinants on a possibly infinitedimensional feature space they simply project the trajectories on a reduced set of points in feature space. [sent-159, score-0.138]
71 2 Martin [10] suggested the use of Cepstrum coefficients of a dynamical system to define a Euclidean metric. [sent-161, score-0.137]
72 4 Graph Kernels Yet another class of kernels can be seen to fall into this category: the graph kernels proposed in [6, 13, 9, 8]. [sent-165, score-0.626]
73 For recovering these kernels from our framework we set q = 1 and systematically map graph kernels to dynamical systems. [sent-168, score-0.763]
74 The timeevolution xt → xt+1 occurs by performing a random walk on the graph G(V, E). [sent-170, score-0.118]
75 This yields xt+1 = W D−1 xt , where W is the connectivity matrix of the graph and D is a diagonal matrix where Dii denotes the outdegree of vertex i. [sent-171, score-0.195]
76 In the graph kernels of [9, 13] one assumes that the variables xt are directly observed and no special mapping is required in order to obtain yt . [sent-173, score-0.518]
77 • The inverse Graph-Laplacian kernel proposed in [13] uses a weighted combination of diffusion process and corresponds to S = W a diagonal weight matrix. [sent-176, score-0.221]
78 If we associate this mapping from states to labels with the matrix P of (12), set T = 1 and let S be the projector on the first n time instances, we recover the kernels from [6]. [sent-178, score-0.391]
79 So far, we deliberately made no distinction between kernels on graphs and kernels between graphs. [sent-179, score-0.61]
80 This is for good reason: the trajectories depend on both initial conditions and the dynamical system itself. [sent-180, score-0.176]
81 Consequently, whenever we want to consider kernels between initial conditions, we choose the same dynamical system in both cases. [sent-181, score-0.427]
82 Conversely, whenever we want to consider kernels between dynamical systems, we average over initial conditions. [sent-182, score-0.427]
83 This is what allows us to cover all the aforementioned kernels in one framework. [sent-183, score-0.316]
84 5 Extensions Obviously the aforementioned kernels are just specific instances of what is possible by using kernels of Theorem 9. [sent-185, score-0.606]
85 While it is pretty much impossible to enumerate all combinations, we give a list of suggestions for possible kernels below: • Use the continuous-time diffusion process and a partially observable model. [sent-186, score-0.385]
86 This would extend the diffusion kernels of [9] to comparisons between vertices of a labeled graph (e. [sent-187, score-0.456]
87 • Compute the determinant of the trajectory associated with an n-step random walk on a graph, that is use Cq with q = n instead of C1 . [sent-191, score-0.138]
88 • Use a nonlinear version of the dynamical system as described in [14]. [sent-194, score-0.137]
89 4 Experiments To test the utility of our kernels we applied it to the task of clustering short video clips. [sent-195, score-0.35]
90 We randomly sampled 480 short clips from the movie Kill Bill and model them as linear ARMA models (see Section 3. [sent-196, score-0.142]
91 The sub-optimal procedure outlined in [4] was used for estimating the model parameters P , Q, R and, S and the kernels described in Section 3. [sent-198, score-0.29]
92 Locally Linear Embedding (LLE) [12] was used to cluster and embed the clips in two dimensions. [sent-200, score-0.142]
93 We randomly selected a few data points from Figure 1 and depict the first frame of the corresponding clips in Figure 2. [sent-202, score-0.182]
94 This corresponds to clips which are temdom clips from Kill Bill porally close to each other and hence have similar dynamics. [sent-204, score-0.284]
95 For instance clips in the far right depict a person rolling in the snow while those in the far left corner depict a sword fight while clips in the center involve conversations between two characters. [sent-205, score-0.364]
96 A naiv´ comparison of the ine tensity values or a dot product of the actual clips would not be able to extract such semantic information. [sent-206, score-0.142]
97 png 5 Discussion In this paper, we introduced a unifying framework for defining kernels on discrete objects using the Binet-Cauchy theorem on compounds of the Fredholm operators. [sent-214, score-0.436]
98 We demonstrated that many of the previously known kernels can be explained neatly by our framework. [sent-215, score-0.29]
99 In particular many graph kernels and dynamical system related kernels fall out as natural special cases. [sent-216, score-0.796]
100 The main advantage of our unifying framework is that it allows kernel engineers to use domain knowledge in a principled way to design kernels for solving real life problems. [sent-217, score-0.418]
wordName wordTfidf (topN-words)
[('cq', 0.65), ('fredholm', 0.32), ('kernels', 0.29), ('det', 0.186), ('compound', 0.149), ('tr', 0.149), ('clips', 0.142), ('traj', 0.142), ('iq', 0.141), ('dynamical', 0.137), ('vt', 0.137), ('operators', 0.114), ('kernel', 0.104), ('theorem', 0.101), ('determinants', 0.099), ('determinant', 0.09), ('ka', 0.089), ('sa', 0.084), ('diffusion', 0.071), ('yt', 0.067), ('video', 0.06), ('wolf', 0.059), ('operator', 0.058), ('rm', 0.054), ('arma', 0.053), ('avs', 0.053), ('bvs', 0.053), ('qx', 0.053), ('xt', 0.053), ('shashua', 0.053), ('recursion', 0.05), ('australia', 0.05), ('lle', 0.047), ('triangular', 0.046), ('diagonal', 0.046), ('graph', 0.046), ('depict', 0.04), ('trajectories', 0.039), ('automatica', 0.036), ('cock', 0.036), ('doretto', 0.036), ('kill', 0.036), ('semiring', 0.036), ('semirings', 0.036), ('sylvester', 0.036), ('subspace', 0.035), ('vs', 0.035), ('rn', 0.033), ('wt', 0.033), ('composition', 0.033), ('special', 0.033), ('vidal', 0.031), ('anova', 0.031), ('bill', 0.031), ('vishwanathan', 0.031), ('mercer', 0.03), ('graphs', 0.03), ('trajectory', 0.029), ('mapping', 0.029), ('vertices', 0.029), ('angles', 0.029), ('kb', 0.028), ('australian', 0.028), ('ict', 0.028), ('recover', 0.028), ('integral', 0.028), ('aforementioned', 0.026), ('subsequently', 0.026), ('trace', 0.025), ('xq', 0.025), ('continuum', 0.025), ('rational', 0.025), ('matrix', 0.025), ('sequences', 0.025), ('unifying', 0.024), ('matrices', 0.024), ('observable', 0.024), ('rl', 0.024), ('rx', 0.024), ('begin', 0.023), ('marginalized', 0.023), ('qr', 0.022), ('embeddings', 0.022), ('encouraging', 0.022), ('smola', 0.022), ('replaced', 0.021), ('discrete', 0.021), ('systems', 0.02), ('behavioral', 0.02), ('strings', 0.02), ('extend', 0.02), ('orthogonal', 0.02), ('involving', 0.019), ('let', 0.019), ('walk', 0.019), ('warmuth', 0.019), ('exploit', 0.019), ('likewise', 0.018), ('martin', 0.018), ('entries', 0.018)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000002 30 nips-2004-Binet-Cauchy Kernels
Author: Alex J. Smola, S.v.n. Vishwanathan
Abstract: We propose a family of kernels based on the Binet-Cauchy theorem and its extension to Fredholm operators. This includes as special cases all currently known kernels derived from the behavioral framework, diffusion processes, marginalized kernels, kernels on graphs, and the kernels on sets arising from the subspace angle approach. Many of these kernels can be seen as the extrema of a new continuum of kernel functions, which leads to numerous new special cases. As an application, we apply the new class of kernels to the problem of clustering of video sequences with encouraging results. 1
2 0.18285131 133 nips-2004-Nonparametric Transforms of Graph Kernels for Semi-Supervised Learning
Author: Xiaojin Zhu, Jaz Kandola, Zoubin Ghahramani, John D. Lafferty
Abstract: We present an algorithm based on convex optimization for constructing kernels for semi-supervised learning. The kernel matrices are derived from the spectral decomposition of graph Laplacians, and combine labeled and unlabeled data in a systematic fashion. Unlike previous work using diffusion kernels and Gaussian random field kernels, a nonparametric kernel approach is presented that incorporates order constraints during optimization. This results in flexible kernels and avoids the need to choose among different parametric forms. Our approach relies on a quadratically constrained quadratic program (QCQP), and is computationally feasible for large datasets. We evaluate the kernels on real datasets using support vector machines, with encouraging results. 1
3 0.13509969 110 nips-2004-Matrix Exponential Gradient Updates for On-line Learning and Bregman Projection
Author: Koji Tsuda, Gunnar Rätsch, Manfred K. Warmuth
Abstract: We address the problem of learning a symmetric positive definite matrix. The central issue is to design parameter updates that preserve positive definiteness. Our updates are motivated with the von Neumann divergence. Rather than treating the most general case, we focus on two key applications that exemplify our methods: On-line learning with a simple square loss and finding a symmetric positive definite matrix subject to symmetric linear constraints. The updates generalize the Exponentiated Gradient (EG) update and AdaBoost, respectively: the parameter is now a symmetric positive definite matrix of trace one instead of a probability vector (which in this context is a diagonal positive definite matrix with trace one). The generalized updates use matrix logarithms and exponentials to preserve positive definiteness. Most importantly, we show how the analysis of each algorithm generalizes to the non-diagonal case. We apply both new algorithms, called the Matrix Exponentiated Gradient (MEG) update and DefiniteBoost, to learn a kernel matrix from distance measurements.
4 0.11869188 168 nips-2004-Semigroup Kernels on Finite Sets
Author: Marco Cuturi, Jean-philippe Vert
Abstract: Complex objects can often be conveniently represented by finite sets of simpler components, such as images by sets of patches or texts by bags of words. We study the class of positive definite (p.d.) kernels for two such objects that can be expressed as a function of the merger of their respective sets of components. We prove a general integral representation of such kernels and present two particular examples. One of them leads to a kernel for sets of points living in a space endowed itself with a positive definite kernel. We provide experimental results on a benchmark experiment of handwritten digits image classification which illustrate the validity of the approach. 1
5 0.096023709 18 nips-2004-Algebraic Set Kernels with Application to Inference Over Local Image Representations
Author: Amnon Shashua, Tamir Hazan
Abstract: This paper presents a general family of algebraic positive definite similarity functions over spaces of matrices with varying column rank. The columns can represent local regions in an image (whereby images have varying number of local parts), images of an image sequence, motion trajectories in a multibody motion, and so forth. The family of set kernels we derive is based on a group invariant tensor product lifting with parameters that can be naturally tuned to provide a cook-book of sorts covering the possible ”wish lists” from similarity measures over sets of varying cardinality. We highlight the strengths of our approach by demonstrating the set kernels for visual recognition of pedestrians using local parts representations. 1
6 0.095536828 42 nips-2004-Computing regularization paths for learning multiple kernels
7 0.078262031 206 nips-2004-Worst-Case Analysis of Selective Sampling for Linear-Threshold Algorithms
8 0.07748913 103 nips-2004-Limits of Spectral Clustering
9 0.068332016 94 nips-2004-Kernels for Multi--task Learning
10 0.063136309 97 nips-2004-Learning Efficient Auditory Codes Using Spikes Predicts Cochlear Filters
11 0.062385574 148 nips-2004-Probabilistic Computation in Spiking Populations
12 0.061313424 189 nips-2004-The Power of Selective Memory: Self-Bounded Learning of Prediction Suffix Trees
13 0.061265331 79 nips-2004-Hierarchical Eigensolver for Transition Matrices in Spectral Methods
14 0.058426231 83 nips-2004-Incremental Learning for Visual Tracking
15 0.053960685 113 nips-2004-Maximum-Margin Matrix Factorization
16 0.053539716 177 nips-2004-Supervised Graph Inference
17 0.05314656 165 nips-2004-Semi-supervised Learning on Directed Graphs
18 0.052275877 19 nips-2004-An Application of Boosting to Graph Classification
19 0.048984021 91 nips-2004-Joint Tracking of Pose, Expression, and Texture using Conditionally Gaussian Filters
20 0.048349828 188 nips-2004-The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space
topicId topicWeight
[(0, -0.163), (1, 0.046), (2, 0.027), (3, 0.068), (4, -0.042), (5, -0.186), (6, -0.104), (7, -0.133), (8, 0.032), (9, -0.091), (10, -0.114), (11, -0.061), (12, -0.001), (13, 0.079), (14, 0.051), (15, 0.089), (16, 0.085), (17, -0.042), (18, 0.057), (19, -0.039), (20, -0.009), (21, -0.005), (22, 0.016), (23, 0.003), (24, -0.144), (25, 0.015), (26, 0.065), (27, -0.077), (28, -0.037), (29, 0.046), (30, -0.037), (31, -0.036), (32, -0.012), (33, -0.03), (34, -0.016), (35, -0.048), (36, 0.058), (37, 0.121), (38, 0.059), (39, -0.098), (40, 0.093), (41, 0.054), (42, -0.032), (43, 0.006), (44, -0.125), (45, -0.092), (46, -0.013), (47, -0.034), (48, -0.048), (49, 0.022)]
simIndex simValue paperId paperTitle
same-paper 1 0.9572103 30 nips-2004-Binet-Cauchy Kernels
Author: Alex J. Smola, S.v.n. Vishwanathan
Abstract: We propose a family of kernels based on the Binet-Cauchy theorem and its extension to Fredholm operators. This includes as special cases all currently known kernels derived from the behavioral framework, diffusion processes, marginalized kernels, kernels on graphs, and the kernels on sets arising from the subspace angle approach. Many of these kernels can be seen as the extrema of a new continuum of kernel functions, which leads to numerous new special cases. As an application, we apply the new class of kernels to the problem of clustering of video sequences with encouraging results. 1
2 0.7219159 94 nips-2004-Kernels for Multi--task Learning
Author: Charles A. Micchelli, Massimiliano Pontil
Abstract: This paper provides a foundation for multi–task learning using reproducing kernel Hilbert spaces of vector–valued functions. In this setting, the kernel is a matrix–valued function. Some explicit examples will be described which go beyond our earlier results in [7]. In particular, we characterize classes of matrix– valued kernels which are linear and are of the dot product or the translation invariant type. We discuss how these kernels can be used to model relations between the tasks and present linear multi–task learning algorithms. Finally, we present a novel proof of the representer theorem for a minimizer of a regularization functional which is based on the notion of minimal norm interpolation. 1
3 0.72143722 133 nips-2004-Nonparametric Transforms of Graph Kernels for Semi-Supervised Learning
Author: Xiaojin Zhu, Jaz Kandola, Zoubin Ghahramani, John D. Lafferty
Abstract: We present an algorithm based on convex optimization for constructing kernels for semi-supervised learning. The kernel matrices are derived from the spectral decomposition of graph Laplacians, and combine labeled and unlabeled data in a systematic fashion. Unlike previous work using diffusion kernels and Gaussian random field kernels, a nonparametric kernel approach is presented that incorporates order constraints during optimization. This results in flexible kernels and avoids the need to choose among different parametric forms. Our approach relies on a quadratically constrained quadratic program (QCQP), and is computationally feasible for large datasets. We evaluate the kernels on real datasets using support vector machines, with encouraging results. 1
4 0.65334398 168 nips-2004-Semigroup Kernels on Finite Sets
Author: Marco Cuturi, Jean-philippe Vert
Abstract: Complex objects can often be conveniently represented by finite sets of simpler components, such as images by sets of patches or texts by bags of words. We study the class of positive definite (p.d.) kernels for two such objects that can be expressed as a function of the merger of their respective sets of components. We prove a general integral representation of such kernels and present two particular examples. One of them leads to a kernel for sets of points living in a space endowed itself with a positive definite kernel. We provide experimental results on a benchmark experiment of handwritten digits image classification which illustrate the validity of the approach. 1
5 0.61619687 18 nips-2004-Algebraic Set Kernels with Application to Inference Over Local Image Representations
Author: Amnon Shashua, Tamir Hazan
Abstract: This paper presents a general family of algebraic positive definite similarity functions over spaces of matrices with varying column rank. The columns can represent local regions in an image (whereby images have varying number of local parts), images of an image sequence, motion trajectories in a multibody motion, and so forth. The family of set kernels we derive is based on a group invariant tensor product lifting with parameters that can be naturally tuned to provide a cook-book of sorts covering the possible ”wish lists” from similarity measures over sets of varying cardinality. We highlight the strengths of our approach by demonstrating the set kernels for visual recognition of pedestrians using local parts representations. 1
6 0.53241956 110 nips-2004-Matrix Exponential Gradient Updates for On-line Learning and Bregman Projection
7 0.51861328 79 nips-2004-Hierarchical Eigensolver for Transition Matrices in Spectral Methods
8 0.48087287 177 nips-2004-Supervised Graph Inference
9 0.46467179 165 nips-2004-Semi-supervised Learning on Directed Graphs
10 0.45182982 201 nips-2004-Using the Equivalent Kernel to Understand Gaussian Process Regression
11 0.41730675 60 nips-2004-Efficient Kernel Machines Using the Improved Fast Gauss Transform
12 0.37070742 42 nips-2004-Computing regularization paths for learning multiple kernels
13 0.36128545 97 nips-2004-Learning Efficient Auditory Codes Using Spikes Predicts Cochlear Filters
14 0.35701746 185 nips-2004-The Convergence of Contrastive Divergences
15 0.35475409 141 nips-2004-Optimal sub-graphical models
16 0.34261554 148 nips-2004-Probabilistic Computation in Spiking Populations
17 0.32875448 19 nips-2004-An Application of Boosting to Graph Classification
18 0.3193126 103 nips-2004-Limits of Spectral Clustering
19 0.30754238 93 nips-2004-Kernel Projection Machine: a New Tool for Pattern Recognition
20 0.30615866 12 nips-2004-A Temporal Kernel-Based Model for Tracking Hand Movements from Neural Activities
topicId topicWeight
[(13, 0.073), (15, 0.14), (25, 0.014), (26, 0.052), (27, 0.018), (31, 0.017), (33, 0.115), (35, 0.018), (39, 0.039), (50, 0.03), (76, 0.351)]
simIndex simValue paperId paperTitle
Author: Tobias Blaschke, Laurenz Wiskott
Abstract: In contrast to the equivalence of linear blind source separation and linear independent component analysis it is not possible to recover the original source signal from some unknown nonlinear transformations of the sources using only the independence assumption. Integrating the objectives of statistical independence and temporal slowness removes this indeterminacy leading to a new method for nonlinear blind source separation. The principle of temporal slowness is adopted from slow feature analysis, an unsupervised method to extract slowly varying features from a given observed vectorial signal. The performance of the algorithm is demonstrated on nonlinearly mixed speech data. 1
same-paper 2 0.80126905 30 nips-2004-Binet-Cauchy Kernels
Author: Alex J. Smola, S.v.n. Vishwanathan
Abstract: We propose a family of kernels based on the Binet-Cauchy theorem and its extension to Fredholm operators. This includes as special cases all currently known kernels derived from the behavioral framework, diffusion processes, marginalized kernels, kernels on graphs, and the kernels on sets arising from the subspace angle approach. Many of these kernels can be seen as the extrema of a new continuum of kernel functions, which leads to numerous new special cases. As an application, we apply the new class of kernels to the problem of clustering of video sequences with encouraging results. 1
3 0.68693435 142 nips-2004-Outlier Detection with One-class Kernel Fisher Discriminants
Author: Volker Roth
Abstract: The problem of detecting “atypical objects” or “outliers” is one of the classical topics in (robust) statistics. Recently, it has been proposed to address this problem by means of one-class SVM classifiers. The main conceptual shortcoming of most one-class approaches, however, is that in a strict sense they are unable to detect outliers, since the expected fraction of outliers has to be specified in advance. The method presented in this paper overcomes this problem by relating kernelized one-class classification to Gaussian density estimation in the induced feature space. Having established this relation, it is possible to identify “atypical objects” by quantifying their deviations from the Gaussian model. For RBF kernels it is shown that the Gaussian model is “rich enough” in the sense that it asymptotically provides an unbiased estimator for the true density. In order to overcome the inherent model selection problem, a cross-validated likelihood criterion for selecting all free model parameters is applied. 1
4 0.65736991 60 nips-2004-Efficient Kernel Machines Using the Improved Fast Gauss Transform
Author: Changjiang Yang, Ramani Duraiswami, Larry S. Davis
Abstract: The computation and memory required for kernel machines with N training samples is at least O(N 2 ). Such a complexity is significant even for moderate size problems and is prohibitive for large datasets. We present an approximation technique based on the improved fast Gauss transform to reduce the computation to O(N ). We also give an error bound for the approximation, and provide experimental results on the UCI datasets. 1
5 0.58923209 104 nips-2004-Linear Multilayer Independent Component Analysis for Large Natural Scenes
Author: Yoshitatsu Matsuda, Kazunori Yamaguchi
Abstract: In this paper, linear multilayer ICA (LMICA) is proposed for extracting independent components from quite high-dimensional observed signals such as large-size natural scenes. There are two phases in each layer of LMICA. One is the mapping phase, where a one-dimensional mapping is formed by a stochastic gradient algorithm which makes more highlycorrelated (non-independent) signals be nearer incrementally. Another is the local-ICA phase, where each neighbor (namely, highly-correlated) pair of signals in the mapping is separated by the MaxKurt algorithm. Because LMICA separates only the highly-correlated pairs instead of all ones, it can extract independent components quite efficiently from appropriate observed signals. In addition, it is proved that LMICA always converges. Some numerical experiments verify that LMICA is quite efficient and effective in large-size natural image processing.
6 0.54074317 5 nips-2004-A Harmonic Excitation State-Space Approach to Blind Separation of Speech
7 0.5098871 168 nips-2004-Semigroup Kernels on Finite Sets
8 0.50915945 16 nips-2004-Adaptive Discriminative Generative Model and Its Applications
9 0.50903225 18 nips-2004-Algebraic Set Kernels with Application to Inference Over Local Image Representations
10 0.50666273 25 nips-2004-Assignment of Multiplicative Mixtures in Natural Images
11 0.5049181 31 nips-2004-Blind One-microphone Speech Separation: A Spectral Learning Approach
12 0.50349736 177 nips-2004-Supervised Graph Inference
13 0.49858925 188 nips-2004-The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space
14 0.49668255 178 nips-2004-Support Vector Classification with Input Data Uncertainty
15 0.49560511 14 nips-2004-A Topographic Support Vector Machine: Classification Using Local Label Configurations
16 0.49549249 124 nips-2004-Multiple Alignment of Continuous Time Series
17 0.49545494 201 nips-2004-Using the Equivalent Kernel to Understand Gaussian Process Regression
18 0.49507305 181 nips-2004-Synergies between Intrinsic and Synaptic Plasticity in Individual Model Neurons
19 0.49470252 93 nips-2004-Kernel Projection Machine: a New Tool for Pattern Recognition
20 0.49408531 182 nips-2004-Synergistic Face Detection and Pose Estimation with Energy-Based Models