jmlr jmlr2007 jmlr2007-46 knowledge-graph by maker-knowledge-mining

46 jmlr-2007-Learning Equivariant Functions with Matrix Valued Kernels


Source: pdf

Author: Marco Reisert, Hans Burkhardt

Abstract: This paper presents a new class of matrix valued kernels that are ideally suited to learn vector valued equivariant functions. Matrix valued kernels are a natural generalization of the common notion of a kernel. We set the theoretical foundations of so called equivariant matrix valued kernels. We work out several properties of equivariant kernels, we give an interpretation of their behavior and show relations to scalar kernels. The notion of (ir)reducibility of group representations is transferred into the framework of matrix valued kernels. At the end to two exemplary applications are demonstrated. We design a non-linear rotation and translation equivariant filter for 2D-images and propose an invariant object detector based on the generalized Hough transform. Keywords: kernel methods, matrix kernels, equivariance, group integration, representation theory, Hough transform, signal processing, Volterra series

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 DE LMB, Georges-Koehler-Allee 52 Albert-Ludwig University 79110 Freiburg, Germany Editor: Leslie Pack Kaelbing Abstract This paper presents a new class of matrix valued kernels that are ideally suited to learn vector valued equivariant functions. [sent-5, score-1.274]

2 Matrix valued kernels are a natural generalization of the common notion of a kernel. [sent-6, score-0.337]

3 We set the theoretical foundations of so called equivariant matrix valued kernels. [sent-7, score-0.937]

4 We work out several properties of equivariant kernels, we give an interpretation of their behavior and show relations to scalar kernels. [sent-8, score-0.8]

5 The notion of (ir)reducibility of group representations is transferred into the framework of matrix valued kernels. [sent-9, score-0.418]

6 We design a non-linear rotation and translation equivariant filter for 2D-images and propose an invariant object detector based on the generalized Hough transform. [sent-11, score-0.924]

7 Keywords: kernel methods, matrix kernels, equivariance, group integration, representation theory, Hough transform, signal processing, Volterra series 1. [sent-12, score-0.314]

8 The notion of a kernel as a kind of similarity between two given patterns can be naturally generalized to matrix valued kernels. [sent-17, score-0.391]

9 A matrix valued kernel can carry more information than only similarity, like information about the relative pose or configuration of the two patterns. [sent-18, score-0.44]

10 First Burbea and Masani (1984) studied the theory of vector-valued reproducing kernel Hilbert spaces leading to the notion of a matrix (or operator) valued kernel. [sent-20, score-0.391]

11 In this paper we introduce a new class of matrix valued kernels with a specific transformation behavior. [sent-22, score-0.39]

12 The new type of kernel is motivated by a vector valued regression problem. [sent-23, score-0.338]

13 In our words the term ’time-invariant’ means that the filter is equivariant to the group of time-shifts. [sent-26, score-0.764]

14 We give a constructive way to obtain kernels, whose linear combinations lead to equivariant functions. [sent-30, score-0.662]

15 The paper is organized as follows: In Section 2 we give a first motivation for our kernels by solving an equivariant regression problem. [sent-36, score-0.777]

16 We give an illustrative interpretation of the kernels and present an equivariant Representer Theorem, which justifies our lax motivation from the beginning. [sent-37, score-0.777]

17 In Section 3 we give rules how to construct matrix kernels out of matrix kernels and give constraints how to preserve equivariance. [sent-39, score-0.336]

18 Section 4 introduces the notion of irreducibility for kernels and show how the traces of matrix valued kernels are related to scalar valued kernels. [sent-40, score-0.897]

19 Two possible application of matrix valued kernels are given in Section 5: a rotation equivariant non-linear filter and a rotation invariant object detector. [sent-41, score-1.312]

20 Then we give an interpretation of the proposed kernels and present an equivariant Representer Theorem. [sent-46, score-0.777]

21 An equivariant function is a function fulfilling f(gx) = gf(x). [sent-60, score-0.662]

22 Due to the unitary group representation the inner product is invariant in the sense gx1 |gx2 X = x1 |x2 X . [sent-63, score-0.463]

23 A scalar valued kernel is a symmetric, positive definite function k : X × X → R fulfilling the Mercer property. [sent-64, score-0.476]

24 n} in an equivariant manner, that is, the function has to satisfy f(gx) = gf(x) for all g ∈ G , while f(x i ) = yi should be fulfilled as accurate as possible. [sent-73, score-0.662]

25 To obtain an equivariant behavior we pretend to present the training samples in all possible poses {(gx i , gyi )|i = 1. [sent-75, score-0.662]

26 1 Every function of form (2) is an equivariant function with respect to G . [sent-81, score-0.662]

27 Proof Due to the invariance of the kernel we have f(gx) = Z ∑ k(x, g−1g xi ) g ai dg , G i and using the unimodularity of G we reparametrize the integral by h = g −1 g and obtain the desired result f(gx) = Z ∑ k(x, hxi ) ghai dh = gf(x), G i using the linearity of the group representation. [sent-82, score-0.593]

28 So we have a constructive way to obtain equivariant functions. [sent-83, score-0.662]

29 Since we can exchange summation with integration and the group action is linear we can rewrite (2) by n f(x) = ∑ i=1 Z G k(x, gxi ) ρg dg ai , where we can interpret the expression inside the brackets as a matrix valued kernel. [sent-84, score-0.696]

30 The following properties hold for a function above, a) For every x1 , x2 ∈ X and g, h ∈ G , we have that K(gx1 , hx2 ) = ρg K(x1 , x2 ) ρ† , h that is, K is equivariant in the first argument and anti-equivariant in the second, we say K is equivariant. [sent-87, score-0.662]

31 For b) we use the invariance and symmetry of k and the unimodularity of G and get K(x1 , x2 ) = Z G k(g x1 , x2 ) ρg dg = −1 Z G k(x2 , gx1 ) ρg−1 dg = K(x2 , x1 )† , as asserted. [sent-91, score-0.518]

32 There are basically two ways to introduce matrix valued kernels in almost the same manner as for scalar-valued kernels. [sent-92, score-0.415]

33 As the upper expression is linear in y one can define a linear operator in Y , the so called matrix valued kernel K(x1 , x2 ) ∈ L(Y ), by the following K(x1 , x2 )y := (Kx2 y)(x1 ) equation. [sent-95, score-0.428]

34 A second possibility to introduce the notion of a matrix valued kernel is to demand the existence of a feature mapping into a certain feature space. [sent-96, score-0.391]

35 For example for scalar valued kernels this is done by Shawe-Taylor and Cristianini (2004). [sent-97, score-0.475]

36 For matrix valued kernels the feature space can be identified as the tensor product space of linear mappings on Y times an arbitrary high dimensional feature space. [sent-98, score-0.48]

37 Proof To show this, we give the feature mapping Ψ by the use of the feature mapping Φ corresponding to the scalar kernel k and show that the GIM-kernel can be written as a positive matrix valued inner product in the new feature space H = F ⊗ L(Y ). [sent-127, score-0.611]

38 The matrix valued inner product in H is given by the rule Φ1 ⊗ ρ1 |Φ2 ⊗ ρ2 H := ρ† ρ2 Φ1 |Φ2 F 1 R and its linear extension, that is, it is a sesqui-linear mapping of type H × H → L(Y ). [sent-129, score-0.357]

39 Using the above rule for the inner product we can compute Ψ(x1 )|Ψ(x2 ) H = = 1 µ(G ) 1 µ(G ) Z Φ(gx1 ) ⊗ ρg |Φ(hx2 ) ⊗ ρh H dg dh Z Φ(gx1 )|Φ(hx2 ) G2 G2 389 F ρg−1 h dg dh. [sent-133, score-0.537]

40 Y ≥0 R EISERT AND B URKHARDT Inserting the scalar kernel k and reparametrizing by g = g−1 h gives 1 Ψ(x1 )|Ψ(x2 ) H = µ(G ) Z k(x1 , g x2 )ρg dg dh = G2 Z G k(x1 , gx2 )ρg dg = K(x1 , x2 ) which is the desired result. [sent-134, score-0.709]

41 3 Examples and Interpretation of Equivariant Kernels To get more intuition how equivariant kernels, not necessarily GIM-kernels, behave, we want to sketch how such kernels can be interpreted as estimates of the relative pose of two objects. [sent-136, score-0.85]

42 First we consider the probably most simple equivariant kernel, the complex hermitian inner product itself. [sent-137, score-0.771]

43 If we define the representation of U(1) acting on the Cn by a simple scalar multiplication with a complex unit number, then the ordinary inner product is obviously equivariant, that is g1 x1 |g2 x2 = eiφ1 x1 |eiφ2 x2 = eiφ1 x1 |x2 e−iφ2 = g1 x1 |x2 g† . [sent-139, score-0.36]

44 Actually this result can be generalized for equivariant kernels. [sent-145, score-0.662]

45 Interpreting this result, we can say that the sum of the singular values of a equivariant kernel expresses the similarity of the two given objects, while the unitary parts U and V contain information about the relative pose of the objects. [sent-149, score-0.995]

46 The alignment with respect to those is a very common procedure in image processing applications to obtain complete invariant feature sets for the cyclic translation group Cn or other abelian groups like the two dimensional rotations SO(2) (see, for example, Canterakis, 1986). [sent-166, score-0.402]

47 We did not clarify whether this is the optimal solution to obtain an equivariant behavior. [sent-171, score-0.662]

48 It is easy to check that the subspace E ⊂ Zk of equivariant functions is a linear subspace. [sent-173, score-0.662]

49 Proof Applying the projection on (1) yields (ΠE f)(x) = = 1 µ(G ) 1 µ(G ) n Z 1 ∑ k(gx, xi ) g ai dg = µ(G ) i=1 Z n ∑ k(x, hxi )hai dh, G −1 Z ∑ k(x, g−1xi ) g−1 ai dg G i=1 n G i=1 where we used the substitution h = g−1 . [sent-180, score-0.544]

50 In fact, this projection is an unitary projection with respect to the naturally induced scalar product in the feature space. [sent-181, score-0.414]

51 Before stating this more precisely we show how vector valued functions 391 R EISERT AND B URKHARDT which are based on scalar kernels (as introduced in Eq. [sent-182, score-0.475]

52 (5) ˜ By the use of this we define a new operator ΠE acting on F ⊗ Y and show that it corresponds to ΠE and that it is unitary with respect to the inner product given in Eq. [sent-196, score-0.323]

53 7 If ΠE is given by ˜ ΠE Ψf := 1 gΨf dg µ(G ) G Z ˜ ˜ then ΠE Ψf = ΨΠE f holds and ΠE is a unitary projection. [sent-199, score-0.381]

54 ˜ In conclusion, due to the unitartity the projection ΠE gives the best equivariant approximation of an arbitrary function in Zk . [sent-206, score-0.698]

55 The space of equivariant function E is nothing else than the space of fix points with respect to the group action defined in Eq. [sent-207, score-0.764]

56 Then each equivariant minimizer f ∈ Zk of R(f) = c(x1 , y1 , f(x1 ), . [sent-213, score-0.662]

57 Every equivariant function in Zk is a projection of the form ΠE f. [sent-222, score-0.698]

58 Note that the above theorem makes only statements for vector valued function spaces induced by a scalar kernel. [sent-230, score-0.36]

59 There are also matrix valued kernels that are not based on a scalar kernels, that is, GIM-kernels are not the only way to obtain equivariant kernels. [sent-231, score-1.19]

60 In Section 4 we will work out that one important class of equivariant kernels are GIM-kernels, namely those kernels whose corresponding group representation is irreducible. [sent-232, score-1.037]

61 In fact, Volterra series of degree n can be modeled with polynomial matrix kernels of degree n, that is, the scalar basis kernel is k(x1 , x2 ) = (1 + x1 |x2 )n . [sent-246, score-0.422]

62 Constructing Kernels Similar to scalar valued kernels there are also several building rules to obtain new matrix and scalar valued kernels from existing ones. [sent-253, score-1.003]

63 In particular, the rule for building a kernel out of two kernels by multiplying them splits into two rules, either using the tensor product or the matrix product. [sent-255, score-0.35]

64 1 (Closure Properties) Let K1 : X × X → L(Y ) and K2 : X × X → L(Y ) be matrix valued kernels and A ∈ L(V , Y ) with full row-rank, then the following functions are kernels. [sent-257, score-0.39]

65 4, where we mentioned that the trace of a positive matrix valued product is an ordinary positive scalar inner product. [sent-269, score-0.527]

66 Let us have a look how such rules can be applied to equivariant kernels and under what circumstances equivariance is preserved. [sent-270, score-0.861]

67 Since ρg is again an unitary representation, K is an equivariant kernel. [sent-274, score-0.83]

68 Rule c) can also be used to construct equivariant kernels. [sent-275, score-0.662]

69 Supposing an equivariant kernel K1 and an invariant scalar kernel k in sense that k(x1 , gx2 ) = k(x1 , x2 ) for all g ∈ G , then rule c) implies that K = K1 ⊗ k = K1 k is also a kernel and in fact equivariant. [sent-276, score-1.216]

70 2 If K is a normal equivariant kernel, then k = tr(K † K) is an invariant kernel in the following sense k(x1 , gx2 ) = k(g x1 , x2 ) = k(x1 , x2 ) for all g, g ∈ G . [sent-284, score-0.846]

71 A matrix kernel which is diagonal seems to be the most appropriate; this means that the kernel has to be diagonal for every pair of input patterns. [sent-291, score-0.335]

72 For abelian groups the basis function on which the irreducible representation are working have the same formal appearance as the irreducible representations itself which might be confusing. [sent-308, score-0.528]

73 If an equivariant kernel transforms by an irreducible representation, then it can be deduced that also the kernel itself is irreducible. [sent-316, score-1.07]

74 But the opposite direction is not true in general, otherwise any equivariant kernel would be a GIM-kernel. [sent-317, score-0.778]

75 2 If the representation ρ of a strictly positive definite, equivariant kernel K is irreducible then K is also irreducible. [sent-321, score-0.997]

76 / Since we know from representation theory (Gaal, 1973) that any unitary representation can be decomposed in a direct sum of irreducible representations we can make a similar statement for kernels. [sent-328, score-0.493]

77 3 Any GIM-kernel K can be decomposed in a direct sum of irreducible GIM-kernels associated with its irreducible representations. [sent-330, score-0.352]

78 By the Peter-Weyl-Theorem (Gaal, 1973) we know that the entries of the irreducible representations of a compact group G form a basis for the space of square-integrable function on G . [sent-340, score-0.341]

79 4 If the representation ρ of an equivariant kernel K is irreducible then K is a GIM-kernel. [sent-344, score-0.997]

80 Proof We define the corresponding scalar kernel by k= n tr(K), µ(G ) where n is the dimensionality of the associated group representation. [sent-345, score-0.356]

81 g µ(G ) G Z By the orthogonality relations for irreducible group representations we know that n tr(K(x1 , x2 )ρ† )ρg dg = K(x1 , x2 ). [sent-347, score-0.554]

82 Any scalar kernel can be written in terms of its irreducible GIM-kernel expansion. [sent-349, score-0.43]

83 3 Proof Due to the Peter-Weyl-Theorem we can expand the scalar kernel in terms of the irreducible representation of G , where the expansion coefficients are by definition the corresponding GIMkernels. [sent-355, score-0.5]

84 Hence we have a one-to-one correspondence between the scalar basis kernel k and the GIM(l) kernels Kl formed by the irreducible group representations ρg . [sent-361, score-0.688]

85 1 Non GIM-kernels There is another very simple way to construct equivariant kernels. [sent-365, score-0.662]

86 For example assume that X = Y then K(x1 , x2 ) = |x1 x2 | is obviously an equivariant kernel. [sent-366, score-0.691]

87 6 Let K be an irreducible equivariant kernel and let its corresponding representation be reducible, then K is not a GIM-kernel. [sent-373, score-0.997]

88 To characterize the kernel response for such patterns we define a linear unitary projection on the space of U -symmetric patterns as follows 1 πU x := gx dg. [sent-384, score-0.439]

89 If x2 is U2 -symmetric then K(x1 , x2 ) = = 1 µ(G ) Z G 1 k(x1 , gx2 )ρg dg = µ(G ) 1 µ(G /U2 ) Z k(x1 , gx2 ) dg G /U2 Z Z k(x1 , gux2 )ρgu du dg G /U2 U2 1 Z µ(U2 ) U2 ρu du . [sent-391, score-0.639]

90 The irreducible representations of 2D-rotations are e inφ and the irreducible subspaces correspond to the Fourier representation of the function. [sent-426, score-0.436]

91 Using this kernel as X a scalar base kernel the matrix-valued feature space looks very simple. [sent-433, score-0.37]

92 The induced matrix valued bilinear product in this space is the ordinary Kn (x1 , x2 ) = ∑[Ψ(x1 )]nl [Ψ(x2 )]∗ . [sent-435, score-0.343]

93 The scalar basis kernel k for the GIM-kernel expansion of f is chosen by k(xz , xz ) = (1 + xz |xz X )2 . [sent-456, score-0.659]

94 As already mentioned the irreducible kernels of the 2D-rotation group act on the Fourier representation of functions, so we denote p(φ|x) by the vector p(x), where its components are the Fourier coefficients of p(φ|x) in respect to φ. [sent-527, score-0.436]

95 Then the equivariant least-square minimizer p(x 0 ) is proportional to the Fourier transformed histogram of the relative angles φ i + ϕi . [sent-534, score-0.662]

96 2 The scalar basis kernel is chosen to be the Gaussian kernel k(x 1 , x2 ) = e−λ||x1 −x2 || . [sent-541, score-0.37]

97 The results are satisfying, the voting function p(x) obviously behaves in an equivariant manner and is robust against small distortions. [sent-556, score-0.755]

98 2 to compute an invariant kernel and modify our matrix kernel by 405 R EISERT AND B URKHARDT a) b) c) Figure 6: A test image of a leaf with background and small clutter. [sent-569, score-0.465]

99 Conclusion and Outlook We presented a new type of matrix valued kernel for learning equivariant functions. [sent-576, score-1.053]

100 Another simple task for the equivariant filter would be to gaps in road networks. [sent-581, score-0.662]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('equivariant', 0.662), ('valued', 0.222), ('dg', 0.213), ('xz', 0.189), ('irreducible', 0.176), ('unitary', 0.168), ('scalar', 0.138), ('volterra', 0.137), ('eisert', 0.126), ('urkhardt', 0.126), ('gx', 0.119), ('kernel', 0.116), ('quivariant', 0.116), ('kernels', 0.115), ('group', 0.102), ('atrix', 0.098), ('ernels', 0.098), ('fourier', 0.097), ('lter', 0.087), ('image', 0.085), ('equivariance', 0.084), ('tr', 0.082), ('unctions', 0.08), ('object', 0.074), ('wiener', 0.071), ('invariant', 0.068), ('src', 0.063), ('detector', 0.061), ('rotation', 0.059), ('rectangle', 0.057), ('rotations', 0.054), ('matrix', 0.053), ('burkhardt', 0.053), ('dest', 0.053), ('reisert', 0.053), ('unimodularity', 0.053), ('pose', 0.049), ('inner', 0.046), ('fft', 0.044), ('blurred', 0.044), ('iy', 0.044), ('reducible', 0.044), ('representation', 0.043), ('bz', 0.042), ('gxi', 0.042), ('haasdonk', 0.042), ('hough', 0.042), ('unimodular', 0.042), ('ai', 0.041), ('representations', 0.041), ('images', 0.041), ('invariance', 0.039), ('voting', 0.039), ('zk', 0.038), ('operator', 0.037), ('groups', 0.037), ('neighborhood', 0.036), ('iz', 0.036), ('acting', 0.036), ('earning', 0.036), ('product', 0.036), ('projection', 0.036), ('representer', 0.034), ('kn', 0.033), ('ein', 0.032), ('hilbert', 0.032), ('ordinary', 0.032), ('rkhs', 0.032), ('abelian', 0.032), ('filter', 0.032), ('gaal', 0.032), ('irreducibility', 0.032), ('pixel', 0.031), ('interest', 0.03), ('tensor', 0.03), ('obviously', 0.029), ('dh', 0.029), ('kl', 0.028), ('lemma', 0.028), ('expansion', 0.027), ('kx', 0.027), ('leaf', 0.027), ('convolution', 0.027), ('actions', 0.027), ('gf', 0.027), ('gai', 0.027), ('schoelkopf', 0.027), ('hermitian', 0.027), ('diagonal', 0.025), ('coef', 0.025), ('manner', 0.025), ('want', 0.024), ('gk', 0.024), ('kernelized', 0.024), ('freiburg', 0.024), ('dimensional', 0.024), ('integration', 0.023), ('appearance', 0.023), ('know', 0.022), ('cients', 0.022)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000001 46 jmlr-2007-Learning Equivariant Functions with Matrix Valued Kernels

Author: Marco Reisert, Hans Burkhardt

Abstract: This paper presents a new class of matrix valued kernels that are ideally suited to learn vector valued equivariant functions. Matrix valued kernels are a natural generalization of the common notion of a kernel. We set the theoretical foundations of so called equivariant matrix valued kernels. We work out several properties of equivariant kernels, we give an interpretation of their behavior and show relations to scalar kernels. The notion of (ir)reducibility of group representations is transferred into the framework of matrix valued kernels. At the end to two exemplary applications are demonstrated. We design a non-linear rotation and translation equivariant filter for 2D-images and propose an invariant object detector based on the generalized Hough transform. Keywords: kernel methods, matrix kernels, equivariance, group integration, representation theory, Hough transform, signal processing, Volterra series

2 0.062007565 80 jmlr-2007-Synergistic Face Detection and Pose Estimation with Energy-Based Models

Author: Margarita Osadchy, Yann Le Cun, Matthew L. Miller

Abstract: We describe a novel method for simultaneously detecting faces and estimating their pose in real time. The method employs a convolutional network to map images of faces to points on a lowdimensional manifold parametrized by pose, and images of non-faces to points far away from that manifold. Given an image, detecting a face and estimating its pose is viewed as minimizing an energy function with respect to the face/non-face binary variable and the continuous pose parameters. The system is trained to minimize a loss function that drives correct combinations of labels and pose to be associated with lower energy values than incorrect ones. The system is designed to handle very large range of poses without retraining. The performance of the system was tested on three standard data sets—for frontal views, rotated faces, and profiles— is comparable to previous systems that are designed to handle a single one of these data sets. We show that a system trained simuiltaneously for detection and pose estimation is more accurate on both tasks than similar systems trained for each task separately.1 Keywords: face detection, pose estimation, convolutional networks, energy based models, object recognition

3 0.058098771 76 jmlr-2007-Spherical-Homoscedastic Distributions: The Equivalency of Spherical and Normal Distributions in Classification

Author: Onur C. Hamsici, Aleix M. Martinez

Abstract: Many feature representations, as in genomics, describe directional data where all feature vectors share a common norm. In other cases, as in computer vision, a norm or variance normalization step, where all feature vectors are normalized to a common length, is generally used. These representations and pre-processing step map the original data from R p to the surface of a hypersphere S p−1 . Such representations should then be modeled using spherical distributions. However, the difficulty associated with such spherical representations has prompted researchers to model their spherical data using Gaussian distributions instead—as if the data were represented in R p rather than S p−1 . This opens the question to whether the classification results calculated with the Gaussian approximation are the same as those obtained when using the original spherical distributions. In this paper, we show that in some particular cases (which we named spherical-homoscedastic) the answer to this question is positive. In the more general case however, the answer is negative. For this reason, we further investigate the additional error added by the Gaussian modeling. We conclude that the more the data deviates from spherical-homoscedastic, the less advisable it is to employ the Gaussian approximation. We then show how our derivations can be used to define optimal classifiers for spherical-homoscedastic distributions. By using a kernel which maps the original space into one where the data adapts to the spherical-homoscedastic model, we can derive non-linear classifiers with potential applications in a large number of problems. We conclude this paper by demonstrating the uses of spherical-homoscedasticity in the classification of images of objects, gene expression sequences, and text data. Keywords: directional data, spherical distributions, normal distributions, norm normalization, linear and non-linear classifiers, computer vision

4 0.048240211 84 jmlr-2007-The Pyramid Match Kernel: Efficient Learning with Sets of Features

Author: Kristen Grauman, Trevor Darrell

Abstract: In numerous domains it is useful to represent a single example by the set of the local features or parts that comprise it. However, this representation poses a challenge to many conventional machine learning techniques, since sets may vary in cardinality and elements lack a meaningful ordering. Kernel methods can learn complex functions, but a kernel over unordered set inputs must somehow solve for correspondences—generally a computationally expensive task that becomes impractical for large set sizes. We present a new fast kernel function called the pyramid match that measures partial match similarity in time linear in the number of features. The pyramid match maps unordered feature sets to multi-resolution histograms and computes a weighted histogram intersection in order to find implicit correspondences based on the finest resolution histogram cell where a matched pair first appears. We show the pyramid match yields a Mercer kernel, and we prove bounds on its error relative to the optimal partial matching cost. We demonstrate our algorithm on both classification and regression tasks, including object recognition, 3-D human pose inference, and time of publication estimation for documents, and we show that the proposed method is accurate and significantly more efficient than current approaches. Keywords: kernel, sets of features, histogram intersection, multi-resolution histogram pyramid, approximate matching, object recognition

5 0.046992745 71 jmlr-2007-Refinable Kernels

Author: Yuesheng Xu, Haizhang Zhang

Abstract: Motivated by mathematical learning from training data, we introduce the notion of refinable kernels. Various characterizations of refinable kernels are presented. The concept of refinable kernels leads to the introduction of wavelet-like reproducing kernels. We also investigate a refinable kernel that forms a Riesz basis. In particular, we characterize refinable translation invariant kernels, and refinable kernels defined by refinable functions. This study leads to multiresolution analysis of reproducing kernel Hilbert spaces. Keywords: refinable kernels, refinable feature maps, wavelet-like reproducing kernels, dual kernels, learning with kernels, reproducing kernel Hilbert spaces, Riesz bases

6 0.045363769 45 jmlr-2007-Learnability of Gaussians with Flexible Variances

7 0.043528493 2 jmlr-2007-A Complete Characterization of a Family of Solutions to a Generalized Fisher Criterion

8 0.041415431 18 jmlr-2007-Characterizing the Function Space for Bayesian Kernel Models

9 0.040277787 90 jmlr-2007-Value Regularization and Fenchel Duality

10 0.037816159 69 jmlr-2007-Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes

11 0.036589194 38 jmlr-2007-Graph Laplacians and their Convergence on Random Neighborhood Graphs     (Special Topic on the Conference on Learning Theory 2005)

12 0.036163889 78 jmlr-2007-Statistical Consistency of Kernel Canonical Correlation Analysis

13 0.03581905 12 jmlr-2007-Attribute-Efficient and Non-adaptive Learning of Parities and DNF Expressions     (Special Topic on the Conference on Learning Theory 2005)

14 0.034063835 33 jmlr-2007-Fast Iterative Kernel Principal Component Analysis

15 0.033791423 62 jmlr-2007-On the Effectiveness of Laplacian Normalization for Graph Semi-supervised Learning

16 0.032899607 63 jmlr-2007-On the Representer Theorem and Equivalent Degrees of Freedom of SVR

17 0.031159943 89 jmlr-2007-VC Theory of Large Margin Multi-Category Classifiers     (Special Topic on Model Selection)

18 0.026820464 3 jmlr-2007-A Generalized Maximum Entropy Approach to Bregman Co-clustering and Matrix Approximation

19 0.0265733 17 jmlr-2007-Building Blocks for Variational Bayesian Learning of Latent Variable Models

20 0.026052129 7 jmlr-2007-A Stochastic Algorithm for Feature Selection in Pattern Recognition


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.157), (1, -0.046), (2, 0.083), (3, 0.093), (4, -0.167), (5, 0.002), (6, -0.0), (7, -0.019), (8, -0.064), (9, 0.051), (10, -0.08), (11, 0.093), (12, -0.037), (13, -0.097), (14, -0.049), (15, -0.058), (16, 0.026), (17, -0.12), (18, -0.01), (19, 0.022), (20, -0.003), (21, 0.091), (22, 0.189), (23, 0.084), (24, 0.019), (25, 0.087), (26, -0.041), (27, 0.007), (28, -0.212), (29, -0.101), (30, -0.199), (31, 0.203), (32, 0.065), (33, -0.039), (34, -0.002), (35, -0.084), (36, -0.036), (37, 0.094), (38, 0.278), (39, -0.043), (40, -0.231), (41, 0.081), (42, 0.003), (43, -0.283), (44, -0.026), (45, -0.05), (46, 0.109), (47, 0.211), (48, 0.223), (49, 0.227)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95698428 46 jmlr-2007-Learning Equivariant Functions with Matrix Valued Kernels

Author: Marco Reisert, Hans Burkhardt

Abstract: This paper presents a new class of matrix valued kernels that are ideally suited to learn vector valued equivariant functions. Matrix valued kernels are a natural generalization of the common notion of a kernel. We set the theoretical foundations of so called equivariant matrix valued kernels. We work out several properties of equivariant kernels, we give an interpretation of their behavior and show relations to scalar kernels. The notion of (ir)reducibility of group representations is transferred into the framework of matrix valued kernels. At the end to two exemplary applications are demonstrated. We design a non-linear rotation and translation equivariant filter for 2D-images and propose an invariant object detector based on the generalized Hough transform. Keywords: kernel methods, matrix kernels, equivariance, group integration, representation theory, Hough transform, signal processing, Volterra series

2 0.36944422 80 jmlr-2007-Synergistic Face Detection and Pose Estimation with Energy-Based Models

Author: Margarita Osadchy, Yann Le Cun, Matthew L. Miller

Abstract: We describe a novel method for simultaneously detecting faces and estimating their pose in real time. The method employs a convolutional network to map images of faces to points on a lowdimensional manifold parametrized by pose, and images of non-faces to points far away from that manifold. Given an image, detecting a face and estimating its pose is viewed as minimizing an energy function with respect to the face/non-face binary variable and the continuous pose parameters. The system is trained to minimize a loss function that drives correct combinations of labels and pose to be associated with lower energy values than incorrect ones. The system is designed to handle very large range of poses without retraining. The performance of the system was tested on three standard data sets—for frontal views, rotated faces, and profiles— is comparable to previous systems that are designed to handle a single one of these data sets. We show that a system trained simuiltaneously for detection and pose estimation is more accurate on both tasks than similar systems trained for each task separately.1 Keywords: face detection, pose estimation, convolutional networks, energy based models, object recognition

3 0.31417841 84 jmlr-2007-The Pyramid Match Kernel: Efficient Learning with Sets of Features

Author: Kristen Grauman, Trevor Darrell

Abstract: In numerous domains it is useful to represent a single example by the set of the local features or parts that comprise it. However, this representation poses a challenge to many conventional machine learning techniques, since sets may vary in cardinality and elements lack a meaningful ordering. Kernel methods can learn complex functions, but a kernel over unordered set inputs must somehow solve for correspondences—generally a computationally expensive task that becomes impractical for large set sizes. We present a new fast kernel function called the pyramid match that measures partial match similarity in time linear in the number of features. The pyramid match maps unordered feature sets to multi-resolution histograms and computes a weighted histogram intersection in order to find implicit correspondences based on the finest resolution histogram cell where a matched pair first appears. We show the pyramid match yields a Mercer kernel, and we prove bounds on its error relative to the optimal partial matching cost. We demonstrate our algorithm on both classification and regression tasks, including object recognition, 3-D human pose inference, and time of publication estimation for documents, and we show that the proposed method is accurate and significantly more efficient than current approaches. Keywords: kernel, sets of features, histogram intersection, multi-resolution histogram pyramid, approximate matching, object recognition

4 0.26287305 76 jmlr-2007-Spherical-Homoscedastic Distributions: The Equivalency of Spherical and Normal Distributions in Classification

Author: Onur C. Hamsici, Aleix M. Martinez

Abstract: Many feature representations, as in genomics, describe directional data where all feature vectors share a common norm. In other cases, as in computer vision, a norm or variance normalization step, where all feature vectors are normalized to a common length, is generally used. These representations and pre-processing step map the original data from R p to the surface of a hypersphere S p−1 . Such representations should then be modeled using spherical distributions. However, the difficulty associated with such spherical representations has prompted researchers to model their spherical data using Gaussian distributions instead—as if the data were represented in R p rather than S p−1 . This opens the question to whether the classification results calculated with the Gaussian approximation are the same as those obtained when using the original spherical distributions. In this paper, we show that in some particular cases (which we named spherical-homoscedastic) the answer to this question is positive. In the more general case however, the answer is negative. For this reason, we further investigate the additional error added by the Gaussian modeling. We conclude that the more the data deviates from spherical-homoscedastic, the less advisable it is to employ the Gaussian approximation. We then show how our derivations can be used to define optimal classifiers for spherical-homoscedastic distributions. By using a kernel which maps the original space into one where the data adapts to the spherical-homoscedastic model, we can derive non-linear classifiers with potential applications in a large number of problems. We conclude this paper by demonstrating the uses of spherical-homoscedasticity in the classification of images of objects, gene expression sequences, and text data. Keywords: directional data, spherical distributions, normal distributions, norm normalization, linear and non-linear classifiers, computer vision

5 0.21922162 78 jmlr-2007-Statistical Consistency of Kernel Canonical Correlation Analysis

Author: Kenji Fukumizu, Francis R. Bach, Arthur Gretton

Abstract: While kernel canonical correlation analysis (CCA) has been applied in many contexts, the convergence of finite sample estimates of the associated functions to their population counterparts has not yet been established. This paper gives a mathematical proof of the statistical convergence of kernel CCA, providing a theoretical justification for the method. The proof uses covariance operators defined on reproducing kernel Hilbert spaces, and analyzes the convergence of their empirical estimates of finite rank to their population counterparts, which can have infinite rank. The result also gives a sufficient condition for convergence on the regularization coefficient involved in kernel CCA: this should decrease as n−1/3 , where n is the number of data. Keywords: canonical correlation analysis, kernel, consistency, regularization, Hilbert space

6 0.21121418 3 jmlr-2007-A Generalized Maximum Entropy Approach to Bregman Co-clustering and Matrix Approximation

7 0.20964208 71 jmlr-2007-Refinable Kernels

8 0.1768999 18 jmlr-2007-Characterizing the Function Space for Bayesian Kernel Models

9 0.16890395 69 jmlr-2007-Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes

10 0.16694745 17 jmlr-2007-Building Blocks for Variational Bayesian Learning of Latent Variable Models

11 0.1653361 32 jmlr-2007-Euclidean Embedding of Co-occurrence Data

12 0.16011627 68 jmlr-2007-Preventing Over-Fitting during Model Selection via Bayesian Regularisation of the Hyper-Parameters     (Special Topic on Model Selection)

13 0.15925249 34 jmlr-2007-From External to Internal Regret     (Special Topic on the Conference on Learning Theory 2005)

14 0.15844014 89 jmlr-2007-VC Theory of Large Margin Multi-Category Classifiers     (Special Topic on Model Selection)

15 0.1568273 33 jmlr-2007-Fast Iterative Kernel Principal Component Analysis

16 0.15654238 12 jmlr-2007-Attribute-Efficient and Non-adaptive Learning of Parities and DNF Expressions     (Special Topic on the Conference on Learning Theory 2005)

17 0.1559433 45 jmlr-2007-Learnability of Gaussians with Flexible Variances

18 0.14857371 44 jmlr-2007-Large Margin Semi-supervised Learning

19 0.14502865 63 jmlr-2007-On the Representer Theorem and Equivalent Degrees of Freedom of SVR

20 0.14197209 8 jmlr-2007-A Unified Continuous Optimization Framework for Center-Based Clustering Methods


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(4, 0.037), (8, 0.025), (10, 0.025), (12, 0.044), (22, 0.01), (28, 0.06), (40, 0.042), (45, 0.013), (48, 0.074), (60, 0.033), (83, 0.381), (85, 0.054), (98, 0.085)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.71082658 46 jmlr-2007-Learning Equivariant Functions with Matrix Valued Kernels

Author: Marco Reisert, Hans Burkhardt

Abstract: This paper presents a new class of matrix valued kernels that are ideally suited to learn vector valued equivariant functions. Matrix valued kernels are a natural generalization of the common notion of a kernel. We set the theoretical foundations of so called equivariant matrix valued kernels. We work out several properties of equivariant kernels, we give an interpretation of their behavior and show relations to scalar kernels. The notion of (ir)reducibility of group representations is transferred into the framework of matrix valued kernels. At the end to two exemplary applications are demonstrated. We design a non-linear rotation and translation equivariant filter for 2D-images and propose an invariant object detector based on the generalized Hough transform. Keywords: kernel methods, matrix kernels, equivariance, group integration, representation theory, Hough transform, signal processing, Volterra series

2 0.36640352 26 jmlr-2007-Dimensionality Reduction of Multimodal Labeled Data by Local Fisher Discriminant Analysis

Author: Masashi Sugiyama

Abstract: Reducing the dimensionality of data without losing intrinsic information is an important preprocessing step in high-dimensional data analysis. Fisher discriminant analysis (FDA) is a traditional technique for supervised dimensionality reduction, but it tends to give undesired results if samples in a class are multimodal. An unsupervised dimensionality reduction method called localitypreserving projection (LPP) can work well with multimodal data due to its locality preserving property. However, since LPP does not take the label information into account, it is not necessarily useful in supervised learning scenarios. In this paper, we propose a new linear supervised dimensionality reduction method called local Fisher discriminant analysis (LFDA), which effectively combines the ideas of FDA and LPP. LFDA has an analytic form of the embedding transformation and the solution can be easily computed just by solving a generalized eigenvalue problem. We demonstrate the practical usefulness and high scalability of the LFDA method in data visualization and classification tasks through extensive simulation studies. We also show that LFDA can be extended to non-linear dimensionality reduction scenarios by applying the kernel trick. Keywords: dimensionality reduction, supervised learning, Fisher discriminant analysis, locality preserving projection, affinity matrix

3 0.34205091 62 jmlr-2007-On the Effectiveness of Laplacian Normalization for Graph Semi-supervised Learning

Author: Rie Johnson, Tong Zhang

Abstract: This paper investigates the effect of Laplacian normalization in graph-based semi-supervised learning. To this end, we consider multi-class transductive learning on graphs with Laplacian regularization. Generalization bounds are derived using geometric properties of the graph. Specifically, by introducing a definition of graph cut from learning theory, we obtain generalization bounds that depend on the Laplacian regularizer. We then use this analysis to better understand the role of graph Laplacian matrix normalization. Under assumptions that the cut is small, we derive near-optimal normalization factors by approximately minimizing the generalization bounds. The analysis reveals the limitations of the standard degree-based normalization method in that the resulting normalization factors can vary significantly within each connected component with the same class label, which may cause inferior generalization performance. Our theory also suggests a remedy that does not suffer from this problem. Experiments confirm the superiority of the normalization scheme motivated by learning theory on artificial and real-world data sets. Keywords: transductive learning, graph learning, Laplacian regularization, normalization of graph Laplacian

4 0.33910179 76 jmlr-2007-Spherical-Homoscedastic Distributions: The Equivalency of Spherical and Normal Distributions in Classification

Author: Onur C. Hamsici, Aleix M. Martinez

Abstract: Many feature representations, as in genomics, describe directional data where all feature vectors share a common norm. In other cases, as in computer vision, a norm or variance normalization step, where all feature vectors are normalized to a common length, is generally used. These representations and pre-processing step map the original data from R p to the surface of a hypersphere S p−1 . Such representations should then be modeled using spherical distributions. However, the difficulty associated with such spherical representations has prompted researchers to model their spherical data using Gaussian distributions instead—as if the data were represented in R p rather than S p−1 . This opens the question to whether the classification results calculated with the Gaussian approximation are the same as those obtained when using the original spherical distributions. In this paper, we show that in some particular cases (which we named spherical-homoscedastic) the answer to this question is positive. In the more general case however, the answer is negative. For this reason, we further investigate the additional error added by the Gaussian modeling. We conclude that the more the data deviates from spherical-homoscedastic, the less advisable it is to employ the Gaussian approximation. We then show how our derivations can be used to define optimal classifiers for spherical-homoscedastic distributions. By using a kernel which maps the original space into one where the data adapts to the spherical-homoscedastic model, we can derive non-linear classifiers with potential applications in a large number of problems. We conclude this paper by demonstrating the uses of spherical-homoscedasticity in the classification of images of objects, gene expression sequences, and text data. Keywords: directional data, spherical distributions, normal distributions, norm normalization, linear and non-linear classifiers, computer vision

5 0.33784956 5 jmlr-2007-A Nonparametric Statistical Approach to Clustering via Mode Identification

Author: Jia Li, Surajit Ray, Bruce G. Lindsay

Abstract: A new clustering approach based on mode identification is developed by applying new optimization techniques to a nonparametric density estimator. A cluster is formed by those sample points that ascend to the same local maximum (mode) of the density function. The path from a point to its associated mode is efficiently solved by an EM-style algorithm, namely, the Modal EM (MEM). This method is then extended for hierarchical clustering by recursively locating modes of kernel density estimators with increasing bandwidths. Without model fitting, the mode-based clustering yields a density description for every cluster, a major advantage of mixture-model-based clustering. Moreover, it ensures that every cluster corresponds to a bump of the density. The issue of diagnosing clustering results is also investigated. Specifically, a pairwise separability measure for clusters is defined using the ridgeline between the density bumps of two clusters. The ridgeline is solved for by the Ridgeline EM (REM) algorithm, an extension of MEM. Based upon this new measure, a cluster merging procedure is created to enforce strong separation. Experiments on simulated and real data demonstrate that the mode-based clustering approach tends to combine the strengths of linkage and mixture-model-based clustering. In addition, the approach is robust in high dimensions and when clusters deviate substantially from Gaussian distributions. Both of these cases pose difficulty for parametric mixture modeling. A C package on the new algorithms is developed for public access at http://www.stat.psu.edu/∼jiali/hmac. Keywords: modal clustering, mode-based clustering, mixture modeling, modal EM, ridgeline EM, nonparametric density

6 0.3320989 63 jmlr-2007-On the Representer Theorem and Equivalent Degrees of Freedom of SVR

7 0.33007586 32 jmlr-2007-Euclidean Embedding of Co-occurrence Data

8 0.32839772 33 jmlr-2007-Fast Iterative Kernel Principal Component Analysis

9 0.32604814 81 jmlr-2007-The Locally Weighted Bag of Words Framework for Document Representation

10 0.32532388 49 jmlr-2007-Learning to Classify Ordinal Data: The Data Replication Method

11 0.3244099 37 jmlr-2007-GiniSupport Vector Machine: Quadratic Entropy Based Robust Multi-Class Probability Regression

12 0.32140589 53 jmlr-2007-Maximum Entropy Density Estimation with Generalized Regularization and an Application to Species Distribution Modeling

13 0.32075799 36 jmlr-2007-Generalization Error Bounds in Semi-supervised Classification Under the Cluster Assumption

14 0.32050633 7 jmlr-2007-A Stochastic Algorithm for Feature Selection in Pattern Recognition

15 0.32010496 12 jmlr-2007-Attribute-Efficient and Non-adaptive Learning of Parities and DNF Expressions     (Special Topic on the Conference on Learning Theory 2005)

16 0.31979221 69 jmlr-2007-Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes

17 0.31863704 17 jmlr-2007-Building Blocks for Variational Bayesian Learning of Latent Variable Models

18 0.31829607 66 jmlr-2007-Penalized Model-Based Clustering with Application to Variable Selection

19 0.31820488 80 jmlr-2007-Synergistic Face Detection and Pose Estimation with Energy-Based Models

20 0.31787339 60 jmlr-2007-Nonlinear Estimators and Tail Bounds for Dimension Reduction inl1Using Cauchy Random Projections