nips nips2004 nips2004-18 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Amnon Shashua, Tamir Hazan
Abstract: This paper presents a general family of algebraic positive definite similarity functions over spaces of matrices with varying column rank. The columns can represent local regions in an image (whereby images have varying number of local parts), images of an image sequence, motion trajectories in a multibody motion, and so forth. The family of set kernels we derive is based on a group invariant tensor product lifting with parameters that can be naturally tuned to provide a cook-book of sorts covering the possible ”wish lists” from similarity measures over sets of varying cardinality. We highlight the strengths of our approach by demonstrating the set kernels for visual recognition of pedestrians using local parts representations. 1
Reference: text
sentIndex sentText sentNum sentScore
1 Algebraic Set Kernels with Application to Inference Over Local Image Representations Amnon Shashua and Tamir Hazan ∗ Abstract This paper presents a general family of algebraic positive definite similarity functions over spaces of matrices with varying column rank. [sent-1, score-0.505]
2 The columns can represent local regions in an image (whereby images have varying number of local parts), images of an image sequence, motion trajectories in a multibody motion, and so forth. [sent-2, score-0.623]
3 The family of set kernels we derive is based on a group invariant tensor product lifting with parameters that can be naturally tuned to provide a cook-book of sorts covering the possible ”wish lists” from similarity measures over sets of varying cardinality. [sent-3, score-0.738]
4 We highlight the strengths of our approach by demonstrating the set kernels for visual recognition of pedestrians using local parts representations. [sent-4, score-0.344]
5 This dichotomy is probably most emphasized in the area of computer vision, where image understanding from observations involve data instances of images or image sequences containing huge amounts of data. [sent-9, score-0.163]
6 The ”holistic” representations suffer also from sensitivity to occlusions, invariance to local and global transformations, non-rigidity of local parts of the object, and so forth. [sent-11, score-0.523]
7 Practitioners in the area of data representations have long noticed that a collection of local representations (part-based representations) can be most effective to ameliorate changes of appearance [5, 10, 6]. [sent-12, score-0.3]
8 The local data representations vary in their sophistication, but share the same principle where an image corresponds to a collection of points each in a relatively small dimensional space — instead of a single point in high-dimensional space induced by holistic representations. [sent-13, score-0.4]
9 In general, the number of points (local parts) per image may vary and the dimension of each point may vary as well. [sent-14, score-0.128]
10 The key for unifying local and holistic representations for inference engines is to design positive definite similarity functions (a. [sent-16, score-0.458]
11 A set kernel would be useful also to other types of inference engines such as kernel versions of PCA, LDA, CCA, ridge regression and any algorithm which can be mapped onto inner-products between pairs of data instances (see [8] for details on kernel methods). [sent-21, score-0.369]
12 Formally, we consider an instance being represented by a collection of vectors, which for the sake of convenience, form the columns of a matrix. [sent-22, score-0.167]
13 We would like to find an algebraic family of similarity functions sim(A, B) over matrices A, B which satisfy the following requirements: (i) sim(A, B) is an inner product, i. [sent-23, score-0.367]
14 The design of a kernel over sets of vectors has been recently attracting much attention in the computer vision and machine learning literature. [sent-27, score-0.158]
15 A possible approach is to fit a distribution to the set of vectors and define the kernel as a distribution matching measure [9, 12, 4]. [sent-28, score-0.132]
16 This has the advantage that the number of local parts can vary but at the expense of fitting a distribution to the variation over parts. [sent-29, score-0.278]
17 The variation could be quite complex at times, unlikely to fit into a known family of distributions in many situations of interest, and in practice the sample size (number of columns of A) is not sufficiently large to reliably fit a distribution. [sent-30, score-0.215]
18 The alternative, which is the approach taken in this paper, is to create a kernel over sets of vectors in a direct manner. [sent-31, score-0.132]
19 It is important to note that although we chose SVM over local representations as the application to demonstrate the use of set kernels, the need for adequately working with instances made out of sets of various cardinalities spans many other application domains. [sent-33, score-0.215]
20 2 The General Family of Inner-Products over Matrices We wish to derive the general family of positive definite similarity measures sim(A, B) over matrices A, B which have the same number of rows but possibly different column rank (in particular, different number of columns). [sent-36, score-0.433]
21 Let ai , bj be the column vectors of matrices A, B and let k(ai , bj ) be the local kernel function. [sent-39, score-0.8]
22 For example, in the context where the column vectors represent local parts of an image, then the matching function k(·, ·) between pairs of local parts provides the building blocks of the overall similarity function. [sent-40, score-0.746]
23 The local kernel is some positive definite function k(x, y) = φ(x) φ(y) which is the inner-product between the ”feature”-mapped vectors x, y for some feature map φ(·). [sent-41, score-0.321]
24 The local kernels can be combined in a linear or non-linear manner. [sent-43, score-0.253]
25 When the combination is linear the similarity becomes the analogue of the inner-product between vectors extended to matrices. [sent-44, score-0.209]
26 We will refer to the linear family as sim(A, B) =< A, B > and that will be the focus of this section. [sent-45, score-0.106]
27 In the next section we will derive the general (algebraic) nonlinear family which is based on ”lifting” the input matrices A, B onto higher dimensional spaces and feeding the result onto the < ·, · > machinery developed in this section, i. [sent-46, score-0.337]
28 We will start by embedding A, B onto m × m matrices by zero padding as follows. [sent-49, score-0.178]
29 The the embedding is represented by linear combinations of tensor products: n k A→ n aij ei ⊗ ej , q B→ blt el ⊗ et . [sent-55, score-0.568]
30 Let S be a positive p semi definite m2 × m2 matrix represented by S = r=1 Gr ⊗ Fr where Gr , Fr are m × m 1 ˆ ˆ matrices . [sent-57, score-0.293]
31 The matrices Gr , Fr 1 Any S can be represented as a sum over tensor products: given column-wise ordering, the matrix G ⊗ F is composed of n × n blocks of the form fij G. [sent-67, score-0.514]
32 Therefore, take Gr to be the n × n blocks of S and Fr to be the elemental matrices which have ”1” in coordinate r = (i, j) and zero everywhere else. [sent-68, score-0.175]
33 p must be selected such that r=1 Gr ⊗ Fr is positive semi definite. [sent-69, score-0.099]
34 The problem of deciding on the the necessary conditions on Fr and Gr such that the sum over tensor products is p. [sent-70, score-0.221]
35 The sufficient conditions are easy — choosing Gr , Fr to be positive p semi definite would make r=1 Gr ⊗ Fr positive semi definite as well. [sent-74, score-0.198]
36 In this context (of separable S) we need one more constraint in order to work with non-linear local kerˆ ˜ ˜ nels k(x, y) = φ(x) φ(y): the matrices Gr = Mr Mr must ”distribute with the kernel”, namely there exist Mr such that ˜ ˜ ˆ k(Mr x, Mr y) = φ(Mr x) φ(Mr y) = φ(x) Mr Mr φ(y) = φ(x) Gr φ(y). [sent-75, score-0.258]
37 ˆ The role of the matrices Gr is to perform global coordinate changes of Rn before application of the kernel k() on the columns of A, B. [sent-77, score-0.35]
38 These global transformations include projections (say onto prototypical ”parts”) that may be given or ”learned” from a training set. [sent-78, score-0.078]
39 ˆ The matrices Fr determine the range of interaction between columns of A and columns of ˆ ˆ ˆ B. [sent-79, score-0.44]
40 The various choices of F determine the type of invariances one could obtain from the similarity measure. [sent-85, score-0.198]
41 For example, when F = I the similarity is simply the sum (average) of the local kernels k(ai , bi ) thereby assuming we have a strict alignment between the local parts represented by A and the local parts represented by B. [sent-86, score-0.876]
42 On the other end of the invariance spectrum, when F = 11 (all entries are ”1”) the similarity measure averages over all interactions of local parts k(ai , bj ) thereby achieving an invariance to the order of the parts. [sent-87, score-0.695]
43 A decaying weighted interaction such as fij = σ −|i−j| would provide a middle ground between the assumption of strict alignment and the assumption of complete lack of alignment. [sent-88, score-0.122]
44 3 Lifting Matrices onto Higher Dimensions The family of sim(A, B) =< A, B > forms a weighted linear superposition of the local kernel k(ai , bj ). [sent-91, score-0.505]
45 Non-linear combinations of local kernels emerge using mappings ψ(A) from the input matrices onto other higher-dimensional matrices, thus forming sim(A, B) =< ψ(A), ψ(B) >. [sent-92, score-0.434]
46 Additional invariance properties and parameters controlling the perfromance of sim(A, B) emerge with the introduction of non-linear combinations of local kernels, and those will be discussed later on in this section. [sent-93, score-0.22]
47 Consider the general d-fold lifting ψ(A) = A⊗d which can be viewed as a nd × k d matrix. [sent-94, score-0.112]
48 The key for computational simplification lay in the fact that choices of Fr determine not only local interactions (as in the linear case) but also group invariances. [sent-111, score-0.251]
49 The group invariances are a result of applying symmetric operators on the tensor product space — we will consider two of those operators here, known as the the d-fold alternating tensor A∧d = A ∧ . [sent-112, score-0.611]
50 These lifting operations introduce the determinant and permanent operations on ˆ submatrices of A Gr B, as described below. [sent-120, score-0.208]
51 The alternating tensor is a multilinear map of Rn , (A ∧ . [sent-121, score-0.266]
52 < id ≤ n form a basis of the alternating d − f old tensor product of Rn , denoted d n n×k n as Λ R . [sent-149, score-0.357]
53 If A ∈ R is a linear map on R sending points to Rk , then A∧d is a d n linear map on Λ R sending x1 ∧ . [sent-150, score-0.198]
54 , jd ]) where the determinant is of the d × d block constructed by choosing the rows i1 , . [sent-171, score-0.175]
55 In other words, Cd (A) has n rows and k columns d d (instead of nd × k d necessary for A⊗d ) whose entries are equal to the d × d minors of A. [sent-178, score-0.231]
56 Taken together, the ”d-fold alternating kernel” Λd (A, B) is defined by: ˆ ˆ trace Cd (A Gr B)Fr , (2) Λd (A, B) =< A∧d , B ∧d >=< Cd (A), Cd (B) >= r ˆ where Fr is the × upper-left submatrix of the p. [sent-181, score-0.141]
57 Note d d ˆ ˆ that the local kernel plugs in as the entries of (A Gr B)ij = k(Mr ai , Mr bj ) where Gr = Mr Mr . [sent-184, score-0.494]
58 q d k d Another symmetric operator on the tensor product space is via the d-fold symmetric tensor space Symd Rn whose points are: 1 xσ(1) ⊗ . [sent-185, score-0.495]
59 , jd ]) and which stands for the map A·d (A · · · A)(x1 · · · xd ) = Ax1 · · · Axd . [sent-203, score-0.171]
60 In other words, Rd (A) has n+d−1 rows and k+d−1 columns whose entries are equal to d d the d×d permanents of A. [sent-204, score-0.231]
61 The ensuing kernel similarity function, referred to as the ”d-fold symmetric kernel” is: ˆ ˆ Symd (A, B) =< A·d , B ·d >=< Rd (A), Rd (B) >= trace Rd (A Gr B)Fr (3) r ˆ where Fr is the q+d−1 × k+d−1 upper-left submatrix of the positive definite m+d−1 × d d d n+d−1 matrix Fr . [sent-206, score-0.372]
62 1 Practical Considerations To recap, the family of similarity functions sim(A, B) comprise of the linear version < A, B > (eqn. [sent-209, score-0.196]
63 2,3) which are group projections of the general kernel < A⊗d , B ⊗d >. [sent-211, score-0.116]
64 These different similarity functions are controlled by the choice of three items: Gr , Fr and the parameter d representing the degree of the tensor product operator. [sent-212, score-0.326]
65 The role of Gr is fairly interesting as it can be viewed as a projection operator from ”parts” to prototypical parts that can be learned from a training set but we leave this to the full length article that will appear later. [sent-214, score-0.144]
66 Practically, to compute Λd (A, B) one needs to run over all d × d blocks of the k × q matrix A B (whose entries are k(ai , bj )) and for each block compute the determinant. [sent-215, score-0.312]
67 The similarity function is a weighted sum of all those determinants weighted by fij . [sent-216, score-0.213]
68 These determinants have an interesting geometric interpretation if those are computed over unitary matrices — as described next. [sent-218, score-0.215]
69 , QA has orthonormal columns which span the column space of A, then it has been recently shown −1 [14] that RA can be computed from A using only operations over k(ai , aj ). [sent-221, score-0.237]
70 Therefore, −T −1 the product QA QB , which is equal to RA A BRB , can be computed using only local −1 kernel applications. [sent-222, score-0.259]
71 In other words, for each A compute RA (can be done using only inner-products over columns of A), then when it comes to compute A B compute in−T −1 stead RA A BRB which is equivalent to computing QA QB . [sent-223, score-0.134]
72 Now, Λd (QA , QB ) for unitary matrices is the sum over the product of the cosine principal angles between d-dim subspaces spanned by columns of A and B. [sent-225, score-0.591]
73 The value of each determinant of the d × d blocks of QA QB is equal to the product of the cosine principal angles between the respective d-dim subspaces determined by corresponding selection of d columns from A and d columns from B. [sent-226, score-0.613]
74 For example, the case k = q = d produces Λd (QA , QB ) = det(QA QB ) which is the product of the eigenvalues of the matrix QA QB . [sent-227, score-0.074]
75 Those eigenvalues are the cosine of the principal angles between the column space of A and the column space of B [2]. [sent-228, score-0.248]
76 Therefore, det(QA QB ) measures the ”angle” between the two subspaces spanned by the respective columns of the input matrices — in particular is invariant to the order of the columns. [sent-229, score-0.406]
77 For smaller values of d we obtain the sum over such products between subspaces spanned by subsets of d columns between A and B. [sent-230, score-0.304]
78 The advantage of smaller values of d is two fold: first it enables to compute the similarity when k = q and second breaks down the similarity between subspaces into smaller pieces. [sent-231, score-0.299]
79 The entries of the matrix F determine which subspaces are being considered and the interaction between subspaces in A and B. [sent-232, score-0.385]
80 A diagonal F compares corresponding subspaces (a) (b) Figure 1: (a) The configuration of the nine sub-regions is displayed over the gradient image. [sent-233, score-0.119]
81 between A and B whereas off-diagonal entries would enable comparisons between different choices of subspaces in A and in B. [sent-235, score-0.223]
82 For example, we may want to consider choices of d columns arranged in a ”sliding” fashion, i. [sent-236, score-0.168]
83 This selection is associated with a sparse diagonal F where the non-vanishing entries along the diagonal have the value of ”1” and correspond to the sliding window selections. [sent-247, score-0.125]
84 To conclude, in the linear version < A, B > the role of F is to determine the range of interaction between columns of A and columns of B, whereas with the non-linear version it is the interaction between d-dim subspaces rather than individual columns. [sent-248, score-0.5]
85 We could select all possible interactions (exponential number) or any reduced interaction set such as the sliding window rule (linear number of choices) as described above. [sent-249, score-0.133]
86 4 Experiments We examined the performance of sim(A, B) on part-based representations for pedestrian detection using SVM for the inference engine. [sent-250, score-0.118]
87 We compared our results to the conventional down-sampled holistic representation where the raw images were down-sampled to size 20 × 20 and 32 × 32. [sent-255, score-0.158]
88 Our tests also included simulation of occlusions (in the test images) in order to examine the sensitivity of our sim(A, B) family to occlusions. [sent-256, score-0.136]
89 For the local part representation, the input image was divided into 9 fixed regions where for each image local orientation statistics were were generated following [5, 7] with a total of 22 numbers per region (see Fig 1a), thereby making a 22 × 9 matrix representation to be fed into sim(A, B). [sent-257, score-0.421]
90 The table below summarizes the accuracy results for the raw-pixel (holistic) representation over three trials: (i) images down-sampled to 20 × 20, (ii) images down-sampled to 32 × 32, and (iii) test images were partially occluded (32 × 32 version). [sent-259, score-0.133]
91 5% The table below displays sim(A, B) with linear and RBF local kernels. [sent-263, score-0.155]
92 2% < A, B >, F = 11 85% 85% < A, B >, fij = 2−|i−j| Λ2 (A, B) 90. [sent-266, score-0.078]
93 4% 88% 90% One can see that the local part representation provides a sharp increase in accuracy compared to the raw pixel holistic representation. [sent-268, score-0.253]
94 The added power of invariance to order of parts induced by < A, B >, F = 11 is not required since the parts are aligned and therefore the accuracy is the highest for the linear combination of local RBF < A, B >, F = I. [sent-269, score-0.449]
95 The same applies for the non-linear version Λd (A, B) — the additional invariances that come with a non-linear combination of local parts are apparently not required. [sent-270, score-0.32]
96 The power of non-linearity associated with the combination of local parts comes to bear when the test images have occluded parts, i. [sent-271, score-0.309]
97 Although the experiments above are still preliminary they show the power and potential of the sim(A, B) family of kernels defined over local kernels. [sent-274, score-0.309]
98 With the principles laid down in Section 3 one can construct a large number (we touched only a few) of algebraic kernels which combine the local kernels in non-linear ways thus creating invariances to order and increased performance against occlusion. [sent-275, score-0.468]
99 Further research is required for sifting through the various possibilities with this new family of kernels and extracting their properties, their invariances and behavior under changing parameters (Fr , Gr , d). [sent-276, score-0.253]
100 The kullback-leibler kernel as a framework for discriminant and localized representations for visual recognition. [sent-346, score-0.173]
wordName wordTfidf (topN-words)
[('fr', 0.435), ('gr', 0.428), ('sim', 0.342), ('tensor', 0.195), ('mr', 0.176), ('qa', 0.144), ('columns', 0.134), ('bj', 0.131), ('local', 0.13), ('matrices', 0.128), ('qb', 0.125), ('cd', 0.124), ('subspaces', 0.119), ('parts', 0.116), ('lifting', 0.112), ('kernels', 0.098), ('blt', 0.094), ('similarity', 0.09), ('holistic', 0.089), ('kernel', 0.088), ('representations', 0.085), ('jd', 0.081), ('family', 0.081), ('fij', 0.078), ('ai', 0.075), ('id', 0.075), ('invariances', 0.074), ('column', 0.073), ('entries', 0.07), ('ra', 0.069), ('algebraic', 0.068), ('xd', 0.065), ('semi', 0.065), ('rbf', 0.064), ('image', 0.064), ('invariance', 0.062), ('axd', 0.056), ('sliding', 0.055), ('occlusions', 0.055), ('aij', 0.055), ('trace', 0.053), ('det', 0.052), ('onto', 0.05), ('analogue', 0.05), ('ej', 0.05), ('sending', 0.049), ('sd', 0.048), ('blocks', 0.047), ('ei', 0.047), ('rd', 0.046), ('alternating', 0.046), ('determinants', 0.045), ('interaction', 0.044), ('vectors', 0.044), ('cardinality', 0.043), ('rn', 0.042), ('submatrix', 0.042), ('unitary', 0.042), ('product', 0.041), ('el', 0.041), ('angles', 0.04), ('brb', 0.037), ('hr', 0.037), ('symd', 0.037), ('determinant', 0.036), ('jt', 0.036), ('images', 0.035), ('choices', 0.034), ('raw', 0.034), ('positive', 0.034), ('interactions', 0.034), ('matrix', 0.033), ('represented', 0.033), ('pedestrian', 0.033), ('poly', 0.033), ('nite', 0.032), ('covering', 0.032), ('vary', 0.032), ('symmetric', 0.032), ('principal', 0.031), ('block', 0.031), ('cosine', 0.031), ('varying', 0.031), ('operations', 0.03), ('engines', 0.03), ('sorts', 0.03), ('machinery', 0.028), ('occluded', 0.028), ('prototypical', 0.028), ('shashua', 0.028), ('combinations', 0.028), ('th', 0.028), ('group', 0.028), ('rows', 0.027), ('vision', 0.026), ('products', 0.026), ('spanned', 0.025), ('versions', 0.025), ('linear', 0.025), ('map', 0.025)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999988 18 nips-2004-Algebraic Set Kernels with Application to Inference Over Local Image Representations
Author: Amnon Shashua, Tamir Hazan
Abstract: This paper presents a general family of algebraic positive definite similarity functions over spaces of matrices with varying column rank. The columns can represent local regions in an image (whereby images have varying number of local parts), images of an image sequence, motion trajectories in a multibody motion, and so forth. The family of set kernels we derive is based on a group invariant tensor product lifting with parameters that can be naturally tuned to provide a cook-book of sorts covering the possible ”wish lists” from similarity measures over sets of varying cardinality. We highlight the strengths of our approach by demonstrating the set kernels for visual recognition of pedestrians using local parts representations. 1
2 0.120115 100 nips-2004-Learning Preferences for Multiclass Problems
Author: Fabio Aiolli, Alessandro Sperduti
Abstract: Many interesting multiclass problems can be cast in the general framework of label ranking defined on a given set of classes. The evaluation for such a ranking is generally given in terms of the number of violated order constraints between classes. In this paper, we propose the Preference Learning Model as a unifying framework to model and solve a large class of multiclass problems in a large margin perspective. In addition, an original kernel-based method is proposed and evaluated on a ranking dataset with state-of-the-art results. 1
3 0.11316778 168 nips-2004-Semigroup Kernels on Finite Sets
Author: Marco Cuturi, Jean-philippe Vert
Abstract: Complex objects can often be conveniently represented by finite sets of simpler components, such as images by sets of patches or texts by bags of words. We study the class of positive definite (p.d.) kernels for two such objects that can be expressed as a function of the merger of their respective sets of components. We prove a general integral representation of such kernels and present two particular examples. One of them leads to a kernel for sets of points living in a space endowed itself with a positive definite kernel. We provide experimental results on a benchmark experiment of handwritten digits image classification which illustrate the validity of the approach. 1
4 0.10667001 133 nips-2004-Nonparametric Transforms of Graph Kernels for Semi-Supervised Learning
Author: Xiaojin Zhu, Jaz Kandola, Zoubin Ghahramani, John D. Lafferty
Abstract: We present an algorithm based on convex optimization for constructing kernels for semi-supervised learning. The kernel matrices are derived from the spectral decomposition of graph Laplacians, and combine labeled and unlabeled data in a systematic fashion. Unlike previous work using diffusion kernels and Gaussian random field kernels, a nonparametric kernel approach is presented that incorporates order constraints during optimization. This results in flexible kernels and avoids the need to choose among different parametric forms. Our approach relies on a quadratically constrained quadratic program (QCQP), and is computationally feasible for large datasets. We evaluate the kernels on real datasets using support vector machines, with encouraging results. 1
5 0.096023709 30 nips-2004-Binet-Cauchy Kernels
Author: Alex J. Smola, S.v.n. Vishwanathan
Abstract: We propose a family of kernels based on the Binet-Cauchy theorem and its extension to Fredholm operators. This includes as special cases all currently known kernels derived from the behavioral framework, diffusion processes, marginalized kernels, kernels on graphs, and the kernels on sets arising from the subspace angle approach. Many of these kernels can be seen as the extrema of a new continuum of kernel functions, which leads to numerous new special cases. As an application, we apply the new class of kernels to the problem of clustering of video sequences with encouraging results. 1
6 0.095518909 185 nips-2004-The Convergence of Contrastive Divergences
7 0.082961559 134 nips-2004-Object Classification from a Single Example Utilizing Class Relevance Metrics
8 0.075400755 71 nips-2004-Generalization Error Bounds for Collaborative Prediction with Low-Rank Matrices
9 0.07292974 68 nips-2004-Face Detection --- Efficient and Rank Deficient
10 0.071236953 59 nips-2004-Efficient Kernel Discriminant Analysis via QR Decomposition
11 0.07006532 125 nips-2004-Multiple Relational Embedding
12 0.069413714 42 nips-2004-Computing regularization paths for learning multiple kernels
13 0.068424702 113 nips-2004-Maximum-Margin Matrix Factorization
14 0.067414947 161 nips-2004-Self-Tuning Spectral Clustering
15 0.066127233 79 nips-2004-Hierarchical Eigensolver for Transition Matrices in Spectral Methods
16 0.065241113 13 nips-2004-A Three Tiered Approach for Articulated Object Action Modeling and Recognition
17 0.063953131 89 nips-2004-Joint MRI Bias Removal Using Entropy Minimization Across Images
18 0.063664995 197 nips-2004-Two-Dimensional Linear Discriminant Analysis
19 0.061687898 44 nips-2004-Conditional Random Fields for Object Recognition
20 0.058447011 54 nips-2004-Distributed Information Regularization on Graphs
topicId topicWeight
[(0, -0.189), (1, 0.07), (2, -0.053), (3, -0.025), (4, -0.033), (5, -0.037), (6, -0.073), (7, -0.142), (8, 0.078), (9, -0.059), (10, -0.099), (11, -0.076), (12, 0.013), (13, 0.111), (14, 0.072), (15, 0.043), (16, 0.022), (17, 0.054), (18, 0.045), (19, -0.017), (20, 0.074), (21, 0.005), (22, -0.004), (23, -0.073), (24, -0.073), (25, 0.024), (26, 0.042), (27, -0.012), (28, -0.048), (29, -0.078), (30, -0.006), (31, 0.03), (32, -0.054), (33, -0.024), (34, -0.098), (35, 0.09), (36, 0.024), (37, 0.067), (38, -0.114), (39, -0.089), (40, 0.193), (41, 0.078), (42, 0.005), (43, -0.16), (44, -0.017), (45, -0.053), (46, -0.049), (47, 0.079), (48, -0.125), (49, -0.062)]
simIndex simValue paperId paperTitle
same-paper 1 0.94467914 18 nips-2004-Algebraic Set Kernels with Application to Inference Over Local Image Representations
Author: Amnon Shashua, Tamir Hazan
Abstract: This paper presents a general family of algebraic positive definite similarity functions over spaces of matrices with varying column rank. The columns can represent local regions in an image (whereby images have varying number of local parts), images of an image sequence, motion trajectories in a multibody motion, and so forth. The family of set kernels we derive is based on a group invariant tensor product lifting with parameters that can be naturally tuned to provide a cook-book of sorts covering the possible ”wish lists” from similarity measures over sets of varying cardinality. We highlight the strengths of our approach by demonstrating the set kernels for visual recognition of pedestrians using local parts representations. 1
2 0.60066628 30 nips-2004-Binet-Cauchy Kernels
Author: Alex J. Smola, S.v.n. Vishwanathan
Abstract: We propose a family of kernels based on the Binet-Cauchy theorem and its extension to Fredholm operators. This includes as special cases all currently known kernels derived from the behavioral framework, diffusion processes, marginalized kernels, kernels on graphs, and the kernels on sets arising from the subspace angle approach. Many of these kernels can be seen as the extrema of a new continuum of kernel functions, which leads to numerous new special cases. As an application, we apply the new class of kernels to the problem of clustering of video sequences with encouraging results. 1
3 0.59976327 94 nips-2004-Kernels for Multi--task Learning
Author: Charles A. Micchelli, Massimiliano Pontil
Abstract: This paper provides a foundation for multi–task learning using reproducing kernel Hilbert spaces of vector–valued functions. In this setting, the kernel is a matrix–valued function. Some explicit examples will be described which go beyond our earlier results in [7]. In particular, we characterize classes of matrix– valued kernels which are linear and are of the dot product or the translation invariant type. We discuss how these kernels can be used to model relations between the tasks and present linear multi–task learning algorithms. Finally, we present a novel proof of the representer theorem for a minimizer of a regularization functional which is based on the notion of minimal norm interpolation. 1
4 0.52330154 100 nips-2004-Learning Preferences for Multiclass Problems
Author: Fabio Aiolli, Alessandro Sperduti
Abstract: Many interesting multiclass problems can be cast in the general framework of label ranking defined on a given set of classes. The evaluation for such a ranking is generally given in terms of the number of violated order constraints between classes. In this paper, we propose the Preference Learning Model as a unifying framework to model and solve a large class of multiclass problems in a large margin perspective. In addition, an original kernel-based method is proposed and evaluated on a ranking dataset with state-of-the-art results. 1
5 0.51308662 168 nips-2004-Semigroup Kernels on Finite Sets
Author: Marco Cuturi, Jean-philippe Vert
Abstract: Complex objects can often be conveniently represented by finite sets of simpler components, such as images by sets of patches or texts by bags of words. We study the class of positive definite (p.d.) kernels for two such objects that can be expressed as a function of the merger of their respective sets of components. We prove a general integral representation of such kernels and present two particular examples. One of them leads to a kernel for sets of points living in a space endowed itself with a positive definite kernel. We provide experimental results on a benchmark experiment of handwritten digits image classification which illustrate the validity of the approach. 1
6 0.49527755 133 nips-2004-Nonparametric Transforms of Graph Kernels for Semi-Supervised Learning
7 0.48051241 185 nips-2004-The Convergence of Contrastive Divergences
8 0.41732898 14 nips-2004-A Topographic Support Vector Machine: Classification Using Local Label Configurations
9 0.40422559 68 nips-2004-Face Detection --- Efficient and Rank Deficient
10 0.39677212 79 nips-2004-Hierarchical Eigensolver for Transition Matrices in Spectral Methods
11 0.39403433 89 nips-2004-Joint MRI Bias Removal Using Entropy Minimization Across Images
12 0.38401854 190 nips-2004-The Rescorla-Wagner Algorithm and Maximum Likelihood Estimation of Causal Parameters
13 0.38349807 59 nips-2004-Efficient Kernel Discriminant Analysis via QR Decomposition
14 0.38030058 81 nips-2004-Implicit Wiener Series for Higher-Order Image Analysis
15 0.37384653 97 nips-2004-Learning Efficient Auditory Codes Using Spikes Predicts Cochlear Filters
16 0.37098941 134 nips-2004-Object Classification from a Single Example Utilizing Class Relevance Metrics
17 0.32584989 207 nips-2004-ℓ₀-norm Minimization for Basis Selection
18 0.32429171 93 nips-2004-Kernel Projection Machine: a New Tool for Pattern Recognition
19 0.32245719 113 nips-2004-Maximum-Margin Matrix Factorization
20 0.31250423 201 nips-2004-Using the Equivalent Kernel to Understand Gaussian Process Regression
topicId topicWeight
[(13, 0.071), (15, 0.161), (26, 0.065), (31, 0.013), (33, 0.144), (35, 0.038), (39, 0.059), (50, 0.032), (69, 0.285), (76, 0.025)]
simIndex simValue paperId paperTitle
1 0.93609262 128 nips-2004-Neural Network Computation by In Vitro Transcriptional Circuits
Author: Jongmin Kim, John Hopfield, Erik Winfree
Abstract: The structural similarity of neural networks and genetic regulatory networks to digital circuits, and hence to each other, was noted from the very beginning of their study [1, 2]. In this work, we propose a simple biochemical system whose architecture mimics that of genetic regulation and whose components allow for in vitro implementation of arbitrary circuits. We use only two enzymes in addition to DNA and RNA molecules: RNA polymerase (RNAP) and ribonuclease (RNase). We develop a rate equation for in vitro transcriptional networks, and derive a correspondence with general neural network rate equations [3]. As proof-of-principle demonstrations, an associative memory task and a feedforward network computation are shown by simulation. A difference between the neural network and biochemical models is also highlighted: global coupling of rate equations through enzyme saturation can lead to global feedback regulation, thus allowing a simple network without explicit mutual inhibition to perform the winner-take-all computation. Thus, the full complexity of the cell is not necessary for biochemical computation: a wide range of functional behaviors can be achieved with a small set of biochemical components. 1
same-paper 2 0.83376986 18 nips-2004-Algebraic Set Kernels with Application to Inference Over Local Image Representations
Author: Amnon Shashua, Tamir Hazan
Abstract: This paper presents a general family of algebraic positive definite similarity functions over spaces of matrices with varying column rank. The columns can represent local regions in an image (whereby images have varying number of local parts), images of an image sequence, motion trajectories in a multibody motion, and so forth. The family of set kernels we derive is based on a group invariant tensor product lifting with parameters that can be naturally tuned to provide a cook-book of sorts covering the possible ”wish lists” from similarity measures over sets of varying cardinality. We highlight the strengths of our approach by demonstrating the set kernels for visual recognition of pedestrians using local parts representations. 1
3 0.81032795 74 nips-2004-Harmonising Chorales by Probabilistic Inference
Author: Moray Allan, Christopher Williams
Abstract: We describe how we used a data set of chorale harmonisations composed by Johann Sebastian Bach to train Hidden Markov Models. Using a probabilistic framework allows us to create a harmonisation system which learns from examples, and which can compose new harmonisations. We make a quantitative comparison of our system’s harmonisation performance against simpler models, and provide example harmonisations. 1
4 0.63459009 60 nips-2004-Efficient Kernel Machines Using the Improved Fast Gauss Transform
Author: Changjiang Yang, Ramani Duraiswami, Larry S. Davis
Abstract: The computation and memory required for kernel machines with N training samples is at least O(N 2 ). Such a complexity is significant even for moderate size problems and is prohibitive for large datasets. We present an approximation technique based on the improved fast Gauss transform to reduce the computation to O(N ). We also give an error bound for the approximation, and provide experimental results on the UCI datasets. 1
5 0.63159043 178 nips-2004-Support Vector Classification with Input Data Uncertainty
Author: Jinbo Bi, Tong Zhang
Abstract: This paper investigates a new learning model in which the input data is corrupted with noise. We present a general statistical framework to tackle this problem. Based on the statistical reasoning, we propose a novel formulation of support vector classification, which allows uncertainty in input data. We derive an intuitive geometric interpretation of the proposed formulation, and develop algorithms to efficiently solve it. Empirical results are included to show that the newly formed method is superior to the standard SVM for problems with noisy input. 1
6 0.63151008 201 nips-2004-Using the Equivalent Kernel to Understand Gaussian Process Regression
7 0.63116282 68 nips-2004-Face Detection --- Efficient and Rank Deficient
8 0.62911785 110 nips-2004-Matrix Exponential Gradient Updates for On-line Learning and Bregman Projection
9 0.62615395 79 nips-2004-Hierarchical Eigensolver for Transition Matrices in Spectral Methods
10 0.62577182 16 nips-2004-Adaptive Discriminative Generative Model and Its Applications
11 0.6254707 9 nips-2004-A Method for Inferring Label Sampling Mechanisms in Semi-Supervised Learning
12 0.62530518 131 nips-2004-Non-Local Manifold Tangent Learning
13 0.62410092 168 nips-2004-Semigroup Kernels on Finite Sets
14 0.62401527 148 nips-2004-Probabilistic Computation in Spiking Populations
15 0.62342024 133 nips-2004-Nonparametric Transforms of Graph Kernels for Semi-Supervised Learning
16 0.62335968 4 nips-2004-A Generalized Bradley-Terry Model: From Group Competition to Individual Skill
17 0.62322116 187 nips-2004-The Entire Regularization Path for the Support Vector Machine
18 0.62301415 142 nips-2004-Outlier Detection with One-class Kernel Fisher Discriminants
19 0.62256414 189 nips-2004-The Power of Selective Memory: Self-Bounded Learning of Prediction Suffix Trees
20 0.62158442 172 nips-2004-Sparse Coding of Natural Images Using an Overcomplete Set of Limited Capacity Units