nips nips2009 nips2009-77 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Liefeng Bo, Cristian Sminchisescu
Abstract: In visual recognition, the images are frequently modeled as unordered collections of local features (bags). We show that bag-of-words representations commonly used in conjunction with linear classifiers can be viewed as special match kernels, which count 1 if two local features fall into the same regions partitioned by visual words and 0 otherwise. Despite its simplicity, this quantization is too coarse, motivating research into the design of match kernels that more accurately measure the similarity between local features. However, it is impractical to use such kernels for large datasets due to their significant computational cost. To address this problem, we propose efficient match kernels (EMK) that map local features to a low dimensional feature space and average the resulting vectors to form a setlevel feature. The local feature maps are learned so their inner products preserve, to the best possible, the values of the specified kernel function. Classifiers based on EMK are linear both in the number of images and in the number of local features. We demonstrate that EMK are extremely efficient and achieve the current state of the art in three difficult computer vision datasets: Scene-15, Caltech-101 and Caltech-256. 1
Reference: text
sentIndex sentText sentNum sentScore
1 org Abstract In visual recognition, the images are frequently modeled as unordered collections of local features (bags). [sent-5, score-0.306]
2 We show that bag-of-words representations commonly used in conjunction with linear classifiers can be viewed as special match kernels, which count 1 if two local features fall into the same regions partitioned by visual words and 0 otherwise. [sent-6, score-0.462]
3 Despite its simplicity, this quantization is too coarse, motivating research into the design of match kernels that more accurately measure the similarity between local features. [sent-7, score-0.463]
4 To address this problem, we propose efficient match kernels (EMK) that map local features to a low dimensional feature space and average the resulting vectors to form a setlevel feature. [sent-9, score-0.682]
5 The local feature maps are learned so their inner products preserve, to the best possible, the values of the specified kernel function. [sent-10, score-0.57]
6 1 Introduction Models based on local features have achieved state-of-the art results in many visual object recognition tasks. [sent-13, score-0.332]
7 For example, an image can be described by a set of local features extracted from patches around salient interest points or regular grids, or a shape can be described by a set of local features defined at edge points. [sent-14, score-0.449]
8 This raises the question on how should one measure the similarity between two images represented as sets of local features. [sent-15, score-0.196]
9 BOW represents each local feature with the closest visual word and counts the occurrence frequencies in the image. [sent-18, score-0.25]
10 The resulting histogram is used as an image descriptor for object recognition, often in conjunction with linear classifiers. [sent-19, score-0.225]
11 Various methods for creating vocabularies exist [10], the most common being k-means clustering of all (or a subsample of) the local features to obtain visual words. [sent-21, score-0.275]
12 An even better approach to recognition is to define kernels over sets of local features. [sent-22, score-0.296]
13 The sum match kernel of Haussler [7] is obtained by adding local kernels over all combinations of local features from two different sets. [sent-24, score-0.843]
14 In [17], the authors modify the sum kernel by introducing an integer exponent on local kernels. [sent-25, score-0.361]
15 Neighborhood kernels [20] integrate the spatial location of local features into a sum match kernel. [sent-26, score-0.534]
16 Pyramid match kernels [5, 14, 13] map local features to multi-resolution histograms and compute a weighted histogram intersection. [sent-27, score-0.553]
17 Algebraic set kernels [26] exploit tensor products to aggregate local kernels, whereas principal angle kernels 1 [29] measure similarities based on angles between linear subspaces spanned by local features in the two sets. [sent-28, score-0.711]
18 All of the above methods need to explicitly evaluate the full kernel matrix, hence they require space and time complexity that is quadratic in the number of images. [sent-30, score-0.225]
19 In this paper we present efficient match kernels (EMK) that combine the strengths of both bag of words and set kernels. [sent-32, score-0.355]
20 We map local features to a low dimensional feature space and construct set-level features by averaging the resulting feature vectors. [sent-33, score-0.537]
21 Hence EMK can be used in conjunction with linear classifiers and do not require the explicit computation of a full kernel matrix—this leads to both space and time complexity that is linear in the number of images. [sent-35, score-0.289]
22 We adopt a bag of features method, which represents an image as a set of local features. [sent-39, score-0.298]
23 , xp } be a set of local features in an image and V = {v1 , . [sent-43, score-0.255]
24 In BOW, each local feature is quantized into a D dimensional binary indicator vector µ(x) = [µ1 (x), . [sent-47, score-0.244]
25 1 The feature vectors for one image form a normalized histogram µ(X) = |X| x∈X µ(x), where | · | is the cardinality of a set. [sent-52, score-0.215]
26 BOW features can be used in conjunction with either a linear or a kernel classifier, albeit the latter often leads to expensive training and testing (see §4). [sent-53, score-0.495]
27 When a linear classifier is used, the resulting kernel function is: KB (X, Y) = µ(X) µ(Y) = with δ(x, y) = 1 |X||Y| µ(x) µ(y) = x∈X y∈Y 1 |X||Y| 1, x, y ⊂ R(vi ), ∃i ∈ {1, . [sent-54, score-0.225]
28 , D} 0, otherwise δ(x, y) (1) x∈X y∈Y (2) δ(x, y) is obviously a positive definite kernel, measuring the similarity between two local features x and y: δ(x, y) = 1 if x and y belong the same region R(vi ), and 0 otherwise. [sent-57, score-0.222]
29 However, this type of quantization can be too coarse when measuring the similarity of two local features (see also fig. [sent-58, score-0.26]
30 Better would be to replace δ(x, y) with a continuous kernel function that more accurately measures the similarity between x and y: 1 KS (X, Y) = k(x, y) (3) |X||Y| x∈X y∈Y In fact, this is related to the normalized sum match kernel [7, 17]. [sent-60, score-0.632]
31 A negative impact of kernelization is the high computational cost required to compute the summation match function, which takes O(|X||Y|) for a single kernel value rather than O(1), the cost of evaluating a single kernel function defined on vectors. [sent-63, score-0.655]
32 When used in conjunction with kernel machines, it takes O(n2 ) and O(n2 m2 d) to store and compute the entire kernel matrix, respectively, where n is the number of images in the training set, and m is the average cardinality of all sets. [sent-64, score-0.644]
33 In addition to expensive training, the match kernel function has also a fairly high testing cost: n 2 for a test input, evaluating the discriminant f (X) = i=1 αi Ks (Xi , X) takes O(nm d). [sent-66, score-0.402]
34 For sparse kernel machines, such as SVMs, the cost can decrease to some extent, as some of the αi are zero. [sent-68, score-0.264]
35 The kernel function k(x, y) = φ(x) φ(y) is called finite dimensional if the feature map φ(·) is finite dimensional. [sent-71, score-0.395]
36 PMK is the pyramid match kernel of [6], with T in PMK giving the value of the maximal feature range. [sent-76, score-0.635]
37 D in EMK is the dimensionality of feature maps and does not change with the training set size. [sent-78, score-0.318]
38 EMK uses linear classifiers and does not require the evaluation of the kernel matrix. [sent-81, score-0.225]
39 The other four methods are used in conjunction with kernel classifiers, hence they all need to evaluate the entire kernel matrix. [sent-82, score-0.514]
40 With the finite dimensional kernel, the match kernel can be simplified as: KS (X, Y) = φ(X) φ(Y) (4) 1 where φ(X) = |X| x∈X φ(x) is the feature map on the set of vectors. [sent-85, score-0.522]
41 The training and testing costs are O(nmDd) and O(mDd) respectively, where D is the dimensionality of the feature map φ(x). [sent-89, score-0.289]
42 If the feature map φ(x) is low dimensional, the computational cost of EMK can be much lower than the one required to evaluate the match kernel by computing the kernel functions 1 k(x, y). [sent-90, score-0.739]
43 Notice that we only need the feature vectors φ(X) in EMK, hence it is not necessary to compute the entire kernel matrix. [sent-92, score-0.343]
44 While there can be many choices for the local feature maps φ(x)—and the positive definiteness of k(x, y) = φ(x) φ(x) can be always guaranteed—, most do not necessarily lead to a meaningful similarity measure. [sent-99, score-0.339]
45 In the paper, we give two principled methods to create meaningful local feature maps φ(x), by arranging for their inner products to approximate a given kernel function. [sent-100, score-0.57]
46 3 Efficient Match Kernels In this section we present two kernel approximations, based on low-dimensional projections (§3. [sent-101, score-0.225]
47 Given {ψ(zi )}D , a i=1 set of basis vectors zi , we can approximate the feature vector ψ(x): vx = argmin ψ(x) − Hvx vx 3 2 (5) Approximation Exact 1. [sent-106, score-0.271]
48 Left: approximated Gaussian kernel with 20 learned feature maps. [sent-119, score-0.313]
49 Right: approximated Gaussian kernel based on 200 random Fourier features. [sent-121, score-0.225]
50 The feature maps are learned from 200 samples, uniformly drawn from [-10,10]. [sent-122, score-0.202]
51 For G G = K−1 (notice that K−1 is positive definite), the local feature maps are: ZZ ZZ φ(x) = GkZ (x) (8) 1 The resulting full feature map is: φ(X) = |X| G x∈X kZ (x) , with computational complexity 2 O(mDd + D ) for a set of local features. [sent-128, score-0.543]
52 A related method is the kernel codebook [28], where a set-level feature is also extracted based on a local kernel, but with different feature map φ(·). [sent-129, score-0.594]
53 An essential difference is that inner products of our set-level features φ(X) formally approximate the sum-match kernel, whereas the ones induced by the kernel codebook do not. [sent-130, score-0.422]
54 Therefore EMK only requires a linear classifier wherea a kernel codebook would require a non-linear classifier for comparable performance. [sent-131, score-0.274]
55 Our experiments, shown in table 3, further suggest that EMK outperforms the kernel codebook, even in the non-linear case. [sent-133, score-0.225]
56 One way is kernel principal component analysis (KPCA) [24] on a randomly selected pool of F local features, with the basis set to the topmost D eigenvectors. [sent-135, score-0.357]
57 This faces two difficulties, however: (i) KPCA scales cubically in the number of selected local features, F ; (ii) O(F md) work is required to extract the set-level feature vector for one image, because F the eigenvectors are linear combinations of the selected local feature vectors, i=1 αi ψ(xi ). [sent-136, score-0.394]
58 This motivates our constrained singular value decomposition in kernel feature space (CKSVD): argmin R(V, Z) = V,Z 1 F F ψ(xi ) − Hvi 2 (9) i=1 where F is the number of the randomly selected local features, Z = [z1 , . [sent-142, score-0.448]
59 2 Random Fourier Set Features Another tractable approach to large-scale learning is to approximate the kernel using random feature maps [22, 23]. [sent-167, score-0.427]
60 For a given function µ(x; θ) and the probability distribution p(θ), one can define the local kernel as: kf (x, y) = p(θ)µ(x; θ)µ(y, θ)dθ. [sent-168, score-0.359]
61 We consider feature maps of the form µ(x; θ) = cos(ω x + b) with θ = (ω, b), which project local features to a randomly chosen line, then pass the resulting scalar through a sinusoid. [sent-169, score-0.396]
62 For example, to approximate the Gaussian kernel kf (x, y) = exp(−γ x − y 2 ), the random feature maps are: φ(x) = 2 D [cos(ω1 x + b1 ),. [sent-170, score-0.452]
63 Although any shift invariant kernel can be represented using random Fourier features, currently these are limited to Gaussian kernels or to kernels with analytical inverse Fourier transforms. [sent-177, score-0.547]
64 The constraint of a shiftinvariant kernel excludes a number of practically interesting similarities. [sent-179, score-0.225]
65 For example, the χ2 kernel [8] and the histogram intersection kernel [5] are designed to compare histograms, hence they can be used as local kernels, if the features are histograms. [sent-180, score-0.68]
66 1) can produce superior results when the dimensionality of the feature maps is small. [sent-184, score-0.247]
67 BOW-Linear and BOW-Gaussian use a linear classifier and a Gaussian kernel classifier on BOW features, respectively. [sent-189, score-0.225]
68 For the former, we learn low dimensional feature maps (§3. [sent-191, score-0.249]
69 The local features are SIFT descriptors [16] extracted from 16×16 image patches. [sent-195, score-0.29]
70 For EMK, our local kernel is a Gaussian 5 exp(−γ x − y 2 ). [sent-197, score-0.334]
71 We run k-means clustering to identify the visual words and stochastic gradient descent to learn the local feature maps, using a 100,000 random set of SIFT descriptors. [sent-199, score-0.32]
72 The regularization and the kernel parameters (if available) in SVM are tuned by ten-fold cross validation on the training set. [sent-202, score-0.296]
73 The dimensionality of the feature maps and the vocabulary size are both set to 1000 for fair comparisons, unless otherwise specified. [sent-203, score-0.297]
74 We vary the dimensionality of the feature maps (EMK) and the vocabulary size (BOW) from 250 to 2000 with step length 250. [sent-212, score-0.297]
75 For this dataset, we only consider the flat BOW and EMK (only pyramid level 0) in all experiments. [sent-213, score-0.195]
76 Our second experiment is similar with the first one, but the dimensionality of the feature maps and the vocabulary size vary from 50 to 200 with step length 25. [sent-216, score-0.297]
77 In our third experiment, we fix the dimensionality of the feature maps to 1000, and vary the training set size from 300 to 2400 with step length 300. [sent-217, score-0.318]
78 We observe that EMK-CKSVD significantly outperforms EMKFourier for low-dimensional feature maps, indicating that learned features preserve the values of the Gaussian kernel better than the random Fourier maps in this regime, see also fig. [sent-222, score-0.512]
79 The sum match kernel takes about 10 hours for training and 10 hours for testing, respectively whereas EMK-Fourier and EMK-CKSVD need less than 1 hour, most spent computing SIFT descriptors. [sent-231, score-0.527]
80 In addition, we use 10,000 randomly selected SIFT descriptors to learn KPCA-based local feature maps, which takes about 12 hours for the training and testing sets on the full Scene-15 dataset, respectively. [sent-232, score-0.377]
81 We consider three pyramid levels: L = 0, L = 1, amd L = 2 (for the latter two, spatial information is used). [sent-239, score-0.22]
82 EMK-Fourier and EMK-CKSVD perform substantially better than BOW-Linear and BOWGaussian for all pyramid levels. [sent-242, score-0.195]
83 The performance gap increases as more pyramid levels are added. [sent-243, score-0.222]
84 The training set size is 1500 in the left plot; the dimensionality of feature maps and the vocabulary size are both set to 1000 in the right plot (for fair comparisons). [sent-267, score-0.368]
85 8 Table 2: Classification accuracy comparisons for three pyramid levels. [sent-316, score-0.277]
86 The dimensionality of the feature maps and the vocabulary size are both set to 1000. [sent-318, score-0.297]
87 To compare the four algorithms computationally, we select images from each category proportionally to the total number of images of that category, as the training set. [sent-328, score-0.244]
88 To accelerate BOW-Gaussian, we precompute the entire kernel matrix. [sent-337, score-0.225]
89 7 Algorithms 15 training 30 training 45 training 60 training BOW-Linear 17. [sent-374, score-0.284]
90 The dimensionality of the feature maps and the vocabulary size are both set to 1000 (for fair comparisons). [sent-408, score-0.297]
91 We illustrate the quantization limitations of such models and propose more sophisticated kernel approximations that preserve the computational efficiency of bag-of-words while being just as (or more) accurate than the existing, computationally demanding, non-linear kernels. [sent-425, score-0.288]
92 The models we propose are built around Efficient Match Kernels (EMK), which map local features to a low dimensional feature space, average the resulting feature vectors to form a set-level feature, then apply a linear classifier. [sent-426, score-0.482]
93 The pyramid match kernel: discriminative classification with sets of image features. [sent-458, score-0.383]
94 The pyramid match kernel: Efficient learning with sets of features. [sent-463, score-0.322]
95 Local features and kernels for classification of texture and object categories: A comprehensive study. [sent-473, score-0.283]
96 Iterative kernel principal component analysis for image o modeling. [sent-490, score-0.309]
97 Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. [sent-506, score-0.227]
98 A kullback-leibler divergence based kernel for svm classification in multimedia applications. [sent-526, score-0.248]
99 Nonlinear component analysis as a kernel eigenvalue o u problem. [sent-562, score-0.225]
100 Algebraic set kernels with application to inference over local image representations. [sent-574, score-0.331]
wordName wordTfidf (topN-words)
[('emk', 0.692), ('bow', 0.318), ('kernel', 0.225), ('pyramid', 0.195), ('kernels', 0.161), ('kz', 0.156), ('match', 0.127), ('maps', 0.114), ('fourier', 0.112), ('local', 0.109), ('cksvd', 0.104), ('pmk', 0.104), ('feature', 0.088), ('mdd', 0.086), ('features', 0.085), ('classi', 0.073), ('training', 0.071), ('zz', 0.069), ('nmdd', 0.069), ('conjunction', 0.064), ('sift', 0.064), ('image', 0.061), ('images', 0.059), ('category', 0.055), ('vi', 0.055), ('visual', 0.053), ('ks', 0.052), ('vx', 0.052), ('bowgaussian', 0.052), ('kzz', 0.052), ('testing', 0.05), ('vocabulary', 0.05), ('codebook', 0.049), ('bhattacharyya', 0.049), ('sgd', 0.049), ('kpca', 0.049), ('dimensional', 0.047), ('dimensionality', 0.045), ('accuracy', 0.044), ('bag', 0.043), ('er', 0.043), ('cvpr', 0.039), ('cost', 0.039), ('quantization', 0.038), ('comparisons', 0.038), ('zd', 0.037), ('object', 0.037), ('histogram', 0.036), ('map', 0.035), ('descriptors', 0.035), ('dimentionality', 0.035), ('emkfourier', 0.035), ('hvi', 0.035), ('hvx', 0.035), ('trainning', 0.035), ('ers', 0.034), ('products', 0.034), ('scene', 0.032), ('iccv', 0.032), ('liefeng', 0.03), ('liblinear', 0.03), ('vectors', 0.03), ('whereas', 0.029), ('notice', 0.029), ('similarity', 0.028), ('codebooks', 0.028), ('hmax', 0.028), ('rahimi', 0.028), ('bo', 0.028), ('vocabularies', 0.028), ('levels', 0.027), ('descriptor', 0.027), ('sum', 0.027), ('recognition', 0.026), ('datasets', 0.026), ('argmin', 0.026), ('grauman', 0.026), ('lazebnik', 0.026), ('approximations', 0.025), ('gaussian', 0.025), ('spatial', 0.025), ('kf', 0.025), ('caltech', 0.025), ('hours', 0.024), ('gradient', 0.024), ('words', 0.024), ('cos', 0.024), ('grids', 0.023), ('schmid', 0.023), ('nips', 0.023), ('principal', 0.023), ('svm', 0.023), ('zi', 0.023), ('svms', 0.023), ('jmlr', 0.023), ('closure', 0.022), ('mercer', 0.022), ('descent', 0.022), ('art', 0.022), ('thousands', 0.022)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000006 77 nips-2009-Efficient Match Kernel between Sets of Features for Visual Recognition
Author: Liefeng Bo, Cristian Sminchisescu
Abstract: In visual recognition, the images are frequently modeled as unordered collections of local features (bags). We show that bag-of-words representations commonly used in conjunction with linear classifiers can be viewed as special match kernels, which count 1 if two local features fall into the same regions partitioned by visual words and 0 otherwise. Despite its simplicity, this quantization is too coarse, motivating research into the design of match kernels that more accurately measure the similarity between local features. However, it is impractical to use such kernels for large datasets due to their significant computational cost. To address this problem, we propose efficient match kernels (EMK) that map local features to a low dimensional feature space and average the resulting vectors to form a setlevel feature. The local feature maps are learned so their inner products preserve, to the best possible, the values of the specified kernel function. Classifiers based on EMK are linear both in the number of images and in the number of local features. We demonstrate that EMK are extremely efficient and achieve the current state of the art in three difficult computer vision datasets: Scene-15, Caltech-101 and Caltech-256. 1
2 0.15758072 128 nips-2009-Learning Non-Linear Combinations of Kernels
Author: Corinna Cortes, Mehryar Mohri, Afshin Rostamizadeh
Abstract: This paper studies the general problem of learning kernels based on a polynomial combination of base kernels. We analyze this problem in the case of regression and the kernel ridge regression algorithm. We examine the corresponding learning kernel optimization problem, show how that minimax problem can be reduced to a simpler minimization problem, and prove that the global solution of this problem always lies on the boundary. We give a projection-based gradient descent algorithm for solving the optimization problem, shown empirically to converge in few iterations. Finally, we report the results of extensive experiments with this algorithm using several publicly available datasets demonstrating the effectiveness of our technique.
3 0.14086351 119 nips-2009-Kernel Methods for Deep Learning
Author: Youngmin Cho, Lawrence K. Saul
Abstract: We introduce a new family of positive-definite kernel functions that mimic the computation in large, multilayer neural nets. These kernel functions can be used in shallow architectures, such as support vector machines (SVMs), or in deep kernel-based architectures that we call multilayer kernel machines (MKMs). We evaluate SVMs and MKMs with these kernel functions on problems designed to illustrate the advantages of deep architectures. On several problems, we obtain better results than previous, leading benchmarks from both SVMs with Gaussian kernels as well as deep belief nets. 1
4 0.11194707 118 nips-2009-Kernel Choice and Classifiability for RKHS Embeddings of Probability Distributions
Author: Kenji Fukumizu, Arthur Gretton, Gert R. Lanckriet, Bernhard Schölkopf, Bharath K. Sriperumbudur
Abstract: Embeddings of probability measures into reproducing kernel Hilbert spaces have been proposed as a straightforward and practical means of representing and comparing probabilities. In particular, the distance between embeddings (the maximum mean discrepancy, or MMD) has several key advantages over many classical metrics on distributions, namely easy computability, fast convergence and low bias of finite sample estimates. An important requirement of the embedding RKHS is that it be characteristic: in this case, the MMD between two distributions is zero if and only if the distributions coincide. Three new results on the MMD are introduced in the present study. First, it is established that MMD corresponds to the optimal risk of a kernel classifier, thus forming a natural link between the distance between distributions and their ease of classification. An important consequence is that a kernel must be characteristic to guarantee classifiability between distributions in the RKHS. Second, the class of characteristic kernels is broadened to incorporate all strictly positive definite kernels: these include non-translation invariant kernels and kernels on non-compact domains. Third, a generalization of the MMD is proposed for families of kernels, as the supremum over MMDs on a class of kernels (for instance the Gaussian kernels with different bandwidths). This extension is necessary to obtain a single distance measure if a large selection or class of characteristic kernels is potentially appropriate. This generalization is reasonable, given that it corresponds to the problem of learning the kernel by minimizing the risk of the corresponding kernel classifier. The generalized MMD is shown to have consistent finite sample estimates, and its performance is demonstrated on a homogeneity testing example. 1
5 0.10575151 95 nips-2009-Fast subtree kernels on graphs
Author: Nino Shervashidze, Karsten M. Borgwardt
Abstract: In this article, we propose fast subtree kernels on graphs. On graphs with n nodes and m edges and maximum degree d, these kernels comparing subtrees of height h can be computed in O(mh), whereas the classic subtree kernel by Ramon & G¨ rtner scales as O(n2 4d h). Key to this efficiency is the observation that the a Weisfeiler-Lehman test of isomorphism from graph theory elegantly computes a subtree kernel as a byproduct. Our fast subtree kernels can deal with labeled graphs, scale up easily to large graphs and outperform state-of-the-art graph kernels on several classification benchmark datasets in terms of accuracy and runtime. 1
6 0.10417251 120 nips-2009-Kernels and learning curves for Gaussian process regression on random graphs
7 0.097839773 179 nips-2009-On the Algorithmics and Applications of a Mixed-norm based Kernel Learning Formulation
8 0.081582837 211 nips-2009-Segmenting Scenes by Matching Image Composites
9 0.078270711 58 nips-2009-Constructing Topological Maps using Markov Random Fields and Loop-Closure Detection
10 0.077796616 80 nips-2009-Efficient and Accurate Lp-Norm Multiple Kernel Learning
11 0.076547489 104 nips-2009-Group Sparse Coding
12 0.073777556 8 nips-2009-A Fast, Consistent Kernel Two-Sample Test
13 0.073334888 133 nips-2009-Learning models of object structure
14 0.072708324 137 nips-2009-Learning transport operators for image manifolds
15 0.071053624 43 nips-2009-Bayesian estimation of orientation preference maps
16 0.070884325 142 nips-2009-Locality-sensitive binary codes from shift-invariant kernels
17 0.069247432 28 nips-2009-An Additive Latent Feature Model for Transparent Object Recognition
18 0.068205841 260 nips-2009-Zero-shot Learning with Semantic Output Codes
19 0.067272618 236 nips-2009-Structured output regression for detection with partial truncation
20 0.065831766 139 nips-2009-Linear-time Algorithms for Pairwise Statistical Problems
topicId topicWeight
[(0, -0.203), (1, -0.012), (2, -0.129), (3, 0.079), (4, -0.079), (5, 0.053), (6, -0.017), (7, 0.18), (8, -0.041), (9, 0.005), (10, 0.124), (11, -0.113), (12, -0.003), (13, 0.087), (14, -0.093), (15, 0.027), (16, -0.063), (17, 0.105), (18, 0.031), (19, -0.054), (20, 0.03), (21, 0.022), (22, -0.028), (23, 0.046), (24, -0.018), (25, 0.037), (26, -0.013), (27, -0.02), (28, -0.066), (29, 0.037), (30, -0.042), (31, -0.014), (32, -0.057), (33, -0.024), (34, 0.046), (35, 0.034), (36, 0.001), (37, 0.083), (38, -0.043), (39, 0.05), (40, -0.029), (41, 0.034), (42, -0.042), (43, -0.072), (44, 0.029), (45, 0.067), (46, 0.054), (47, -0.021), (48, 0.038), (49, -0.047)]
simIndex simValue paperId paperTitle
same-paper 1 0.95265669 77 nips-2009-Efficient Match Kernel between Sets of Features for Visual Recognition
Author: Liefeng Bo, Cristian Sminchisescu
Abstract: In visual recognition, the images are frequently modeled as unordered collections of local features (bags). We show that bag-of-words representations commonly used in conjunction with linear classifiers can be viewed as special match kernels, which count 1 if two local features fall into the same regions partitioned by visual words and 0 otherwise. Despite its simplicity, this quantization is too coarse, motivating research into the design of match kernels that more accurately measure the similarity between local features. However, it is impractical to use such kernels for large datasets due to their significant computational cost. To address this problem, we propose efficient match kernels (EMK) that map local features to a low dimensional feature space and average the resulting vectors to form a setlevel feature. The local feature maps are learned so their inner products preserve, to the best possible, the values of the specified kernel function. Classifiers based on EMK are linear both in the number of images and in the number of local features. We demonstrate that EMK are extremely efficient and achieve the current state of the art in three difficult computer vision datasets: Scene-15, Caltech-101 and Caltech-256. 1
2 0.79210585 128 nips-2009-Learning Non-Linear Combinations of Kernels
Author: Corinna Cortes, Mehryar Mohri, Afshin Rostamizadeh
Abstract: This paper studies the general problem of learning kernels based on a polynomial combination of base kernels. We analyze this problem in the case of regression and the kernel ridge regression algorithm. We examine the corresponding learning kernel optimization problem, show how that minimax problem can be reduced to a simpler minimization problem, and prove that the global solution of this problem always lies on the boundary. We give a projection-based gradient descent algorithm for solving the optimization problem, shown empirically to converge in few iterations. Finally, we report the results of extensive experiments with this algorithm using several publicly available datasets demonstrating the effectiveness of our technique.
3 0.77744663 119 nips-2009-Kernel Methods for Deep Learning
Author: Youngmin Cho, Lawrence K. Saul
Abstract: We introduce a new family of positive-definite kernel functions that mimic the computation in large, multilayer neural nets. These kernel functions can be used in shallow architectures, such as support vector machines (SVMs), or in deep kernel-based architectures that we call multilayer kernel machines (MKMs). We evaluate SVMs and MKMs with these kernel functions on problems designed to illustrate the advantages of deep architectures. On several problems, we obtain better results than previous, leading benchmarks from both SVMs with Gaussian kernels as well as deep belief nets. 1
4 0.71429473 118 nips-2009-Kernel Choice and Classifiability for RKHS Embeddings of Probability Distributions
Author: Kenji Fukumizu, Arthur Gretton, Gert R. Lanckriet, Bernhard Schölkopf, Bharath K. Sriperumbudur
Abstract: Embeddings of probability measures into reproducing kernel Hilbert spaces have been proposed as a straightforward and practical means of representing and comparing probabilities. In particular, the distance between embeddings (the maximum mean discrepancy, or MMD) has several key advantages over many classical metrics on distributions, namely easy computability, fast convergence and low bias of finite sample estimates. An important requirement of the embedding RKHS is that it be characteristic: in this case, the MMD between two distributions is zero if and only if the distributions coincide. Three new results on the MMD are introduced in the present study. First, it is established that MMD corresponds to the optimal risk of a kernel classifier, thus forming a natural link between the distance between distributions and their ease of classification. An important consequence is that a kernel must be characteristic to guarantee classifiability between distributions in the RKHS. Second, the class of characteristic kernels is broadened to incorporate all strictly positive definite kernels: these include non-translation invariant kernels and kernels on non-compact domains. Third, a generalization of the MMD is proposed for families of kernels, as the supremum over MMDs on a class of kernels (for instance the Gaussian kernels with different bandwidths). This extension is necessary to obtain a single distance measure if a large selection or class of characteristic kernels is potentially appropriate. This generalization is reasonable, given that it corresponds to the problem of learning the kernel by minimizing the risk of the corresponding kernel classifier. The generalized MMD is shown to have consistent finite sample estimates, and its performance is demonstrated on a homogeneity testing example. 1
5 0.6385029 95 nips-2009-Fast subtree kernels on graphs
Author: Nino Shervashidze, Karsten M. Borgwardt
Abstract: In this article, we propose fast subtree kernels on graphs. On graphs with n nodes and m edges and maximum degree d, these kernels comparing subtrees of height h can be computed in O(mh), whereas the classic subtree kernel by Ramon & G¨ rtner scales as O(n2 4d h). Key to this efficiency is the observation that the a Weisfeiler-Lehman test of isomorphism from graph theory elegantly computes a subtree kernel as a byproduct. Our fast subtree kernels can deal with labeled graphs, scale up easily to large graphs and outperform state-of-the-art graph kernels on several classification benchmark datasets in terms of accuracy and runtime. 1
6 0.63471282 8 nips-2009-A Fast, Consistent Kernel Two-Sample Test
7 0.6307168 92 nips-2009-Fast Graph Laplacian Regularized Kernel Learning via Semidefinite–Quadratic–Linear Programming
8 0.62325293 120 nips-2009-Kernels and learning curves for Gaussian process regression on random graphs
9 0.53600633 80 nips-2009-Efficient and Accurate Lp-Norm Multiple Kernel Learning
10 0.52164227 93 nips-2009-Fast Image Deconvolution using Hyper-Laplacian Priors
11 0.51716805 179 nips-2009-On the Algorithmics and Applications of a Mixed-norm based Kernel Learning Formulation
12 0.47625086 33 nips-2009-Analysis of SVM with Indefinite Kernels
13 0.47481331 236 nips-2009-Structured output regression for detection with partial truncation
14 0.46203354 32 nips-2009-An Online Algorithm for Large Scale Image Similarity Learning
15 0.4556846 58 nips-2009-Constructing Topological Maps using Markov Random Fields and Loop-Closure Detection
16 0.43514913 227 nips-2009-Speaker Comparison with Inner Product Discriminant Functions
17 0.43082869 5 nips-2009-A Bayesian Model for Simultaneous Image Clustering, Annotation and Object Segmentation
18 0.42554399 142 nips-2009-Locality-sensitive binary codes from shift-invariant kernels
19 0.42489508 72 nips-2009-Distribution Matching for Transduction
20 0.41461727 211 nips-2009-Segmenting Scenes by Matching Image Composites
topicId topicWeight
[(21, 0.011), (24, 0.034), (25, 0.064), (35, 0.055), (36, 0.141), (39, 0.064), (48, 0.256), (55, 0.011), (58, 0.087), (71, 0.04), (81, 0.013), (86, 0.116), (91, 0.014)]
simIndex simValue paperId paperTitle
1 0.87545794 198 nips-2009-Rank-Approximate Nearest Neighbor Search: Retaining Meaning and Speed in High Dimensions
Author: Parikshit Ram, Dongryeol Lee, Hua Ouyang, Alexander G. Gray
Abstract: The long-standing problem of efďŹ cient nearest-neighbor (NN) search has ubiquitous applications ranging from astrophysics to MP3 ďŹ ngerprinting to bioinformatics to movie recommendations. As the dimensionality of the dataset increases, exact NN search becomes computationally prohibitive; (1+đ?œ–) distance-approximate NN search can provide large speedups but risks losing the meaning of NN search present in the ranks (ordering) of the distances. This paper presents a simple, practical algorithm allowing the user to, for the ďŹ rst time, directly control the true accuracy of NN search (in terms of ranks) while still achieving the large speedups over exact NN. Experiments on high-dimensional datasets show that our algorithm often achieves faster and more accurate results than the best-known distance-approximate method, with much more stable behavior. 1
2 0.86296016 82 nips-2009-Entropic Graph Regularization in Non-Parametric Semi-Supervised Classification
Author: Amarnag Subramanya, Jeff A. Bilmes
Abstract: We prove certain theoretical properties of a graph-regularized transductive learning objective that is based on minimizing a Kullback-Leibler divergence based loss. These include showing that the iterative alternating minimization procedure used to minimize the objective converges to the correct solution and deriving a test for convergence. We also propose a graph node ordering algorithm that is cache cognizant and leads to a linear speedup in parallel computations. This ensures that the algorithm scales to large data sets. By making use of empirical evaluation on the TIMIT and Switchboard I corpora, we show this approach is able to outperform other state-of-the-art SSL approaches. In one instance, we solve a problem on a 120 million node graph. 1
same-paper 3 0.79861867 77 nips-2009-Efficient Match Kernel between Sets of Features for Visual Recognition
Author: Liefeng Bo, Cristian Sminchisescu
Abstract: In visual recognition, the images are frequently modeled as unordered collections of local features (bags). We show that bag-of-words representations commonly used in conjunction with linear classifiers can be viewed as special match kernels, which count 1 if two local features fall into the same regions partitioned by visual words and 0 otherwise. Despite its simplicity, this quantization is too coarse, motivating research into the design of match kernels that more accurately measure the similarity between local features. However, it is impractical to use such kernels for large datasets due to their significant computational cost. To address this problem, we propose efficient match kernels (EMK) that map local features to a low dimensional feature space and average the resulting vectors to form a setlevel feature. The local feature maps are learned so their inner products preserve, to the best possible, the values of the specified kernel function. Classifiers based on EMK are linear both in the number of images and in the number of local features. We demonstrate that EMK are extremely efficient and achieve the current state of the art in three difficult computer vision datasets: Scene-15, Caltech-101 and Caltech-256. 1
4 0.72476274 215 nips-2009-Sensitivity analysis in HMMs with application to likelihood maximization
Author: Pierre-arnaud Coquelin, Romain Deguest, Rémi Munos
Abstract: This paper considers a sensitivity analysis in Hidden Markov Models with continuous state and observation spaces. We propose an Infinitesimal Perturbation Analysis (IPA) on the filtering distribution with respect to some parameters of the model. We describe a methodology for using any algorithm that estimates the filtering density, such as Sequential Monte Carlo methods, to design an algorithm that estimates its gradient. The resulting IPA estimator is proven to be asymptotically unbiased, consistent and has computational complexity linear in the number of particles. We consider an application of this analysis to the problem of identifying unknown parameters of the model given a sequence of observations. We derive an IPA estimator for the gradient of the log-likelihood, which may be used in a gradient method for the purpose of likelihood maximization. We illustrate the method with several numerical experiments.
5 0.66017228 104 nips-2009-Group Sparse Coding
Author: Samy Bengio, Fernando Pereira, Yoram Singer, Dennis Strelow
Abstract: Bag-of-words document representations are often used in text, image and video processing. While it is relatively easy to determine a suitable word dictionary for text documents, there is no simple mapping from raw images or videos to dictionary terms. The classical approach builds a dictionary using vector quantization over a large set of useful visual descriptors extracted from a training set, and uses a nearest-neighbor algorithm to count the number of occurrences of each dictionary word in documents to be encoded. More robust approaches have been proposed recently that represent each visual descriptor as a sparse weighted combination of dictionary words. While favoring a sparse representation at the level of visual descriptors, those methods however do not ensure that images have sparse representation. In this work, we use mixed-norm regularization to achieve sparsity at the image level as well as a small overall dictionary. This approach can also be used to encourage using the same dictionary words for all the images in a class, providing a discriminative signal in the construction of image representations. Experimental results on a benchmark image classification dataset show that when compact image or dictionary representations are needed for computational efficiency, the proposed approach yields better mean average precision in classification. 1
6 0.65603083 137 nips-2009-Learning transport operators for image manifolds
7 0.65459454 135 nips-2009-Learning to Hash with Binary Reconstructive Embeddings
8 0.65278351 167 nips-2009-Non-Parametric Bayesian Dictionary Learning for Sparse Image Representations
9 0.64965773 129 nips-2009-Learning a Small Mixture of Trees
10 0.64797974 70 nips-2009-Discriminative Network Models of Schizophrenia
11 0.64775652 119 nips-2009-Kernel Methods for Deep Learning
12 0.64763129 72 nips-2009-Distribution Matching for Transduction
13 0.64686263 169 nips-2009-Nonlinear Learning using Local Coordinate Coding
14 0.64614248 191 nips-2009-Positive Semidefinite Metric Learning with Boosting
15 0.64582193 158 nips-2009-Multi-Label Prediction via Sparse Infinite CCA
16 0.64546943 128 nips-2009-Learning Non-Linear Combinations of Kernels
17 0.64527202 87 nips-2009-Exponential Family Graph Matching and Ranking
18 0.64444917 32 nips-2009-An Online Algorithm for Large Scale Image Similarity Learning
19 0.64336878 132 nips-2009-Learning in Markov Random Fields using Tempered Transitions
20 0.64261866 83 nips-2009-Estimating image bases for visual image reconstruction from human brain activity