nips nips2009 nips2009-77 knowledge-graph by maker-knowledge-mining

77 nips-2009-Efficient Match Kernel between Sets of Features for Visual Recognition

Source: pdf

Author: Liefeng Bo, Cristian Sminchisescu

Abstract: In visual recognition, the images are frequently modeled as unordered collections of local features (bags). We show that bag-of-words representations commonly used in conjunction with linear classiﬁers can be viewed as special match kernels, which count 1 if two local features fall into the same regions partitioned by visual words and 0 otherwise. Despite its simplicity, this quantization is too coarse, motivating research into the design of match kernels that more accurately measure the similarity between local features. However, it is impractical to use such kernels for large datasets due to their signiﬁcant computational cost. To address this problem, we propose efﬁcient match kernels (EMK) that map local features to a low dimensional feature space and average the resulting vectors to form a setlevel feature. The local feature maps are learned so their inner products preserve, to the best possible, the values of the speciﬁed kernel function. Classiﬁers based on EMK are linear both in the number of images and in the number of local features. We demonstrate that EMK are extremely efﬁcient and achieve the current state of the art in three difﬁcult computer vision datasets: Scene-15, Caltech-101 and Caltech-256. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 org Abstract In visual recognition, the images are frequently modeled as unordered collections of local features (bags). [sent-5, score-0.306]

2 We show that bag-of-words representations commonly used in conjunction with linear classiﬁers can be viewed as special match kernels, which count 1 if two local features fall into the same regions partitioned by visual words and 0 otherwise. [sent-6, score-0.462]

3 Despite its simplicity, this quantization is too coarse, motivating research into the design of match kernels that more accurately measure the similarity between local features. [sent-7, score-0.463]

4 To address this problem, we propose efﬁcient match kernels (EMK) that map local features to a low dimensional feature space and average the resulting vectors to form a setlevel feature. [sent-9, score-0.682]

5 The local feature maps are learned so their inner products preserve, to the best possible, the values of the speciﬁed kernel function. [sent-10, score-0.57]

6 1 Introduction Models based on local features have achieved state-of-the art results in many visual object recognition tasks. [sent-13, score-0.332]

7 For example, an image can be described by a set of local features extracted from patches around salient interest points or regular grids, or a shape can be described by a set of local features deﬁned at edge points. [sent-14, score-0.449]

8 This raises the question on how should one measure the similarity between two images represented as sets of local features. [sent-15, score-0.196]

9 BOW represents each local feature with the closest visual word and counts the occurrence frequencies in the image. [sent-18, score-0.25]

10 The resulting histogram is used as an image descriptor for object recognition, often in conjunction with linear classiﬁers. [sent-19, score-0.225]

11 Various methods for creating vocabularies exist [10], the most common being k-means clustering of all (or a subsample of) the local features to obtain visual words. [sent-21, score-0.275]

12 An even better approach to recognition is to deﬁne kernels over sets of local features. [sent-22, score-0.296]

13 The sum match kernel of Haussler [7] is obtained by adding local kernels over all combinations of local features from two different sets. [sent-24, score-0.843]

14 In [17], the authors modify the sum kernel by introducing an integer exponent on local kernels. [sent-25, score-0.361]

15 Neighborhood kernels [20] integrate the spatial location of local features into a sum match kernel. [sent-26, score-0.534]

16 Pyramid match kernels [5, 14, 13] map local features to multi-resolution histograms and compute a weighted histogram intersection. [sent-27, score-0.553]

17 Algebraic set kernels [26] exploit tensor products to aggregate local kernels, whereas principal angle kernels 1 [29] measure similarities based on angles between linear subspaces spanned by local features in the two sets. [sent-28, score-0.711]

18 All of the above methods need to explicitly evaluate the full kernel matrix, hence they require space and time complexity that is quadratic in the number of images. [sent-30, score-0.225]

19 In this paper we present efﬁcient match kernels (EMK) that combine the strengths of both bag of words and set kernels. [sent-32, score-0.355]

20 We map local features to a low dimensional feature space and construct set-level features by averaging the resulting feature vectors. [sent-33, score-0.537]

21 Hence EMK can be used in conjunction with linear classiﬁers and do not require the explicit computation of a full kernel matrix—this leads to both space and time complexity that is linear in the number of images. [sent-35, score-0.289]

22 We adopt a bag of features method, which represents an image as a set of local features. [sent-39, score-0.298]

23 , xp } be a set of local features in an image and V = {v1 , . [sent-43, score-0.255]

24 In BOW, each local feature is quantized into a D dimensional binary indicator vector µ(x) = [µ1 (x), . [sent-47, score-0.244]

25 1 The feature vectors for one image form a normalized histogram µ(X) = |X| x∈X µ(x), where | · | is the cardinality of a set. [sent-52, score-0.215]

26 BOW features can be used in conjunction with either a linear or a kernel classiﬁer, albeit the latter often leads to expensive training and testing (see §4). [sent-53, score-0.495]

27 When a linear classiﬁer is used, the resulting kernel function is: KB (X, Y) = µ(X) µ(Y) = with δ(x, y) = 1 |X||Y| µ(x) µ(y) = x∈X y∈Y 1 |X||Y| 1, x, y ⊂ R(vi ), ∃i ∈ {1, . [sent-54, score-0.225]

28 , D} 0, otherwise δ(x, y) (1) x∈X y∈Y (2) δ(x, y) is obviously a positive deﬁnite kernel, measuring the similarity between two local features x and y: δ(x, y) = 1 if x and y belong the same region R(vi ), and 0 otherwise. [sent-57, score-0.222]

29 However, this type of quantization can be too coarse when measuring the similarity of two local features (see also ﬁg. [sent-58, score-0.26]

30 Better would be to replace δ(x, y) with a continuous kernel function that more accurately measures the similarity between x and y: 1 KS (X, Y) = k(x, y) (3) |X||Y| x∈X y∈Y In fact, this is related to the normalized sum match kernel [7, 17]. [sent-60, score-0.632]

31 A negative impact of kernelization is the high computational cost required to compute the summation match function, which takes O(|X||Y|) for a single kernel value rather than O(1), the cost of evaluating a single kernel function deﬁned on vectors. [sent-63, score-0.655]

32 When used in conjunction with kernel machines, it takes O(n2 ) and O(n2 m2 d) to store and compute the entire kernel matrix, respectively, where n is the number of images in the training set, and m is the average cardinality of all sets. [sent-64, score-0.644]

33 In addition to expensive training, the match kernel function has also a fairly high testing cost: n 2 for a test input, evaluating the discriminant f (X) = i=1 αi Ks (Xi , X) takes O(nm d). [sent-66, score-0.402]

34 For sparse kernel machines, such as SVMs, the cost can decrease to some extent, as some of the αi are zero. [sent-68, score-0.264]

35 The kernel function k(x, y) = φ(x) φ(y) is called ﬁnite dimensional if the feature map φ(·) is ﬁnite dimensional. [sent-71, score-0.395]

36 PMK is the pyramid match kernel of [6], with T in PMK giving the value of the maximal feature range. [sent-76, score-0.635]

37 D in EMK is the dimensionality of feature maps and does not change with the training set size. [sent-78, score-0.318]

38 EMK uses linear classiﬁers and does not require the evaluation of the kernel matrix. [sent-81, score-0.225]

39 The other four methods are used in conjunction with kernel classiﬁers, hence they all need to evaluate the entire kernel matrix. [sent-82, score-0.514]

40 With the ﬁnite dimensional kernel, the match kernel can be simpliﬁed as: KS (X, Y) = φ(X) φ(Y) (4) 1 where φ(X) = |X| x∈X φ(x) is the feature map on the set of vectors. [sent-85, score-0.522]

41 The training and testing costs are O(nmDd) and O(mDd) respectively, where D is the dimensionality of the feature map φ(x). [sent-89, score-0.289]

42 If the feature map φ(x) is low dimensional, the computational cost of EMK can be much lower than the one required to evaluate the match kernel by computing the kernel functions 1 k(x, y). [sent-90, score-0.739]

43 Notice that we only need the feature vectors φ(X) in EMK, hence it is not necessary to compute the entire kernel matrix. [sent-92, score-0.343]

44 While there can be many choices for the local feature maps φ(x)—and the positive deﬁniteness of k(x, y) = φ(x) φ(x) can be always guaranteed—, most do not necessarily lead to a meaningful similarity measure. [sent-99, score-0.339]

45 In the paper, we give two principled methods to create meaningful local feature maps φ(x), by arranging for their inner products to approximate a given kernel function. [sent-100, score-0.57]

46 3 Efﬁcient Match Kernels In this section we present two kernel approximations, based on low-dimensional projections (§3. [sent-101, score-0.225]

47 Given {ψ(zi )}D , a i=1 set of basis vectors zi , we can approximate the feature vector ψ(x): vx = argmin ψ(x) − Hvx vx 3 2 (5) Approximation Exact 1. [sent-106, score-0.271]

48 Left: approximated Gaussian kernel with 20 learned feature maps. [sent-119, score-0.313]

49 Right: approximated Gaussian kernel based on 200 random Fourier features. [sent-121, score-0.225]

50 The feature maps are learned from 200 samples, uniformly drawn from [-10,10]. [sent-122, score-0.202]

51 For G G = K−1 (notice that K−1 is positive deﬁnite), the local feature maps are: ZZ ZZ φ(x) = GkZ (x) (8) 1 The resulting full feature map is: φ(X) = |X| G x∈X kZ (x) , with computational complexity 2 O(mDd + D ) for a set of local features. [sent-128, score-0.543]

52 A related method is the kernel codebook [28], where a set-level feature is also extracted based on a local kernel, but with different feature map φ(·). [sent-129, score-0.594]

53 An essential difference is that inner products of our set-level features φ(X) formally approximate the sum-match kernel, whereas the ones induced by the kernel codebook do not. [sent-130, score-0.422]

54 Therefore EMK only requires a linear classiﬁer wherea a kernel codebook would require a non-linear classiﬁer for comparable performance. [sent-131, score-0.274]

55 Our experiments, shown in table 3, further suggest that EMK outperforms the kernel codebook, even in the non-linear case. [sent-133, score-0.225]

56 One way is kernel principal component analysis (KPCA) [24] on a randomly selected pool of F local features, with the basis set to the topmost D eigenvectors. [sent-135, score-0.357]

57 This faces two difﬁculties, however: (i) KPCA scales cubically in the number of selected local features, F ; (ii) O(F md) work is required to extract the set-level feature vector for one image, because F the eigenvectors are linear combinations of the selected local feature vectors, i=1 αi ψ(xi ). [sent-136, score-0.394]

58 This motivates our constrained singular value decomposition in kernel feature space (CKSVD): argmin R(V, Z) = V,Z 1 F F ψ(xi ) − Hvi 2 (9) i=1 where F is the number of the randomly selected local features, Z = [z1 , . [sent-142, score-0.448]

59 2 Random Fourier Set Features Another tractable approach to large-scale learning is to approximate the kernel using random feature maps [22, 23]. [sent-167, score-0.427]

60 For a given function µ(x; θ) and the probability distribution p(θ), one can deﬁne the local kernel as: kf (x, y) = p(θ)µ(x; θ)µ(y, θ)dθ. [sent-168, score-0.359]

61 We consider feature maps of the form µ(x; θ) = cos(ω x + b) with θ = (ω, b), which project local features to a randomly chosen line, then pass the resulting scalar through a sinusoid. [sent-169, score-0.396]

62 For example, to approximate the Gaussian kernel kf (x, y) = exp(−γ x − y 2 ), the random feature maps are: φ(x) = 2 D [cos(ω1 x + b1 ),. [sent-170, score-0.452]

63 Although any shift invariant kernel can be represented using random Fourier features, currently these are limited to Gaussian kernels or to kernels with analytical inverse Fourier transforms. [sent-177, score-0.547]

64 The constraint of a shiftinvariant kernel excludes a number of practically interesting similarities. [sent-179, score-0.225]

65 For example, the χ2 kernel [8] and the histogram intersection kernel [5] are designed to compare histograms, hence they can be used as local kernels, if the features are histograms. [sent-180, score-0.68]

66 1) can produce superior results when the dimensionality of the feature maps is small. [sent-184, score-0.247]

67 BOW-Linear and BOW-Gaussian use a linear classiﬁer and a Gaussian kernel classiﬁer on BOW features, respectively. [sent-189, score-0.225]

68 For the former, we learn low dimensional feature maps (§3. [sent-191, score-0.249]

69 The local features are SIFT descriptors [16] extracted from 16×16 image patches. [sent-195, score-0.29]

70 For EMK, our local kernel is a Gaussian 5 exp(−γ x − y 2 ). [sent-197, score-0.334]

71 We run k-means clustering to identify the visual words and stochastic gradient descent to learn the local feature maps, using a 100,000 random set of SIFT descriptors. [sent-199, score-0.32]

72 The regularization and the kernel parameters (if available) in SVM are tuned by ten-fold cross validation on the training set. [sent-202, score-0.296]

73 The dimensionality of the feature maps and the vocabulary size are both set to 1000 for fair comparisons, unless otherwise speciﬁed. [sent-203, score-0.297]

74 We vary the dimensionality of the feature maps (EMK) and the vocabulary size (BOW) from 250 to 2000 with step length 250. [sent-212, score-0.297]

75 For this dataset, we only consider the ﬂat BOW and EMK (only pyramid level 0) in all experiments. [sent-213, score-0.195]

76 Our second experiment is similar with the ﬁrst one, but the dimensionality of the feature maps and the vocabulary size vary from 50 to 200 with step length 25. [sent-216, score-0.297]

77 In our third experiment, we ﬁx the dimensionality of the feature maps to 1000, and vary the training set size from 300 to 2400 with step length 300. [sent-217, score-0.318]

78 We observe that EMK-CKSVD signiﬁcantly outperforms EMKFourier for low-dimensional feature maps, indicating that learned features preserve the values of the Gaussian kernel better than the random Fourier maps in this regime, see also ﬁg. [sent-222, score-0.512]

79 The sum match kernel takes about 10 hours for training and 10 hours for testing, respectively whereas EMK-Fourier and EMK-CKSVD need less than 1 hour, most spent computing SIFT descriptors. [sent-231, score-0.527]

80 In addition, we use 10,000 randomly selected SIFT descriptors to learn KPCA-based local feature maps, which takes about 12 hours for the training and testing sets on the full Scene-15 dataset, respectively. [sent-232, score-0.377]

81 We consider three pyramid levels: L = 0, L = 1, amd L = 2 (for the latter two, spatial information is used). [sent-239, score-0.22]

82 EMK-Fourier and EMK-CKSVD perform substantially better than BOW-Linear and BOWGaussian for all pyramid levels. [sent-242, score-0.195]

83 The performance gap increases as more pyramid levels are added. [sent-243, score-0.222]

84 The training set size is 1500 in the left plot; the dimensionality of feature maps and the vocabulary size are both set to 1000 in the right plot (for fair comparisons). [sent-267, score-0.368]

85 8 Table 2: Classiﬁcation accuracy comparisons for three pyramid levels. [sent-316, score-0.277]

86 The dimensionality of the feature maps and the vocabulary size are both set to 1000. [sent-318, score-0.297]

87 To compare the four algorithms computationally, we select images from each category proportionally to the total number of images of that category, as the training set. [sent-328, score-0.244]

88 To accelerate BOW-Gaussian, we precompute the entire kernel matrix. [sent-337, score-0.225]

89 7 Algorithms 15 training 30 training 45 training 60 training BOW-Linear 17. [sent-374, score-0.284]

90 The dimensionality of the feature maps and the vocabulary size are both set to 1000 (for fair comparisons). [sent-408, score-0.297]

91 We illustrate the quantization limitations of such models and propose more sophisticated kernel approximations that preserve the computational efﬁciency of bag-of-words while being just as (or more) accurate than the existing, computationally demanding, non-linear kernels. [sent-425, score-0.288]

92 The models we propose are built around Efﬁcient Match Kernels (EMK), which map local features to a low dimensional feature space, average the resulting feature vectors to form a set-level feature, then apply a linear classiﬁer. [sent-426, score-0.482]

93 The pyramid match kernel: discriminative classiﬁcation with sets of image features. [sent-458, score-0.383]

94 The pyramid match kernel: Efﬁcient learning with sets of features. [sent-463, score-0.322]

95 Local features and kernels for classiﬁcation of texture and object categories: A comprehensive study. [sent-473, score-0.283]

96 Iterative kernel principal component analysis for image o modeling. [sent-490, score-0.309]

97 Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. [sent-506, score-0.227]

98 A kullback-leibler divergence based kernel for svm classiﬁcation in multimedia applications. [sent-526, score-0.248]

99 Nonlinear component analysis as a kernel eigenvalue o u problem. [sent-562, score-0.225]

100 Algebraic set kernels with application to inference over local image representations. [sent-574, score-0.331]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('emk', 0.692), ('bow', 0.318), ('kernel', 0.225), ('pyramid', 0.195), ('kernels', 0.161), ('kz', 0.156), ('match', 0.127), ('maps', 0.114), ('fourier', 0.112), ('local', 0.109), ('cksvd', 0.104), ('pmk', 0.104), ('feature', 0.088), ('mdd', 0.086), ('features', 0.085), ('classi', 0.073), ('training', 0.071), ('zz', 0.069), ('nmdd', 0.069), ('conjunction', 0.064), ('sift', 0.064), ('image', 0.061), ('images', 0.059), ('category', 0.055), ('vi', 0.055), ('visual', 0.053), ('ks', 0.052), ('vx', 0.052), ('bowgaussian', 0.052), ('kzz', 0.052), ('testing', 0.05), ('vocabulary', 0.05), ('codebook', 0.049), ('bhattacharyya', 0.049), ('sgd', 0.049), ('kpca', 0.049), ('dimensional', 0.047), ('dimensionality', 0.045), ('accuracy', 0.044), ('bag', 0.043), ('er', 0.043), ('cvpr', 0.039), ('cost', 0.039), ('quantization', 0.038), ('comparisons', 0.038), ('zd', 0.037), ('object', 0.037), ('histogram', 0.036), ('map', 0.035), ('descriptors', 0.035), ('dimentionality', 0.035), ('emkfourier', 0.035), ('hvi', 0.035), ('hvx', 0.035), ('trainning', 0.035), ('ers', 0.034), ('products', 0.034), ('scene', 0.032), ('iccv', 0.032), ('liefeng', 0.03), ('liblinear', 0.03), ('vectors', 0.03), ('whereas', 0.029), ('notice', 0.029), ('similarity', 0.028), ('codebooks', 0.028), ('hmax', 0.028), ('rahimi', 0.028), ('bo', 0.028), ('vocabularies', 0.028), ('levels', 0.027), ('descriptor', 0.027), ('sum', 0.027), ('recognition', 0.026), ('datasets', 0.026), ('argmin', 0.026), ('grauman', 0.026), ('lazebnik', 0.026), ('approximations', 0.025), ('gaussian', 0.025), ('spatial', 0.025), ('kf', 0.025), ('caltech', 0.025), ('hours', 0.024), ('gradient', 0.024), ('words', 0.024), ('cos', 0.024), ('grids', 0.023), ('schmid', 0.023), ('nips', 0.023), ('principal', 0.023), ('svm', 0.023), ('zi', 0.023), ('svms', 0.023), ('jmlr', 0.023), ('closure', 0.022), ('mercer', 0.022), ('descent', 0.022), ('art', 0.022), ('thousands', 0.022)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000006 77 nips-2009-Efficient Match Kernel between Sets of Features for Visual Recognition

Author: Liefeng Bo, Cristian Sminchisescu

2 0.15758072 128 nips-2009-Learning Non-Linear Combinations of Kernels

Author: Corinna Cortes, Mehryar Mohri, Afshin Rostamizadeh

Abstract: This paper studies the general problem of learning kernels based on a polynomial combination of base kernels. We analyze this problem in the case of regression and the kernel ridge regression algorithm. We examine the corresponding learning kernel optimization problem, show how that minimax problem can be reduced to a simpler minimization problem, and prove that the global solution of this problem always lies on the boundary. We give a projection-based gradient descent algorithm for solving the optimization problem, shown empirically to converge in few iterations. Finally, we report the results of extensive experiments with this algorithm using several publicly available datasets demonstrating the effectiveness of our technique.

3 0.14086351 119 nips-2009-Kernel Methods for Deep Learning

Author: Youngmin Cho, Lawrence K. Saul

Abstract: We introduce a new family of positive-deﬁnite kernel functions that mimic the computation in large, multilayer neural nets. These kernel functions can be used in shallow architectures, such as support vector machines (SVMs), or in deep kernel-based architectures that we call multilayer kernel machines (MKMs). We evaluate SVMs and MKMs with these kernel functions on problems designed to illustrate the advantages of deep architectures. On several problems, we obtain better results than previous, leading benchmarks from both SVMs with Gaussian kernels as well as deep belief nets. 1

4 0.11194707 118 nips-2009-Kernel Choice and Classifiability for RKHS Embeddings of Probability Distributions

Author: Kenji Fukumizu, Arthur Gretton, Gert R. Lanckriet, Bernhard Schölkopf, Bharath K. Sriperumbudur

Abstract: Embeddings of probability measures into reproducing kernel Hilbert spaces have been proposed as a straightforward and practical means of representing and comparing probabilities. In particular, the distance between embeddings (the maximum mean discrepancy, or MMD) has several key advantages over many classical metrics on distributions, namely easy computability, fast convergence and low bias of ﬁnite sample estimates. An important requirement of the embedding RKHS is that it be characteristic: in this case, the MMD between two distributions is zero if and only if the distributions coincide. Three new results on the MMD are introduced in the present study. First, it is established that MMD corresponds to the optimal risk of a kernel classiﬁer, thus forming a natural link between the distance between distributions and their ease of classiﬁcation. An important consequence is that a kernel must be characteristic to guarantee classiﬁability between distributions in the RKHS. Second, the class of characteristic kernels is broadened to incorporate all strictly positive deﬁnite kernels: these include non-translation invariant kernels and kernels on non-compact domains. Third, a generalization of the MMD is proposed for families of kernels, as the supremum over MMDs on a class of kernels (for instance the Gaussian kernels with different bandwidths). This extension is necessary to obtain a single distance measure if a large selection or class of characteristic kernels is potentially appropriate. This generalization is reasonable, given that it corresponds to the problem of learning the kernel by minimizing the risk of the corresponding kernel classiﬁer. The generalized MMD is shown to have consistent ﬁnite sample estimates, and its performance is demonstrated on a homogeneity testing example. 1

5 0.10575151 95 nips-2009-Fast subtree kernels on graphs

Author: Nino Shervashidze, Karsten M. Borgwardt

Abstract: In this article, we propose fast subtree kernels on graphs. On graphs with n nodes and m edges and maximum degree d, these kernels comparing subtrees of height h can be computed in O(mh), whereas the classic subtree kernel by Ramon & G¨ rtner scales as O(n2 4d h). Key to this efﬁciency is the observation that the a Weisfeiler-Lehman test of isomorphism from graph theory elegantly computes a subtree kernel as a byproduct. Our fast subtree kernels can deal with labeled graphs, scale up easily to large graphs and outperform state-of-the-art graph kernels on several classiﬁcation benchmark datasets in terms of accuracy and runtime. 1

6 0.10417251 120 nips-2009-Kernels and learning curves for Gaussian process regression on random graphs

7 0.097839773 179 nips-2009-On the Algorithmics and Applications of a Mixed-norm based Kernel Learning Formulation

8 0.081582837 211 nips-2009-Segmenting Scenes by Matching Image Composites

9 0.078270711 58 nips-2009-Constructing Topological Maps using Markov Random Fields and Loop-Closure Detection

10 0.077796616 80 nips-2009-Efficient and Accurate Lp-Norm Multiple Kernel Learning

11 0.076547489 104 nips-2009-Group Sparse Coding

12 0.073777556 8 nips-2009-A Fast, Consistent Kernel Two-Sample Test

13 0.073334888 133 nips-2009-Learning models of object structure

14 0.072708324 137 nips-2009-Learning transport operators for image manifolds

15 0.071053624 43 nips-2009-Bayesian estimation of orientation preference maps

16 0.070884325 142 nips-2009-Locality-sensitive binary codes from shift-invariant kernels

17 0.069247432 28 nips-2009-An Additive Latent Feature Model for Transparent Object Recognition

18 0.068205841 260 nips-2009-Zero-shot Learning with Semantic Output Codes

19 0.067272618 236 nips-2009-Structured output regression for detection with partial truncation

20 0.065831766 139 nips-2009-Linear-time Algorithms for Pairwise Statistical Problems

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.203), (1, -0.012), (2, -0.129), (3, 0.079), (4, -0.079), (5, 0.053), (6, -0.017), (7, 0.18), (8, -0.041), (9, 0.005), (10, 0.124), (11, -0.113), (12, -0.003), (13, 0.087), (14, -0.093), (15, 0.027), (16, -0.063), (17, 0.105), (18, 0.031), (19, -0.054), (20, 0.03), (21, 0.022), (22, -0.028), (23, 0.046), (24, -0.018), (25, 0.037), (26, -0.013), (27, -0.02), (28, -0.066), (29, 0.037), (30, -0.042), (31, -0.014), (32, -0.057), (33, -0.024), (34, 0.046), (35, 0.034), (36, 0.001), (37, 0.083), (38, -0.043), (39, 0.05), (40, -0.029), (41, 0.034), (42, -0.042), (43, -0.072), (44, 0.029), (45, 0.067), (46, 0.054), (47, -0.021), (48, 0.038), (49, -0.047)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95265669 77 nips-2009-Efficient Match Kernel between Sets of Features for Visual Recognition

Author: Liefeng Bo, Cristian Sminchisescu

2 0.79210585 128 nips-2009-Learning Non-Linear Combinations of Kernels

Author: Corinna Cortes, Mehryar Mohri, Afshin Rostamizadeh

3 0.77744663 119 nips-2009-Kernel Methods for Deep Learning

Author: Youngmin Cho, Lawrence K. Saul

4 0.71429473 118 nips-2009-Kernel Choice and Classifiability for RKHS Embeddings of Probability Distributions

Author: Kenji Fukumizu, Arthur Gretton, Gert R. Lanckriet, Bernhard Schölkopf, Bharath K. Sriperumbudur

5 0.6385029 95 nips-2009-Fast subtree kernels on graphs

Author: Nino Shervashidze, Karsten M. Borgwardt

6 0.63471282 8 nips-2009-A Fast, Consistent Kernel Two-Sample Test

7 0.6307168 92 nips-2009-Fast Graph Laplacian Regularized Kernel Learning via Semidefinite–Quadratic–Linear Programming

8 0.62325293 120 nips-2009-Kernels and learning curves for Gaussian process regression on random graphs

9 0.53600633 80 nips-2009-Efficient and Accurate Lp-Norm Multiple Kernel Learning

10 0.52164227 93 nips-2009-Fast Image Deconvolution using Hyper-Laplacian Priors

11 0.51716805 179 nips-2009-On the Algorithmics and Applications of a Mixed-norm based Kernel Learning Formulation

12 0.47625086 33 nips-2009-Analysis of SVM with Indefinite Kernels

13 0.47481331 236 nips-2009-Structured output regression for detection with partial truncation

14 0.46203354 32 nips-2009-An Online Algorithm for Large Scale Image Similarity Learning

15 0.4556846 58 nips-2009-Constructing Topological Maps using Markov Random Fields and Loop-Closure Detection

16 0.43514913 227 nips-2009-Speaker Comparison with Inner Product Discriminant Functions

17 0.43082869 5 nips-2009-A Bayesian Model for Simultaneous Image Clustering, Annotation and Object Segmentation

18 0.42554399 142 nips-2009-Locality-sensitive binary codes from shift-invariant kernels

19 0.42489508 72 nips-2009-Distribution Matching for Transduction

20 0.41461727 211 nips-2009-Segmenting Scenes by Matching Image Composites

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(21, 0.011), (24, 0.034), (25, 0.064), (35, 0.055), (36, 0.141), (39, 0.064), (48, 0.256), (55, 0.011), (58, 0.087), (71, 0.04), (81, 0.013), (86, 0.116), (91, 0.014)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.87545794 198 nips-2009-Rank-Approximate Nearest Neighbor Search: Retaining Meaning and Speed in High Dimensions

Author: Parikshit Ram, Dongryeol Lee, Hua Ouyang, Alexander G. Gray

Abstract: The long-standing problem of efďŹ cient nearest-neighbor (NN) search has ubiquitous applications ranging from astrophysics to MP3 ďŹ ngerprinting to bioinformatics to movie recommendations. As the dimensionality of the dataset increases, exact NN search becomes computationally prohibitive; (1+đ?œ–) distance-approximate NN search can provide large speedups but risks losing the meaning of NN search present in the ranks (ordering) of the distances. This paper presents a simple, practical algorithm allowing the user to, for the ďŹ rst time, directly control the true accuracy of NN search (in terms of ranks) while still achieving the large speedups over exact NN. Experiments on high-dimensional datasets show that our algorithm often achieves faster and more accurate results than the best-known distance-approximate method, with much more stable behavior. 1

2 0.86296016 82 nips-2009-Entropic Graph Regularization in Non-Parametric Semi-Supervised Classification

Author: Amarnag Subramanya, Jeff A. Bilmes

Abstract: We prove certain theoretical properties of a graph-regularized transductive learning objective that is based on minimizing a Kullback-Leibler divergence based loss. These include showing that the iterative alternating minimization procedure used to minimize the objective converges to the correct solution and deriving a test for convergence. We also propose a graph node ordering algorithm that is cache cognizant and leads to a linear speedup in parallel computations. This ensures that the algorithm scales to large data sets. By making use of empirical evaluation on the TIMIT and Switchboard I corpora, we show this approach is able to outperform other state-of-the-art SSL approaches. In one instance, we solve a problem on a 120 million node graph. 1

same-paper 3 0.79861867 77 nips-2009-Efficient Match Kernel between Sets of Features for Visual Recognition

Author: Liefeng Bo, Cristian Sminchisescu

4 0.72476274 215 nips-2009-Sensitivity analysis in HMMs with application to likelihood maximization

Author: Pierre-arnaud Coquelin, Romain Deguest, Rémi Munos

Abstract: This paper considers a sensitivity analysis in Hidden Markov Models with continuous state and observation spaces. We propose an Inﬁnitesimal Perturbation Analysis (IPA) on the ﬁltering distribution with respect to some parameters of the model. We describe a methodology for using any algorithm that estimates the ﬁltering density, such as Sequential Monte Carlo methods, to design an algorithm that estimates its gradient. The resulting IPA estimator is proven to be asymptotically unbiased, consistent and has computational complexity linear in the number of particles. We consider an application of this analysis to the problem of identifying unknown parameters of the model given a sequence of observations. We derive an IPA estimator for the gradient of the log-likelihood, which may be used in a gradient method for the purpose of likelihood maximization. We illustrate the method with several numerical experiments.

5 0.66017228 104 nips-2009-Group Sparse Coding

Author: Samy Bengio, Fernando Pereira, Yoram Singer, Dennis Strelow

Abstract: Bag-of-words document representations are often used in text, image and video processing. While it is relatively easy to determine a suitable word dictionary for text documents, there is no simple mapping from raw images or videos to dictionary terms. The classical approach builds a dictionary using vector quantization over a large set of useful visual descriptors extracted from a training set, and uses a nearest-neighbor algorithm to count the number of occurrences of each dictionary word in documents to be encoded. More robust approaches have been proposed recently that represent each visual descriptor as a sparse weighted combination of dictionary words. While favoring a sparse representation at the level of visual descriptors, those methods however do not ensure that images have sparse representation. In this work, we use mixed-norm regularization to achieve sparsity at the image level as well as a small overall dictionary. This approach can also be used to encourage using the same dictionary words for all the images in a class, providing a discriminative signal in the construction of image representations. Experimental results on a benchmark image classiﬁcation dataset show that when compact image or dictionary representations are needed for computational efﬁciency, the proposed approach yields better mean average precision in classiﬁcation. 1

6 0.65603083 137 nips-2009-Learning transport operators for image manifolds

7 0.65459454 135 nips-2009-Learning to Hash with Binary Reconstructive Embeddings

8 0.65278351 167 nips-2009-Non-Parametric Bayesian Dictionary Learning for Sparse Image Representations

9 0.64965773 129 nips-2009-Learning a Small Mixture of Trees

10 0.64797974 70 nips-2009-Discriminative Network Models of Schizophrenia

11 0.64775652 119 nips-2009-Kernel Methods for Deep Learning

12 0.64763129 72 nips-2009-Distribution Matching for Transduction

13 0.64686263 169 nips-2009-Nonlinear Learning using Local Coordinate Coding

14 0.64614248 191 nips-2009-Positive Semidefinite Metric Learning with Boosting

15 0.64582193 158 nips-2009-Multi-Label Prediction via Sparse Infinite CCA

16 0.64546943 128 nips-2009-Learning Non-Linear Combinations of Kernels

17 0.64527202 87 nips-2009-Exponential Family Graph Matching and Ranking

18 0.64444917 32 nips-2009-An Online Algorithm for Large Scale Image Similarity Learning

19 0.64336878 132 nips-2009-Learning in Markov Random Fields using Tempered Transitions

20 0.64261866 83 nips-2009-Estimating image bases for visual image reconstruction from human brain activity