nips nips2009 nips2009-135 knowledge-graph by maker-knowledge-mining

135 nips-2009-Learning to Hash with Binary Reconstructive Embeddings


Source: pdf

Author: Brian Kulis, Trevor Darrell

Abstract: Fast retrieval methods are increasingly critical for many large-scale analysis tasks, and there have been several recent methods that attempt to learn hash functions for fast and accurate nearest neighbor searches. In this paper, we develop an algorithm for learning hash functions based on explicitly minimizing the reconstruction error between the original distances and the Hamming distances of the corresponding binary embeddings. We develop a scalable coordinate-descent algorithm for our proposed hashing objective that is able to efficiently learn hash functions in a variety of settings. Unlike existing methods such as semantic hashing and spectral hashing, our method is easily kernelized and does not require restrictive assumptions about the underlying distribution of the data. We present results over several domains to demonstrate that our method outperforms existing state-of-the-art techniques. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract Fast retrieval methods are increasingly critical for many large-scale analysis tasks, and there have been several recent methods that attempt to learn hash functions for fast and accurate nearest neighbor searches. [sent-3, score-0.782]

2 In this paper, we develop an algorithm for learning hash functions based on explicitly minimizing the reconstruction error between the original distances and the Hamming distances of the corresponding binary embeddings. [sent-4, score-0.86]

3 We develop a scalable coordinate-descent algorithm for our proposed hashing objective that is able to efficiently learn hash functions in a variety of settings. [sent-5, score-1.144]

4 Unlike existing methods such as semantic hashing and spectral hashing, our method is easily kernelized and does not require restrictive assumptions about the underlying distribution of the data. [sent-6, score-0.659]

5 In cases where the amount of data is huge—large image repositories, video sequences, and others—having fast techniques for finding nearest neighbors to a query is essential. [sent-9, score-0.326]

6 At an abstract level, we may view hashing methods for similarity search as mapping input data (which may be arbitrarily high-dimensional) to a lowdimensional binary (Hamming) space. [sent-10, score-0.529]

7 Since the Hamming distance between two objects can be computed via an xor operation and a bit count, even a linear scan in the Hamming space for a nearest neighbor to a query in a database of 100 million objects can currently be performed within a few seconds on a typical workstation. [sent-12, score-0.367]

8 If the input dimensionality is very high, hashing methods lead to enormous computational savings. [sent-13, score-0.499]

9 In order to be successful, hashing techniques must appropriately preserve distances when mapping to the Hamming space. [sent-14, score-0.622]

10 One of the basic but most widely-employed methods, locality-sensitive hashing (LSH) [1, 2], generates embeddings via random projections and has been used for many large-scale search tasks. [sent-15, score-0.533]

11 An advantage to this technique is that the random projections provably maintain the input distances in the limit as the number of hash bits increases; at the same time, it has been observed that the number of hash bits required may be large in some cases to faithfully maintain the distances. [sent-16, score-1.511]

12 On the other hand, several recent techniques—most notably semantic hashing [3] and spectral hashing [4]—attempt to overcome this problem by designing hashing techniques that leverage machine learning to find appropriate hash functions to optimize an underlying hashing objective. [sent-17, score-2.704]

13 Both methods have shown advantages over LSH in terms of the number of bits required 1 to find good approximate nearest neighbors. [sent-18, score-0.235]

14 In particular, as noted by the authors, spectral hashing assumes a uniform distribution over the data, a potentially restrictive assumption in some cases. [sent-20, score-0.582]

15 In this paper, we introduce and analyze a simple objective for learning hash functions, develop an efficient coordinate-descent algorithm, and demonstrate that the proposed approach leads to improved results as compared to existing hashing techniques. [sent-21, score-1.128]

16 The main idea is to construct hash functions that explicitly preserve the input distances when mapping to the Hamming space. [sent-22, score-0.683]

17 If there are n training points, k nearest neighbors per point in the training data, and b bits in our desired hash table, our method ends up costing O(nb(k + log n)) time per iteration to update all hash functions, and provably reaches a local optimum of the reconstruction objective. [sent-25, score-1.554]

18 In experiments, we compare against relevant existing hashing techniques on a variety of important vision data sets, and show that our method is able to compete with or outperform state-of-the-art hashing algorithms on these data sets. [sent-26, score-1.056]

19 1 Related Work Methods for fast nearest neighbor retrieval are generally broken down into two families. [sent-29, score-0.201]

20 These methods attempt to speed up nearest neighbor computation, but can degenerate to a linear scan in the worst case. [sent-31, score-0.187]

21 Locality-sensitive hashing [1, 2] is the most popular method, and extensions have been explored for accommodating distances such as ℓp norms [10], learned metrics [11], and image kernels [12]. [sent-33, score-0.654]

22 Algorithms based on LSH typically come with guarantees that the approximate nearest neighbors (neighbors within (1 + ǫ) times the true nearest neighbor distance) may be found in time that is sublinear in the total number of database objects (but as a function of ǫ). [sent-34, score-0.425]

23 These include semantic hashing [3], spectral hashing [4], parameter-sensitive hashing [13], and boosting-based hashing methods [14]. [sent-37, score-2.102]

24 2 Hashing Formulation In the following section, we describe our proposed method, starting with the choice of parameterization for the hash functions and the objective function to minimize. [sent-38, score-0.68]

25 We would like to project each data point to a low-dimensional binary space to take advantage of fast nearest neighbor routines. [sent-48, score-0.196]

26 Suppose that the desired number of dimensions of the binary space is b; we will compute the b-dimensional binary embedding by projecting our data using a set of b hash functions h1 , . [sent-49, score-0.641]

27 Each hash function hi is a binary-valued function, and our low-dimensional 1 2 1 Alternatively, we may scale the data appropriately by a constant so that the squared Euclidean distances xi − xj 2 are in [0, 1]. [sent-53, score-0.738]

28 2 Parameterization and Objective In standard random hyperplane locality-sensitive hashing (e. [sent-61, score-0.499]

29 [1]), each hash function hp is generated independently by selecting a random vector rp from a multivariate Gaussian with zero-mean T and identity covariance. [sent-63, score-0.613]

30 Then the hash function is given as hp (x) = sign(rp x). [sent-64, score-0.613]

31 In contrast, we propose to generate a sequence of hash functions that are dependent on one another, in the same spirit as in spectral hashing (though with a different parameterization). [sent-65, score-1.163]

32 We introduce a matrix W of size b × n, and we parameterize the hash functions h1 , . [sent-66, score-0.601]

33 q=1 Note that the data points xpq for each hash function need not be the same for each hq (that is, each hash function may utilize different sets of points). [sent-73, score-1.19]

34 Similarly, the number of points s used for each hash function may change, though for simplicity we will present the case when s is the same for each function (and so we can represent all weights via the b × s matrix W ). [sent-74, score-0.568]

35 Though we are not aware of any existing methods that parameterize the hash functions in this way, this parameterization is natural for several reasons. [sent-75, score-0.654]

36 Furthermore, the form of each hash function—the sign of a linear combination of kernel function values—is the same as several kernel-based learning algorithms such as support vector machines. [sent-78, score-0.567]

37 In particular, we will look at the squared error ˜ between the original distances (using d) and the reconstructed distances (using d). [sent-80, score-0.272]

38 We minimize the following objective with respect to the weight matrix W : ˜ (d(xi , xj ) − d(xi , xj ))2 . [sent-81, score-0.166]

39 Typically, we will choose this to be a set of pairs which includes both the nearest neighbors as well as other pairs from the database (see Section 3 for details). [sent-83, score-0.311]

40 Such an approach will update a single hash function hp ; then, by choosing a single weight to update for each hash function, we can update all hash functions in O(nb(k + log n)) time. [sent-93, score-1.9]

41 In particular, if k = Ω(log n), then we can update all hash functions on the order of the time it takes to compute the objective function itself, making the updates particularly efficient. [sent-94, score-0.69]

42 We begin with a simple lemma characterizing how the objective function changes when we update a single hash function. [sent-98, score-0.656]

43 Consider updating some hash function hold to hnew ˜ uses hold ), and let ho and hn be the n × 1 vectors obtained by applying the old and new (where d hash functions to each data point, respectively. [sent-101, score-1.829]

44 Then the objective function O from (1) after updating the hash function can be expressed as O= (i,j)∈N 1 1 ¯ Dij + (ho (i) − ho (j))2 − (hn (i) − hn (j))2 b b 2 . [sent-102, score-1.081]

45 For notational convenience in this proof, let Dold and Dnew be the matrices of reconstructed distances using hold and hnew , respectively, and let Hold and Hnew be the n × b matrices of old and new hash bits, respectively. [sent-104, score-0.87]

46 Note that Hnew = Hold + (hn − ho )eT , where t is the index of the hash function being t ˜ updated. [sent-106, score-0.768]

47 Note that the corresponding vector of squared norms of the rows of Hnew may be expressed as ℓnew = ℓold − ho + hn since the hash vectors are binary-valued. [sent-108, score-1.02]

48 We can then write the objective using Dnew to obtain O = (i,j)∈N = (i,j)∈N 1 1 ¯ Dij + (ho (i) + ho (j) − 2ho (i)ho (j)) − (hn (i) + hn (j) − 2hn (i)hn (j)) b b 1 1 ¯ Dij + (ho (i) − ho (j))2 − (hn (i) − hn (j))2 b b 2 2 , since ho (i)2 = ho (i) and hn (i)2 = hn (i). [sent-110, score-1.78]

49 The lemma above demonstrates that, when updating a hash function, the new objective function can ¯ be computed in O(nk) time, assuming that we have computed and stored the values of Dij . [sent-112, score-0.652]

50 Consider choosing some hash function hp , and choose one weight index q, i. [sent-114, score-0.637]

51 Modifying the value of Wpq results in updating hp to a new hashing function hnew . [sent-117, score-0.711]

52 Now, ˆ for every point x, there is a hashing threshold: a new value of Wpq , which we will call Wpq , such that s ˆ Wpq κ(xpq , x) = 0. [sent-118, score-0.499]

53 q=1 4 Observe that, if cx = s q=1 Wpq κ(xpq , x), then the threshold tx is given by tx = Wpq − cx . [sent-119, score-0.179]

54 Since we are updating a single Wpq per iteration, we can update the values of cx in O(n) time after updating Wpq , so the total time to compute all thresholds tx is O(n). [sent-121, score-0.234]

55 Observe that, for any fixed interval, the new computed hash function hnew does not change over the entire interval. [sent-123, score-0.652]

56 Furthermore, observe that as we cross from one threshold to the next, a single bit of the corresponding hash vector flips. [sent-124, score-0.594]

57 As a result, we need only compute the objective function at each of the n + 1 intervals, and choose the interval that minimizes the objective function. [sent-125, score-0.165]

58 We choose a value Wpq within that interval (which will be optimal) and update the hash function using this new choice of weight. [sent-126, score-0.629]

59 Suppose we have a sequence of hash vectors ht0 , . [sent-131, score-0.547]

60 Then the objective functions for all n + 1 hash functions can be computed in O(nk) time. [sent-135, score-0.679]

61 The objective function may be computed in O(nk) time for the hash function ht0 corresponding to the smallest interval. [sent-137, score-0.611]

62 Consider the case when going from ho = htj−1 to hn = htj for some 1 ≤ j ≤ n. [sent-138, score-0.489]

63 Let the index of the bit that changes in hn be a. [sent-139, score-0.236]

64 Let fa = 1 if ho (a) = 0, hn (a) = 1, and fa = −1 otherwise. [sent-141, score-0.535]

65 Then we can simplify (hn (i) − hn (j))2 − (ho (i) − ho (j))2 to fa (1 − 2hn (j)) when a = i and to fa (1 − 2hn (i)) when a = j (the expression is zero when i = j and will not contribute to the objective). [sent-142, score-0.535]

66 Therefore the relevant terms in the objective function as given in Lemma 1 may be written as: (a,j)∈N fa ¯ Daj − (1 − 2hn (j)) b 2 + (i,a)∈N fa ¯ Dia − (1 − 2hn (i)) b 2 . [sent-143, score-0.17]

67 Furthermore, we must update D as we progress through the hash functions, which can also be straightforwardly done in O(k) time on average. [sent-146, score-0.592]

68 Completing this process over all n + 1 hash functions results in a total of O(nk) time. [sent-147, score-0.581]

69 Fix all but one entry Wpq of the hashing weight matrix W . [sent-149, score-0.523]

70 Our overall strategy successively cycles through each hash function one by one, randomly selects a weight to update for each hash function, and computes the optimal updates for those weights. [sent-151, score-1.163]

71 One full iteration to update all hash functions requires time O(nb(k + log n)). [sent-153, score-0.626]

72 Note that local convergence is guaranteed in a finite number of updates since each update will never increase the objective function value, and only a finite number of possible hash configurations are possible. [sent-154, score-0.656]

73 For example, the addition of an ℓ1 regularization over the entries of W could lead to sparse hash functions, and may be worth additional study. [sent-164, score-0.547]

74 3 Experiments We now present results comparing our proposed approach to the relevant existing methods—locality sensitive hashing, semantic hashing (RBM), and spectral hashing. [sent-165, score-0.623]

75 We implemented our binary reconstructive embedding method (BRE) and LSH, and used the same code for spectral hashing and RBM that was employed in [4]. [sent-167, score-0.657]

76 1 Data Sets and Methodology We applied the hashing algorithms to a number of important large-scale data sets from the computer vision community. [sent-170, score-0.518]

77 Following the suggestion in [4], we apply PCA (or kernel PCA in the case of kernelized data) to the input data before applying spectral hashing or BRE—the results of the RBM method and LSH were better without applying PCA, so PCA is not applied for these algorithms. [sent-174, score-0.618]

78 For training the BRE method, we select nearest neighbors using the top 5th percentile of the training distances and set the target distances to 0; we found that this ensures that the nearest neighbors in the embedded space will have Hamming distance very close to 0. [sent-176, score-0.719]

79 We also choose farthest neighbors using the 98th percentile of the training distances and maintained their original distances as target distances. [sent-177, score-0.373]

80 Having both near and far neighbors improves performance for BRE, as it prevents a trivial solution where all the database objects are given the same hash key. [sent-178, score-0.708]

81 The spectral hashing and RBM parameters are set as in [4, 17]. [sent-179, score-0.582]

82 After constructing the hash functions for each method, we randomly generate 3000 hashing queries (except for Caltech-101, which has fewer than 4000 data points; in this case we choose the remainder of the data as queries). [sent-180, score-1.106]

83 We collect training/test pairs such that the unnormalized Hamming distance using the constructed hash functions is less than or equal to three. [sent-182, score-0.647]

84 We then compute the percentage of these pairs that are nearest neighbors in the original data space, which are defined as pairs of points from the training set whose distances are in the top 5th percentile. [sent-183, score-0.429]

85 50), one would expect that distances with a Hamming distance less than or equal to three would correspond to nearest neighbors in the original data embedding. [sent-187, score-0.384]

86 2 Quantitative Results In Figure 1, we plot hashing retrieval results over each of the data sets. [sent-189, score-0.534]

87 Observe that both RBM and spectral hashing underperform all other methods on at least one data set. [sent-191, score-0.582]

88 6 BRE Spectral hashing RBM LSH 0 10 20 30 Number of bits 40 50 0. [sent-193, score-0.626]

89 distance <= 3 Peekaboom 30 Number of bits 40 50 0. [sent-211, score-0.163]

90 distance <= 3 Photo Tourism 1 30 Number of bits 40 50 1 0. [sent-226, score-0.163]

91 The plots show how well the nearest neighbors in the Hamming space (pairs of data points with unnormalized Hamming distance less than or equal to 3) correspond to the nearest neighbors (top 5th percentile of distances) in the original dataset. [sent-231, score-0.558]

92 The better performance in our tests may be due to our implementation of LSH; we use Charikar’s random projection method [1] to construct hash tables. [sent-236, score-0.547]

93 In terms of training time, the BRE method typically converges in 50–100 iterations of updating all hash functions, and takes 1–5 minutes to train per data set on our machines (depending on the number of bits requested). [sent-237, score-0.715]

94 Relatively speaking, the time required for training is typically faster than RBM but slower than spectral hashing and LSH. [sent-238, score-0.582]

95 We ran our reconstructive hashing algorithm on the Gist descriptors for the Tiny Image data set using 50 bits, with 1000 training images used to construct the hash functions as before. [sent-246, score-1.168]

96 We selected a random set of queries from the database and compared the results of a linear scan over the Gist features with the hashing results over the Gist features. [sent-247, score-0.601]

97 When obtaining hashing results, we collected the nearest neighbors in the Hamming space to the query (the top 0. [sent-248, score-0.745]

98 For each group of images, the top left image is the query, the top row corresponds to a linear scan, and the second row corresponds to the hashing retrieval results using 50 hash bits. [sent-252, score-1.112]

99 The hashing results are similar to the linear scan results but are significantly faster to obtain. [sent-253, score-0.548]

100 We thank Rob Fergus for the spectral hashing and RBM code, and Greg Shakhnarovich for the Boosting SSC code. [sent-258, score-0.582]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('hash', 0.547), ('hashing', 0.499), ('ho', 0.221), ('wpq', 0.211), ('hn', 0.208), ('lsh', 0.181), ('hamming', 0.176), ('bre', 0.145), ('rbm', 0.143), ('bits', 0.127), ('neighbors', 0.116), ('nearest', 0.108), ('hnew', 0.105), ('distances', 0.102), ('spectral', 0.083), ('dold', 0.075), ('xpq', 0.075), ('hp', 0.066), ('objective', 0.064), ('nk', 0.061), ('htj', 0.06), ('tourism', 0.06), ('gist', 0.059), ('old', 0.058), ('tiny', 0.055), ('dij', 0.054), ('fa', 0.053), ('photo', 0.049), ('scan', 0.049), ('dnew', 0.045), ('peekaboom', 0.045), ('reconstructive', 0.045), ('update', 0.045), ('updating', 0.041), ('tx', 0.041), ('cx', 0.039), ('xj', 0.039), ('interval', 0.037), ('kernelized', 0.036), ('nb', 0.036), ('hb', 0.036), ('distance', 0.036), ('parameterization', 0.035), ('retrieval', 0.035), ('ht', 0.034), ('embeddings', 0.034), ('hold', 0.034), ('functions', 0.034), ('image', 0.031), ('million', 0.031), ('percentile', 0.031), ('labelme', 0.031), ('binary', 0.03), ('fergus', 0.03), ('coordinatedescent', 0.03), ('eht', 0.03), ('shakhnarovich', 0.03), ('ssc', 0.03), ('pairs', 0.03), ('neighbor', 0.03), ('qualitative', 0.028), ('fast', 0.028), ('xi', 0.028), ('bit', 0.028), ('database', 0.027), ('thresholds', 0.027), ('queries', 0.026), ('nursery', 0.024), ('reconstructed', 0.024), ('torralba', 0.024), ('weight', 0.024), ('reconstruction', 0.023), ('semantic', 0.023), ('descriptors', 0.023), ('provably', 0.023), ('squared', 0.022), ('query', 0.022), ('original', 0.022), ('norms', 0.022), ('points', 0.021), ('pca', 0.021), ('techniques', 0.021), ('mnist', 0.02), ('indyk', 0.02), ('images', 0.02), ('trees', 0.02), ('sign', 0.02), ('parameterize', 0.02), ('optima', 0.02), ('threshold', 0.019), ('vision', 0.019), ('sigmoid', 0.019), ('kulis', 0.019), ('maintain', 0.019), ('et', 0.019), ('optimum', 0.018), ('sublinear', 0.018), ('objects', 0.018), ('existing', 0.018), ('comparably', 0.018)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000004 135 nips-2009-Learning to Hash with Binary Reconstructive Embeddings

Author: Brian Kulis, Trevor Darrell

Abstract: Fast retrieval methods are increasingly critical for many large-scale analysis tasks, and there have been several recent methods that attempt to learn hash functions for fast and accurate nearest neighbor searches. In this paper, we develop an algorithm for learning hash functions based on explicitly minimizing the reconstruction error between the original distances and the Hamming distances of the corresponding binary embeddings. We develop a scalable coordinate-descent algorithm for our proposed hashing objective that is able to efficiently learn hash functions in a variety of settings. Unlike existing methods such as semantic hashing and spectral hashing, our method is easily kernelized and does not require restrictive assumptions about the underlying distribution of the data. We present results over several domains to demonstrate that our method outperforms existing state-of-the-art techniques. 1

2 0.43046418 142 nips-2009-Locality-sensitive binary codes from shift-invariant kernels

Author: Maxim Raginsky, Svetlana Lazebnik

Abstract: This paper addresses the problem of designing binary codes for high-dimensional data such that vectors that are similar in the original space map to similar binary strings. We introduce a simple distribution-free encoding scheme based on random projections, such that the expected Hamming distance between the binary codes of two vectors is related to the value of a shift-invariant kernel (e.g., a Gaussian kernel) between the vectors. We present a full theoretical analysis of the convergence properties of the proposed scheme, and report favorable experimental performance as compared to a recent state-of-the-art method, spectral hashing.

3 0.12458301 182 nips-2009-Optimal Scoring for Unsupervised Learning

Author: Zhihua Zhang, Guang Dai

Abstract: We are often interested in casting classification and clustering problems as a regression framework, because it is feasible to achieve some statistical properties in this framework by imposing some penalty criteria. In this paper we illustrate optimal scoring, which was originally proposed for performing the Fisher linear discriminant analysis by regression, in the application of unsupervised learning. In particular, we devise a novel clustering algorithm that we call optimal discriminant clustering. We associate our algorithm with the existing unsupervised learning algorithms such as spectral clustering, discriminative clustering and sparse principal component analysis. Experimental results on a collection of benchmark datasets validate the effectiveness of the optimal discriminant clustering algorithm.

4 0.093661211 190 nips-2009-Polynomial Semantic Indexing

Author: Bing Bai, Jason Weston, David Grangier, Ronan Collobert, Kunihiko Sadamasa, Yanjun Qi, Corinna Cortes, Mehryar Mohri

Abstract: We present a class of nonlinear (polynomial) models that are discriminatively trained to directly map from the word content in a query-document or documentdocument pair to a ranking score. Dealing with polynomial models on word features is computationally challenging. We propose a low-rank (but diagonal preserving) representation of our polynomial models to induce feasible memory and computation requirements. We provide an empirical study on retrieval tasks based on Wikipedia documents, where we obtain state-of-the-art performance while providing realistically scalable methods. 1

5 0.088537015 198 nips-2009-Rank-Approximate Nearest Neighbor Search: Retaining Meaning and Speed in High Dimensions

Author: Parikshit Ram, Dongryeol Lee, Hua Ouyang, Alexander G. Gray

Abstract: The long-standing problem of efďŹ cient nearest-neighbor (NN) search has ubiquitous applications ranging from astrophysics to MP3 ďŹ ngerprinting to bioinformatics to movie recommendations. As the dimensionality of the dataset increases, exact NN search becomes computationally prohibitive; (1+đ?œ–) distance-approximate NN search can provide large speedups but risks losing the meaning of NN search present in the ranks (ordering) of the distances. This paper presents a simple, practical algorithm allowing the user to, for the ďŹ rst time, directly control the true accuracy of NN search (in terms of ranks) while still achieving the large speedups over exact NN. Experiments on high-dimensional datasets show that our algorithm often achieves faster and more accurate results than the best-known distance-approximate method, with much more stable behavior. 1

6 0.069210805 2 nips-2009-3D Object Recognition with Deep Belief Nets

7 0.064311355 238 nips-2009-Submanifold density estimation

8 0.056421027 220 nips-2009-Slow Learners are Fast

9 0.052578993 212 nips-2009-Semi-Supervised Learning in Gigantic Image Collections

10 0.052499317 139 nips-2009-Linear-time Algorithms for Pairwise Statistical Problems

11 0.052073088 211 nips-2009-Segmenting Scenes by Matching Image Composites

12 0.045451757 95 nips-2009-Fast subtree kernels on graphs

13 0.045230322 260 nips-2009-Zero-shot Learning with Semantic Output Codes

14 0.044618338 82 nips-2009-Entropic Graph Regularization in Non-Parametric Semi-Supervised Classification

15 0.043479785 166 nips-2009-Noisy Generalized Binary Search

16 0.041798446 214 nips-2009-Semi-supervised Regression using Hessian energy with an application to semi-supervised dimensionality reduction

17 0.040762853 202 nips-2009-Regularized Distance Metric Learning:Theory and Algorithm

18 0.040689718 98 nips-2009-From PAC-Bayes Bounds to KL Regularization

19 0.038148846 32 nips-2009-An Online Algorithm for Large Scale Image Similarity Learning

20 0.037606858 229 nips-2009-Statistical Analysis of Semi-Supervised Learning: The Limit of Infinite Unlabelled Data


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.137), (1, 0.047), (2, -0.066), (3, 0.031), (4, -0.032), (5, 0.02), (6, -0.117), (7, 0.096), (8, 0.015), (9, 0.061), (10, -0.017), (11, 0.068), (12, 0.098), (13, 0.085), (14, -0.011), (15, -0.161), (16, 0.14), (17, 0.107), (18, -0.023), (19, -0.094), (20, -0.015), (21, 0.326), (22, 0.167), (23, 0.095), (24, 0.063), (25, -0.17), (26, -0.046), (27, 0.273), (28, 0.041), (29, 0.09), (30, -0.09), (31, 0.01), (32, 0.227), (33, 0.093), (34, 0.275), (35, -0.007), (36, -0.007), (37, 0.004), (38, -0.033), (39, -0.067), (40, 0.067), (41, -0.094), (42, -0.186), (43, 0.088), (44, -0.072), (45, 0.075), (46, 0.064), (47, 0.017), (48, -0.046), (49, 0.069)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95606101 135 nips-2009-Learning to Hash with Binary Reconstructive Embeddings

Author: Brian Kulis, Trevor Darrell

Abstract: Fast retrieval methods are increasingly critical for many large-scale analysis tasks, and there have been several recent methods that attempt to learn hash functions for fast and accurate nearest neighbor searches. In this paper, we develop an algorithm for learning hash functions based on explicitly minimizing the reconstruction error between the original distances and the Hamming distances of the corresponding binary embeddings. We develop a scalable coordinate-descent algorithm for our proposed hashing objective that is able to efficiently learn hash functions in a variety of settings. Unlike existing methods such as semantic hashing and spectral hashing, our method is easily kernelized and does not require restrictive assumptions about the underlying distribution of the data. We present results over several domains to demonstrate that our method outperforms existing state-of-the-art techniques. 1

2 0.85153681 142 nips-2009-Locality-sensitive binary codes from shift-invariant kernels

Author: Maxim Raginsky, Svetlana Lazebnik

Abstract: This paper addresses the problem of designing binary codes for high-dimensional data such that vectors that are similar in the original space map to similar binary strings. We introduce a simple distribution-free encoding scheme based on random projections, such that the expected Hamming distance between the binary codes of two vectors is related to the value of a shift-invariant kernel (e.g., a Gaussian kernel) between the vectors. We present a full theoretical analysis of the convergence properties of the proposed scheme, and report favorable experimental performance as compared to a recent state-of-the-art method, spectral hashing.

3 0.38949826 182 nips-2009-Optimal Scoring for Unsupervised Learning

Author: Zhihua Zhang, Guang Dai

Abstract: We are often interested in casting classification and clustering problems as a regression framework, because it is feasible to achieve some statistical properties in this framework by imposing some penalty criteria. In this paper we illustrate optimal scoring, which was originally proposed for performing the Fisher linear discriminant analysis by regression, in the application of unsupervised learning. In particular, we devise a novel clustering algorithm that we call optimal discriminant clustering. We associate our algorithm with the existing unsupervised learning algorithms such as spectral clustering, discriminative clustering and sparse principal component analysis. Experimental results on a collection of benchmark datasets validate the effectiveness of the optimal discriminant clustering algorithm.

4 0.32357791 198 nips-2009-Rank-Approximate Nearest Neighbor Search: Retaining Meaning and Speed in High Dimensions

Author: Parikshit Ram, Dongryeol Lee, Hua Ouyang, Alexander G. Gray

Abstract: The long-standing problem of efďŹ cient nearest-neighbor (NN) search has ubiquitous applications ranging from astrophysics to MP3 ďŹ ngerprinting to bioinformatics to movie recommendations. As the dimensionality of the dataset increases, exact NN search becomes computationally prohibitive; (1+đ?œ–) distance-approximate NN search can provide large speedups but risks losing the meaning of NN search present in the ranks (ordering) of the distances. This paper presents a simple, practical algorithm allowing the user to, for the ďŹ rst time, directly control the true accuracy of NN search (in terms of ranks) while still achieving the large speedups over exact NN. Experiments on high-dimensional datasets show that our algorithm often achieves faster and more accurate results than the best-known distance-approximate method, with much more stable behavior. 1

5 0.29893261 143 nips-2009-Localizing Bugs in Program Executions with Graphical Models

Author: Laura Dietz, Valentin Dallmeier, Andreas Zeller, Tobias Scheffer

Abstract: We devise a graphical model that supports the process of debugging software by guiding developers to code that is likely to contain defects. The model is trained using execution traces of passing test runs; it reflects the distribution over transitional patterns of code positions. Given a failing test case, the model determines the least likely transitional pattern in the execution trace. The model is designed such that Bayesian inference has a closed-form solution. We evaluate the Bernoulli graph model on data of the software projects AspectJ and Rhino. 1

6 0.2503587 34 nips-2009-Anomaly Detection with Score functions based on Nearest Neighbor Graphs

7 0.24089824 238 nips-2009-Submanifold density estimation

8 0.24018098 190 nips-2009-Polynomial Semantic Indexing

9 0.23443344 32 nips-2009-An Online Algorithm for Large Scale Image Similarity Learning

10 0.21245117 91 nips-2009-Fast, smooth and adaptive regression in metric spaces

11 0.2075112 220 nips-2009-Slow Learners are Fast

12 0.20625046 227 nips-2009-Speaker Comparison with Inner Product Discriminant Functions

13 0.20523421 234 nips-2009-Streaming k-means approximation

14 0.18394083 260 nips-2009-Zero-shot Learning with Semantic Output Codes

15 0.18203822 189 nips-2009-Periodic Step Size Adaptation for Single Pass On-line Learning

16 0.18103017 74 nips-2009-Efficient Bregman Range Search

17 0.18010819 30 nips-2009-An Integer Projected Fixed Point Method for Graph Matching and MAP Inference

18 0.17059401 93 nips-2009-Fast Image Deconvolution using Hyper-Laplacian Priors

19 0.16979474 139 nips-2009-Linear-time Algorithms for Pairwise Statistical Problems

20 0.16572742 212 nips-2009-Semi-Supervised Learning in Gigantic Image Collections


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(15, 0.283), (24, 0.04), (25, 0.069), (31, 0.016), (35, 0.062), (36, 0.108), (39, 0.04), (48, 0.019), (58, 0.062), (61, 0.015), (71, 0.03), (81, 0.018), (86, 0.089), (89, 0.039), (91, 0.016)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.76755667 135 nips-2009-Learning to Hash with Binary Reconstructive Embeddings

Author: Brian Kulis, Trevor Darrell

Abstract: Fast retrieval methods are increasingly critical for many large-scale analysis tasks, and there have been several recent methods that attempt to learn hash functions for fast and accurate nearest neighbor searches. In this paper, we develop an algorithm for learning hash functions based on explicitly minimizing the reconstruction error between the original distances and the Hamming distances of the corresponding binary embeddings. We develop a scalable coordinate-descent algorithm for our proposed hashing objective that is able to efficiently learn hash functions in a variety of settings. Unlike existing methods such as semantic hashing and spectral hashing, our method is easily kernelized and does not require restrictive assumptions about the underlying distribution of the data. We present results over several domains to demonstrate that our method outperforms existing state-of-the-art techniques. 1

2 0.73010969 193 nips-2009-Potential-Based Agnostic Boosting

Author: Varun Kanade, Adam Kalai

Abstract: We prove strong noise-tolerance properties of a potential-based boosting algorithm, similar to MadaBoost (Domingo and Watanabe, 2000) and SmoothBoost (Servedio, 2003). Our analysis is in the agnostic framework of Kearns, Schapire and Sellie (1994), giving polynomial-time guarantees in presence of arbitrary noise. A remarkable feature of our algorithm is that it can be implemented without reweighting examples, by randomly relabeling them instead. Our boosting theorem gives, as easy corollaries, alternative derivations of two recent nontrivial results in computational learning theory: agnostically learning decision trees (Gopalan et al, 2008) and agnostically learning halfspaces (Kalai et al, 2005). Experiments suggest that the algorithm performs similarly to MadaBoost. 1

3 0.54395229 77 nips-2009-Efficient Match Kernel between Sets of Features for Visual Recognition

Author: Liefeng Bo, Cristian Sminchisescu

Abstract: In visual recognition, the images are frequently modeled as unordered collections of local features (bags). We show that bag-of-words representations commonly used in conjunction with linear classifiers can be viewed as special match kernels, which count 1 if two local features fall into the same regions partitioned by visual words and 0 otherwise. Despite its simplicity, this quantization is too coarse, motivating research into the design of match kernels that more accurately measure the similarity between local features. However, it is impractical to use such kernels for large datasets due to their significant computational cost. To address this problem, we propose efficient match kernels (EMK) that map local features to a low dimensional feature space and average the resulting vectors to form a setlevel feature. The local feature maps are learned so their inner products preserve, to the best possible, the values of the specified kernel function. Classifiers based on EMK are linear both in the number of images and in the number of local features. We demonstrate that EMK are extremely efficient and achieve the current state of the art in three difficult computer vision datasets: Scene-15, Caltech-101 and Caltech-256. 1

4 0.54151243 104 nips-2009-Group Sparse Coding

Author: Samy Bengio, Fernando Pereira, Yoram Singer, Dennis Strelow

Abstract: Bag-of-words document representations are often used in text, image and video processing. While it is relatively easy to determine a suitable word dictionary for text documents, there is no simple mapping from raw images or videos to dictionary terms. The classical approach builds a dictionary using vector quantization over a large set of useful visual descriptors extracted from a training set, and uses a nearest-neighbor algorithm to count the number of occurrences of each dictionary word in documents to be encoded. More robust approaches have been proposed recently that represent each visual descriptor as a sparse weighted combination of dictionary words. While favoring a sparse representation at the level of visual descriptors, those methods however do not ensure that images have sparse representation. In this work, we use mixed-norm regularization to achieve sparsity at the image level as well as a small overall dictionary. This approach can also be used to encourage using the same dictionary words for all the images in a class, providing a discriminative signal in the construction of image representations. Experimental results on a benchmark image classification dataset show that when compact image or dictionary representations are needed for computational efficiency, the proposed approach yields better mean average precision in classification. 1

5 0.54135293 167 nips-2009-Non-Parametric Bayesian Dictionary Learning for Sparse Image Representations

Author: Mingyuan Zhou, Haojun Chen, Lu Ren, Guillermo Sapiro, Lawrence Carin, John W. Paisley

Abstract: Non-parametric Bayesian techniques are considered for learning dictionaries for sparse image representations, with applications in denoising, inpainting and compressive sensing (CS). The beta process is employed as a prior for learning the dictionary, and this non-parametric method naturally infers an appropriate dictionary size. The Dirichlet process and a probit stick-breaking process are also considered to exploit structure within an image. The proposed method can learn a sparse dictionary in situ; training images may be exploited if available, but they are not required. Further, the noise variance need not be known, and can be nonstationary. Another virtue of the proposed method is that sequential inference can be readily employed, thereby allowing scaling to large images. Several example results are presented, using both Gibbs and variational Bayesian inference, with comparisons to other state-of-the-art approaches. 1

6 0.5401473 137 nips-2009-Learning transport operators for image manifolds

7 0.5397867 169 nips-2009-Nonlinear Learning using Local Coordinate Coding

8 0.53877485 132 nips-2009-Learning in Markov Random Fields using Tempered Transitions

9 0.5386095 82 nips-2009-Entropic Graph Regularization in Non-Parametric Semi-Supervised Classification

10 0.53834838 129 nips-2009-Learning a Small Mixture of Trees

11 0.53814834 212 nips-2009-Semi-Supervised Learning in Gigantic Image Collections

12 0.53772277 174 nips-2009-Nonparametric Latent Feature Models for Link Prediction

13 0.5376153 142 nips-2009-Locality-sensitive binary codes from shift-invariant kernels

14 0.53565121 30 nips-2009-An Integer Projected Fixed Point Method for Graph Matching and MAP Inference

15 0.53553629 198 nips-2009-Rank-Approximate Nearest Neighbor Search: Retaining Meaning and Speed in High Dimensions

16 0.53430444 72 nips-2009-Distribution Matching for Transduction

17 0.53395718 70 nips-2009-Discriminative Network Models of Schizophrenia

18 0.53257 139 nips-2009-Linear-time Algorithms for Pairwise Statistical Problems

19 0.53214234 87 nips-2009-Exponential Family Graph Matching and Ranking

20 0.53150958 97 nips-2009-Free energy score space