nips nips2012 nips2012-329 knowledge-graph by maker-knowledge-mining

329 nips-2012-Super-Bit Locality-Sensitive Hashing

Source: pdf

Author: Jianqiu Ji, Jianmin Li, Shuicheng Yan, Bo Zhang, Qi Tian

Abstract: Sign-random-projection locality-sensitive hashing (SRP-LSH) is a probabilistic dimension reduction method which provides an unbiased estimate of angular similarity, yet suffers from the large variance of its estimation. In this work, we propose the Super-Bit locality-sensitive hashing (SBLSH). It is easy to implement, which orthogonalizes the random projection vectors in batches, and it is theoretically guaranteed that SBLSH also provides an unbiased estimate of angular similarity, yet with a smaller variance when the angle to estimate is within (0, ⇡/2]. The extensive experiments on real data well validate that given the same length of binary code, SBLSH may achieve signiﬁcant mean squared error reduction in estimating pairwise angular similarity. Moreover, SBLSH shows the superiority over SRP-LSH in approximate nearest neighbor (ANN) retrieval experiments. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract Sign-random-projection locality-sensitive hashing (SRP-LSH) is a probabilistic dimension reduction method which provides an unbiased estimate of angular similarity, yet suffers from the large variance of its estimation. [sent-11, score-0.546]

2 In this work, we propose the Super-Bit locality-sensitive hashing (SBLSH). [sent-12, score-0.221]

3 It is easy to implement, which orthogonalizes the random projection vectors in batches, and it is theoretically guaranteed that SBLSH also provides an unbiased estimate of angular similarity, yet with a smaller variance when the angle to estimate is within (0, ⇡/2]. [sent-13, score-0.47]

4 The extensive experiments on real data well validate that given the same length of binary code, SBLSH may achieve signiﬁcant mean squared error reduction in estimating pairwise angular similarity. [sent-14, score-0.3]

5 Moreover, SBLSH shows the superiority over SRP-LSH in approximate nearest neighbor (ANN) retrieval experiments. [sent-15, score-0.128]

6 1 Introduction Locality-sensitive hashing (LSH) method aims to hash similar data samples to the same hash code with high probability [7, 9]. [sent-16, score-0.402]

7 Among them are some binary LSH schemes, which generate binary codes. [sent-20, score-0.068]

8 Binary LSH approximates a certain distance or similarity of two data samples by computing the Hamming distance between the corresponding compact binary codes. [sent-21, score-0.166]

9 1 Locality-Sensitive Hashing for Angular Similarity For many data representations, the natural pairwise similarity is only related with the angle between the data, e. [sent-31, score-0.143]

10 In these cases, angular similarity 1 ha,bi can serve as a similarity measurement, which is deﬁned as sim(a, b) = 1 cos 1 ( kakkbk )/⇡. [sent-34, score-0.353]

11 One popular LSH for approximating angular similarity is the sign-random-projection LSH (SRPLSH) [3], which provides an unbiased estimate of angular similarity and is a binary LSH method. [sent-36, score-0.622]

12 Then it is easy to prove that E[dHamming (h(a), h(b))] = K✓a,b ⇡ = C✓a,b That is, the expectation of the Hamming distance between the binary hash codes of two given data samples a and b is an unbiased estimate of their angle ✓a,b , up to a constant scale factor C = K/⇡. [sent-45, score-0.263]

13 Thus SRP-LSH provides an unbiased estimate of angular similarity. [sent-46, score-0.262]

14 dHamming (h(a), h(b)) ⇠ ✓ K✓a,b ✓a,b B(K, a,b ), its variance is This implies that the variance of ⇡ ⇡ (1 ⇡ ). [sent-49, score-0.088]

15 Generally we need a substantially long code to accurately approximate the angular similarity [24, 12, 23]. [sent-53, score-0.339]

16 Since any two of the random vectors may be close to being linearly dependent, the resulting binary code may be less informative as it seems, and even contains many redundant bits. [sent-54, score-0.161]

17 An intuitive idea would be to orthogonalize the random vectors. [sent-55, score-0.095]

18 Moreover, it remains unclear whether the resulting Hamming distance is still an unbiased estimate of the angle ✓a,b multiplied by a constant, and what its variance will be. [sent-57, score-0.224]

19 In the next section, based on the above intuitive idea, we propose the so-called Super-Bit localitysensitive hashing (SBLSH) method. [sent-59, score-0.221]

20 We provide theoretical guarantees that after orthogonalizing the random projection vectors in batches, we still get an unbiased estimate of angular similarity, yet with a smaller variance when ✓a,b 2 (0, ⇡/2], and thus the resulting binary code is more informative. [sent-60, score-0.569]

21 Experiments on real data show the effectiveness of SBLSH, which with the same length of binary code may achieve as much as 30% mean squared error (MSE) reduction compared with the SRP-LSH in estimating angular similarity on real data. [sent-61, score-0.453]

22 Moreover, SBLSH performs best among several widely used data-independent LSH methods in approximate nearest neighbor (ANN) retrieval experiments. [sent-62, score-0.128]

23 When the code length K satisﬁes 1 < K  d, where d is the dimension of data space, we can orthogonalize N (1  N  min(K, d) = K) of the random vectors sampled from the normal distribution N (0, Id ). [sent-64, score-0.336]

24 However, when the code length K > d, it is impossible to orthogonalize all K vectors. [sent-68, score-0.239]

25 Assume that K = N ⇥ L without loss of generality, and 1  N  d, then we can perform the GramSchmidt process to orthogonalize them in L batches. [sent-69, score-0.095]

26 , vK } are independently sampled from the normal distribution N (0, Id ), and then divided into L batches with N vectors each. [sent-73, score-0.14]

27 By performing the Gram-Schmidt process to these L batches of N vectors respectively, we get K = N ⇥ L projection vectors {w1 , w2 . [sent-74, score-0.178]

28 These K functions produce L N -SuperBits and altogether produce binary codes of length K. [sent-82, score-0.095]

29 Note that when the Super-Bit depth N = 1, SBLSH becomes SRP-LSH. [sent-85, score-0.091]

30 The algorithm can be easily extended to the case when the code length K is not a multiple of the Super-Bit depth N . [sent-87, score-0.235]

31 In fact one can even use variable Super-Bit depth Ni as long as 1  Ni  d. [sent-88, score-0.091]

32 With the same code length, SBLSH has the same running time O(Kd) as SRP-LSH in on-line processing, i. [sent-89, score-0.083]

33 Algorithm 1 Generating Super-Bit Locality-Sensitive Hashing Projection Vectors Input: Data space dimension d, Super-Bit depth 1  N  d, number of Super-Bit L resulting code length K = N ⇥ L. [sent-93, score-0.235]

34 1 Unbiased Estimate In this subsection we prove that SBLSH provides an unbiased estimate of ✓a,b of a, b 2 Rd . [sent-107, score-0.099]

35 Given a random vector v uniformly sampled from S d 1 , we have P r[hv (a) 6= hv (b)] = ✓a,b /⇡. [sent-111, score-0.098]

36 If v 2 Rd follows an isotropic distribution, then v = v/kvk is uniformly distributed on ¯ S d 1. [sent-113, score-0.062]

37 This lemma can be proven by the deﬁnition of isotropic distribution, and we omit the details here. [sent-114, score-0.082]

38 from the normal distribution N (0, Id ), and span a subspace Sk , let PSk denote the orthogonal projection onto Sk , then PSk is a random matrix uniformly distributed on the Grassmann manifold Gk,d k . [sent-122, score-0.153]

39 If P is a random matrix uniformly distributed on the Grassmann manifold Gk,d k , 1  k  d, and v ⇠ N (0, Id ) is independent of P , then the random vector v = P v follows an ˜ isotropic distribution. [sent-129, score-0.089]

40 From the uniformity of P on the Grassmann manifold and the property of the normal distribution N (0, Id ), we can get this result directly. [sent-130, score-0.059]

41 , vk ]T is a N (0, Ik )-distributed vector, where each vi ⇠ N (0, 1), and it ˆ ˆ ˆ ˆ ˆ is easy to verify that v is independent of U . [sent-140, score-0.093]

42 , vN 2 Rd sampled from the normal distribution N (0, Id ), where 1  N  d, perform the Gram-Schmidt process on them and produce N orthogonalized vectors w1 , w2 , . [sent-159, score-0.121]

43 , XN as ⇢ 1, hwi (a) 6= hwi (b) Xi = 0, hwi (a) = hwi (b) we have E[Xi ] = ✓a,b /⇡, for any 1  i  N . [sent-165, score-0.492]

44 , wi 1 }, and the orthogonal projection onto its ? [sent-170, score-0.079]

45 ¯ For any 1  i  N , E[Xi ] = P r[Xi = 1] = P r[hwi (a) 6= hwi (b)] = P r[hw (a) 6= hw (b)]. [sent-175, score-0.165]

46 uniformly distributed on the Grassmann manifold Gi 1,d i+1 , thus PSi 1 = I PSi 1 is uniformly distributed on Gd i+1,i 1 . [sent-179, score-0.077]

47 By Lemma 4, we have that wi = PSi 1 vi follows an isotropic distribution. [sent-186, score-0.119]

48 For any Super-Bit depth N , 1  N  d, assuming that the code length K = N ⇥ L, the Hamming distance dHamming (h(a), h(b)) is an unbiased estimate of ✓a,b , for any two data vectors a and b 2 Rd , up to a constant scale factor C = K/⇡. [sent-190, score-0.386]

49 2 Variance In this subsection we prove that when the angle ✓a,b 2 (0, ⇡/2], the variance of SBLSH is strictly smaller than that of SRP-LSH. [sent-194, score-0.14]

50 P r[Xi = 1|Xj = 1] = P r[hwi (a) 6= hwi (b)|Xj = 1] = P r[hvi ⌃i 1 wk wT vi (a) 6= k=1 k hvi ⌃i 1 wk wT vi (b)|hwj (a) 6= hwj (b)]. [sent-198, score-0.397]

51 wi 1 } is a uniformly random orthonormal k=1 k 4 basis of a random subspace uniformly distributed on Grassmann manifold, by exchanging the index j and 1 we have that equals P r[hvi ⌃i 1 wk wT vi (a) 6= hvi ⌃i 1 wk wT vi (b)|hw1 (a) 6= hw1 (b)] = k=1 k k=1 k P r[Xi = 1|X1 = 1]. [sent-202, score-0.351]

52 Given two vectors a, b 2 Rd and random variables {Xi } deﬁned as in Theorem 1, denote p2,1 = P r[X2 = 1|X1 = 1], and SX = ⌃N Xi which is the Hamming distance between i=1 N✓ p ✓ N✓ the N -Super-Bits of a and b, for 1 < N  d, then V ar[SX ] = ⇡a,b +N (N 1) 2,1 a,b ( ⇡a,b )2 . [sent-208, score-0.075]

53 , wK are produced by orthogonalizing every N vectors, the Hamming distances produced by different N -Super-Bits are independent, thus V ar[SBLSHN,K ] = L ⇥ V ar[SBLSHN,N ]. [sent-220, score-0.078]

54 Denote V ar[SRP LSHK ] as the variance of the Hamming distance produced by SRPLSH, where K = N ⇥ L is the code length and L is a positive integer, 1 < N  d. [sent-225, score-0.219]

55 1 Numerical veriﬁcation /2 Figure 2: The variances of SRP-LSH and SBLSH against the angle ✓a,b to estimate. [sent-231, score-0.073]

56 In this subsection we verify numerically the behavior of the variances of both SRP-LSH and SBLSH for different angles ✓a,b 2 (0, ⇡]. [sent-232, score-0.058]

57 3 Discussion From Corollary 1, SBLSH provides an unbiased estimate of angular similarity. [sent-240, score-0.262]

58 From Corollary 3, when ✓a,b 2 (0, ⇡/2], with the same length of binary code, the variance of SBLSH is strictly smaller than SRP-LSH. [sent-241, score-0.139]

59 For this kind of data, the angle of any two different samples is limited in (0, ⇡/2], and thus SBLSH will provide more accurate estimation than SRP-LSH on such data. [sent-246, score-0.073]

60 In fact, our later experiments show that even when ✓a,b is not constrained in (0, ⇡/2], SBLSH still gives much more accurate estimate of angular similarity. [sent-247, score-0.186]

61 3 Experimental Results We conduct two sets of experiments, angular similarity estimation and approximate nearest neighbor (ANN) retrieval, to evaluate the effectiveness of our proposed SBLSH method. [sent-248, score-0.361]

62 In the ﬁrst set of experiments we directly measure the accuracy in estimating pairwise angular similarity. [sent-249, score-0.186]

63 1 Angular Similarity Estimation In this experiment, we evaluate the accuracy of estimating pairwise angular similarity on several datasets. [sent-252, score-0.256]

64 Speciﬁcally, we test the effect to the estimation accuracy when the Super-Bit depth N varies and the code length K is ﬁxed, and vice versa. [sent-253, score-0.235]

65 We compute the mean squared error between the true angle and the approximated angles from DLSH and DSBLSH respectively. [sent-256, score-0.108]

66 uk/jsh2/mirflickr/ 6 Table 1: ANN retrieval results, measured by proportion of good neighbors within query’s Hamming ball of radius 3. [sent-275, score-0.068]

67 Super-Bit depth N and code length K, we randomly sample 10,000 data, which involve about 50,000,000 data pairs, and randomly generate SRP-LSH functions, together with SBLSH functions by orthogonalizing the generated SRP in batches. [sent-297, score-0.29]

68 To test the effect of Super-Bit depth N , we ﬁx K = 120 for Photo Tourism SIFT and K = 3000 for MIR-Flickr respectively, and to test the effect of code length K, we ﬁx N = 120 for Photo Tourism SIFT and N = 3000 for MIR-Flickr. [sent-299, score-0.235]

69 Figure 3 shows that when using ﬁxed code length K, as the Super-Bit depth N gets larger (1 < N  min(d, K)), the MSE of SBLSH gets smaller, and the gap between SBLSH and SRPLSH gets larger. [sent-301, score-0.295]

70 This veriﬁes Corollary 2 that when applying SBLSH, the best strategy would be to set the Super-Bit depth N as large as possible, i. [sent-303, score-0.091]

71 An informal explanation to this interesting phenomenon is that as the degree of orthogonality of the random projections gets higher, the code becomes more and more informative, and thus provides better estimate. [sent-306, score-0.13]

72 This shows that even when the angle between each data pair is not constrained in (0, ⇡/2], SBLSH still gives much more accurate estimation. [sent-308, score-0.073]

73 Figure 3 also shows that with ﬁxed Super-Bit depth N SBLSH signiﬁcantly outperforms SRP-LSH. [sent-309, score-0.091]

74 When increasing the code length K, the accuracies of SBLSH and SRP-LSH shall both increase. [sent-310, score-0.144]

75 2 Approximate Nearest Neighbor Retrieval In this subsection, we conduct ANN retrieval experiment, which compares SBLSH with two other widely used data-independent binary LSH methods: SRP-LSH and E2LSH (we use the binary version in [25, 1]). [sent-313, score-0.133]

76 We deﬁne the good neighbors to a query q as the samples within the top 5% nearest neighbors (measured in Euclidean distance) to q. [sent-317, score-0.093]

77 Using code length K = 30, we repeat the experiment for 10 times and take the mean of the results. [sent-322, score-0.144]

78 Table 1 shows that SBLSH performs best among all these data-independent hashing methods. [sent-324, score-0.221]

79 These data-independent methods are simple, thus easy to be integrated as a module in more complicated algorithms involving pairwise distance or similarity computation, e. [sent-328, score-0.101]

80 [16] proposed b-bit minwise hash which improves the original min-hash in terms of compactness. [sent-334, score-0.104]

81 7 [17] shows that b-bit minwise hash can be integrated in linear learning algorithms for large-scale learning tasks. [sent-335, score-0.104]

82 [14] reduces the variance of random projections by taking advantage of marginal norms, and compares the variance of SRP with regular random projections considering the margins. [sent-336, score-0.142]

83 Prior to SBLSH, SRP-LSH [3] was the only hashing method proven to provide unbiased estimate of angular similarity. [sent-338, score-0.483]

84 The proposed SBLSH method is the ﬁrst data-independent method that outperforms SRP-LSH in terms of higher accuracy in estimating angular similarity. [sent-339, score-0.186]

85 On the other hand, data-dependent hashing methods have been extensively studied. [sent-340, score-0.221]

86 For example, spectral hashing [25] and anchor graph hashing [19] are data-dependent unsupervised methods. [sent-341, score-0.442]

87 [13] proposed kernelized locality-sensitive hashing (KLSH), which is based on SRPLSH, to approximate the angular similarity in very high or even inﬁnite dimensional space induced by any given kernel, with access to data only via kernels. [sent-343, score-0.477]

88 There are also a bunch of works devoted to semi-supervised or supervised hashing methods [10, 21, 23, 24, 18], which try to capture not only the geometry of the original data, but also the semantic relations. [sent-344, score-0.221]

89 5 Discussion Instead of the Gram-Schmidt process, we can use other method to orthogonalize the projection vectors, e. [sent-345, score-0.142]

90 In fact, when the dimension of data is very high, the random normal projection vectors {vi }i=1,2. [sent-351, score-0.123]

91 ,K will tend to be orthogonal to each other, thus it may not be very necessary to orthogonalize the vectors deliberately. [sent-354, score-0.139]

92 2, we can see that the closer the Super-Bit depth N is to the data dimension d, the larger the variance reduction SBLSH achieves over SRP-LSH. [sent-357, score-0.154]

93 ) shows that b-bit minwise hashing almost always has a smaller variance than SRP in estimating Jaccard coefﬁcient on binary data. [sent-359, score-0.354]

94 The comparison of SBLSH with b-bit minwise hashing for Jaccard coefﬁcient is left for future work. [sent-360, score-0.276]

95 6 Conclusion and Future Work The proposed SBLSH is a data-independent hashing method which signiﬁcantly outperforms SRPLSH. [sent-361, score-0.221]

96 We have theoretically proved that SBLSH provides an unbiased estimate of angular similarity, and has a smaller variance than SRP-LSH when the angle to estimate is in (0, ⇡/2]. [sent-362, score-0.379]

97 Experiments show that with the same length of binary code, SBLSH achieves over 30% mean squared error reduction over SRP-LSH in estimating angular similarity, when the Super-Bit depth N is close to the data dimension d. [sent-364, score-0.391]

98 Moreover, SBLSH performs best among several widely used data-independent LSH methods in approximate nearest neighbor retrieval experiments. [sent-365, score-0.128]

99 Theoretically exploring the variance of SBLSH when the angle is in (⇡/2, ⇡] is left for future work. [sent-366, score-0.117]

100 Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. [sent-379, score-0.305]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('sblsh', 0.79), ('hashing', 0.221), ('angular', 0.186), ('lsh', 0.155), ('ar', 0.146), ('hwi', 0.123), ('dhamming', 0.109), ('hamming', 0.097), ('orthogonalize', 0.095), ('psi', 0.095), ('srp', 0.095), ('depth', 0.091), ('code', 0.083), ('unbiased', 0.076), ('angle', 0.073), ('similarity', 0.07), ('tourism', 0.067), ('id', 0.063), ('length', 0.061), ('grassmann', 0.06), ('win', 0.06), ('photo', 0.059), ('minwise', 0.055), ('hvi', 0.055), ('orthogonalizing', 0.055), ('srplsh', 0.055), ('hv', 0.052), ('sx', 0.052), ('sift', 0.05), ('vi', 0.05), ('hash', 0.049), ('ann', 0.047), ('projection', 0.047), ('wk', 0.046), ('lemma', 0.045), ('nearest', 0.045), ('retrieval', 0.044), ('jaccard', 0.044), ('vectors', 0.044), ('variance', 0.044), ('antonio', 0.043), ('batches', 0.043), ('vk', 0.043), ('piotr', 0.042), ('hw', 0.042), ('dlsh', 0.041), ('dsblsh', 0.041), ('lshk', 0.041), ('neighbor', 0.039), ('isotropic', 0.037), ('ping', 0.037), ('angles', 0.035), ('binary', 0.034), ('sanjiv', 0.033), ('dame', 0.033), ('notre', 0.033), ('jun', 0.033), ('sgn', 0.033), ('corollary', 0.033), ('normal', 0.032), ('wi', 0.032), ('tsinghua', 0.031), ('kulis', 0.031), ('distance', 0.031), ('mse', 0.028), ('projections', 0.027), ('dome', 0.027), ('hwj', 0.027), ('kakkbk', 0.027), ('psk', 0.027), ('trevi', 0.027), ('manifold', 0.027), ('xi', 0.027), ('indyk', 0.026), ('rd', 0.026), ('uniformly', 0.025), ('neighbors', 0.024), ('vin', 0.024), ('orthogonalization', 0.024), ('arnd', 0.024), ('orthogonalized', 0.024), ('subsection', 0.023), ('xj', 0.023), ('brian', 0.023), ('distances', 0.023), ('subspace', 0.022), ('shuicheng', 0.022), ('conduct', 0.021), ('trevor', 0.021), ('kristen', 0.021), ('tian', 0.021), ('sampled', 0.021), ('china', 0.02), ('gets', 0.02), ('li', 0.02), ('ward', 0.02), ('rajeev', 0.02), ('san', 0.02), ('reduction', 0.019), ('wt', 0.019)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0 329 nips-2012-Super-Bit Locality-Sensitive Hashing

Author: Jianqiu Ji, Jianmin Li, Shuicheng Yan, Bo Zhang, Qi Tian

2 0.18250093 313 nips-2012-Sketch-Based Linear Value Function Approximation

Author: Marc Bellemare, Joel Veness, Michael Bowling

Abstract: Hashing is a common method to reduce large, potentially inﬁnite feature vectors to a ﬁxed-size table. In reinforcement learning, hashing is often used in conjunction with tile coding to represent states in continuous spaces. Hashing is also a promising approach to value function approximation in large discrete domains such as Go and Hearts, where feature vectors can be constructed by exhaustively combining a set of atomic features. Unfortunately, the typical use of hashing in value function approximation results in biased value estimates due to the possibility of collisions. Recent work in data stream summaries has led to the development of the tug-of-war sketch, an unbiased estimator for approximating inner products. Our work investigates the application of this new data structure to linear value function approximation. Although in the reinforcement learning setting the use of the tug-of-war sketch leads to biased value estimates, we show that this bias can be orders of magnitude less than that of standard hashing. We provide empirical results on two RL benchmark domains and ﬁfty-ﬁve Atari 2600 games to highlight the superior learning performance obtained when using tug-of-war hashing. 1

3 0.15840541 42 nips-2012-Angular Quantization-based Binary Codes for Fast Similarity Search

Author: Yunchao Gong, Sanjiv Kumar, Vishal Verma, Svetlana Lazebnik

Abstract: This paper focuses on the problem of learning binary codes for efﬁcient retrieval of high-dimensional non-negative data that arises in vision and text applications where counts or frequencies are used as features. The similarity of such feature vectors is commonly measured using the cosine of the angle between them. In this work, we introduce a novel angular quantization-based binary coding (AQBC) technique for such data and analyze its properties. In its most basic form, AQBC works by mapping each non-negative feature vector onto the vertex of the binary hypercube with which it has the smallest angle. Even though the number of vertices (quantization landmarks) in this scheme grows exponentially with data dimensionality d, we propose a method for mapping feature vectors to their smallest-angle binary vertices that scales as O(d log d). Further, we propose a method for learning a linear transformation of the data to minimize the quantization error, and show that it results in improved binary codes. Experiments on image and text datasets show that the proposed AQBC method outperforms the state of the art. 1

4 0.15520199 257 nips-2012-One Permutation Hashing

Author: Ping Li, Art Owen, Cun-hui Zhang

Abstract: Minwise hashing is a standard procedure in the context of search, for efﬁciently estimating set similarities in massive binary data such as text. Recently, b-bit minwise hashing has been applied to large-scale learning and sublinear time nearneighbor search. The major drawback of minwise hashing is the expensive preprocessing, as the method requires applying (e.g.,) k = 200 to 500 permutations on the data. This paper presents a simple solution called one permutation hashing. Conceptually, given a binary data matrix, we permute the columns once and divide the permuted columns evenly into k bins; and we store, for each data vector, the smallest nonzero location in each bin. The probability analysis illustrates that this one permutation scheme should perform similarly to the original (k-permutation) minwise hashing. Our experiments with training SVM and logistic regression conﬁrm that one permutation hashing can achieve similar (or even better) accuracies compared to the k-permutation scheme. See more details in arXiv:1208.1259.

5 0.14421214 148 nips-2012-Hamming Distance Metric Learning

Author: Mohammad Norouzi, David M. Blei, Ruslan Salakhutdinov

Abstract: Motivated by large-scale multimedia applications we propose to learn mappings from high-dimensional data to binary codes that preserve semantic similarity. Binary codes are well suited to large-scale applications as they are storage efﬁcient and permit exact sub-linear kNN search. The framework is applicable to broad families of mappings, and uses a ﬂexible form of triplet ranking loss. We overcome discontinuous optimization of the discrete mappings by minimizing a piecewise-smooth upper bound on empirical loss, inspired by latent structural SVMs. We develop a new loss-augmented inference algorithm that is quadratic in the code length. We show strong retrieval performance on CIFAR-10 and MNIST, with promising classiﬁcation results using no more than kNN on the binary codes. 1

6 0.11417497 163 nips-2012-Isotropic Hashing

7 0.098395593 71 nips-2012-Co-Regularized Hashing for Multimodal Data

8 0.072854452 126 nips-2012-FastEx: Hash Clustering with Exponential Families

9 0.061721344 202 nips-2012-Locally Uniform Comparison Image Descriptor

10 0.058897652 176 nips-2012-Learning Image Descriptors with the Boosting-Trick

11 0.051971391 119 nips-2012-Entropy Estimations Using Correlated Symmetric Stable Random Projections

12 0.050316703 92 nips-2012-Deep Representations and Codes for Image Auto-Annotation

13 0.048910394 301 nips-2012-Scaled Gradients on Grassmann Manifolds for Matrix Completion

14 0.047959417 307 nips-2012-Semi-Crowdsourced Clustering: Generalizing Crowd Labeling by Robust Distance Metric Learning

15 0.046629746 279 nips-2012-Projection Retrieval for Classification

16 0.043881852 242 nips-2012-Non-linear Metric Learning

17 0.03990487 97 nips-2012-Diffusion Decision Making for Adaptive k-Nearest Neighbor Classification

18 0.03828365 101 nips-2012-Discriminatively Trained Sparse Code Gradients for Contour Detection

19 0.038256314 330 nips-2012-Supervised Learning with Similarity Functions

20 0.037797581 79 nips-2012-Compressive neural representation of sparse, high-dimensional probabilities

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.11), (1, 0.014), (2, -0.039), (3, -0.051), (4, 0.056), (5, -0.037), (6, -0.009), (7, 0.09), (8, 0.124), (9, 0.063), (10, 0.08), (11, -0.155), (12, 0.039), (13, 0.142), (14, -0.193), (15, 0.071), (16, 0.098), (17, -0.041), (18, -0.139), (19, -0.024), (20, 0.011), (21, 0.082), (22, 0.058), (23, -0.069), (24, -0.045), (25, 0.002), (26, 0.045), (27, -0.01), (28, -0.01), (29, -0.003), (30, 0.02), (31, 0.064), (32, -0.023), (33, -0.007), (34, -0.018), (35, -0.042), (36, -0.033), (37, -0.016), (38, -0.039), (39, 0.01), (40, -0.037), (41, -0.016), (42, -0.021), (43, 0.02), (44, 0.021), (45, 0.042), (46, -0.025), (47, 0.009), (48, -0.006), (49, -0.0)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.91888171 329 nips-2012-Super-Bit Locality-Sensitive Hashing

Author: Jianqiu Ji, Jianmin Li, Shuicheng Yan, Bo Zhang, Qi Tian

2 0.87855148 163 nips-2012-Isotropic Hashing

Author: Weihao Kong, Wu-jun Li

Abstract: Most existing hashing methods adopt some projection functions to project the original data into several dimensions of real values, and then each of these projected dimensions is quantized into one bit (zero or one) by thresholding. Typically, the variances of different projected dimensions are different for existing projection functions such as principal component analysis (PCA). Using the same number of bits for different projected dimensions is unreasonable because larger-variance dimensions will carry more information. Although this viewpoint has been widely accepted by many researchers, it is still not veriﬁed by either theory or experiment because no methods have been proposed to ﬁnd a projection with equal variances for different dimensions. In this paper, we propose a novel method, called isotropic hashing (IsoHash), to learn projection functions which can produce projected dimensions with isotropic variances (equal variances). Experimental results on real data sets show that IsoHash can outperform its counterpart with different variances for different dimensions, which veriﬁes the viewpoint that projections with isotropic variances will be better than those with anisotropic variances. 1

3 0.86735576 257 nips-2012-One Permutation Hashing

Author: Ping Li, Art Owen, Cun-hui Zhang

4 0.85373533 313 nips-2012-Sketch-Based Linear Value Function Approximation

Author: Marc Bellemare, Joel Veness, Michael Bowling

5 0.82595074 71 nips-2012-Co-Regularized Hashing for Multimodal Data

Author: Yi Zhen, Dit-Yan Yeung

Abstract: Hashing-based methods provide a very promising approach to large-scale similarity search. To obtain compact hash codes, a recent trend seeks to learn the hash functions from data automatically. In this paper, we study hash function learning in the context of multimodal data. We propose a novel multimodal hash function learning method, called Co-Regularized Hashing (CRH), based on a boosted coregularization framework. The hash functions for each bit of the hash codes are learned by solving DC (difference of convex functions) programs, while the learning for multiple bits proceeds via a boosting procedure so that the bias introduced by the hash functions can be sequentially minimized. We empirically compare CRH with two state-of-the-art multimodal hash function learning methods on two publicly available data sets. 1

6 0.74966013 42 nips-2012-Angular Quantization-based Binary Codes for Fast Similarity Search

7 0.63152176 148 nips-2012-Hamming Distance Metric Learning

8 0.39624712 202 nips-2012-Locally Uniform Comparison Image Descriptor

9 0.36231971 176 nips-2012-Learning Image Descriptors with the Boosting-Trick

10 0.35191968 126 nips-2012-FastEx: Hash Clustering with Exponential Families

11 0.31534651 92 nips-2012-Deep Representations and Codes for Image Auto-Annotation

12 0.29954886 338 nips-2012-The Perturbed Variation

13 0.27847555 279 nips-2012-Projection Retrieval for Classification

14 0.27009243 17 nips-2012-A Scalable CUR Matrix Decomposition Algorithm: Lower Time Complexity and Tighter Bound

15 0.26575926 223 nips-2012-Multi-criteria Anomaly Detection using Pareto Depth Analysis

16 0.26276419 184 nips-2012-Learning Probability Measures with respect to Optimal Transport Metrics

17 0.24663123 36 nips-2012-Adaptive Stratified Sampling for Monte-Carlo integration of Differentiable functions

18 0.24606061 133 nips-2012-Finding Exemplars from Pairwise Dissimilarities via Simultaneous Sparse Recovery

19 0.24429964 221 nips-2012-Multi-Stage Multi-Task Feature Learning

20 0.24350142 267 nips-2012-Perceptron Learning of SAT

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.028), (16, 0.271), (21, 0.031), (38, 0.056), (39, 0.012), (42, 0.042), (43, 0.048), (54, 0.021), (74, 0.066), (76, 0.121), (80, 0.062), (92, 0.109)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.71422625 329 nips-2012-Super-Bit Locality-Sensitive Hashing

Author: Jianqiu Ji, Jianmin Li, Shuicheng Yan, Bo Zhang, Qi Tian

2 0.61242825 232 nips-2012-Multiplicative Forests for Continuous-Time Processes

Author: Jeremy Weiss, Sriraam Natarajan, David Page

Abstract: Learning temporal dependencies between variables over continuous time is an important and challenging task. Continuous-time Bayesian networks effectively model such processes but are limited by the number of conditional intensity matrices, which grows exponentially in the number of parents per variable. We develop a partition-based representation using regression trees and forests whose parameter spaces grow linearly in the number of node splits. Using a multiplicative assumption we show how to update the forest likelihood in closed form, producing efﬁcient model updates. Our results show multiplicative forests can be learned from few temporal trajectories with large gains in performance and scalability.

3 0.59118712 222 nips-2012-Multi-Task Averaging

Author: Sergey Feldman, Maya Gupta, Bela Frigyik

Abstract: We present a multi-task learning approach to jointly estimate the means of multiple independent data sets. The proposed multi-task averaging (MTA) algorithm results in a convex combination of the single-task averages. We derive the optimal amount of regularization, and show that it can be effectively estimated. Simulations and real data experiments demonstrate that MTA outperforms both maximum likelihood and James-Stein estimators, and that our approach to estimating the amount of regularization rivals cross-validation in performance but is more computationally efﬁcient. 1

4 0.57211721 76 nips-2012-Communication-Efficient Algorithms for Statistical Optimization

Author: Yuchen Zhang, Martin J. Wainwright, John C. Duchi

Abstract: We study two communication-efﬁcient algorithms for distributed statistical optimization on large-scale data. The ﬁrst algorithm is an averaging method that distributes the N data samples evenly to m machines, performs separate minimization on each subset, and then averages the estimates. We provide a sharp analysis of this average mixture algorithm, showing that under a reasonable set of conditions, the combined parameter achieves mean-squared error that decays as √ O(N −1 + (N/m)−2 ). Whenever m ≤ N , this guarantee matches the best possible rate achievable by a centralized algorithm having access to all N samples. The second algorithm is a novel method, based on an appropriate form of the bootstrap. Requiring only a single round of communication, it has mean-squared error that decays as O(N −1 + (N/m)−3 ), and so is more robust to the amount of parallelization. We complement our theoretical results with experiments on largescale problems from the internet search domain. In particular, we show that our methods efﬁciently solve an advertisement prediction problem from the Chinese SoSo Search Engine, which consists of N ≈ 2.4 × 108 samples and d ≥ 700, 000 dimensions. 1

5 0.56552303 87 nips-2012-Convolutional-Recursive Deep Learning for 3D Object Classification

Author: Richard Socher, Brody Huval, Bharath Bath, Christopher D. Manning, Andrew Y. Ng

Abstract: Recent advances in 3D sensing technologies make it possible to easily record color and depth images which together can improve object recognition. Most current methods rely on very well-designed features for this new 3D modality. We introduce a model based on a combination of convolutional and recursive neural networks (CNN and RNN) for learning features and classifying RGB-D images. The CNN layer learns low-level translationally invariant features which are then given as inputs to multiple, ﬁxed-tree RNNs in order to compose higher order features. RNNs can be seen as combining convolution and pooling into one efﬁcient, hierarchical operation. Our main result is that even RNNs with random weights compose powerful features. Our model obtains state of the art performance on a standard RGB-D object dataset while being more accurate and faster during training and testing than comparable architectures such as two-layer CNNs. 1

6 0.56080627 42 nips-2012-Angular Quantization-based Binary Codes for Fast Similarity Search

7 0.55540884 82 nips-2012-Continuous Relaxations for Discrete Hamiltonian Monte Carlo

8 0.55309516 349 nips-2012-Training sparse natural image models with a fast Gibbs sampler of an extended state space

9 0.55296236 101 nips-2012-Discriminatively Trained Sparse Code Gradients for Contour Detection

10 0.53954095 90 nips-2012-Deep Learning of Invariant Features via Simulated Fixations in Video

11 0.53421623 257 nips-2012-One Permutation Hashing

12 0.53349298 110 nips-2012-Efficient Reinforcement Learning for High Dimensional Linear Quadratic Systems

13 0.53296304 229 nips-2012-Multimodal Learning with Deep Boltzmann Machines

14 0.53243816 44 nips-2012-Approximating Concavely Parameterized Optimization Problems

15 0.52973968 197 nips-2012-Learning with Recursive Perceptual Representations

16 0.52873421 260 nips-2012-Online Sum-Product Computation Over Trees

17 0.5286684 235 nips-2012-Natural Images, Gaussian Mixtures and Dead Leaves

18 0.52850568 163 nips-2012-Isotropic Hashing

19 0.52408797 92 nips-2012-Deep Representations and Codes for Image Auto-Annotation

20 0.52289325 336 nips-2012-The Coloured Noise Expansion and Parameter Estimation of Diffusion Processes