nips nips2012 nips2012-257 knowledge-graph by maker-knowledge-mining

257 nips-2012-One Permutation Hashing


Source: pdf

Author: Ping Li, Art Owen, Cun-hui Zhang

Abstract: Minwise hashing is a standard procedure in the context of search, for efficiently estimating set similarities in massive binary data such as text. Recently, b-bit minwise hashing has been applied to large-scale learning and sublinear time nearneighbor search. The major drawback of minwise hashing is the expensive preprocessing, as the method requires applying (e.g.,) k = 200 to 500 permutations on the data. This paper presents a simple solution called one permutation hashing. Conceptually, given a binary data matrix, we permute the columns once and divide the permuted columns evenly into k bins; and we store, for each data vector, the smallest nonzero location in each bin. The probability analysis illustrates that this one permutation scheme should perform similarly to the original (k-permutation) minwise hashing. Our experiments with training SVM and logistic regression confirm that one permutation hashing can achieve similar (or even better) accuracies compared to the k-permutation scheme. See more details in arXiv:1208.1259.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Recently, b-bit minwise hashing has been applied to large-scale learning and sublinear time nearneighbor search. [sent-2, score-0.823]

2 The major drawback of minwise hashing is the expensive preprocessing, as the method requires applying (e. [sent-3, score-0.791]

3 This paper presents a simple solution called one permutation hashing. [sent-6, score-0.224]

4 Conceptually, given a binary data matrix, we permute the columns once and divide the permuted columns evenly into k bins; and we store, for each data vector, the smallest nonzero location in each bin. [sent-7, score-0.197]

5 The probability analysis illustrates that this one permutation scheme should perform similarly to the original (k-permutation) minwise hashing. [sent-8, score-0.731]

6 Our experiments with training SVM and logistic regression confirm that one permutation hashing can achieve similar (or even better) accuracies compared to the k-permutation scheme. [sent-9, score-0.821]

7 1 Introduction Minwise hashing [4, 3] is a standard technique in the context of search, for efficiently computing set similarities. [sent-12, score-0.444]

8 Recently, b-bit minwise hashing [18, 19], which stores only the lowest b bits of each hashed value, has been applied to sublinear time near neighbor search [22] and learning [16], on large-scale high-dimensional binary data (e. [sent-13, score-1.297]

9 A drawback of minwise hashing is that it requires a costly preprocessing step, for conducting (e. [sent-16, score-0.883]

10 2 Minwise Hashing and b-Bit Minwise Hashing Minwise hashing was mainly designed for binary data. [sent-37, score-0.483]

11 The minwise hashing method was proposed for efficient computing resemblances. [sent-44, score-0.791]

12 The hashed values are the two minimums of π(S1 ) and π(S2 ). [sent-47, score-0.196]

13 The probability at which the two hashed values are equal is |S1 ∩ S2 | Pr (min(π(S1 )) = min(π(S2 ))) = =R (2) |S1 ∪ S2 | 1 One can then estimate R from k independent permutations, π1 , . [sent-48, score-0.196]

14 The method of b-bit minwise hashing [18, 19] provides a simple solution by storing only the lowest b bits of each hashed data, reducing the dimensionality of the (expanded) hashed data matrix to just 2b × k. [sent-53, score-1.267]

15 [16] applied this idea to large-scale learning on the webspam dataset and demonstrated that using b = 8 and k = 200 to 500 could achieve very similar accuracies as using the original data. [sent-54, score-0.418]

16 3 The Cost of Preprocessing and Testing Clearly, the preprocessing of minwise hashing can be very costly. [sent-56, score-0.883]

17 In our experiments, loading the webspam dataset (350,000 samples, about 16 million features, and about 24GB in Libsvm/svmlight (text) format) used in [16] took about 1000 seconds when the data were stored in text format, and took about 150 seconds after we converted the data into binary. [sent-57, score-0.313]

18 Note that, compared to industrial applications [24], the webspam dataset is very small. [sent-59, score-0.287]

19 Intuitively, the standard practice of minwise hashing ought to be very “wasteful” in that all the nonzero elements in one set are scanned (permuted) but only the smallest one will be used. [sent-65, score-0.86]

20 We apply one permutation π on the sets and present π(S1 ), π(S2 ), and π(S3 ) as binary (0/1) vectors, where π(S1 ) = {2, 4, 7, 13}, π(S2 ) = {0, 6, 13}, and π(S3 ) = {0, 1, 10, 12}. [sent-73, score-0.263]

21 For now, we use ‘*’ for empty bins, which occur rarely unless the number of nonzeros is small compared to k. [sent-75, score-0.287]

22 As illustrated in Figure 1, the idea of one permutation hashing is simple. [sent-76, score-0.668]

23 In the example in Figure 1 (which concerns 3 sets), the sample selected from π(S1 ) is [2, 4, ∗, 13], where we use ’*’ to denote an empty bin, for the time being. [sent-79, score-0.141]

24 Since only want to compare elements with the same bin number (so that we can obtain an inner product), we can actually re-index the elements of each bin to use the smallest possible representations. [sent-80, score-0.169]

25 We will show that empty bins occur rarely unless the total number of nonzeros for some set is small compared to k, and we will present strategies on how to deal with empty bins should they occur. [sent-82, score-0.708]

26 , 500) permutations to just one permutation (or a few) is much more computationally efficient. [sent-86, score-0.31]

27 From the perspective of energy consumption, this scheme is desirable, especially considering that minwise hashing is deployed in the search industry. [sent-87, score-0.941]

28 One permutation hashing should be easier to implement, from the perspective of random number generation. [sent-94, score-0.668]

29 On the other hand, it would not be realistic to store a “permutation matrix” of size D × k if D = 109 and k = 500; instead, one usually has to resort to approximations such as universal hashing [5]. [sent-98, score-0.494]

30 Universal hashing often works well in practice although theoretically there are always worst cases. [sent-99, score-0.444]

31 One permutation hashing is a better matrix sparsification scheme. [sent-100, score-0.668]

32 In terms of the original binary data matrix, the one permutation scheme simply makes many nonzero entries be zero, without further “damaging” the matrix. [sent-101, score-0.463]

33 Using the k-permutation scheme, we store, for each permutation and each row, only the first nonzero and make all the other nonzero entries be zero; and then we have to concatenate k such data matrices. [sent-102, score-0.335]

34 Basically, CRS continuously takes the bottom-k nonzeros after applying one permutation on the data, then it uses a simple “trick” to construct a random sample for each pair with the effective sample size determined at the estimation stage. [sent-106, score-0.353]

35 By taking the nonzeros continuously, however, the samples are no longer “aligned” and hence we can not write the estimator as an inner product in a unified fashion. [sent-107, score-0.151]

36 [16] commented that using CRS for linear learning does not produce as good results compared to using b-bit minwise hashing. [sent-108, score-0.347]

37 Interestingly, in the original “minwise hashing” paper [4] (we use quotes because the scheme was not called “minwise hashing” at that time), only one permutation was used and a sample was the first k nonzeros after the permutation. [sent-109, score-0.513]

38 Then they quickly moved to the k-permutation minwise hashing scheme [3]. [sent-110, score-0.893]

39 The preprocessing cost of very sparse random projections can be as small as merely doing one projection. [sent-114, score-0.165]

40 The variable-length hashing scheme is to some extent related to the Count-Min (CM) sketch [6] and the Vowpal Wabbit (VW) [21, 25] hashing algorithms. [sent-123, score-1.008]

41 2 Applications of Minwise Hashing on Efficient Search and Learning In this section, we will briefly review two important applications of the k-permutation b-bit minwise hashing: (i) sublinear time near neighbor search [22], and (ii) large-scale linear learning [16]. [sent-124, score-0.517]

42 1 Sublinear Time Near Neighbor Search The task of near neighbor search is to identify a set of data points which are “most similar” to a query data point. [sent-126, score-0.138]

43 The idea in [22] is to directly use the bits from b-bit minwise hashing to construct hash tables. [sent-131, score-0.917]

44 3 Specifically, we hash the data points using k random permutations and store each hash value using b bits. [sent-132, score-0.252]

45 In the testing phrase, we apply the same k permutations to a query data point to generate a bk-bit signature and only search data points in the corresponding bucket. [sent-137, score-0.2]

46 Given n data points, we apply k = 2 permutations and store b = 2 bits of each hashed value to generate n (4-bit) signatures L times. [sent-143, score-0.406]

47 For Table 1 (left panel of Figure 2), the lowest b-bits of its two hashed values are 00 and 00 and thus its signature is 0000 in binary; hence we place a pointer to data point 6 in bucket number 0. [sent-145, score-0.292]

48 We then only search for the near neighbors in the set {6, 15, 26, 79, 110, 143}, instead of the original set of n data points. [sent-149, score-0.15]

49 Suppose for one data vector, the hashed values are {12013, 25964, 20191}, whose binary digits are respectively {010111011101101, 110010101101100, 100111011011111}. [sent-156, score-0.263]

50 At run-time, the (b-bit) hashed data are expanded into a new feature vector of length 2b k = 12: {0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0}. [sent-158, score-0.244]

51 Clearly, in both applications (near neighbor search and linear learning), the hashed data have to be “aligned” in that only the hashed data generated from the same permutation are interacted. [sent-160, score-0.71]

52 Note that, with our one permutation scheme as in Figure 1, the hashed data are indeed aligned. [sent-161, score-0.522]

53 3 Theoretical Analysis of the One Permutation Scheme This section presents the probability analysis to provide a rigorous foundation for one permutation hashing as illustrated in Figure 1. [sent-162, score-0.668]

54 Since 1 − k ≈ k −f /k e , we know that the chance of empty bins is small when f k. [sent-175, score-0.3]

55 For practical applications, we would expect that f k (for most data pairs), otherwise hashing probably would not be too useful anyway. [sent-178, score-0.461]

56 This is why we do not expect empty bins will significantly impact (if at all) the performance in practical settings. [sent-179, score-0.298]

57 Note that k − Nemp > 0, because we assume the original data vectors are not completely empty (allˆ zero). [sent-181, score-0.199]

58 It is probably not surprising that our one permutation scheme (slightly) outperforms the original k-permutation scheme (at merely 1/k of the preprocessing cost), because one permutation hashing, which is “sampling-without-replacement”, provides a better strategy for matrix sparsification. [sent-184, score-0.867]

59 5 4 Strategies for Dealing with Empty Bins In general, we expect that empty bins should not occur often because E(Nemp )/k ≈ e−f /k , which is very close to zero if f /k > 5. [sent-185, score-0.332]

60 ) If the goal of using minwise hashing is for data reduction, i. [sent-187, score-0.791]

61 Nevertheless, in applications where we need the estimators to be inner products, we need strategies to deal with empty bins in case they occur. [sent-190, score-0.303]

62 4 Frequency x 10 4 Figure 3 plots the histogram of the numbers of Webspam nonzeros in the webspam dataset, which has 350,000 3 samples. [sent-192, score-0.37]

63 Thus, we must deal with empty bins 0 2000 4000 6000 8000 10000 # nonzeros if we do not want to exclude those data points. [sent-198, score-0.41]

64 For Figure 3: Histogram of the numbers of nonzeros −f /k example, if f = k = 500, then Nemp ≈ e = in the webspam dataset (350,000 samples). [sent-199, score-0.398]

65 The strategy we recommend for linear learning is zero coding, which is tightly coupled with the strategy of hashed data expansion [16] as reviewed in Sec. [sent-202, score-0.288]

66 This strategy, which is sparsity-preserving, essentially corresponds to the following modified estimator: Nmat ˆ (0) Rmat = k− (1) k (1) (2) (1) Nemp (2) (17) k − Nemp (2) k where Nemp = j=1 Iemp,j and Nemp = j=1 Iemp,j are the numbers of empty bins in π(S1 ) and π(S2 ), respectively. [sent-209, score-0.281]

67 Basically, since each data vector is processed and coded separately, we actually do not know Nemp (the number of jointly empty bins) until we see both π(S1 ) and π(S2 ). [sent-211, score-0.17]

68 Once the preprocessing is no longer the bottleneck, it matters less whether we use 1 permutation or (e. [sent-222, score-0.316]

69 The chance of having empty bins decreases exponentially with increasing m. [sent-225, score-0.3]

70 2 reviewed the data-expansion strategy used by [16] for integrating b-bit minwise hashing with linear learning. [sent-229, score-0.837]

71 We will adopt a similar strategy with modifications for considering empty bins. [sent-230, score-0.17]

72 Suppose we apply our one permutation hashing scheme and use k = 4 bins. [sent-234, score-0.77]

73 For the first data vector, the hashed values are [12013, 25964, 20191, ∗] (i. [sent-235, score-0.196]

74 1 varies, depending on the number of empty bins in the i-th vector. [sent-240, score-0.281]

75 The normalization factor (i) k−Nemp 5 Experimental Results on the Webspam Dataset The webspam dataset has 350,000 samples and 16,609,143 features. [sent-241, score-0.269]

76 We conduct extensive experiments on linear SVM and logistic regression, using our proposed one permutation hashing scheme with k ∈ {26 , 27 , 28 , 29 } and b ∈ {1, 2, 4, 6, 8}. [sent-244, score-0.836]

77 Since our purpose is to demonstrate the effectiveness of our proposed hashing scheme, we simply provide the results for a wide range of C values and assume that the best performance is achievable if we conduct cross-validations. [sent-247, score-0.466]

78 Clearly, when k = 512 (or even 256) and b = 8, b-bit one permutation hashing achieves similar test accuracies as using the original data. [sent-250, score-0.817]

79 Also, compared to the original k-permutation scheme as in [16], our one permutation scheme achieves similar (or even slightly better) accuracies. [sent-251, score-0.486]

80 1 10 2 10 100 98 96 94 92 90 88 86 84 82 80 −3 10 b = 4,6,8 b=2 b=1 Original 1 Perm k Perm logit: k = 512 Webspam: Accuracy −2 10 −1 0 10 10 1 10 2 10 C Figure 4: Test accuracies of SVM (upper panels) and logistic regression (bottom panels), averaged over 50 repetitions. [sent-252, score-0.153]

81 The accuracies of using the original data are plotted as dashed (red, if color is available) curves with “diamond” markers. [sent-253, score-0.149]

82 Compared with the original k-permutation minwise hashing (dashed and blue if color is available), the one permutation hashing scheme achieves similar accuracies, or even slightly better accuracies when k is large. [sent-255, score-1.71]

83 The empirical results on the webspam datasets are encouraging because they verify that our proposed one permutation hashing scheme performs as well as (or even slightly better than) the original kpermutation scheme, at merely 1/k of the original preprocessing cost. [sent-256, score-1.255]

84 , news20) where the empty bins will occur much more frequently. [sent-259, score-0.298]

85 Therefore, this is more like a contrived example and we use it just to verify that our one permutation scheme (with the zero coding strategy) still works very well even when we let k be 7 as large as 4096 (i. [sent-262, score-0.361]

86 In fact, the one permutation schemes achieves noticeably better accuracies than the original k-permutation scheme. [sent-265, score-0.41]

87 We believe this is because the one permutation scheme is “sample-without-replacement” and provides a better matrix sparsification strategy without “contaminating” the original data matrix too much. [sent-266, score-0.413]

88 We experiment with k ∈ {25 , 26 , 27 , 28 , 29 , 210 , 211 , 212 } and b ∈ {1, 2, 4, 6, 8}, for both one permutation scheme and k-permutation scheme. [sent-267, score-0.326]

89 , k ≤ 64) both the one permutation scheme and the original k-permutation scheme perform similarly. [sent-273, score-0.486]

90 For larger k values (especially as k ≥ 256), however, our one permutation scheme noticeably outperforms the k-permutation scheme. [sent-274, score-0.363]

91 Using the original data, the test accuracies are about 98%. [sent-275, score-0.149]

92 Our one permutation scheme with k ≥ 512 and b = 8 essentially achieves the original test accuracies, while the k-permutation scheme could only reach about 97. [sent-276, score-0.486]

93 The one permutation scheme noticeably outperforms the original k-permutation scheme especially when k is not small. [sent-279, score-0.523]

94 65 −1 10 logit: k = 4096 News20: Accuracy 0 10 1 10 C 2 10 3 10 Figure 6: Test accuracies of logistic regression averaged over 100 repetitions. [sent-280, score-0.153]

95 The one permutation scheme noticeably outperforms the original k-permutation scheme especially when k is not small. [sent-281, score-0.523]

96 7 Conclusion A new hashing algorithm is developed for large-scale search and learning in massive binary data. [sent-282, score-0.556]

97 , k = 500) minwise hashing (which is a standard procedure in the context of search), our method requires only one permutation and can achieve similar or even better accuracies at merely 1/k of the original preprocessing cost. [sent-285, score-1.292]

98 We expect that one permutation hashing (or its variant) will be adopted in practice. [sent-286, score-0.685]

99 Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. [sent-293, score-0.49]

100 b-bit minwise hashing in practice: Largeo scale batch and online learning and using GPUs for fast preprocessing with simple hash functions. [sent-360, score-0.95]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('nemp', 0.454), ('hashing', 0.444), ('minwise', 0.347), ('perm', 0.256), ('webspam', 0.241), ('permutation', 0.224), ('hashed', 0.196), ('empty', 0.141), ('bins', 0.14), ('nonzeros', 0.129), ('rmat', 0.128), ('logit', 0.114), ('ping', 0.105), ('scheme', 0.102), ('nmat', 0.099), ('svm', 0.093), ('preprocessing', 0.092), ('accuracies', 0.091), ('accuracy', 0.088), ('permutations', 0.086), ('hash', 0.067), ('bits', 0.059), ('bin', 0.059), ('original', 0.058), ('arnd', 0.05), ('search', 0.048), ('expanded', 0.048), ('neighbor', 0.046), ('near', 0.044), ('logistic', 0.044), ('crs', 0.043), ('nonzero', 0.04), ('binary', 0.039), ('signature', 0.038), ('anshumali', 0.038), ('noticeably', 0.037), ('merely', 0.036), ('christian', 0.036), ('shrivastava', 0.035), ('bucket', 0.033), ('signatures', 0.033), ('sublinear', 0.032), ('store', 0.032), ('concatenate', 0.031), ('panels', 0.031), ('bc', 0.03), ('processed', 0.029), ('smallest', 0.029), ('strategy', 0.029), ('digits', 0.028), ('dataset', 0.028), ('largeo', 0.028), ('owen', 0.028), ('testing', 0.028), ('sparsi', 0.028), ('kenneth', 0.028), ('vancouver', 0.028), ('phrase', 0.028), ('li', 0.027), ('permuted', 0.027), ('lowest', 0.025), ('dallas', 0.025), ('andrei', 0.025), ('broder', 0.025), ('manasse', 0.025), ('www', 0.025), ('massive', 0.025), ('basically', 0.025), ('evenly', 0.024), ('text', 0.023), ('inner', 0.022), ('conduct', 0.022), ('ar', 0.022), ('piotr', 0.022), ('stoc', 0.021), ('stored', 0.021), ('aligned', 0.02), ('pegasos', 0.02), ('permute', 0.02), ('lsh', 0.019), ('resemblance', 0.019), ('liblinear', 0.019), ('projections', 0.019), ('chance', 0.019), ('regression', 0.018), ('universal', 0.018), ('sketch', 0.018), ('format', 0.018), ('industrial', 0.018), ('min', 0.018), ('divide', 0.018), ('cost', 0.018), ('coding', 0.018), ('kdd', 0.018), ('mark', 0.017), ('stores', 0.017), ('occur', 0.017), ('zero', 0.017), ('expect', 0.017), ('reviewed', 0.017)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999946 257 nips-2012-One Permutation Hashing

Author: Ping Li, Art Owen, Cun-hui Zhang

Abstract: Minwise hashing is a standard procedure in the context of search, for efficiently estimating set similarities in massive binary data such as text. Recently, b-bit minwise hashing has been applied to large-scale learning and sublinear time nearneighbor search. The major drawback of minwise hashing is the expensive preprocessing, as the method requires applying (e.g.,) k = 200 to 500 permutations on the data. This paper presents a simple solution called one permutation hashing. Conceptually, given a binary data matrix, we permute the columns once and divide the permuted columns evenly into k bins; and we store, for each data vector, the smallest nonzero location in each bin. The probability analysis illustrates that this one permutation scheme should perform similarly to the original (k-permutation) minwise hashing. Our experiments with training SVM and logistic regression confirm that one permutation hashing can achieve similar (or even better) accuracies compared to the k-permutation scheme. See more details in arXiv:1208.1259.

2 0.35666597 313 nips-2012-Sketch-Based Linear Value Function Approximation

Author: Marc Bellemare, Joel Veness, Michael Bowling

Abstract: Hashing is a common method to reduce large, potentially infinite feature vectors to a fixed-size table. In reinforcement learning, hashing is often used in conjunction with tile coding to represent states in continuous spaces. Hashing is also a promising approach to value function approximation in large discrete domains such as Go and Hearts, where feature vectors can be constructed by exhaustively combining a set of atomic features. Unfortunately, the typical use of hashing in value function approximation results in biased value estimates due to the possibility of collisions. Recent work in data stream summaries has led to the development of the tug-of-war sketch, an unbiased estimator for approximating inner products. Our work investigates the application of this new data structure to linear value function approximation. Although in the reinforcement learning setting the use of the tug-of-war sketch leads to biased value estimates, we show that this bias can be orders of magnitude less than that of standard hashing. We provide empirical results on two RL benchmark domains and fifty-five Atari 2600 games to highlight the superior learning performance obtained when using tug-of-war hashing. 1

3 0.15520199 329 nips-2012-Super-Bit Locality-Sensitive Hashing

Author: Jianqiu Ji, Jianmin Li, Shuicheng Yan, Bo Zhang, Qi Tian

Abstract: Sign-random-projection locality-sensitive hashing (SRP-LSH) is a probabilistic dimension reduction method which provides an unbiased estimate of angular similarity, yet suffers from the large variance of its estimation. In this work, we propose the Super-Bit locality-sensitive hashing (SBLSH). It is easy to implement, which orthogonalizes the random projection vectors in batches, and it is theoretically guaranteed that SBLSH also provides an unbiased estimate of angular similarity, yet with a smaller variance when the angle to estimate is within (0, ⇡/2]. The extensive experiments on real data well validate that given the same length of binary code, SBLSH may achieve significant mean squared error reduction in estimating pairwise angular similarity. Moreover, SBLSH shows the superiority over SRP-LSH in approximate nearest neighbor (ANN) retrieval experiments. 1

4 0.14141832 163 nips-2012-Isotropic Hashing

Author: Weihao Kong, Wu-jun Li

Abstract: Most existing hashing methods adopt some projection functions to project the original data into several dimensions of real values, and then each of these projected dimensions is quantized into one bit (zero or one) by thresholding. Typically, the variances of different projected dimensions are different for existing projection functions such as principal component analysis (PCA). Using the same number of bits for different projected dimensions is unreasonable because larger-variance dimensions will carry more information. Although this viewpoint has been widely accepted by many researchers, it is still not verified by either theory or experiment because no methods have been proposed to find a projection with equal variances for different dimensions. In this paper, we propose a novel method, called isotropic hashing (IsoHash), to learn projection functions which can produce projected dimensions with isotropic variances (equal variances). Experimental results on real data sets show that IsoHash can outperform its counterpart with different variances for different dimensions, which verifies the viewpoint that projections with isotropic variances will be better than those with anisotropic variances. 1

5 0.13038643 71 nips-2012-Co-Regularized Hashing for Multimodal Data

Author: Yi Zhen, Dit-Yan Yeung

Abstract: Hashing-based methods provide a very promising approach to large-scale similarity search. To obtain compact hash codes, a recent trend seeks to learn the hash functions from data automatically. In this paper, we study hash function learning in the context of multimodal data. We propose a novel multimodal hash function learning method, called Co-Regularized Hashing (CRH), based on a boosted coregularization framework. The hash functions for each bit of the hash codes are learned by solving DC (difference of convex functions) programs, while the learning for multiple bits proceeds via a boosting procedure so that the bias introduced by the hash functions can be sequentially minimized. We empirically compare CRH with two state-of-the-art multimodal hash function learning methods on two publicly available data sets. 1

6 0.11838058 148 nips-2012-Hamming Distance Metric Learning

7 0.10264323 126 nips-2012-FastEx: Hash Clustering with Exponential Families

8 0.10070287 42 nips-2012-Angular Quantization-based Binary Codes for Fast Similarity Search

9 0.076341175 197 nips-2012-Learning with Recursive Perceptual Representations

10 0.068782918 202 nips-2012-Locally Uniform Comparison Image Descriptor

11 0.061876737 111 nips-2012-Efficient Sampling for Bipartite Matching Problems

12 0.045411874 228 nips-2012-Multilabel Classification using Bayesian Compressed Sensing

13 0.044343863 186 nips-2012-Learning as MAP Inference in Discrete Graphical Models

14 0.043991268 131 nips-2012-Feature Clustering for Accelerating Parallel Coordinate Descent

15 0.037110105 119 nips-2012-Entropy Estimations Using Correlated Symmetric Stable Random Projections

16 0.036226369 188 nips-2012-Learning from Distributions via Support Measure Machines

17 0.035653636 279 nips-2012-Projection Retrieval for Classification

18 0.034950368 92 nips-2012-Deep Representations and Codes for Image Auto-Annotation

19 0.034340542 128 nips-2012-Fast Resampling Weighted v-Statistics

20 0.033036407 176 nips-2012-Learning Image Descriptors with the Boosting-Trick


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.105), (1, 0.014), (2, -0.04), (3, -0.057), (4, 0.045), (5, -0.049), (6, 0.005), (7, 0.114), (8, 0.128), (9, 0.09), (10, 0.084), (11, -0.141), (12, 0.074), (13, 0.225), (14, -0.205), (15, 0.058), (16, 0.096), (17, -0.051), (18, -0.22), (19, -0.056), (20, 0.014), (21, 0.142), (22, 0.036), (23, -0.125), (24, -0.078), (25, 0.01), (26, 0.007), (27, -0.054), (28, -0.032), (29, -0.053), (30, -0.02), (31, 0.075), (32, -0.016), (33, 0.069), (34, -0.023), (35, -0.07), (36, -0.02), (37, 0.017), (38, -0.019), (39, -0.003), (40, 0.025), (41, -0.009), (42, -0.035), (43, -0.011), (44, -0.01), (45, -0.041), (46, -0.021), (47, -0.015), (48, -0.014), (49, -0.058)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.93939692 257 nips-2012-One Permutation Hashing

Author: Ping Li, Art Owen, Cun-hui Zhang

Abstract: Minwise hashing is a standard procedure in the context of search, for efficiently estimating set similarities in massive binary data such as text. Recently, b-bit minwise hashing has been applied to large-scale learning and sublinear time nearneighbor search. The major drawback of minwise hashing is the expensive preprocessing, as the method requires applying (e.g.,) k = 200 to 500 permutations on the data. This paper presents a simple solution called one permutation hashing. Conceptually, given a binary data matrix, we permute the columns once and divide the permuted columns evenly into k bins; and we store, for each data vector, the smallest nonzero location in each bin. The probability analysis illustrates that this one permutation scheme should perform similarly to the original (k-permutation) minwise hashing. Our experiments with training SVM and logistic regression confirm that one permutation hashing can achieve similar (or even better) accuracies compared to the k-permutation scheme. See more details in arXiv:1208.1259.

2 0.88937205 313 nips-2012-Sketch-Based Linear Value Function Approximation

Author: Marc Bellemare, Joel Veness, Michael Bowling

Abstract: Hashing is a common method to reduce large, potentially infinite feature vectors to a fixed-size table. In reinforcement learning, hashing is often used in conjunction with tile coding to represent states in continuous spaces. Hashing is also a promising approach to value function approximation in large discrete domains such as Go and Hearts, where feature vectors can be constructed by exhaustively combining a set of atomic features. Unfortunately, the typical use of hashing in value function approximation results in biased value estimates due to the possibility of collisions. Recent work in data stream summaries has led to the development of the tug-of-war sketch, an unbiased estimator for approximating inner products. Our work investigates the application of this new data structure to linear value function approximation. Although in the reinforcement learning setting the use of the tug-of-war sketch leads to biased value estimates, we show that this bias can be orders of magnitude less than that of standard hashing. We provide empirical results on two RL benchmark domains and fifty-five Atari 2600 games to highlight the superior learning performance obtained when using tug-of-war hashing. 1

3 0.8187955 163 nips-2012-Isotropic Hashing

Author: Weihao Kong, Wu-jun Li

Abstract: Most existing hashing methods adopt some projection functions to project the original data into several dimensions of real values, and then each of these projected dimensions is quantized into one bit (zero or one) by thresholding. Typically, the variances of different projected dimensions are different for existing projection functions such as principal component analysis (PCA). Using the same number of bits for different projected dimensions is unreasonable because larger-variance dimensions will carry more information. Although this viewpoint has been widely accepted by many researchers, it is still not verified by either theory or experiment because no methods have been proposed to find a projection with equal variances for different dimensions. In this paper, we propose a novel method, called isotropic hashing (IsoHash), to learn projection functions which can produce projected dimensions with isotropic variances (equal variances). Experimental results on real data sets show that IsoHash can outperform its counterpart with different variances for different dimensions, which verifies the viewpoint that projections with isotropic variances will be better than those with anisotropic variances. 1

4 0.80837452 71 nips-2012-Co-Regularized Hashing for Multimodal Data

Author: Yi Zhen, Dit-Yan Yeung

Abstract: Hashing-based methods provide a very promising approach to large-scale similarity search. To obtain compact hash codes, a recent trend seeks to learn the hash functions from data automatically. In this paper, we study hash function learning in the context of multimodal data. We propose a novel multimodal hash function learning method, called Co-Regularized Hashing (CRH), based on a boosted coregularization framework. The hash functions for each bit of the hash codes are learned by solving DC (difference of convex functions) programs, while the learning for multiple bits proceeds via a boosting procedure so that the bias introduced by the hash functions can be sequentially minimized. We empirically compare CRH with two state-of-the-art multimodal hash function learning methods on two publicly available data sets. 1

5 0.79806137 329 nips-2012-Super-Bit Locality-Sensitive Hashing

Author: Jianqiu Ji, Jianmin Li, Shuicheng Yan, Bo Zhang, Qi Tian

Abstract: Sign-random-projection locality-sensitive hashing (SRP-LSH) is a probabilistic dimension reduction method which provides an unbiased estimate of angular similarity, yet suffers from the large variance of its estimation. In this work, we propose the Super-Bit locality-sensitive hashing (SBLSH). It is easy to implement, which orthogonalizes the random projection vectors in batches, and it is theoretically guaranteed that SBLSH also provides an unbiased estimate of angular similarity, yet with a smaller variance when the angle to estimate is within (0, ⇡/2]. The extensive experiments on real data well validate that given the same length of binary code, SBLSH may achieve significant mean squared error reduction in estimating pairwise angular similarity. Moreover, SBLSH shows the superiority over SRP-LSH in approximate nearest neighbor (ANN) retrieval experiments. 1

6 0.62857908 42 nips-2012-Angular Quantization-based Binary Codes for Fast Similarity Search

7 0.54222941 148 nips-2012-Hamming Distance Metric Learning

8 0.3660275 126 nips-2012-FastEx: Hash Clustering with Exponential Families

9 0.28885651 202 nips-2012-Locally Uniform Comparison Image Descriptor

10 0.25660473 176 nips-2012-Learning Image Descriptors with the Boosting-Trick

11 0.24750412 279 nips-2012-Projection Retrieval for Classification

12 0.24285711 197 nips-2012-Learning with Recursive Perceptual Representations

13 0.23026481 92 nips-2012-Deep Representations and Codes for Image Auto-Annotation

14 0.22545546 17 nips-2012-A Scalable CUR Matrix Decomposition Algorithm: Lower Time Complexity and Tighter Bound

15 0.22361991 44 nips-2012-Approximating Concavely Parameterized Optimization Problems

16 0.21857101 267 nips-2012-Perceptron Learning of SAT

17 0.20373487 130 nips-2012-Feature-aware Label Space Dimension Reduction for Multi-label Classification

18 0.19778447 128 nips-2012-Fast Resampling Weighted v-Statistics

19 0.19684869 309 nips-2012-Semi-supervised Eigenvectors for Locally-biased Learning

20 0.18747342 223 nips-2012-Multi-criteria Anomaly Detection using Pareto Depth Analysis


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.047), (17, 0.026), (21, 0.014), (38, 0.101), (42, 0.016), (43, 0.295), (44, 0.022), (54, 0.019), (55, 0.028), (74, 0.076), (76, 0.112), (80, 0.08), (92, 0.051)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.73503453 257 nips-2012-One Permutation Hashing

Author: Ping Li, Art Owen, Cun-hui Zhang

Abstract: Minwise hashing is a standard procedure in the context of search, for efficiently estimating set similarities in massive binary data such as text. Recently, b-bit minwise hashing has been applied to large-scale learning and sublinear time nearneighbor search. The major drawback of minwise hashing is the expensive preprocessing, as the method requires applying (e.g.,) k = 200 to 500 permutations on the data. This paper presents a simple solution called one permutation hashing. Conceptually, given a binary data matrix, we permute the columns once and divide the permuted columns evenly into k bins; and we store, for each data vector, the smallest nonzero location in each bin. The probability analysis illustrates that this one permutation scheme should perform similarly to the original (k-permutation) minwise hashing. Our experiments with training SVM and logistic regression confirm that one permutation hashing can achieve similar (or even better) accuracies compared to the k-permutation scheme. See more details in arXiv:1208.1259.

2 0.56287605 318 nips-2012-Sparse Approximate Manifolds for Differential Geometric MCMC

Author: Ben Calderhead, Mátyás A. Sustik

Abstract: One of the enduring challenges in Markov chain Monte Carlo methodology is the development of proposal mechanisms to make moves distant from the current point, that are accepted with high probability and at low computational cost. The recent introduction of locally adaptive MCMC methods based on the natural underlying Riemannian geometry of such models goes some way to alleviating these problems for certain classes of models for which the metric tensor is analytically tractable, however computational efficiency is not assured due to the necessity of potentially high-dimensional matrix operations at each iteration. In this paper we firstly investigate a sampling-based approach for approximating the metric tensor and suggest a valid MCMC algorithm that extends the applicability of Riemannian Manifold MCMC methods to statistical models that do not admit an analytically computable metric tensor. Secondly, we show how the approximation scheme we consider naturally motivates the use of 1 regularisation to improve estimates and obtain a sparse approximate inverse of the metric, which enables stable and sparse approximations of the local geometry to be made. We demonstrate the application of this algorithm for inferring the parameters of a realistic system of ordinary differential equations using a biologically motivated robust Student-t error model, for which the Expected Fisher Information is analytically intractable. 1

3 0.55320525 193 nips-2012-Learning to Align from Scratch

Author: Gary Huang, Marwan Mattar, Honglak Lee, Erik G. Learned-miller

Abstract: Unsupervised joint alignment of images has been demonstrated to improve performance on recognition tasks such as face verification. Such alignment reduces undesired variability due to factors such as pose, while only requiring weak supervision in the form of poorly aligned examples. However, prior work on unsupervised alignment of complex, real-world images has required the careful selection of feature representation based on hand-crafted image descriptors, in order to achieve an appropriate, smooth optimization landscape. In this paper, we instead propose a novel combination of unsupervised joint alignment with unsupervised feature learning. Specifically, we incorporate deep learning into the congealing alignment framework. Through deep learning, we obtain features that can represent the image at differing resolutions based on network depth, and that are tuned to the statistics of the specific data being aligned. In addition, we modify the learning algorithm for the restricted Boltzmann machine by incorporating a group sparsity penalty, leading to a topographic organization of the learned filters and improving subsequent alignment results. We apply our method to the Labeled Faces in the Wild database (LFW). Using the aligned images produced by our proposed unsupervised algorithm, we achieve higher accuracy in face verification compared to prior work in both unsupervised and supervised alignment. We also match the accuracy for the best available commercial method. 1

4 0.54713356 274 nips-2012-Priors for Diversity in Generative Latent Variable Models

Author: James T. Kwok, Ryan P. Adams

Abstract: Probabilistic latent variable models are one of the cornerstones of machine learning. They offer a convenient and coherent way to specify prior distributions over unobserved structure in data, so that these unknown properties can be inferred via posterior inference. Such models are useful for exploratory analysis and visualization, for building density models of data, and for providing features that can be used for later discriminative tasks. A significant limitation of these models, however, is that draws from the prior are often highly redundant due to i.i.d. assumptions on internal parameters. For example, there is no preference in the prior of a mixture model to make components non-overlapping, or in topic model to ensure that co-occurring words only appear in a small number of topics. In this work, we revisit these independence assumptions for probabilistic latent variable models, replacing the underlying i.i.d. prior with a determinantal point process (DPP). The DPP allows us to specify a preference for diversity in our latent variables using a positive definite kernel function. Using a kernel between probability distributions, we are able to define a DPP on probability measures. We show how to perform MAP inference with DPP priors in latent Dirichlet allocation and in mixture models, leading to better intuition for the latent variable representation and quantitatively improved unsupervised feature extraction, without compromising the generative aspects of the model. 1

5 0.54548943 329 nips-2012-Super-Bit Locality-Sensitive Hashing

Author: Jianqiu Ji, Jianmin Li, Shuicheng Yan, Bo Zhang, Qi Tian

Abstract: Sign-random-projection locality-sensitive hashing (SRP-LSH) is a probabilistic dimension reduction method which provides an unbiased estimate of angular similarity, yet suffers from the large variance of its estimation. In this work, we propose the Super-Bit locality-sensitive hashing (SBLSH). It is easy to implement, which orthogonalizes the random projection vectors in batches, and it is theoretically guaranteed that SBLSH also provides an unbiased estimate of angular similarity, yet with a smaller variance when the angle to estimate is within (0, ⇡/2]. The extensive experiments on real data well validate that given the same length of binary code, SBLSH may achieve significant mean squared error reduction in estimating pairwise angular similarity. Moreover, SBLSH shows the superiority over SRP-LSH in approximate nearest neighbor (ANN) retrieval experiments. 1

6 0.54383689 168 nips-2012-Kernel Latent SVM for Visual Recognition

7 0.5432328 260 nips-2012-Online Sum-Product Computation Over Trees

8 0.54288936 172 nips-2012-Latent Graphical Model Selection: Efficient Methods for Locally Tree-like Graphs

9 0.54245734 148 nips-2012-Hamming Distance Metric Learning

10 0.5401246 83 nips-2012-Controlled Recognition Bounds for Visual Learning and Exploration

11 0.53952938 229 nips-2012-Multimodal Learning with Deep Boltzmann Machines

12 0.53923869 197 nips-2012-Learning with Recursive Perceptual Representations

13 0.53688484 355 nips-2012-Truncation-free Online Variational Inference for Bayesian Nonparametric Models

14 0.53666449 176 nips-2012-Learning Image Descriptors with the Boosting-Trick

15 0.53627312 81 nips-2012-Context-Sensitive Decision Forests for Object Detection

16 0.53577214 199 nips-2012-Link Prediction in Graphs with Autoregressive Features

17 0.53572142 316 nips-2012-Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models

18 0.53558373 92 nips-2012-Deep Representations and Codes for Image Auto-Annotation

19 0.53549892 209 nips-2012-Max-Margin Structured Output Regression for Spatio-Temporal Action Localization

20 0.53542721 210 nips-2012-Memorability of Image Regions