nips nips2012 nips2012-42 knowledge-graph by maker-knowledge-mining

42 nips-2012-Angular Quantization-based Binary Codes for Fast Similarity Search


Source: pdf

Author: Yunchao Gong, Sanjiv Kumar, Vishal Verma, Svetlana Lazebnik

Abstract: This paper focuses on the problem of learning binary codes for efficient retrieval of high-dimensional non-negative data that arises in vision and text applications where counts or frequencies are used as features. The similarity of such feature vectors is commonly measured using the cosine of the angle between them. In this work, we introduce a novel angular quantization-based binary coding (AQBC) technique for such data and analyze its properties. In its most basic form, AQBC works by mapping each non-negative feature vector onto the vertex of the binary hypercube with which it has the smallest angle. Even though the number of vertices (quantization landmarks) in this scheme grows exponentially with data dimensionality d, we propose a method for mapping feature vectors to their smallest-angle binary vertices that scales as O(d log d). Further, we propose a method for learning a linear transformation of the data to minimize the quantization error, and show that it results in improved binary codes. Experiments on image and text datasets show that the proposed AQBC method outperforms the state of the art. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract This paper focuses on the problem of learning binary codes for efficient retrieval of high-dimensional non-negative data that arises in vision and text applications where counts or frequencies are used as features. [sent-5, score-0.384]

2 The similarity of such feature vectors is commonly measured using the cosine of the angle between them. [sent-6, score-0.447]

3 In this work, we introduce a novel angular quantization-based binary coding (AQBC) technique for such data and analyze its properties. [sent-7, score-0.304]

4 In its most basic form, AQBC works by mapping each non-negative feature vector onto the vertex of the binary hypercube with which it has the smallest angle. [sent-8, score-0.354]

5 Even though the number of vertices (quantization landmarks) in this scheme grows exponentially with data dimensionality d, we propose a method for mapping feature vectors to their smallest-angle binary vertices that scales as O(d log d). [sent-9, score-0.217]

6 Further, we propose a method for learning a linear transformation of the data to minimize the quantization error, and show that it results in improved binary codes. [sent-10, score-0.469]

7 Mapping the original high-dimensional data to similarity-preserving binary codes provides an attractive solution to both of these problems [21, 23]. [sent-14, score-0.293]

8 Several powerful techniques have been proposed recently to learn binary codes for large-scale nearest neighbor search and retrieval. [sent-15, score-0.362]

9 In this work, we investigate whether it is possible to achieve an improved binary embedding if the data vectors are known to contain only non-negative elements. [sent-17, score-0.177]

10 Furthermore, cosine of angle between vectors is typically used as a similarity measure for such data. [sent-19, score-0.447]

11 A popular binary coding method for cosine similarity is based on Locality Sensitive Hashing (LSH) [4], but it does not take advantage of the non-negative nature of histogram data. [sent-21, score-0.456]

12 However, it is appropriate only for Jaccard distance and also it does not result in binary codes. [sent-24, score-0.178]

13 Special 1 clustering algorithms have been developed for data sampled on the unit hypersphere, but they also do not lead to binary codes [1]. [sent-25, score-0.34]

14 To the best of our knowledge, this paper describes the first work that specifically learns binary codes for non-negative data with cosine similarity. [sent-26, score-0.474]

15 We propose a novel angular quantization technique to learn binary codes from non-negative data, where the angle between two vectors is used as a similarity measure. [sent-27, score-0.996]

16 The proposed technique works by quantizing each data point to the vertex of the binary hypercube with which it has the smallest angle. [sent-29, score-0.334]

17 The number of these quantization centers or landmarks is exponential in the dimensionality of the data, yielding a low-distortion quantization of a point. [sent-30, score-0.816]

18 Note that it would be computationally infeasible to perform traditional nearest-neighbor quantization as in [1] with such a large number of centers. [sent-31, score-0.327]

19 Instead, we present a very efficient method to find the nearest landmark for a point, i. [sent-33, score-0.211]

20 , the vertex of the binary hypercube with which it has the smallest angle. [sent-35, score-0.312]

21 Since the basic form of our quantization method does not take data distribution into account, we further propose a learning algorithm that linearly transforms the data before quantization to reduce the angular distortion. [sent-36, score-0.764]

22 We show experimentally that it significantly outperforms other state-of-the-art binary coding methods on both visual and textual data. [sent-37, score-0.194]

23 2 Angular Quantization-based Binary Codes Our goal is to find a quantization scheme that maximally preserves the cosine similarity (angle) between vectors in the positive orthant of the unit hypersphere. [sent-38, score-0.723]

24 This section introduces the proposed angular quantization technique that directly yields binary codes of non-negative data. [sent-39, score-0.73]

25 We first propose a simplified data-independent algorithm which does not involve any learning, and then present a method to adapt the quantization scheme to the input data to learn robust codes. [sent-40, score-0.327]

26 We first address the problem of computing a d-bit binary code of an input vector xi . [sent-43, score-0.263]

27 For angle-preserving quantization, we define a set of quantization centers or landmarks by projecting the vertices of the binary hypercube {0, 1}d onto the unit hypersphere. [sent-47, score-0.758]

28 This construction results in 2d − 1 landmark points for d-dimensional data. [sent-48, score-0.182]

29 1 An illustration of the proposed quantization model is given in Fig. [sent-49, score-0.327]

30 Given a point x on the hypersphere, one first finds its nearest2 landmark v i , and the binary encoding for xi is simply given by the binary vertex bi corresponding to v i . [sent-51, score-0.708]

31 3 One of the main characteristics of the proposed model is that the number of landmarks grows exponentially with d, and for many practical applications d can easily be in thousands or even more. [sent-52, score-0.162]

32 To obtain an efficient solution, we take advantage of the special structure of our set of landmarks, which are given by the projections of binary vectors onto the unit hypercube. [sent-55, score-0.224]

33 The nearest landmark of a point x, or the binary vertex having the smallest angle with x, is given by bT x ˆ b = arg max b b 2 s. [sent-56, score-0.652]

34 3 Since in terms of angle from a point, both bi and v i are equivalent, we will use the term landmark for either bi or v i depending on the context. [sent-64, score-0.616]

35 1 2 10 10 m (log scale) 3 10 (b) Cosine of angle between binary vertices. [sent-70, score-0.292]

36 Figure 1: (a) An illustration of our quantization model in 3D. [sent-71, score-0.327]

37 Here bi is a vertex of the unit cube and v i is its projection on the unit sphere. [sent-72, score-0.379]

38 Points v i are used as the landmarks for quantization. [sent-73, score-0.162]

39 To find the binary code of a given data point x, we first find its nearest landmark point v i on the sphere, and the correponding bi gives its binary code (v 4 and b4 in this case). [sent-74, score-0.785]

40 (b) Bound on cosine of angle between a binary vertex b1 with Hamming weight m, and another vertex b2 at a Hamming distance r from b1 . [sent-75, score-0.69]

41 Algorithm 1: Finding the nearest binary landmark for a point on the unit hypersphere. [sent-77, score-0.4]

42 Output: b, binary vector having the smallest angle with x. [sent-79, score-0.344]

43 Form binary vector bk whose elements are 1 for the k largest positions in x, 0 otherwise. [sent-92, score-0.214]

44 Now we study the properties of the proposed quantization model. [sent-122, score-0.327]

45 The following lemma helps to characterize the angular resolution of the quantization landmarks. [sent-123, score-0.457]

46 Lemma 2 Suppose b is an arbitrary binary vector with Hamming weight b 1 = m, where · 1 is the L1 norm. [sent-124, score-0.187]

47 Then for all binary vectors b that lie at a Hamming radius r from b, the cosine of the angle between b and b is bounded by m−r m , m m+r . [sent-125, score-0.508]

48 To find the lower bound on the cosine of the angle between b and b , we T b want to find a b such that √ b √ is maximized. [sent-127, score-0.331]

49 The Hamming weight m of each binary vertex corresponds to its position in space. [sent-137, score-0.245]

50 The above lemma implies that for landmark points on the boundary, the Voronoi cells are relatively coarse, and cells become progressively denser as one moves towards the center. [sent-139, score-0.29]

51 Thus the proposed set of landmarks non-uniformly tessellates the surface of the positive orthant of the hypersphere. [sent-140, score-0.214]

52 It is clear that for relatively large m, the angle between different landmarks is very small, thus providing dense quantization even for large r. [sent-143, score-0.685]

53 To get good performance, the distribution of the data should be such that a majority of the points fall closer to landmarks with higher m. [sent-144, score-0.182]

54 2 Learning Data-dependent Binary Codes We start by addressing the first issue of how to adapt the method to the given data to minimize the quantization error. [sent-148, score-0.327]

55 Similarly to the Iterative Quantization (ITQ) method of Gong and Lazebnik [7], we would like to align the data to a pre-defined set of quantization landmarks using a rotation, because rotating the data does not change the angles – and, therefore, the similarities – between the data points. [sent-149, score-0.532]

56 Suppose, for a data vector x, the sorted entries x(1) , . [sent-152, score-0.212]

57 , x(k) ∝ 1/k s , where k is the index of the entries sorted in descending order, and s is the power parameter that controls how quickly the entries decay. [sent-157, score-0.32]

58 More germanely, for a fixed s, applying a random rotation R to x makes the distribution of the entries of the resulting vector RT x more uniform and raises the effective m. [sent-159, score-0.174]

59 2 (a), we plot the sorted entries of x generated from Zipf’s law with s = 0. [sent-161, score-0.23]

60 Thus, even randomly rotating the data tends to lead to finer Voronoi cells and reduced quantization error. [sent-173, score-0.371]

61 Next, it is natural to ask whether we can optimize the rotation of the data to increase the cosine similarities between data points and their corresponding binary landmarks. [sent-174, score-0.449]

62 We seek a d × d orthogonal transformation R such that the sum of cosine similarities of each transformed data point RT xi and its corresponding binary landmark bi is maximized. [sent-175, score-0.689]

63 Then the objective function for our optimization problem is given by n Q(B, R) = arg max B,R i=1 bT i RT xi bi 2 s. [sent-177, score-0.245]

64 4 Note that after rotation, RT xi may contain negative values but this does not affect the quantization since the binarization technique described in Algorithm 1 effectively suppresses the negative values to 0. [sent-180, score-0.359]

65 5 0 20 40 sorted index (k) 60 80 −3 0 100 20 sorted index (k) (a) 40 60 80 100 0 0 20 sorted index (k) (b) 40 60 80 100 sorted index (k) (c) (d) Figure 2: Effect of rotation on Hamming weight m of the landmark corresponding to a particular vector. [sent-196, score-0.909]

66 The above objective function still yields a d-bit binary code for d-dimensional data, while in many real-world applications, a low-dimensional binary code may be preferable. [sent-200, score-0.444]

67 To generate a c-bit code where c < d, we can learn a d × c projection matrix R with orthogonal columns by optimizing the following modified objective function: n bT R T xi i bi 2 RT xi Q(B, R) = arg max B,R i=1 s. [sent-201, score-0.401]

68 (3) 2 Note that to minimize the angle after a low-dimensional projection (as opposed to a rotation), the denominator of the objective function contains RT xi 2 since after projection RT xi 2 = 1. [sent-204, score-0.346]

69 We propose to relax it by optimizing the linear correlation instead of the angle: n Q(B, R) = arg max B,R i=1 bT i R T xi bi 2 s. [sent-206, score-0.223]

70 RT R = I c , (5) B,R where Tr(·) is the trace operator, B is the c × n matrix with columns given by bi / bi 2 , and X is the d × n matrix with columns given by xi . [sent-215, score-0.336]

71 (5) becomes separable in xi , and we can solve for each bi separately. [sent-220, score-0.184]

72 Here, the individual sub-problem for each xi can be written as bT ˆ bi = arg max i (RT xi ). [sent-221, score-0.255]

73 bi bi 2 (6) Thus, given a point y i = RT xi in c-dimensional space, we want to find the vertex bi on the cdimensional hypercube having the smallest angle with y i . [sent-222, score-0.808]

74 To do this, we use Algorithm 1 to find bi for each y i , and then normalize each bi back to the unit hypersphere: bi = bi / bi 2 . [sent-223, score-0.807]

75 The optimization procedure is initialized by 1 first generating a random binary matrix by making each element 0 or 1 with probability 2 , and then normalizing each column to unit norm. [sent-238, score-0.189]

76 4 Computation of Cosine Similarity between Binary Codes Most existing similarity-preserving binary coding methods measure the similarity between pairs of binary vectors using the Hamming distance, which is extremely efficient to compute by bitwise XOR followed by bit count (popcount). [sent-243, score-0.482]

77 By contrast, the appropriate similarity measure for our T approach is the cosine of the angle θ between two binary vectors b and b : cos(θ) = b b2 b 2 . [sent-244, score-0.589]

78 Therefore, even though the cosine similarity is marginally slower than Hamming distance, it is still very fast compared to computing similarity of the original data vectors. [sent-247, score-0.343]

79 Due to this, Euclidean distance directly corresponds to the cosine similarity as dist2 = 2 − 2 sim. [sent-257, score-0.298]

80 For each query, we define the ground truth neighbors as all the points within the radius determined by the average distance to the 50th nearest neighbor in the dataset, and plot precision-recall curves of database points ordered by decreasing similarity of their binary codes with the query. [sent-260, score-0.621]

81 Since our AQBC method is unsupervised, we compare with several state-of-the-art unsupervised binary coding methods: Locality Sensitive Hashing (LSH) [4], Spectral Hashing [26], Iterative Quantization (ITQ) [7], Shift-invariant Kernel LSH (SKLSH) [20], and Spherical Hashing (SPH) [9]. [sent-262, score-0.217]

82 It is interesting to verify how much we gain by using the learned data-dependent quantization instead of the data-independent naive version (Sec. [sent-329, score-0.358]

83 Since the naive version can only learn a d-bit code (1000 bits in this case), its performance (AQBC naive) is shown only in Fig. [sent-332, score-0.187]

84 The performance is much worse than that of the learned codes, which clearly shows that adapting quantization to the data distribution is important in practice. [sent-334, score-0.327]

85 This is because for fewer bits, the Hamming weight (m) of the binary codes tends to be small resulting in larger distortion error as discussed in Sec. [sent-341, score-0.318]

86 Because the text features are the sparsest and have the highest dimensionality, we would like to verify whether learning the projection R helps in choosing landmarks with larger m as conjectured in Sec. [sent-348, score-0.261]

87 The average empirical distribution over sorted vector elements for this data is shown in Fig. [sent-351, score-0.164]

88 It is clear that vector elements have a rapidly decaying distribution, and the quantization leads to codes with low m implying higher quantization error. [sent-354, score-0.845]

89 Table 1 compares the binary code generation time and retrieval speed for different methods. [sent-361, score-0.258]

90 Our method involves linear projection and quantization using Algorithm 1, while ITQ and LSH only involve linear projections and thresholding. [sent-364, score-0.382]

91 The results show that the quantization step (Algorithm 1) of our method is fast, adding very little to the coding time. [sent-367, score-0.379]

92 04 0 1000 200 400 600 800 1000 sorted index (k) sorted index (k) (a) 0. [sent-417, score-0.318]

93 05 0 200 400 600 800 1000 sorted index (k) (d) Figure 6: Effect of projection on Hamming weight m for 20 Newsgroups data. [sent-429, score-0.239]

94 (a) Distribution of sorted vector entries, (b) scaled cumulative function, (c) distribution over vector elements after learned projection, (d) scaled cumulative function for the projected data. [sent-430, score-0.37]

95 (a) Average binary code generation time per query (milliseconds) on 5000dimensional LLC features. [sent-454, score-0.243]

96 For the proposed AQBC method, the first number is projection time and the second is quantization time. [sent-455, score-0.382]

97 in Table 1(b), computation of cosine similarity is slightly slower than that of Hamming distance, but both are orders of magnitude faster than Euclidean distance. [sent-460, score-0.262]

98 4 Discussion In this work, we have introduced a novel method for generating binary codes for non-negative frequency/count data. [sent-461, score-0.293]

99 Retrieval results on high-dimensional image and text datasets have demonstrated that the proposed codes accurately approximate neighbors in the original feature space according to cosine similarity. [sent-462, score-0.449]

100 To improve the semantic precision of retrieval, our earlier ITQ method [7] could take advantage of a supervised linear projection learned from labeled data with the help of canonical correlation analysis. [sent-466, score-0.182]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('aqbc', 0.446), ('quantization', 0.327), ('sph', 0.26), ('itq', 0.257), ('sklsh', 0.241), ('lsh', 0.199), ('cosine', 0.181), ('landmark', 0.162), ('landmarks', 0.162), ('rt', 0.157), ('bi', 0.152), ('codes', 0.151), ('angle', 0.15), ('binary', 0.142), ('hashing', 0.129), ('sh', 0.128), ('sorted', 0.124), ('hamming', 0.124), ('angular', 0.11), ('bits', 0.087), ('rotation', 0.086), ('similarity', 0.081), ('vertex', 0.078), ('bow', 0.071), ('code', 0.069), ('entries', 0.068), ('bt', 0.066), ('zipf', 0.06), ('hypercube', 0.06), ('cvpr', 0.059), ('precision', 0.058), ('hypersphere', 0.057), ('popcount', 0.056), ('projection', 0.055), ('coding', 0.052), ('orthant', 0.052), ('nearest', 0.049), ('cumulative', 0.047), ('retrieval', 0.047), ('unit', 0.047), ('semantic', 0.046), ('scaled', 0.046), ('text', 0.044), ('database', 0.042), ('gong', 0.042), ('sun', 0.041), ('newsgroups', 0.04), ('image', 0.04), ('arg', 0.039), ('law', 0.038), ('lazebnik', 0.037), ('rtx', 0.037), ('distance', 0.036), ('index', 0.035), ('vectors', 0.035), ('neighbors', 0.033), ('imagenet', 0.033), ('locality', 0.033), ('xi', 0.032), ('bk', 0.032), ('smallest', 0.032), ('query', 0.032), ('kulis', 0.031), ('euclidean', 0.031), ('naive', 0.031), ('bitwise', 0.03), ('shrivastava', 0.03), ('recall', 0.029), ('retrieving', 0.028), ('curves', 0.027), ('iis', 0.027), ('voronoi', 0.027), ('sort', 0.027), ('llc', 0.026), ('rotated', 0.026), ('weight', 0.025), ('kumar', 0.025), ('descending', 0.025), ('denser', 0.024), ('maxk', 0.024), ('dense', 0.024), ('rotating', 0.023), ('checking', 0.023), ('unsupervised', 0.023), ('supervised', 0.023), ('nc', 0.023), ('relatively', 0.022), ('dataset', 0.022), ('tr', 0.022), ('works', 0.022), ('objective', 0.022), ('cells', 0.021), ('lemma', 0.02), ('elements', 0.02), ('ones', 0.02), ('similarities', 0.02), ('vertices', 0.02), ('points', 0.02), ('vector', 0.02), ('neighbor', 0.02)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999952 42 nips-2012-Angular Quantization-based Binary Codes for Fast Similarity Search

Author: Yunchao Gong, Sanjiv Kumar, Vishal Verma, Svetlana Lazebnik

Abstract: This paper focuses on the problem of learning binary codes for efficient retrieval of high-dimensional non-negative data that arises in vision and text applications where counts or frequencies are used as features. The similarity of such feature vectors is commonly measured using the cosine of the angle between them. In this work, we introduce a novel angular quantization-based binary coding (AQBC) technique for such data and analyze its properties. In its most basic form, AQBC works by mapping each non-negative feature vector onto the vertex of the binary hypercube with which it has the smallest angle. Even though the number of vertices (quantization landmarks) in this scheme grows exponentially with data dimensionality d, we propose a method for mapping feature vectors to their smallest-angle binary vertices that scales as O(d log d). Further, we propose a method for learning a linear transformation of the data to minimize the quantization error, and show that it results in improved binary codes. Experiments on image and text datasets show that the proposed AQBC method outperforms the state of the art. 1

2 0.23427066 148 nips-2012-Hamming Distance Metric Learning

Author: Mohammad Norouzi, David M. Blei, Ruslan Salakhutdinov

Abstract: Motivated by large-scale multimedia applications we propose to learn mappings from high-dimensional data to binary codes that preserve semantic similarity. Binary codes are well suited to large-scale applications as they are storage efficient and permit exact sub-linear kNN search. The framework is applicable to broad families of mappings, and uses a flexible form of triplet ranking loss. We overcome discontinuous optimization of the discrete mappings by minimizing a piecewise-smooth upper bound on empirical loss, inspired by latent structural SVMs. We develop a new loss-augmented inference algorithm that is quadratic in the code length. We show strong retrieval performance on CIFAR-10 and MNIST, with promising classification results using no more than kNN on the binary codes. 1

3 0.21115224 163 nips-2012-Isotropic Hashing

Author: Weihao Kong, Wu-jun Li

Abstract: Most existing hashing methods adopt some projection functions to project the original data into several dimensions of real values, and then each of these projected dimensions is quantized into one bit (zero or one) by thresholding. Typically, the variances of different projected dimensions are different for existing projection functions such as principal component analysis (PCA). Using the same number of bits for different projected dimensions is unreasonable because larger-variance dimensions will carry more information. Although this viewpoint has been widely accepted by many researchers, it is still not verified by either theory or experiment because no methods have been proposed to find a projection with equal variances for different dimensions. In this paper, we propose a novel method, called isotropic hashing (IsoHash), to learn projection functions which can produce projected dimensions with isotropic variances (equal variances). Experimental results on real data sets show that IsoHash can outperform its counterpart with different variances for different dimensions, which verifies the viewpoint that projections with isotropic variances will be better than those with anisotropic variances. 1

4 0.15840541 329 nips-2012-Super-Bit Locality-Sensitive Hashing

Author: Jianqiu Ji, Jianmin Li, Shuicheng Yan, Bo Zhang, Qi Tian

Abstract: Sign-random-projection locality-sensitive hashing (SRP-LSH) is a probabilistic dimension reduction method which provides an unbiased estimate of angular similarity, yet suffers from the large variance of its estimation. In this work, we propose the Super-Bit locality-sensitive hashing (SBLSH). It is easy to implement, which orthogonalizes the random projection vectors in batches, and it is theoretically guaranteed that SBLSH also provides an unbiased estimate of angular similarity, yet with a smaller variance when the angle to estimate is within (0, ⇡/2]. The extensive experiments on real data well validate that given the same length of binary code, SBLSH may achieve significant mean squared error reduction in estimating pairwise angular similarity. Moreover, SBLSH shows the superiority over SRP-LSH in approximate nearest neighbor (ANN) retrieval experiments. 1

5 0.1499335 40 nips-2012-Analyzing 3D Objects in Cluttered Images

Author: Mohsen Hejrati, Deva Ramanan

Abstract: We present an approach to detecting and analyzing the 3D configuration of objects in real-world images with heavy occlusion and clutter. We focus on the application of finding and analyzing cars. We do so with a two-stage model; the first stage reasons about 2D shape and appearance variation due to within-class variation (station wagons look different than sedans) and changes in viewpoint. Rather than using a view-based model, we describe a compositional representation that models a large number of effective views and shapes using a small number of local view-based templates. We use this model to propose candidate detections and 2D estimates of shape. These estimates are then refined by our second stage, using an explicit 3D model of shape and viewpoint. We use a morphable model to capture 3D within-class variation, and use a weak-perspective camera model to capture viewpoint. We learn all model parameters from 2D annotations. We demonstrate state-of-the-art accuracy for detection, viewpoint estimation, and 3D shape reconstruction on challenging images from the PASCAL VOC 2011 dataset. 1

6 0.12039782 330 nips-2012-Supervised Learning with Similarity Functions

7 0.1130484 365 nips-2012-Why MCA? Nonlinear sparse coding with spike-and-slab prior for neurally plausible image encoding

8 0.11199249 313 nips-2012-Sketch-Based Linear Value Function Approximation

9 0.10278602 184 nips-2012-Learning Probability Measures with respect to Optimal Transport Metrics

10 0.10070287 257 nips-2012-One Permutation Hashing

11 0.097395569 71 nips-2012-Co-Regularized Hashing for Multimodal Data

12 0.086237065 92 nips-2012-Deep Representations and Codes for Image Auto-Annotation

13 0.080918334 203 nips-2012-Locating Changes in Highly Dependent Data with Unknown Number of Change Points

14 0.077384047 202 nips-2012-Locally Uniform Comparison Image Descriptor

15 0.076217115 227 nips-2012-Multiclass Learning with Simplex Coding

16 0.071204066 126 nips-2012-FastEx: Hash Clustering with Exponential Families

17 0.066165894 179 nips-2012-Learning Manifolds with K-Means and K-Flats

18 0.064856969 307 nips-2012-Semi-Crowdsourced Clustering: Generalizing Crowd Labeling by Robust Distance Metric Learning

19 0.061177131 176 nips-2012-Learning Image Descriptors with the Boosting-Trick

20 0.061000176 361 nips-2012-Volume Regularization for Binary Classification


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.161), (1, 0.037), (2, -0.07), (3, -0.052), (4, 0.087), (5, -0.061), (6, -0.01), (7, 0.11), (8, 0.124), (9, 0.066), (10, 0.11), (11, -0.135), (12, 0.057), (13, 0.142), (14, -0.14), (15, 0.111), (16, 0.058), (17, -0.036), (18, -0.157), (19, -0.0), (20, 0.035), (21, 0.098), (22, 0.007), (23, -0.036), (24, -0.031), (25, 0.038), (26, 0.009), (27, 0.02), (28, 0.001), (29, -0.003), (30, 0.045), (31, -0.011), (32, -0.031), (33, -0.039), (34, 0.01), (35, -0.0), (36, -0.049), (37, -0.022), (38, 0.018), (39, 0.047), (40, -0.103), (41, 0.052), (42, 0.022), (43, -0.123), (44, -0.044), (45, 0.083), (46, -0.018), (47, 0.005), (48, 0.018), (49, 0.152)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9315697 42 nips-2012-Angular Quantization-based Binary Codes for Fast Similarity Search

Author: Yunchao Gong, Sanjiv Kumar, Vishal Verma, Svetlana Lazebnik

Abstract: This paper focuses on the problem of learning binary codes for efficient retrieval of high-dimensional non-negative data that arises in vision and text applications where counts or frequencies are used as features. The similarity of such feature vectors is commonly measured using the cosine of the angle between them. In this work, we introduce a novel angular quantization-based binary coding (AQBC) technique for such data and analyze its properties. In its most basic form, AQBC works by mapping each non-negative feature vector onto the vertex of the binary hypercube with which it has the smallest angle. Even though the number of vertices (quantization landmarks) in this scheme grows exponentially with data dimensionality d, we propose a method for mapping feature vectors to their smallest-angle binary vertices that scales as O(d log d). Further, we propose a method for learning a linear transformation of the data to minimize the quantization error, and show that it results in improved binary codes. Experiments on image and text datasets show that the proposed AQBC method outperforms the state of the art. 1

2 0.80936909 329 nips-2012-Super-Bit Locality-Sensitive Hashing

Author: Jianqiu Ji, Jianmin Li, Shuicheng Yan, Bo Zhang, Qi Tian

Abstract: Sign-random-projection locality-sensitive hashing (SRP-LSH) is a probabilistic dimension reduction method which provides an unbiased estimate of angular similarity, yet suffers from the large variance of its estimation. In this work, we propose the Super-Bit locality-sensitive hashing (SBLSH). It is easy to implement, which orthogonalizes the random projection vectors in batches, and it is theoretically guaranteed that SBLSH also provides an unbiased estimate of angular similarity, yet with a smaller variance when the angle to estimate is within (0, ⇡/2]. The extensive experiments on real data well validate that given the same length of binary code, SBLSH may achieve significant mean squared error reduction in estimating pairwise angular similarity. Moreover, SBLSH shows the superiority over SRP-LSH in approximate nearest neighbor (ANN) retrieval experiments. 1

3 0.78039187 163 nips-2012-Isotropic Hashing

Author: Weihao Kong, Wu-jun Li

Abstract: Most existing hashing methods adopt some projection functions to project the original data into several dimensions of real values, and then each of these projected dimensions is quantized into one bit (zero or one) by thresholding. Typically, the variances of different projected dimensions are different for existing projection functions such as principal component analysis (PCA). Using the same number of bits for different projected dimensions is unreasonable because larger-variance dimensions will carry more information. Although this viewpoint has been widely accepted by many researchers, it is still not verified by either theory or experiment because no methods have been proposed to find a projection with equal variances for different dimensions. In this paper, we propose a novel method, called isotropic hashing (IsoHash), to learn projection functions which can produce projected dimensions with isotropic variances (equal variances). Experimental results on real data sets show that IsoHash can outperform its counterpart with different variances for different dimensions, which verifies the viewpoint that projections with isotropic variances will be better than those with anisotropic variances. 1

4 0.74349529 71 nips-2012-Co-Regularized Hashing for Multimodal Data

Author: Yi Zhen, Dit-Yan Yeung

Abstract: Hashing-based methods provide a very promising approach to large-scale similarity search. To obtain compact hash codes, a recent trend seeks to learn the hash functions from data automatically. In this paper, we study hash function learning in the context of multimodal data. We propose a novel multimodal hash function learning method, called Co-Regularized Hashing (CRH), based on a boosted coregularization framework. The hash functions for each bit of the hash codes are learned by solving DC (difference of convex functions) programs, while the learning for multiple bits proceeds via a boosting procedure so that the bias introduced by the hash functions can be sequentially minimized. We empirically compare CRH with two state-of-the-art multimodal hash function learning methods on two publicly available data sets. 1

5 0.71678108 257 nips-2012-One Permutation Hashing

Author: Ping Li, Art Owen, Cun-hui Zhang

Abstract: Minwise hashing is a standard procedure in the context of search, for efficiently estimating set similarities in massive binary data such as text. Recently, b-bit minwise hashing has been applied to large-scale learning and sublinear time nearneighbor search. The major drawback of minwise hashing is the expensive preprocessing, as the method requires applying (e.g.,) k = 200 to 500 permutations on the data. This paper presents a simple solution called one permutation hashing. Conceptually, given a binary data matrix, we permute the columns once and divide the permuted columns evenly into k bins; and we store, for each data vector, the smallest nonzero location in each bin. The probability analysis illustrates that this one permutation scheme should perform similarly to the original (k-permutation) minwise hashing. Our experiments with training SVM and logistic regression confirm that one permutation hashing can achieve similar (or even better) accuracies compared to the k-permutation scheme. See more details in arXiv:1208.1259.

6 0.70921415 148 nips-2012-Hamming Distance Metric Learning

7 0.65878421 313 nips-2012-Sketch-Based Linear Value Function Approximation

8 0.43089086 202 nips-2012-Locally Uniform Comparison Image Descriptor

9 0.41781455 338 nips-2012-The Perturbed Variation

10 0.41260394 330 nips-2012-Supervised Learning with Similarity Functions

11 0.40586132 176 nips-2012-Learning Image Descriptors with the Boosting-Trick

12 0.37184125 92 nips-2012-Deep Representations and Codes for Image Auto-Annotation

13 0.37108955 365 nips-2012-Why MCA? Nonlinear sparse coding with spike-and-slab prior for neurally plausible image encoding

14 0.36727276 133 nips-2012-Finding Exemplars from Pairwise Dissimilarities via Simultaneous Sparse Recovery

15 0.36286417 184 nips-2012-Learning Probability Measures with respect to Optimal Transport Metrics

16 0.35780439 343 nips-2012-Tight Bounds on Profile Redundancy and Distinguishability

17 0.35556355 250 nips-2012-On-line Reinforcement Learning Using Incremental Kernel-Based Stochastic Factorization

18 0.3456549 140 nips-2012-Fusion with Diffusion for Robust Visual Tracking

19 0.3452028 278 nips-2012-Probabilistic n-Choose-k Models for Classification and Ranking

20 0.33563596 203 nips-2012-Locating Changes in Highly Dependent Data with Unknown Number of Change Points


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.024), (17, 0.017), (21, 0.015), (38, 0.093), (42, 0.022), (44, 0.021), (53, 0.015), (54, 0.023), (55, 0.023), (74, 0.088), (76, 0.088), (80, 0.068), (92, 0.417)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.90237522 87 nips-2012-Convolutional-Recursive Deep Learning for 3D Object Classification

Author: Richard Socher, Brody Huval, Bharath Bath, Christopher D. Manning, Andrew Y. Ng

Abstract: Recent advances in 3D sensing technologies make it possible to easily record color and depth images which together can improve object recognition. Most current methods rely on very well-designed features for this new 3D modality. We introduce a model based on a combination of convolutional and recursive neural networks (CNN and RNN) for learning features and classifying RGB-D images. The CNN layer learns low-level translationally invariant features which are then given as inputs to multiple, fixed-tree RNNs in order to compose higher order features. RNNs can be seen as combining convolution and pooling into one efficient, hierarchical operation. Our main result is that even RNNs with random weights compose powerful features. Our model obtains state of the art performance on a standard RGB-D object dataset while being more accurate and faster during training and testing than comparable architectures such as two-layer CNNs. 1

same-paper 2 0.86558867 42 nips-2012-Angular Quantization-based Binary Codes for Fast Similarity Search

Author: Yunchao Gong, Sanjiv Kumar, Vishal Verma, Svetlana Lazebnik

Abstract: This paper focuses on the problem of learning binary codes for efficient retrieval of high-dimensional non-negative data that arises in vision and text applications where counts or frequencies are used as features. The similarity of such feature vectors is commonly measured using the cosine of the angle between them. In this work, we introduce a novel angular quantization-based binary coding (AQBC) technique for such data and analyze its properties. In its most basic form, AQBC works by mapping each non-negative feature vector onto the vertex of the binary hypercube with which it has the smallest angle. Even though the number of vertices (quantization landmarks) in this scheme grows exponentially with data dimensionality d, we propose a method for mapping feature vectors to their smallest-angle binary vertices that scales as O(d log d). Further, we propose a method for learning a linear transformation of the data to minimize the quantization error, and show that it results in improved binary codes. Experiments on image and text datasets show that the proposed AQBC method outperforms the state of the art. 1

3 0.85762143 44 nips-2012-Approximating Concavely Parameterized Optimization Problems

Author: Joachim Giesen, Jens Mueller, Soeren Laue, Sascha Swiercy

Abstract: We consider an abstract class of optimization problems that are parameterized concavely in a single parameter, and show that the solution path along the √ parameter can always be approximated with accuracy ε > 0 by a set of size O(1/ ε). A √ lower bound of size Ω(1/ ε) shows that the upper bound is tight up to a constant factor. We also devise an algorithm that calls a step-size oracle and computes an √ approximate path of size O(1/ ε). Finally, we provide an implementation of the oracle for soft-margin support vector machines, and a parameterized semi-definite program for matrix completion. 1

4 0.84861529 82 nips-2012-Continuous Relaxations for Discrete Hamiltonian Monte Carlo

Author: Yichuan Zhang, Zoubin Ghahramani, Amos J. Storkey, Charles A. Sutton

Abstract: Continuous relaxations play an important role in discrete optimization, but have not seen much use in approximate probabilistic inference. Here we show that a general form of the Gaussian Integral Trick makes it possible to transform a wide class of discrete variable undirected models into fully continuous systems. The continuous representation allows the use of gradient-based Hamiltonian Monte Carlo for inference, results in new ways of estimating normalization constants (partition functions), and in general opens up a number of new avenues for inference in difficult discrete systems. We demonstrate some of these continuous relaxation inference algorithms on a number of illustrative problems. 1

5 0.81403649 349 nips-2012-Training sparse natural image models with a fast Gibbs sampler of an extended state space

Author: Lucas Theis, Jascha Sohl-dickstein, Matthias Bethge

Abstract: We present a new learning strategy based on an efficient blocked Gibbs sampler for sparse overcomplete linear models. Particular emphasis is placed on statistical image modeling, where overcomplete models have played an important role in discovering sparse representations. Our Gibbs sampler is faster than general purpose sampling schemes while also requiring no tuning as it is free of parameters. Using the Gibbs sampler and a persistent variant of expectation maximization, we are able to extract highly sparse distributions over latent sources from data. When applied to natural images, our algorithm learns source distributions which resemble spike-and-slab distributions. We evaluate the likelihood and quantitatively compare the performance of the overcomplete linear model to its complete counterpart as well as a product of experts model, which represents another overcomplete generalization of the complete linear model. In contrast to previous claims, we find that overcomplete representations lead to significant improvements, but that the overcomplete linear model still underperforms other models. 1

6 0.7960366 110 nips-2012-Efficient Reinforcement Learning for High Dimensional Linear Quadratic Systems

7 0.65067756 101 nips-2012-Discriminatively Trained Sparse Code Gradients for Contour Detection

8 0.61632729 235 nips-2012-Natural Images, Gaussian Mixtures and Dead Leaves

9 0.61557305 329 nips-2012-Super-Bit Locality-Sensitive Hashing

10 0.60402942 229 nips-2012-Multimodal Learning with Deep Boltzmann Machines

11 0.58945769 260 nips-2012-Online Sum-Product Computation Over Trees

12 0.58161449 90 nips-2012-Deep Learning of Invariant Features via Simulated Fixations in Video

13 0.57698691 94 nips-2012-Delay Compensation with Dynamical Synapses

14 0.56964916 65 nips-2012-Cardinality Restricted Boltzmann Machines

15 0.56954694 24 nips-2012-A mechanistic model of early sensory processing based on subtracting sparse representations

16 0.56544554 102 nips-2012-Distributed Non-Stochastic Experts

17 0.5646987 197 nips-2012-Learning with Recursive Perceptual Representations

18 0.56286561 148 nips-2012-Hamming Distance Metric Learning

19 0.56019962 163 nips-2012-Isotropic Hashing

20 0.55185175 4 nips-2012-A Better Way to Pretrain Deep Boltzmann Machines