nips nips2012 nips2012-71 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Yi Zhen, Dit-Yan Yeung
Abstract: Hashing-based methods provide a very promising approach to large-scale similarity search. To obtain compact hash codes, a recent trend seeks to learn the hash functions from data automatically. In this paper, we study hash function learning in the context of multimodal data. We propose a novel multimodal hash function learning method, called Co-Regularized Hashing (CRH), based on a boosted coregularization framework. The hash functions for each bit of the hash codes are learned by solving DC (difference of convex functions) programs, while the learning for multiple bits proceeds via a boosting procedure so that the bias introduced by the hash functions can be sequentially minimized. We empirically compare CRH with two state-of-the-art multimodal hash function learning methods on two publicly available data sets. 1
Reference: text
sentIndex sentText sentNum sentScore
1 hk Abstract Hashing-based methods provide a very promising approach to large-scale similarity search. [sent-3, score-0.044]
2 To obtain compact hash codes, a recent trend seeks to learn the hash functions from data automatically. [sent-4, score-0.672]
3 In this paper, we study hash function learning in the context of multimodal data. [sent-5, score-0.432]
4 We propose a novel multimodal hash function learning method, called Co-Regularized Hashing (CRH), based on a boosted coregularization framework. [sent-6, score-0.455]
5 The hash functions for each bit of the hash codes are learned by solving DC (difference of convex functions) programs, while the learning for multiple bits proceeds via a boosting procedure so that the bias introduced by the hash functions can be sequentially minimized. [sent-7, score-1.153]
6 We empirically compare CRH with two state-of-the-art multimodal hash function learning methods on two publicly available data sets. [sent-8, score-0.432]
7 similarity search, plays a fundamental role in many important applications, including document retrieval, object recognition, and near-duplicate detection. [sent-12, score-0.06]
8 Among the methods proposed thus far for nearest neighbor search [1], hashing-based methods [2, 3] have attracted considerable interest in recent years. [sent-13, score-0.056]
9 The major advantage of hashing-based methods is that they index data using binary hash codes which enjoy not only low storage requirements but also high computational efficiency. [sent-14, score-0.352]
10 To preserve similarity in the data, a family of algorithms called locality sensitive hashing (LSH) [4, 5] has been developed over the past decade. [sent-15, score-0.246]
11 The basic idea of LSH is to hash the data into bins so that the collision probability reflects data similarity. [sent-16, score-0.307]
12 However, in practice LSH algorithms often generate long hash codes in order to achieve acceptable performance because the theoretical guarantee only holds asymptotically. [sent-18, score-0.352]
13 This shortcoming can be attributed largely to their data-independent nature which cannot capture the data characteristics very accurately in the hash codes. [sent-19, score-0.322]
14 Besides, in many applications, neighbors cannot be defined easily using some generic distance or similarity measures. [sent-20, score-0.067]
15 As such, a new research trend has emerged over the past few years by learning the hash functions from data automatically. [sent-21, score-0.35]
16 In the sequel, we refer to this new trend as hash function learning (HFL). [sent-22, score-0.325]
17 Boosting, as one of the most popular machine learning approaches, was first applied to learning hash functions for pose estimation [6]. [sent-23, score-0.332]
18 These two early HFL methods have been successfully applied to content-based image retrieval in which large-scale data sets are commonly encountered [8]. [sent-25, score-0.106]
19 Spectral hashing (SH) [9] treats HFL as a special case of manifold learning and uses an efficient algorithm based on eigenfunctions. [sent-27, score-0.202]
20 One shortcoming of spectral hashing is in its assumption, which requires that the data be uniformly distributed. [sent-28, score-0.217]
21 To overcome this limitation, several methods have been proposed, including binary reconstructive embeddings [10], shift-invariant kernel hashing [11], distribution matching [12], optmized kernel hashing [13], and minimal loss hashing [14]. [sent-29, score-0.655]
22 Recently, some semi-supervised hashing models have 1 been developed to combine both feature similarity and semantic similarity for HFL [15, 16, 17, 18]. [sent-30, score-0.308]
23 Nevertheless, they can only be applied to a single type of data, called unimodal data, which refer to data from a single modality such as image, text, or audio. [sent-34, score-0.055]
24 Nowadays, it is common to find similarity search applications that involve multimodal data. [sent-35, score-0.185]
25 For example, given an image of a tourist attraction as query, one would like to retrieve some textual documents that provide more detailed information about the place of interest. [sent-36, score-0.132]
26 Because data from different modalities reside in different feature spaces, performing multimodal similarity search will be made much easier and faster if the multimodal data can be mapped into a common Hamming space. [sent-37, score-0.356]
27 However, it is challenging to do so because data from different modalities generally have very different representations. [sent-38, score-0.046]
28 As far as we know, there exist only two multimodal HFL methods. [sent-39, score-0.125]
29 [20] made the first attempt to learn linear hash functions using eigendecomposition and boosting, while Kumar et al. [sent-41, score-0.348]
30 [21] extended spectral hashing to the multiview setting and proposed a cross-view hashing model. [sent-42, score-0.42]
31 Moreover, they consider applications for shape retrieval, image alignment, and people search which are quite different from the multimodal retrieval applications of interest here. [sent-44, score-0.247]
32 In this paper, we propose a novel multimodal HFL method, called Co-Regularized Hashing (CRH), based on a boosted co-regularization framework. [sent-45, score-0.148]
33 For each bit of the hash codes, CRH learns a group of hash functions, one for each modality, by minimizing a novel loss function. [sent-46, score-0.687]
34 Although the loss function is non-convex, it is in a special form which can be expressed as a difference of convex functions. [sent-47, score-0.047]
35 After learning the hash functions for one bit, CRH proceeds to learn more bits via a boosting procedure such that the bias introduced by the hash functions can be sequentially minimized. [sent-50, score-0.743]
36 , {xi ∈ X }I for a i=1 set of I images from some feature space X and {yj ∈ Y}J for a set of J textual docuj=1 ments from another feature space Y. [sent-59, score-0.04]
37 We further assume that each pair has a label sn = 1 if xan and ybn are similar and sn = 0 otherwise. [sent-64, score-0.409]
38 The notion of inter-modality similarity varies from application to application. [sent-65, score-0.044]
39 For example, if an image includes a tiger and a textual document is a research paper on tigers, they should be labeled as similar. [sent-66, score-0.129]
40 On the other hand, it is highly unlikely to label the image as similar to a textual document on basketball. [sent-67, score-0.129]
41 Our goal is to achieve HFL by learning wx and wy from the multimodal data. [sent-69, score-0.73]
42 1 For simplicity of our presentation, we focus on the bimodal case here and leave the discussion on extension to more than two modalities to Section 2. [sent-70, score-0.046]
43 (with respect to) wx and wy : O= 1 I I x i + i=1 J 1 J N y j +γ ωn ∗ n + n=1 j=1 λx wx 2 2 λy wy 2 + 2 , (1) where x and y are intra-modality loss terms for modalities X and Y, respectively. [sent-75, score-1.287]
44 In this work, we i j define them as: x i y j T = 1 − f (xi )(wx xi ) = 1− + T g(yj )(wy yj ) + T = 1 − |wx xi | = 1− , + T |wy yj | + , where [a]+ is equal to a if a ≥ 0 and 0 otherwise. [sent-76, score-0.172]
45 We note that the intra-modality loss terms are similar to the hinge loss in the (linear) support vector machine but have quite different meaning. [sent-77, score-0.062]
46 Conceptually, we want the projected values to be far away from 0 and hence expect the hash functions learned to have good generalization ability [16]. [sent-78, score-0.332]
47 For the inter-modality loss term ∗ , we n N associate with each point pair a weight ωn , with n=1 ωn = 1, to normalize the loss as well as compute the bias of the hash functions. [sent-79, score-0.369]
48 In this paper, we define ∗ as n ∗ n = sn d2 + (1 − sn )τ (dn ), n T T where dn = wx xan − wy ybn and τ (d) is called the smoothly clipped inverted squared deviation (SCISD) function. [sent-80, score-1.163]
49 , sn = 1, have small distance after projection, and the dissimilar ones, i. [sent-83, score-0.106]
50 With these two kinds of loss terms, we expect that the learned hash functions can enjoy the large-margin property while effectively preserving the inter-modality similarity. [sent-86, score-0.387]
51 The SCISD function penalizes projection vectors that result in small distance between dissimilar points after projection. [sent-89, score-0.06]
52 Take wx for example, we remove the irrelevant terms and get the following objective: 1 I I x i i=1 + λx wx 2 N 2 +γ ωn ∗ n, (2) n=1 where x i 0 T = 1 − wx xi T 1 + wx xi T if |wx xi | ≥ 1 T if 0 ≤ wx xi < 1 T if −1 < wx xi < 0. [sent-100, score-1.919]
53 It is easy to realize that the objective function (2) can be expressed as a difference of two convex functions in different cases. [sent-101, score-0.061]
54 As a consequence, we can use CCCP to solve the nonconvex optimization problem iteratively with each iteration minimizing a convex upper bound of the original objective function. [sent-102, score-0.051]
55 At the tth iteration, CCCP minimizes the following convex upper bound of f0 (x) − g0 (x) at location x(t) : f0 (x) − g0 (x(t) ) + ∂x g0 (x(t) )(x − x(t) ) , where ∂x g0 (x(t) ) is the first derivative of g0 (x) at x(t) . [sent-105, score-0.052]
56 wx : Ox = N 2 λx wx 2 x ωn sn d2 + (1 − sn )ζn + n +γ n=1 (t) (t) (t) (t) 1 I I x i, (3) i=1 (t) (t) x T where ζn = τ1 (dn ) − τ2 (dn ) − dn xTn (wx − wx ), dn = (wx )T xan − wy ybn , and wx is the a value of wx at the tth iteration. [sent-111, score-2.564]
57 Specifically, we randomly select k points from each modality and l point pairs to evaluate the sub-gradient at each iteration. [sent-114, score-0.081]
58 0 T sgn wx xi xi Similarly, the objective function for the optimization problem w. [sent-119, score-0.409]
59 The corresponding sub-gradient is given by N N 1 ∂Oy = −2γ ωn sn dn ybn − γ ωn µy + λy wy − n ∂wy J n=1 n=1 where µy = (1 − sn ) n ∂τ1 ∂dn (t) − dn πy = j ybn and 0 T sgn wy yj yj 4 T if |wy yj | ≥ 1 T if |wy yj | < 1. [sent-123, score-1.637]
60 3 Algorithm So far we have only discussed how to learn the hash functions for one bit of the hash codes. [sent-125, score-0.681]
61 To learn the hash functions for multiple bits, one could repeat the same procedure and treat the learning for each bit independently. [sent-126, score-0.374]
62 In other words, to learn compact hash codes, we should coordinate the learning of hash functions for different bits. [sent-128, score-0.654]
63 Intuitively, this approach allows learning of the hash functions in later stages to be aware of the bias introduced by their antecedents. [sent-130, score-0.332]
64 Compute error of current hash functions Input: X , Y – multimodal data Θ – inter-modality point pairs K – code length λx , λy , γ – regularization parameters a, λ – parameters for SCISD function Output: (k) wx , k = 1, . [sent-133, score-0.8]
65 , K – projection vectors for Y k N = n=1 (k) ωn I[sn =hn ] , where I[a] = 1 if a is true and I[a] = 0 otherwise, and hn = Procedure: (1) Initialize ωn = 1/N, ∀n ∈ {1, 2, . [sent-139, score-0.047]
66 for k = 1 to K do repeat (k) Optimize Equation (3) to get wx ; (k) Optimize Equation (5) to get wy ; 1 0 if f (xan ) = g(ybn ) if f (xan ) = g(ybn ). [sent-143, score-0.605]
67 First, we note that it is easy to extend CRH to learn nonlinear hash functions via the kernel trick [26]. [sent-158, score-0.332]
68 Specifically, according to the generalized representer theorem [27], we can represent the projection vectors wx and wy as wx = I i=1 αi φx (xi ) and wy = J j=1 βj φy (yj ), where φx (·) and φy (·) are kernel-induced feature maps for modalities X and Y, respectively. [sent-159, score-1.292]
69 Then the objective function (1) can be expressed in kernel form and kernel-based hash functions can be learned by minimizing a new but very similar objective function. [sent-160, score-0.372]
70 Taking a new modality Z for example, we need to incorporate into Equation (1) the following terms: loss and regularization terms for Z, and all pairwise loss terms involving Z and other modalities, e. [sent-162, score-0.102]
71 5 Discussions CRH is closely related to a recent multimodal metric learning method called Multiview Neighborhood Preserving Projections (Multi-NPP) [23], because CRH uses a loss function for inter-modality point pairs which is similar to Multi-NPP. [sent-167, score-0.18]
72 However, CRH is a general framework and other loss functions for inter-modality point pairs can also be adopted. [sent-168, score-0.08]
73 Second, in addition to the inter-modality loss term, the objective function in CRH includes two intra-modality loss terms for large margin HFL while Multi-NPP only has a loss term for the inter-modality point pairs. [sent-171, score-0.113]
74 Third, CRH uses boosting to sequentially learn the hash functions but Multi-NPP does not take this aspect into consideration. [sent-172, score-0.377]
75 As discussed briefly in [23], one may first use Multi-NPP to map multimodal data into a common real space and then apply any unimodal HFL method for multimodal hashing. [sent-173, score-0.288]
76 First, both stages can introduce information loss which impairs the quality of the hash functions learned. [sent-175, score-0.363]
77 1 Experimental Settings In our experiments, we compare CRH with two state-of-the-art multimodal hashing methods, namely, Cross-Modal Similarity Sensitive Hashing (CMSSH) [20]2 and Cross-View Hashing (CVH) [21],3 for two cross-modal retrieval tasks: (1) image query vs. [sent-179, score-0.541]
78 The goal of each retrieval task is to find from the text (image) database the nearest neighbors for the image (text) query. [sent-182, score-0.334]
79 We use two benchmark data sets which are, to the best of our knowledge, the largest fully paired and labeled multimodal data sets. [sent-183, score-0.125]
80 We further divide each data set into a database set and a query set. [sent-184, score-0.209]
81 To train the models, we randomly select a group of documents from the database set to form the training set. [sent-185, score-0.14]
82 The mean average precision (mAP) is used as the performance measure. [sent-189, score-0.041]
83 The mAP is then computed by averaging the AP values over all the queries in the query set. [sent-191, score-0.108]
84 Besides, we also report the precision and recall within a fixed Hamming radius. [sent-194, score-0.075]
85 4 In each pair, the text is an article describing some events or people and the image is closely related to 2 We used the implementation generously provided by the authors. [sent-203, score-0.156]
86 The images are represented by 128-dimensional SIFT [28] feature vectors, while the text articles are represented by the probability distributions over 10 topics learned by a latent Dirichlet allocation (LDA) model [29]. [sent-209, score-0.083]
87 Moreover, we use 80% of the data as the database set and the remaining 20% to form the query set. [sent-212, score-0.209]
88 We note that CMSSH ignores the intra-modality relational information and CVH simply treats each bit independently. [sent-215, score-0.042]
89 We then plot the precision-recall curves and recall curves for all three methods in the remaining subfigures. [sent-258, score-0.064]
90 3 Results on Flickr The Flickr data set consists of 186,577 image-tag pairs pruned from the NUS data set5 [30] by keeping the pairs that belong to one of the 10 largest classes. [sent-310, score-0.048]
91 Each pair is annotated by at least one of 10 semantic labels, and two points are defined as neighbors if they share at least one label. [sent-313, score-0.058]
92 We use 99% of the data as the database set and the remaining 1% to form the query set. [sent-314, score-0.209]
93 text database, CRH performs comparably to CMSSH, which is better than CVH. [sent-322, score-0.083]
94 The precision-recall curves and recall curves are shown in the remaining subfigures. [sent-365, score-0.064]
95 of Retrieved Points 3 4 8 x 10 (f) Recall curve Figure 2: Results on Flickr 4 Conclusions In this paper, we have presented a novel method for multimodal hash function learning based on a boosted co-regularization framework. [sent-420, score-0.486]
96 Comparative studies based on two benchmark data sets show that CRH outperforms two state-of-the-art multimodal hashing methods. [sent-422, score-0.327]
97 To take this work further, we would like to conduct theoretical analysis of CRH and apply it to some other tasks such as multimodal medical image alignment. [sent-423, score-0.198]
98 Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. [sent-436, score-0.242]
99 Multiple feature hashing for real-time large scale near-duplicate video retrieval. [sent-481, score-0.202]
100 NUS-WIDE: A real-world web image database from National University of Singapore. [sent-528, score-0.174]
wordName wordTfidf (topN-words)
[('crh', 0.544), ('hash', 0.307), ('wx', 0.304), ('wy', 0.301), ('cvh', 0.285), ('cmssh', 0.272), ('hashing', 0.202), ('hfl', 0.155), ('dn', 0.149), ('ybn', 0.129), ('multimodal', 0.125), ('xan', 0.116), ('query', 0.108), ('database', 0.101), ('cccp', 0.095), ('text', 0.083), ('sn', 0.082), ('image', 0.073), ('yj', 0.067), ('wiki', 0.065), ('retrieved', 0.056), ('scisd', 0.052), ('sgn', 0.047), ('modalities', 0.046), ('codes', 0.045), ('similarity', 0.044), ('bit', 0.042), ('precision', 0.041), ('flickr', 0.04), ('textual', 0.04), ('modality', 0.04), ('hamming', 0.037), ('tth', 0.036), ('lsh', 0.035), ('recall', 0.034), ('bits', 0.034), ('retrieval', 0.033), ('bronstein', 0.032), ('curve', 0.031), ('loss', 0.031), ('hong', 0.03), ('piotr', 0.03), ('boosting', 0.029), ('hn', 0.028), ('sub', 0.027), ('ox', 0.026), ('oy', 0.026), ('richang', 0.026), ('rob', 0.026), ('functions', 0.025), ('preserving', 0.024), ('dissimilar', 0.024), ('pairs', 0.024), ('neighbors', 0.023), ('kumar', 0.023), ('boosted', 0.023), ('map', 0.023), ('nearest', 0.021), ('sanjiv', 0.021), ('radius', 0.02), ('objective', 0.02), ('trevor', 0.02), ('shakhnarovich', 0.02), ('gregory', 0.02), ('training', 0.02), ('projection', 0.019), ('ap', 0.019), ('documents', 0.019), ('xi', 0.019), ('cvpr', 0.019), ('neighbor', 0.019), ('semantic', 0.018), ('pegasos', 0.018), ('reconstructive', 0.018), ('trend', 0.018), ('yair', 0.017), ('brie', 0.017), ('representer', 0.017), ('indyk', 0.017), ('points', 0.017), ('document', 0.016), ('david', 0.016), ('luo', 0.016), ('multiview', 0.016), ('antonio', 0.016), ('sequentially', 0.016), ('search', 0.016), ('convex', 0.016), ('eigendecomposition', 0.016), ('kong', 0.016), ('jun', 0.016), ('besides', 0.016), ('unimodal', 0.015), ('curves', 0.015), ('compact', 0.015), ('varying', 0.015), ('code', 0.015), ('nonconvex', 0.015), ('wei', 0.015), ('shortcoming', 0.015)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999994 71 nips-2012-Co-Regularized Hashing for Multimodal Data
Author: Yi Zhen, Dit-Yan Yeung
Abstract: Hashing-based methods provide a very promising approach to large-scale similarity search. To obtain compact hash codes, a recent trend seeks to learn the hash functions from data automatically. In this paper, we study hash function learning in the context of multimodal data. We propose a novel multimodal hash function learning method, called Co-Regularized Hashing (CRH), based on a boosted coregularization framework. The hash functions for each bit of the hash codes are learned by solving DC (difference of convex functions) programs, while the learning for multiple bits proceeds via a boosting procedure so that the bias introduced by the hash functions can be sequentially minimized. We empirically compare CRH with two state-of-the-art multimodal hash function learning methods on two publicly available data sets. 1
2 0.24714471 313 nips-2012-Sketch-Based Linear Value Function Approximation
Author: Marc Bellemare, Joel Veness, Michael Bowling
Abstract: Hashing is a common method to reduce large, potentially infinite feature vectors to a fixed-size table. In reinforcement learning, hashing is often used in conjunction with tile coding to represent states in continuous spaces. Hashing is also a promising approach to value function approximation in large discrete domains such as Go and Hearts, where feature vectors can be constructed by exhaustively combining a set of atomic features. Unfortunately, the typical use of hashing in value function approximation results in biased value estimates due to the possibility of collisions. Recent work in data stream summaries has led to the development of the tug-of-war sketch, an unbiased estimator for approximating inner products. Our work investigates the application of this new data structure to linear value function approximation. Although in the reinforcement learning setting the use of the tug-of-war sketch leads to biased value estimates, we show that this bias can be orders of magnitude less than that of standard hashing. We provide empirical results on two RL benchmark domains and fifty-five Atari 2600 games to highlight the superior learning performance obtained when using tug-of-war hashing. 1
3 0.20459287 148 nips-2012-Hamming Distance Metric Learning
Author: Mohammad Norouzi, David M. Blei, Ruslan Salakhutdinov
Abstract: Motivated by large-scale multimedia applications we propose to learn mappings from high-dimensional data to binary codes that preserve semantic similarity. Binary codes are well suited to large-scale applications as they are storage efficient and permit exact sub-linear kNN search. The framework is applicable to broad families of mappings, and uses a flexible form of triplet ranking loss. We overcome discontinuous optimization of the discrete mappings by minimizing a piecewise-smooth upper bound on empirical loss, inspired by latent structural SVMs. We develop a new loss-augmented inference algorithm that is quadratic in the code length. We show strong retrieval performance on CIFAR-10 and MNIST, with promising classification results using no more than kNN on the binary codes. 1
4 0.15490973 58 nips-2012-Bayesian models for Large-scale Hierarchical Classification
Author: Siddharth Gopal, Yiming Yang, Bing Bai, Alexandru Niculescu-mizil
Abstract: A challenging problem in hierarchical classification is to leverage the hierarchical relations among classes for improving classification performance. An even greater challenge is to do so in a manner that is computationally feasible for large scale problems. This paper proposes a set of Bayesian methods to model hierarchical dependencies among class labels using multivariate logistic regression. Specifically, the parent-child relationships are modeled by placing a hierarchical prior over the children nodes centered around the parameters of their parents; thereby encouraging classes nearby in the hierarchy to share similar model parameters. We present variational algorithms for tractable posterior inference in these models, and provide a parallel implementation that can comfortably handle largescale problems with hundreds of thousands of dimensions and tens of thousands of classes. We run a comparative evaluation on multiple large-scale benchmark datasets that highlights the scalability of our approach and shows improved performance over the other state-of-the-art hierarchical methods. 1
5 0.15430732 126 nips-2012-FastEx: Hash Clustering with Exponential Families
Author: Amr Ahmed, Sujith Ravi, Alex J. Smola, Shravan M. Narayanamurthy
Abstract: Clustering is a key component in any data analysis toolbox. Despite its importance, scalable algorithms often eschew rich statistical models in favor of simpler descriptions such as k-means clustering. In this paper we present a sampler, capable of estimating mixtures of exponential families. At its heart lies a novel proposal distribution using random projections to achieve high throughput in generating proposals, which is crucial for clustering models with large numbers of clusters. 1
6 0.13038643 257 nips-2012-One Permutation Hashing
7 0.099758364 229 nips-2012-Multimodal Learning with Deep Boltzmann Machines
8 0.098395593 329 nips-2012-Super-Bit Locality-Sensitive Hashing
9 0.097395569 42 nips-2012-Angular Quantization-based Binary Codes for Fast Similarity Search
10 0.097380348 163 nips-2012-Isotropic Hashing
11 0.061685808 176 nips-2012-Learning Image Descriptors with the Boosting-Trick
12 0.054278713 92 nips-2012-Deep Representations and Codes for Image Auto-Annotation
13 0.043451164 202 nips-2012-Locally Uniform Comparison Image Descriptor
14 0.037462488 234 nips-2012-Multiresolution analysis on the symmetric group
15 0.036345676 97 nips-2012-Diffusion Decision Making for Adaptive k-Nearest Neighbor Classification
16 0.03583923 130 nips-2012-Feature-aware Label Space Dimension Reduction for Multi-label Classification
17 0.035167895 307 nips-2012-Semi-Crowdsourced Clustering: Generalizing Crowd Labeling by Robust Distance Metric Learning
18 0.034733701 106 nips-2012-Dynamical And-Or Graph Learning for Object Shape Modeling and Detection
19 0.033808675 197 nips-2012-Learning with Recursive Perceptual Representations
20 0.03364509 279 nips-2012-Projection Retrieval for Classification
topicId topicWeight
[(0, 0.107), (1, 0.022), (2, -0.061), (3, -0.051), (4, 0.039), (5, -0.075), (6, -0.009), (7, 0.102), (8, 0.13), (9, 0.064), (10, 0.109), (11, -0.121), (12, 0.081), (13, 0.23), (14, -0.212), (15, 0.076), (16, 0.065), (17, -0.058), (18, -0.175), (19, -0.041), (20, 0.002), (21, 0.1), (22, 0.076), (23, -0.059), (24, -0.033), (25, 0.006), (26, 0.018), (27, -0.031), (28, -0.031), (29, -0.033), (30, -0.007), (31, -0.036), (32, -0.04), (33, 0.002), (34, 0.018), (35, -0.034), (36, -0.006), (37, -0.067), (38, -0.03), (39, -0.005), (40, 0.059), (41, 0.027), (42, 0.048), (43, 0.025), (44, 0.008), (45, -0.076), (46, -0.04), (47, 0.1), (48, 0.041), (49, -0.049)]
simIndex simValue paperId paperTitle
same-paper 1 0.92604446 71 nips-2012-Co-Regularized Hashing for Multimodal Data
Author: Yi Zhen, Dit-Yan Yeung
Abstract: Hashing-based methods provide a very promising approach to large-scale similarity search. To obtain compact hash codes, a recent trend seeks to learn the hash functions from data automatically. In this paper, we study hash function learning in the context of multimodal data. We propose a novel multimodal hash function learning method, called Co-Regularized Hashing (CRH), based on a boosted coregularization framework. The hash functions for each bit of the hash codes are learned by solving DC (difference of convex functions) programs, while the learning for multiple bits proceeds via a boosting procedure so that the bias introduced by the hash functions can be sequentially minimized. We empirically compare CRH with two state-of-the-art multimodal hash function learning methods on two publicly available data sets. 1
2 0.82039344 313 nips-2012-Sketch-Based Linear Value Function Approximation
Author: Marc Bellemare, Joel Veness, Michael Bowling
Abstract: Hashing is a common method to reduce large, potentially infinite feature vectors to a fixed-size table. In reinforcement learning, hashing is often used in conjunction with tile coding to represent states in continuous spaces. Hashing is also a promising approach to value function approximation in large discrete domains such as Go and Hearts, where feature vectors can be constructed by exhaustively combining a set of atomic features. Unfortunately, the typical use of hashing in value function approximation results in biased value estimates due to the possibility of collisions. Recent work in data stream summaries has led to the development of the tug-of-war sketch, an unbiased estimator for approximating inner products. Our work investigates the application of this new data structure to linear value function approximation. Although in the reinforcement learning setting the use of the tug-of-war sketch leads to biased value estimates, we show that this bias can be orders of magnitude less than that of standard hashing. We provide empirical results on two RL benchmark domains and fifty-five Atari 2600 games to highlight the superior learning performance obtained when using tug-of-war hashing. 1
3 0.81828523 257 nips-2012-One Permutation Hashing
Author: Ping Li, Art Owen, Cun-hui Zhang
Abstract: Minwise hashing is a standard procedure in the context of search, for efficiently estimating set similarities in massive binary data such as text. Recently, b-bit minwise hashing has been applied to large-scale learning and sublinear time nearneighbor search. The major drawback of minwise hashing is the expensive preprocessing, as the method requires applying (e.g.,) k = 200 to 500 permutations on the data. This paper presents a simple solution called one permutation hashing. Conceptually, given a binary data matrix, we permute the columns once and divide the permuted columns evenly into k bins; and we store, for each data vector, the smallest nonzero location in each bin. The probability analysis illustrates that this one permutation scheme should perform similarly to the original (k-permutation) minwise hashing. Our experiments with training SVM and logistic regression confirm that one permutation hashing can achieve similar (or even better) accuracies compared to the k-permutation scheme. See more details in arXiv:1208.1259.
4 0.75550807 329 nips-2012-Super-Bit Locality-Sensitive Hashing
Author: Jianqiu Ji, Jianmin Li, Shuicheng Yan, Bo Zhang, Qi Tian
Abstract: Sign-random-projection locality-sensitive hashing (SRP-LSH) is a probabilistic dimension reduction method which provides an unbiased estimate of angular similarity, yet suffers from the large variance of its estimation. In this work, we propose the Super-Bit locality-sensitive hashing (SBLSH). It is easy to implement, which orthogonalizes the random projection vectors in batches, and it is theoretically guaranteed that SBLSH also provides an unbiased estimate of angular similarity, yet with a smaller variance when the angle to estimate is within (0, ⇡/2]. The extensive experiments on real data well validate that given the same length of binary code, SBLSH may achieve significant mean squared error reduction in estimating pairwise angular similarity. Moreover, SBLSH shows the superiority over SRP-LSH in approximate nearest neighbor (ANN) retrieval experiments. 1
5 0.7470234 163 nips-2012-Isotropic Hashing
Author: Weihao Kong, Wu-jun Li
Abstract: Most existing hashing methods adopt some projection functions to project the original data into several dimensions of real values, and then each of these projected dimensions is quantized into one bit (zero or one) by thresholding. Typically, the variances of different projected dimensions are different for existing projection functions such as principal component analysis (PCA). Using the same number of bits for different projected dimensions is unreasonable because larger-variance dimensions will carry more information. Although this viewpoint has been widely accepted by many researchers, it is still not verified by either theory or experiment because no methods have been proposed to find a projection with equal variances for different dimensions. In this paper, we propose a novel method, called isotropic hashing (IsoHash), to learn projection functions which can produce projected dimensions with isotropic variances (equal variances). Experimental results on real data sets show that IsoHash can outperform its counterpart with different variances for different dimensions, which verifies the viewpoint that projections with isotropic variances will be better than those with anisotropic variances. 1
6 0.64049113 148 nips-2012-Hamming Distance Metric Learning
7 0.63346332 42 nips-2012-Angular Quantization-based Binary Codes for Fast Similarity Search
8 0.41232136 126 nips-2012-FastEx: Hash Clustering with Exponential Families
9 0.31486419 202 nips-2012-Locally Uniform Comparison Image Descriptor
10 0.30492771 176 nips-2012-Learning Image Descriptors with the Boosting-Trick
11 0.2990233 92 nips-2012-Deep Representations and Codes for Image Auto-Annotation
12 0.29655328 58 nips-2012-Bayesian models for Large-scale Hierarchical Classification
13 0.24831982 229 nips-2012-Multimodal Learning with Deep Boltzmann Machines
14 0.23911028 267 nips-2012-Perceptron Learning of SAT
15 0.21615887 306 nips-2012-Semantic Kernel Forests from Multiple Taxonomies
16 0.20786655 155 nips-2012-Human memory search as a random walk in a semantic network
17 0.20095034 146 nips-2012-Graphical Gaussian Vector for Image Categorization
18 0.19910231 130 nips-2012-Feature-aware Label Space Dimension Reduction for Multi-label Classification
19 0.19684881 365 nips-2012-Why MCA? Nonlinear sparse coding with spike-and-slab prior for neurally plausible image encoding
20 0.19543292 207 nips-2012-Mandatory Leaf Node Prediction in Hierarchical Multilabel Classification
topicId topicWeight
[(0, 0.031), (16, 0.011), (17, 0.012), (21, 0.017), (30, 0.347), (38, 0.096), (39, 0.013), (42, 0.026), (44, 0.021), (54, 0.021), (55, 0.036), (74, 0.06), (76, 0.084), (80, 0.057), (92, 0.074)]
simIndex simValue paperId paperTitle
same-paper 1 0.66889471 71 nips-2012-Co-Regularized Hashing for Multimodal Data
Author: Yi Zhen, Dit-Yan Yeung
Abstract: Hashing-based methods provide a very promising approach to large-scale similarity search. To obtain compact hash codes, a recent trend seeks to learn the hash functions from data automatically. In this paper, we study hash function learning in the context of multimodal data. We propose a novel multimodal hash function learning method, called Co-Regularized Hashing (CRH), based on a boosted coregularization framework. The hash functions for each bit of the hash codes are learned by solving DC (difference of convex functions) programs, while the learning for multiple bits proceeds via a boosting procedure so that the bias introduced by the hash functions can be sequentially minimized. We empirically compare CRH with two state-of-the-art multimodal hash function learning methods on two publicly available data sets. 1
2 0.54675311 63 nips-2012-CPRL -- An Extension of Compressive Sensing to the Phase Retrieval Problem
Author: Henrik Ohlsson, Allen Yang, Roy Dong, Shankar Sastry
Abstract: While compressive sensing (CS) has been one of the most vibrant research fields in the past few years, most development only applies to linear models. This limits its application in many areas where CS could make a difference. This paper presents a novel extension of CS to the phase retrieval problem, where intensity measurements of a linear system are used to recover a complex sparse signal. We propose a novel solution using a lifting technique – CPRL, which relaxes the NP-hard problem to a nonsmooth semidefinite program. Our analysis shows that CPRL inherits many desirable properties from CS, such as guarantees for exact recovery. We further provide scalable numerical solvers to accelerate its implementation. 1
3 0.49394315 117 nips-2012-Ensemble weighted kernel estimators for multivariate entropy estimation
Author: Kumar Sricharan, Alfred O. Hero
Abstract: The problem of estimation of entropy functionals of probability densities has received much attention in the information theory, machine learning and statistics communities. Kernel density plug-in estimators are simple, easy to implement and widely used for estimation of entropy. However, for large feature dimension d, kernel plug-in estimators suffer from the curse of dimensionality: the MSE rate of convergence is glacially slow - of order O(T −γ/d ), where T is the number of samples, and γ > 0 is a rate parameter. In this paper, it is shown that for sufficiently smooth densities, an ensemble of kernel plug-in estimators can be combined via a weighted convex combination, such that the resulting weighted estimator has a superior parametric MSE rate of convergence of order O(T −1 ). Furthermore, it is shown that these optimal weights can be determined by solving a convex optimization problem which does not require training data or knowledge of the underlying density, and therefore can be performed offline. This novel result is remarkable in that, while each of the individual kernel plug-in estimators belonging to the ensemble suffer from the curse of dimensionality, by appropriate ensemble averaging we can achieve parametric convergence rates. 1
4 0.43566164 193 nips-2012-Learning to Align from Scratch
Author: Gary Huang, Marwan Mattar, Honglak Lee, Erik G. Learned-miller
Abstract: Unsupervised joint alignment of images has been demonstrated to improve performance on recognition tasks such as face verification. Such alignment reduces undesired variability due to factors such as pose, while only requiring weak supervision in the form of poorly aligned examples. However, prior work on unsupervised alignment of complex, real-world images has required the careful selection of feature representation based on hand-crafted image descriptors, in order to achieve an appropriate, smooth optimization landscape. In this paper, we instead propose a novel combination of unsupervised joint alignment with unsupervised feature learning. Specifically, we incorporate deep learning into the congealing alignment framework. Through deep learning, we obtain features that can represent the image at differing resolutions based on network depth, and that are tuned to the statistics of the specific data being aligned. In addition, we modify the learning algorithm for the restricted Boltzmann machine by incorporating a group sparsity penalty, leading to a topographic organization of the learned filters and improving subsequent alignment results. We apply our method to the Labeled Faces in the Wild database (LFW). Using the aligned images produced by our proposed unsupervised algorithm, we achieve higher accuracy in face verification compared to prior work in both unsupervised and supervised alignment. We also match the accuracy for the best available commercial method. 1
5 0.4335885 226 nips-2012-Multiclass Learning Approaches: A Theoretical Comparison with Implications
Author: Amit Daniely, Sivan Sabato, Shai S. Shwartz
Abstract: We theoretically analyze and compare the following five popular multiclass classification methods: One vs. All, All Pairs, Tree-based classifiers, Error Correcting Output Codes (ECOC) with randomly generated code matrices, and Multiclass SVM. In the first four methods, the classification is based on a reduction to binary classification. We consider the case where the binary classifier comes from a class of VC dimension d, and in particular from the class of halfspaces over Rd . We analyze both the estimation error and the approximation error of these methods. Our analysis reveals interesting conclusions of practical relevance, regarding the success of the different approaches under various conditions. Our proof technique employs tools from VC theory to analyze the approximation error of hypothesis classes. This is in contrast to most previous uses of VC theory, which only deal with estimation error. 1
6 0.43335816 148 nips-2012-Hamming Distance Metric Learning
7 0.43261695 235 nips-2012-Natural Images, Gaussian Mixtures and Dead Leaves
8 0.43095785 260 nips-2012-Online Sum-Product Computation Over Trees
9 0.42761889 229 nips-2012-Multimodal Learning with Deep Boltzmann Machines
10 0.42594996 42 nips-2012-Angular Quantization-based Binary Codes for Fast Similarity Search
11 0.42540276 101 nips-2012-Discriminatively Trained Sparse Code Gradients for Contour Detection
12 0.42484847 83 nips-2012-Controlled Recognition Bounds for Visual Learning and Exploration
13 0.42416292 65 nips-2012-Cardinality Restricted Boltzmann Machines
14 0.42347187 87 nips-2012-Convolutional-Recursive Deep Learning for 3D Object Classification
15 0.42275849 333 nips-2012-Synchronization can Control Regularization in Neural Systems via Correlated Noise Processes
16 0.42204976 90 nips-2012-Deep Learning of Invariant Features via Simulated Fixations in Video
17 0.42177448 197 nips-2012-Learning with Recursive Perceptual Representations
18 0.42141062 199 nips-2012-Link Prediction in Graphs with Autoregressive Features
19 0.4206301 113 nips-2012-Efficient and direct estimation of a neural subunit model for sensory coding
20 0.42006615 92 nips-2012-Deep Representations and Codes for Image Auto-Annotation