nips nips2011 nips2011-157 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Zhen Li, Huazhong Ning, Liangliang Cao, Tong Zhang, Yihong Gong, Thomas S. Huang
Abstract: High dimensional similarity search in large scale databases becomes an important challenge due to the advent of Internet. For such applications, specialized data structures are required to achieve computational efficiency. Traditional approaches relied on algorithmic constructions that are often data independent (such as Locality Sensitive Hashing) or weakly dependent (such as kd-trees, k-means trees). While supervised learning algorithms have been applied to related problems, those proposed in the literature mainly focused on learning hash codes optimized for compact embedding of the data rather than search efficiency. Consequently such an embedding has to be used with linear scan or another search algorithm. Hence learning to hash does not directly address the search efficiency issue. This paper considers a new framework that applies supervised learning to directly optimize a data structure that supports efficient large scale search. Our approach takes both search quality and computational cost into consideration. Specifically, we learn a boosted search forest that is optimized using pair-wise similarity labeled examples. The output of this search forest can be efficiently converted into an inverted indexing data structure, which can leverage modern text search infrastructure to achieve both scalability and efficiency. Experimental results show that our approach significantly outperforms the start-of-the-art learning to hash methods (such as spectral hashing), as well as state-of-the-art high dimensional search algorithms (such as LSH and k-means trees).
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract High dimensional similarity search in large scale databases becomes an important challenge due to the advent of Internet. [sent-15, score-0.411]
2 While supervised learning algorithms have been applied to related problems, those proposed in the literature mainly focused on learning hash codes optimized for compact embedding of the data rather than search efficiency. [sent-18, score-0.833]
3 Consequently such an embedding has to be used with linear scan or another search algorithm. [sent-19, score-0.471]
4 Hence learning to hash does not directly address the search efficiency issue. [sent-20, score-0.576]
5 Our approach takes both search quality and computational cost into consideration. [sent-22, score-0.404]
6 Specifically, we learn a boosted search forest that is optimized using pair-wise similarity labeled examples. [sent-23, score-0.778]
7 The output of this search forest can be efficiently converted into an inverted indexing data structure, which can leverage modern text search infrastructure to achieve both scalability and efficiency. [sent-24, score-0.976]
8 Experimental results show that our approach significantly outperforms the start-of-the-art learning to hash methods (such as spectral hashing), as well as state-of-the-art high dimensional search algorithms (such as LSH and k-means trees). [sent-25, score-0.711]
9 1 Introduction The design of efficient algorithms for large scale similarity search (such as nearest neighbor search) has been a central problem in computer science. [sent-26, score-0.374]
10 However, existing approaches for large scale search in high dimension relied mainly on algorithmic constructions that are either data independent or weakly dependent. [sent-29, score-0.366]
11 We call this problem learning to search, and this paper demonstrates that supervised learning can lead to improved search efficiency over algorithms that are not optimized using supervised information. [sent-31, score-0.458]
12 1 To leverage machine learning techniques, we need to consider a scalable search structure with parameters optimizable using labeled data. [sent-35, score-0.369]
13 The data structured considered in this paper is motivated by the success of vocabulary tree method in image retrieval [18, 27, 15], which has been adopted in modern image search engines to find near duplicate images. [sent-36, score-0.782]
14 This representation can then be integrated into an inverted index based text search engine for efficient large scale retrieval. [sent-40, score-0.471]
15 Note that k-means trees can be used for high dimensional data, while the classical kd-trees [1, 3, 22] are limited to dimensions of no more than a few hundreds. [sent-42, score-0.268]
16 In this paper, we also adopt the tree structural representation, and propose a learning algorithm to construct the trees using supervised data. [sent-43, score-0.334]
17 First the k-means trees only use unsupervised clustering algorithm, which is not optimized for search purposes; as we will show in the experiments, by employing supervised information, our learning to search approach can achieve significantly better performance. [sent-45, score-0.952]
18 The learning to search framework proposed in this paper is based on a formulation of search as a supervised learning problem that jointly optimizes two key factors of search: retrieval quality and computational cost. [sent-48, score-0.874]
19 Specifically, we learn a set of selection functions in the form of a tree ensemble, as motivated by the aforementioned kd-trees and k-means trees approaches. [sent-49, score-0.358]
20 However, unlike the traditional methods that are based only on unsupervised information, our trees are learned under the supervision of pairwise similarity information, and are optimized for the defined search criteria, i. [sent-50, score-0.657]
21 It is worth comparing the influential Locality Sensitive Hashing (LSH) [6, 2] approach with our learning to search approach. [sent-55, score-0.31]
22 An inverted index structure can be constructed based on the hashing results [6], which facilitates efficient search. [sent-57, score-0.481]
23 While interesting theoretical results can be obtained for LSH, as we shall see with the experiments, in practice its performance is inferior to the data-dependent search structures optimized via the learning to search approach of this paper. [sent-59, score-0.698]
24 However, the motivation of hashing problem is fundamentally different from that of the search problem considered in this paper. [sent-61, score-0.63]
25 Specifically, the goal of learning to hash is to embed data into compact binary codes so that the hamming distance between two codes reflects their original similarity. [sent-62, score-0.534]
26 In order to perform efficient hamming distance search using the embedded representation, an additional efficient algorithmic structure is still needed. [sent-63, score-0.392]
27 (How to come up with such an efficient algorithm is an issue usually ignored by learning to hash algorithms. [sent-64, score-0.266]
28 ) The compact hash codes were traditionally believed to achieve low search latency by employing either linear scan, hash table lookup, or more sophisticated search mechanism. [sent-65, score-1.288]
29 As we shall see in our experiments, however, linear scan on the Hamming space is not a feasible solution for large scale search problems. [sent-66, score-0.422]
30 Moreover, if other search data structure is implemented on top of the hash code, the optimality of the embedding is likely to be lost, which usually yields suboptimal solution inferior to directly optimizing a search criteria. [sent-67, score-0.935]
31 , xn } and a query q, the search problem is to return top ranked items from the database that are most similar to the query. [sent-71, score-0.459]
32 In large-scale search applications, the database size n can be billions or larger. [sent-73, score-0.37]
33 On the other hand, in order to achieve accurate search results, a complicated ranking function s(q, x) is indispensible. [sent-75, score-0.372]
34 Modern search engines handle this problem by first employing a non-negative selection function T (q, x) that selects a small set of candidates Xq = {x : T (q, x) > 0, x ∈ X } with most of the top ranked items (T (q, x) = 0 means “not selected”). [sent-76, score-0.502]
35 This is called candidate selection stage, which is followed by a reranking stage where a more costly ranking function s(q, x) is evaluated on Xq . [sent-77, score-0.306]
36 • The retrieval quality of Xq measured by the total similarity x∈Xq s(q, x) should be large. [sent-87, score-0.269]
37 Therefore, our objective is to retrieve a set of items that maximizes the ranking quality while lowering the computational cost (keeping the candidate set as small as possible). [sent-88, score-0.301]
38 In additiona, to achieve search efficiency, the selection stage employs the inverted index structure as in standard text search engines to handle web-scale dataset. [sent-89, score-0.955]
39 3 Learning to Search This section presents the proposed learning to search framework. [sent-90, score-0.31]
40 We present the general formulation first, followed by a specific algorithm based on boosted search trees. [sent-91, score-0.52]
41 The learning to search framework considers the search problem as a machine learning problem that finds the optimal selection function T as follows: max Q(T ) subject to C(T ) ≤ C0 , T (3) where C0 is the upper-bound of computational cost. [sent-96, score-0.744]
42 Let xi and xj be two arbitrary samples in the dataset, and let sij = s(xi , xj ) ∈ {1, 0} indicate if they are “similar” or “dissimilar”. [sent-99, score-0.721]
43 Problem in (4) becomes: max J(T ) = T = T i,j i,j zij 1(T (xi , xj ) > 0) (5) 1−λ −λ max T 1(T (xi , xj ) > 0) sij 1(T (xi , xj ) > 0) − λ max (6) i,j where zij = for similar pairs . [sent-100, score-1.139]
44 We also drop the subscripts of zij and regard z as a random variable conditioned on xi and xj . [sent-108, score-0.478]
45 We define the ensemble selection function as a weighted sum of a set of base selection functions: M T (x, y) = cm · tm (xi , xj ). [sent-109, score-0.547]
46 (8) m=1 Suppose we have learnt M base functions, and we are about to learn the (M + 1)-th selection function, denoted as t(xi , xj ) with weight given by c. [sent-110, score-0.341]
47 In many application scenarios, each base selection function t(xi , xj ) takes only binary values 1 or 0. [sent-112, score-0.341]
48 Thus, we may want to minimize L(t, c) by choosing the optimal value of t(xi , xj ) for any given pair (xi , xj ). [sent-113, score-0.44]
49 (11) L(t, c) = Ew [e−zc ] = e−(1−λ)c · Pw [sij = 1|xi , xj ] + eλc · Pw [sij = 0|xi , xj ]. [sent-115, score-0.44]
50 4 (14) Taking the derivative of L with respect to c, we arrive at the optimal solution for c: (1 − λ)Pw [t(xi , xj ) = 1, sij = 1|xi , xj ] . [sent-117, score-0.589]
51 This is particularly because the binaryvalued base selection functions t(xi , xj ) has to be selected from limited function families to ensure the wearability (finite model complexity) and more importantly, the efficiency. [sent-121, score-0.341]
52 Specifically, t(xi , xj ) = 1 if xi and xj get hashed into the same bucket of the inverted table, and 0 otherwise. [sent-124, score-0.695]
53 This paper considers trees (we name it “search trees”) as an approximation to the optimal selection functions, and quick inverted table lookup follows naturally. [sent-125, score-0.488]
54 Consider a search tree with L leaf nodes {ℓ1 , · · · , ℓL }. [sent-129, score-0.48]
55 The selection function given by this tree is defined as L t(xi , xj ) = t(xi , xj ; ℓk ), (16) k=1 where t(xi , xj ; ℓk ) ∈ {0, 1} indicating whether both xi and xj reach the same leaf node ℓk . [sent-130, score-1.304]
56 Similar to (5), the objective function for a search tree can be written as: L max J = max t where J k = given by (10). [sent-131, score-0.441]
57 ij t i,j L J k, t(xi , xj ; ℓk ) = max wij zij t k=1 (17) k=1 wij zij t(xi , xj ; ℓk ) is a partial objective function for the k-th leaf node, and wij is The appealing additive property of the objective function J makes it trackable to analyze each split when the search tree grows. [sent-132, score-1.617]
58 The splitting criterion is given by: ˜ max J k(1) + J k(2) = ≈ = wij zij 1(˜⊤ xi · p⊤ xj > 0) p ˜ ˜ ˜ max p =1 ˜ ij wij zij [˜⊤ xi x⊤ p] p ˜ ˜j ˜ max p =1 ˜ ij max p⊤ XM X ⊤ p, ˜ ˜ ˜ ˜ p =1 ˜ 5 (18) ˜ where Mij = wij zij , and X is the stack of all augmented samples at node ℓk . [sent-138, score-1.387]
59 The search tree construction algorithm is listed in Algorithm 2. [sent-141, score-0.389]
60 The selection of the stump functions is similar to that in traditional decision trees: on the given leaf node, a set of stump functions are attempted and the one that maximizes (17) is selected if the objective function increases. [sent-143, score-0.293]
61 4 Boosted Search Forest In summary, we present a Boosted Search Forest (BSF) algorithm to the learning to search problem. [sent-145, score-0.31]
62 In the learning stage, this algorithm follows the boosting framework described in Algorithm 1 to learn an ensemble of selection functions; each base selection function, in the form of a search tree, is learned with Algorithm 2. [sent-146, score-0.581]
63 We then build inverted indices by passing all data points through the learned search trees. [sent-147, score-0.433]
64 In analogy to text search, each leaf node corresponds to an “index word” in the vocabulary and the data points reaching this leaf node are the “documents” associated with this “index word”. [sent-148, score-0.347]
65 In the candidate selection stage, instead of exhaustively evaluating T (q, x) for ∀x ∈ X , we only need to traverse the search trees and retrieve all items that collide with the query example for at least one tree. [sent-149, score-0.781]
66 4 Experiments We evaluate the Boosted Search Forest (BSF) algorithm on several image search tasks. [sent-151, score-0.359]
67 Although a more general similarity measure can be used, for simplicity we set s(xi , xj ) ∈ {0, 1} according to whether xj is within the top K nearest neighbors (K-NN) of xi on the designated metric space. [sent-152, score-0.636]
68 We compare the performance of BSF to two most popular algorithms on high dimensional image search: k-means trees and LSH. [sent-154, score-0.292]
69 We also compare to a representative method in the learning to hash community: spectral hashing, although this algorithm was designed for Hamming embedding instead of search. [sent-155, score-0.413]
70 Here linear scan is adopted on top of spectral hashing for search, because its more efficient alternatives are either directly compared (such as LSH) or can easily fail as noticed in [24]. [sent-156, score-0.53]
71 Our experiment shows that exhaustive linear scan is not scalable, especially with long hash codes needed for better retrieval accuracy (see Table 1). [sent-157, score-0.595]
72 Fist, LSH was reported to be superior to kd-trees [21] and spectral hashing was reported to out-perform RBM and BoostSCC [26]. [sent-160, score-0.418]
73 Third, since this paper focuses on learning to search, not learning to hash (Hamming embedding) or learning distance metrics that consider different goals, it is not essential to compare with more recent work on those topics such as [8, 11, 24, 7]. [sent-162, score-0.266]
74 We then randomly select around 6000 images as queries, and use the remaining (∼150K) images as the search database. [sent-170, score-0.484]
75 In image search, we are interested in the overall quality of the set of candidate images returned by a search algorithm. [sent-171, score-0.661]
76 This notion coincides with our formulation of the search problem in (4) that is aimed at maximizing retrieval quality while maintaining a relative low computational cost (for reranking stage). [sent-172, score-0.613]
77 The number of returned images clearly reflects the computational cost, and the retrieval quality is measured by the recall of retrieved images, i. [sent-173, score-0.457]
78 Figure 1(a) shows the performance comparison with two search algorithms: k-means trees and LSH. [sent-177, score-0.516]
79 Since our boosted search forest consists of tree ensembles, for a fair comparison, we also construct equivalent number of k-means trees (with random initializations) and multiple sets of LSH codes. [sent-178, score-0.949]
80 The better performance is due to our learning to search formulation that simultaneously maximizes recall while minimizing the size of returned candidate set. [sent-180, score-0.529]
81 It is still interesting to compare to spectral hashing, although it is not a search algorithm. [sent-183, score-0.408]
82 Since our approach requires more trees when the number of returns increases, we implement spectral hashing with varying bits: 32-bit, 96-bit, and 200-bit. [sent-184, score-0.624]
83 As illustrated in Figure 1(b), our approach significantly outperforms spectral hashing under all configurations. [sent-185, score-0.418]
84 Although the search forest does not have an explicit concept of bits, we can measure it from the information theoretical point of view, by counting every binary-branching in the trees as one bit. [sent-186, score-0.686]
85 With the same number of bits, spectral hashing only achieves a recall rate around 60%. [sent-189, score-0.46]
86 We randomly sample one million images from the 80 Millions Tiny Images dataset [23] as the search database, and 5000 additional images as queries. [sent-192, score-0.511]
87 1% recall rate and LSH and spectral hashing are even worse. [sent-211, score-0.46]
88 Note that using more bits in spectral hashing can even hurt performance on this dataset. [sent-212, score-0.523]
89 3 Search Speed All three aforementioned search algorithms (boosted search trees, k-means trees, and LSH) can naturally utilize inverted index structures to facilitate very efficient search. [sent-214, score-0.809]
90 In particular, both our boosted search trees and k-means trees use the leaf nodes as the keys to index a list of data points in the database, while LSH uses multiple independently generated bits to form the indexing key. [sent-215, score-1.218]
91 On the other hand, in order to perform search with compact hamming codes generated by a learning to hash method (e. [sent-217, score-0.767]
92 spectral hashing), one has to either use a linear scan approach or a hash table lookup technique that finds the samples within a radius-1 Hamming ball (or more complex methods like LSH). [sent-219, score-0.537]
93 Although much more efficient, the hash table lookup approach is likely to fail as the dimension of hash code grows to a few dozens, as observed in [24]. [sent-220, score-0.593]
94 When the hash codes grow to 512 bits (which is not unusual for highdimensional image/video data), the query time is almost 20 seconds. [sent-226, score-0.496]
95 On the contrary, our boosted search forest with 32 16-layer trees (∼512 bits) responds in less than 0. [sent-228, score-0.87]
96 5 Conclusion This paper introduces a learning to search framework for scalable similarity search in high dimensions. [sent-231, score-0.743]
97 Unlike previous methods, our algorithm learns a boosted search forest by jointly optimizing search quality versus computational efficiency, under the supervision of pair-wise similarity labels. [sent-232, score-1.13]
98 With a natural integration of the inverted index search structure, our method can handle web-scale datasets efficiently. [sent-233, score-0.471]
99 Experiments show that our approach leads to better retrieval accuracy than the state-of-the-art search methods such as locality sensitive hashing and k-means trees. [sent-234, score-0.804]
100 Shape indexing using approximate nearest-neighbour search in high-dimensional spaces. [sent-240, score-0.362]
wordName wordTfidf (topN-words)
[('hashing', 0.32), ('search', 0.31), ('lsh', 0.273), ('hash', 0.266), ('bsf', 0.254), ('xj', 0.22), ('boosted', 0.21), ('trees', 0.206), ('sij', 0.149), ('forest', 0.144), ('retrieval', 0.14), ('xi', 0.132), ('xq', 0.131), ('zij', 0.126), ('wij', 0.124), ('inverted', 0.123), ('scan', 0.112), ('pw', 0.108), ('bits', 0.105), ('spectral', 0.098), ('returned', 0.098), ('sh', 0.094), ('leaf', 0.091), ('images', 0.087), ('hamming', 0.082), ('ew', 0.08), ('tree', 0.079), ('codes', 0.077), ('selection', 0.073), ('reranking', 0.069), ('vocabulary', 0.067), ('quality', 0.065), ('similarity', 0.064), ('ranking', 0.062), ('lookup', 0.061), ('database', 0.06), ('scalable', 0.059), ('tm', 0.056), ('indexing', 0.052), ('candidate', 0.052), ('engines', 0.051), ('stump', 0.051), ('stage', 0.05), ('optimized', 0.05), ('tiny', 0.049), ('image', 0.049), ('embedding', 0.049), ('supervised', 0.049), ('node', 0.049), ('query', 0.048), ('base', 0.048), ('split', 0.047), ('nn', 0.047), ('multimedia', 0.046), ('czt', 0.042), ('enqueue', 0.042), ('lscom', 0.042), ('ontology', 0.042), ('recall', 0.042), ('ensemble', 0.042), ('items', 0.041), ('dissimilar', 0.041), ('kulis', 0.039), ('millions', 0.038), ('index', 0.038), ('retrieves', 0.037), ('huazhong', 0.037), ('queue', 0.037), ('dimensional', 0.037), ('modern', 0.037), ('cvpr', 0.037), ('cm', 0.035), ('boosting', 0.035), ('locality', 0.034), ('compact', 0.032), ('billion', 0.032), ('synthesized', 0.032), ('ciency', 0.031), ('uiuc', 0.03), ('cost', 0.029), ('relied', 0.029), ('vldb', 0.029), ('indyk', 0.029), ('queries', 0.028), ('structures', 0.028), ('child', 0.028), ('maximizes', 0.027), ('supervision', 0.027), ('cao', 0.027), ('constructions', 0.027), ('million', 0.027), ('employing', 0.027), ('max', 0.026), ('concept', 0.026), ('evaluating', 0.026), ('ef', 0.026), ('retrieved', 0.025), ('retrieve', 0.025), ('dimensions', 0.025), ('considers', 0.025)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999982 157 nips-2011-Learning to Search Efficiently in High Dimensions
Author: Zhen Li, Huazhong Ning, Liangliang Cao, Tong Zhang, Yihong Gong, Thomas S. Huang
Abstract: High dimensional similarity search in large scale databases becomes an important challenge due to the advent of Internet. For such applications, specialized data structures are required to achieve computational efficiency. Traditional approaches relied on algorithmic constructions that are often data independent (such as Locality Sensitive Hashing) or weakly dependent (such as kd-trees, k-means trees). While supervised learning algorithms have been applied to related problems, those proposed in the literature mainly focused on learning hash codes optimized for compact embedding of the data rather than search efficiency. Consequently such an embedding has to be used with linear scan or another search algorithm. Hence learning to hash does not directly address the search efficiency issue. This paper considers a new framework that applies supervised learning to directly optimize a data structure that supports efficient large scale search. Our approach takes both search quality and computational cost into consideration. Specifically, we learn a boosted search forest that is optimized using pair-wise similarity labeled examples. The output of this search forest can be efficiently converted into an inverted indexing data structure, which can leverage modern text search infrastructure to achieve both scalability and efficiency. Experimental results show that our approach significantly outperforms the start-of-the-art learning to hash methods (such as spectral hashing), as well as state-of-the-art high dimensional search algorithms (such as LSH and k-means trees).
2 0.18310744 111 nips-2011-Hashing Algorithms for Large-Scale Learning
Author: Ping Li, Anshumali Shrivastava, Joshua L. Moore, Arnd C. König
Abstract: Minwise hashing is a standard technique in the context of search for efficiently computing set similarities. The recent development of b-bit minwise hashing provides a substantial improvement by storing only the lowest b bits of each hashed value. In this paper, we demonstrate that b-bit minwise hashing can be naturally integrated with linear learning algorithms such as linear SVM and logistic regression, to solve large-scale and high-dimensional statistical learning tasks, especially when the data do not fit in memory. We compare b-bit minwise hashing with the Count-Min (CM) and Vowpal Wabbit (VW) algorithms, which have essentially the same variances as random projections. Our theoretical and empirical comparisons illustrate that b-bit minwise hashing is significantly more accurate (at the same storage cost) than VW (and random projections) for binary data.
3 0.15795797 91 nips-2011-Exploiting spatial overlap to efficiently compute appearance distances between image windows
Author: Bogdan Alexe, Viviana Petrescu, Vittorio Ferrari
Abstract: We present a computationally efficient technique to compute the distance of highdimensional appearance descriptor vectors between image windows. The method exploits the relation between appearance distance and spatial overlap. We derive an upper bound on appearance distance given the spatial overlap of two windows in an image, and use it to bound the distances of many pairs between two images. We propose algorithms that build on these basic operations to efficiently solve tasks relevant to many computer vision applications, such as finding all pairs of windows between two images with distance smaller than a threshold, or finding the single pair with the smallest distance. In experiments on the PASCAL VOC 07 dataset, our algorithms accurately solve these problems while greatly reducing the number of appearance distances computed, and achieve larger speedups than approximate nearest neighbour algorithms based on trees [18] and on hashing [21]. For example, our algorithm finds the most similar pair of windows between two images while computing only 1% of all distances on average. 1
4 0.15717067 214 nips-2011-PiCoDes: Learning a Compact Code for Novel-Category Recognition
Author: Alessandro Bergamo, Lorenzo Torresani, Andrew W. Fitzgibbon
Abstract: We introduce P I C O D ES: a very compact image descriptor which nevertheless allows high performance on object category recognition. In particular, we address novel-category recognition: the task of defining indexing structures and image representations which enable a large collection of images to be searched for an object category that was not known when the index was built. Instead, the training images defining the category are supplied at query time. We explicitly learn descriptors of a given length (from as small as 16 bytes per image) which have good object-recognition performance. In contrast to previous work in the domain of object recognition, we do not choose an arbitrary intermediate representation, but explicitly learn short codes. In contrast to previous approaches to learn compact codes, we optimize explicitly for (an upper bound on) classification performance. Optimization directly for binary features is difficult and nonconvex, but we present an alternation scheme and convex upper bound which demonstrate excellent performance in practice. P I C O D ES of 256 bytes match the accuracy of the current best known classifier for the Caltech256 benchmark, but they decrease the database storage size by a factor of 100 and speed-up the training and testing of novel classes by orders of magnitude.
5 0.1294194 96 nips-2011-Fast and Balanced: Efficient Label Tree Learning for Large Scale Object Recognition
Author: Jia Deng, Sanjeev Satheesh, Alexander C. Berg, Fei Li
Abstract: We present a novel approach to efficiently learn a label tree for large scale classification with many classes. The key contribution of the approach is a technique to simultaneously determine the structure of the tree and learn the classifiers for each node in the tree. This approach also allows fine grained control over the efficiency vs accuracy trade-off in designing a label tree, leading to more balanced trees. Experiments are performed on large scale image classification with 10184 classes and 9 million images. We demonstrate significant improvements in test accuracy and efficiency with less training time and more balanced trees compared to the previous state of the art by Bengio et al. 1
6 0.12771335 231 nips-2011-Randomized Algorithms for Comparison-based Search
7 0.11006974 229 nips-2011-Query-Aware MCMC
8 0.10626985 248 nips-2011-Semi-supervised Regression via Parallel Field Regularization
9 0.10556523 140 nips-2011-Kernel Embeddings of Latent Tree Graphical Models
10 0.094591752 274 nips-2011-Structure Learning for Optimization
11 0.092434406 165 nips-2011-Matrix Completion for Multi-label Image Classification
12 0.092041582 209 nips-2011-Orthogonal Matching Pursuit with Replacement
13 0.087593131 186 nips-2011-Noise Thresholds for Spectral Clustering
14 0.085471526 276 nips-2011-Structured sparse coding via lateral inhibition
15 0.084233314 22 nips-2011-Active Ranking using Pairwise Comparisons
16 0.082847923 141 nips-2011-Large-Scale Category Structure Aware Image Categorization
17 0.082289517 171 nips-2011-Metric Learning with Multiple Kernels
18 0.081995517 151 nips-2011-Learning a Tree of Metrics with Disjoint Visual Features
19 0.081992447 54 nips-2011-Co-regularized Multi-view Spectral Clustering
20 0.078619555 30 nips-2011-Algorithms for Hyper-Parameter Optimization
topicId topicWeight
[(0, 0.21), (1, 0.086), (2, -0.129), (3, 0.045), (4, 0.012), (5, -0.023), (6, -0.032), (7, -0.068), (8, 0.015), (9, -0.144), (10, -0.055), (11, 0.129), (12, -0.106), (13, 0.026), (14, 0.075), (15, 0.05), (16, -0.065), (17, 0.126), (18, -0.107), (19, -0.067), (20, 0.026), (21, 0.052), (22, 0.136), (23, -0.058), (24, -0.044), (25, 0.039), (26, 0.144), (27, 0.007), (28, 0.109), (29, 0.056), (30, -0.028), (31, -0.07), (32, 0.076), (33, -0.145), (34, -0.048), (35, -0.046), (36, 0.091), (37, -0.077), (38, 0.021), (39, 0.018), (40, -0.066), (41, 0.041), (42, 0.094), (43, 0.052), (44, -0.028), (45, -0.033), (46, -0.171), (47, -0.002), (48, -0.084), (49, -0.016)]
simIndex simValue paperId paperTitle
same-paper 1 0.96270967 157 nips-2011-Learning to Search Efficiently in High Dimensions
Author: Zhen Li, Huazhong Ning, Liangliang Cao, Tong Zhang, Yihong Gong, Thomas S. Huang
Abstract: High dimensional similarity search in large scale databases becomes an important challenge due to the advent of Internet. For such applications, specialized data structures are required to achieve computational efficiency. Traditional approaches relied on algorithmic constructions that are often data independent (such as Locality Sensitive Hashing) or weakly dependent (such as kd-trees, k-means trees). While supervised learning algorithms have been applied to related problems, those proposed in the literature mainly focused on learning hash codes optimized for compact embedding of the data rather than search efficiency. Consequently such an embedding has to be used with linear scan or another search algorithm. Hence learning to hash does not directly address the search efficiency issue. This paper considers a new framework that applies supervised learning to directly optimize a data structure that supports efficient large scale search. Our approach takes both search quality and computational cost into consideration. Specifically, we learn a boosted search forest that is optimized using pair-wise similarity labeled examples. The output of this search forest can be efficiently converted into an inverted indexing data structure, which can leverage modern text search infrastructure to achieve both scalability and efficiency. Experimental results show that our approach significantly outperforms the start-of-the-art learning to hash methods (such as spectral hashing), as well as state-of-the-art high dimensional search algorithms (such as LSH and k-means trees).
2 0.75656974 111 nips-2011-Hashing Algorithms for Large-Scale Learning
Author: Ping Li, Anshumali Shrivastava, Joshua L. Moore, Arnd C. König
Abstract: Minwise hashing is a standard technique in the context of search for efficiently computing set similarities. The recent development of b-bit minwise hashing provides a substantial improvement by storing only the lowest b bits of each hashed value. In this paper, we demonstrate that b-bit minwise hashing can be naturally integrated with linear learning algorithms such as linear SVM and logistic regression, to solve large-scale and high-dimensional statistical learning tasks, especially when the data do not fit in memory. We compare b-bit minwise hashing with the Count-Min (CM) and Vowpal Wabbit (VW) algorithms, which have essentially the same variances as random projections. Our theoretical and empirical comparisons illustrate that b-bit minwise hashing is significantly more accurate (at the same storage cost) than VW (and random projections) for binary data.
3 0.62341952 91 nips-2011-Exploiting spatial overlap to efficiently compute appearance distances between image windows
Author: Bogdan Alexe, Viviana Petrescu, Vittorio Ferrari
Abstract: We present a computationally efficient technique to compute the distance of highdimensional appearance descriptor vectors between image windows. The method exploits the relation between appearance distance and spatial overlap. We derive an upper bound on appearance distance given the spatial overlap of two windows in an image, and use it to bound the distances of many pairs between two images. We propose algorithms that build on these basic operations to efficiently solve tasks relevant to many computer vision applications, such as finding all pairs of windows between two images with distance smaller than a threshold, or finding the single pair with the smallest distance. In experiments on the PASCAL VOC 07 dataset, our algorithms accurately solve these problems while greatly reducing the number of appearance distances computed, and achieve larger speedups than approximate nearest neighbour algorithms based on trees [18] and on hashing [21]. For example, our algorithm finds the most similar pair of windows between two images while computing only 1% of all distances on average. 1
4 0.57520872 214 nips-2011-PiCoDes: Learning a Compact Code for Novel-Category Recognition
Author: Alessandro Bergamo, Lorenzo Torresani, Andrew W. Fitzgibbon
Abstract: We introduce P I C O D ES: a very compact image descriptor which nevertheless allows high performance on object category recognition. In particular, we address novel-category recognition: the task of defining indexing structures and image representations which enable a large collection of images to be searched for an object category that was not known when the index was built. Instead, the training images defining the category are supplied at query time. We explicitly learn descriptors of a given length (from as small as 16 bytes per image) which have good object-recognition performance. In contrast to previous work in the domain of object recognition, we do not choose an arbitrary intermediate representation, but explicitly learn short codes. In contrast to previous approaches to learn compact codes, we optimize explicitly for (an upper bound on) classification performance. Optimization directly for binary features is difficult and nonconvex, but we present an alternation scheme and convex upper bound which demonstrate excellent performance in practice. P I C O D ES of 256 bytes match the accuracy of the current best known classifier for the Caltech256 benchmark, but they decrease the database storage size by a factor of 100 and speed-up the training and testing of novel classes by orders of magnitude.
5 0.56695503 274 nips-2011-Structure Learning for Optimization
Author: Shulin Yang, Ali Rahimi
Abstract: We describe a family of global optimization procedures that automatically decompose optimization problems into smaller loosely coupled problems. The solutions of these are subsequently combined with message passing algorithms. We show empirically that these methods produce better solutions with fewer function evaluations than existing global optimization methods. To develop these methods, we introduce a notion of coupling between variables of optimization. This notion of coupling generalizes the notion of independence between random variables in statistics, sparseness of the Hessian in nonlinear optimization, and the generalized distributive law. Despite its generality, this notion of coupling is easier to verify empirically, making structure estimation easy, while allowing us to migrate well-established inference methods on graphical models to the setting of global optimization. 1
6 0.51401877 64 nips-2011-Convergent Bounds on the Euclidean Distance
7 0.47781321 168 nips-2011-Maximum Margin Multi-Instance Learning
8 0.46970117 96 nips-2011-Fast and Balanced: Efficient Label Tree Learning for Large Scale Object Recognition
9 0.46719831 112 nips-2011-Heavy-tailed Distances for Gradient Based Image Descriptors
10 0.46396071 254 nips-2011-Similarity-based Learning via Data Driven Embeddings
11 0.46087196 95 nips-2011-Fast and Accurate k-means For Large Datasets
12 0.45947173 267 nips-2011-Spectral Methods for Learning Multivariate Latent Tree Structure
13 0.45322254 126 nips-2011-Im2Text: Describing Images Using 1 Million Captioned Photographs
14 0.4509472 235 nips-2011-Recovering Intrinsic Images with a Global Sparsity Prior on Reflectance
15 0.44745228 263 nips-2011-Sparse Manifold Clustering and Embedding
16 0.43446407 231 nips-2011-Randomized Algorithms for Comparison-based Search
17 0.43445364 30 nips-2011-Algorithms for Hyper-Parameter Optimization
18 0.43395066 292 nips-2011-Two is better than one: distinct roles for familiarity and recollection in retrieving palimpsest memories
19 0.43131068 280 nips-2011-Testing a Bayesian Measure of Representativeness Using a Large Image Database
20 0.43049678 140 nips-2011-Kernel Embeddings of Latent Tree Graphical Models
topicId topicWeight
[(0, 0.04), (4, 0.042), (20, 0.053), (26, 0.028), (31, 0.05), (33, 0.09), (34, 0.189), (43, 0.039), (45, 0.115), (57, 0.125), (65, 0.013), (74, 0.05), (83, 0.028), (84, 0.013), (99, 0.065)]
simIndex simValue paperId paperTitle
same-paper 1 0.81758851 157 nips-2011-Learning to Search Efficiently in High Dimensions
Author: Zhen Li, Huazhong Ning, Liangliang Cao, Tong Zhang, Yihong Gong, Thomas S. Huang
Abstract: High dimensional similarity search in large scale databases becomes an important challenge due to the advent of Internet. For such applications, specialized data structures are required to achieve computational efficiency. Traditional approaches relied on algorithmic constructions that are often data independent (such as Locality Sensitive Hashing) or weakly dependent (such as kd-trees, k-means trees). While supervised learning algorithms have been applied to related problems, those proposed in the literature mainly focused on learning hash codes optimized for compact embedding of the data rather than search efficiency. Consequently such an embedding has to be used with linear scan or another search algorithm. Hence learning to hash does not directly address the search efficiency issue. This paper considers a new framework that applies supervised learning to directly optimize a data structure that supports efficient large scale search. Our approach takes both search quality and computational cost into consideration. Specifically, we learn a boosted search forest that is optimized using pair-wise similarity labeled examples. The output of this search forest can be efficiently converted into an inverted indexing data structure, which can leverage modern text search infrastructure to achieve both scalability and efficiency. Experimental results show that our approach significantly outperforms the start-of-the-art learning to hash methods (such as spectral hashing), as well as state-of-the-art high dimensional search algorithms (such as LSH and k-means trees).
2 0.78296655 11 nips-2011-A Reinforcement Learning Theory for Homeostatic Regulation
Author: Mehdi Keramati, Boris S. Gutkin
Abstract: Reinforcement learning models address animal’s behavioral adaptation to its changing “external” environment, and are based on the assumption that Pavlovian, habitual and goal-directed responses seek to maximize reward acquisition. Negative-feedback models of homeostatic regulation, on the other hand, are concerned with behavioral adaptation in response to the “internal” state of the animal, and assume that animals’ behavioral objective is to minimize deviations of some key physiological variables from their hypothetical setpoints. Building upon the drive-reduction theory of reward, we propose a new analytical framework that integrates learning and regulatory systems, such that the two seemingly unrelated objectives of reward maximization and physiological-stability prove to be identical. The proposed theory shows behavioral adaptation to both internal and external states in a disciplined way. We further show that the proposed framework allows for a unified explanation of some behavioral pattern like motivational sensitivity of different associative learning mechanism, anticipatory responses, interaction among competing motivational systems, and risk aversion.
3 0.75597358 158 nips-2011-Learning unbelievable probabilities
Author: Xaq Pitkow, Yashar Ahmadian, Ken D. Miller
Abstract: Loopy belief propagation performs approximate inference on graphical models with loops. One might hope to compensate for the approximation by adjusting model parameters. Learning algorithms for this purpose have been explored previously, and the claim has been made that every set of locally consistent marginals can arise from belief propagation run on a graphical model. On the contrary, here we show that many probability distributions have marginals that cannot be reached by belief propagation using any set of model parameters or any learning algorithm. We call such marginals ‘unbelievable.’ This problem occurs whenever the Hessian of the Bethe free energy is not positive-definite at the target marginals. All learning algorithms for belief propagation necessarily fail in these cases, producing beliefs or sets of beliefs that may even be worse than the pre-learning approximation. We then show that averaging inaccurate beliefs, each obtained from belief propagation using model parameters perturbed about some learned mean values, can achieve the unbelievable marginals. 1
4 0.7295981 257 nips-2011-SpaRCS: Recovering low-rank and sparse matrices from compressive measurements
Author: Andrew E. Waters, Aswin C. Sankaranarayanan, Richard Baraniuk
Abstract: We consider the problem of recovering a matrix M that is the sum of a low-rank matrix L and a sparse matrix S from a small set of linear measurements of the form y = A(M) = A(L + S). This model subsumes three important classes of signal recovery problems: compressive sensing, affine rank minimization, and robust principal component analysis. We propose a natural optimization problem for signal recovery under this model and develop a new greedy algorithm called SpaRCS to solve it. Empirically, SpaRCS inherits a number of desirable properties from the state-of-the-art CoSaMP and ADMiRA algorithms, including exponential convergence and efficient implementation. Simulation results with video compressive sensing, hyperspectral imaging, and robust matrix completion data sets demonstrate both the accuracy and efficacy of the algorithm. 1
5 0.72417331 140 nips-2011-Kernel Embeddings of Latent Tree Graphical Models
Author: Le Song, Eric P. Xing, Ankur P. Parikh
Abstract: Latent tree graphical models are natural tools for expressing long range and hierarchical dependencies among many variables which are common in computer vision, bioinformatics and natural language processing problems. However, existing models are largely restricted to discrete and Gaussian variables due to computational constraints; furthermore, algorithms for estimating the latent tree structure and learning the model parameters are largely restricted to heuristic local search. We present a method based on kernel embeddings of distributions for latent tree graphical models with continuous and non-Gaussian variables. Our method can recover the latent tree structures with provable guarantees and perform local-minimum free parameter learning and efficient inference. Experiments on simulated and real data show the advantage of our proposed approach. 1
6 0.70337182 171 nips-2011-Metric Learning with Multiple Kernels
7 0.68664318 216 nips-2011-Portmanteau Vocabularies for Multi-Cue Image Representation
8 0.6833263 151 nips-2011-Learning a Tree of Metrics with Disjoint Visual Features
9 0.68195498 111 nips-2011-Hashing Algorithms for Large-Scale Learning
10 0.67818916 96 nips-2011-Fast and Balanced: Efficient Label Tree Learning for Large Scale Object Recognition
11 0.67459846 223 nips-2011-Probabilistic Joint Image Segmentation and Labeling
12 0.67165792 170 nips-2011-Message-Passing for Approximate MAP Inference with Latent Variables
13 0.67078108 1 nips-2011-$\theta$-MRF: Capturing Spatial and Semantic Structure in the Parameters for Scene Understanding
14 0.66868043 242 nips-2011-See the Tree Through the Lines: The Shazoo Algorithm
15 0.66706342 227 nips-2011-Pylon Model for Semantic Segmentation
16 0.66485643 266 nips-2011-Spatial distance dependent Chinese restaurant processes for image segmentation
17 0.66485339 141 nips-2011-Large-Scale Category Structure Aware Image Categorization
18 0.66474491 234 nips-2011-Reconstructing Patterns of Information Diffusion from Incomplete Observations
19 0.66470093 233 nips-2011-Rapid Deformable Object Detection using Dual-Tree Branch-and-Bound
20 0.66361755 165 nips-2011-Matrix Completion for Multi-label Image Classification