cvpr cvpr2013 cvpr2013-163 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Thomas Dean, Mark A. Ruzon, Mark Segal, Jonathon Shlens, Sudheendra Vijayanarasimhan, Jay Yagnik
Abstract: Many object detection systems are constrained by the time required to convolve a target image with a bank of filters that code for different aspects of an object’s appearance, such as the presence of component parts. We exploit locality-sensitive hashing to replace the dot-product kernel operator in the convolution with a fixed number of hash-table probes that effectively sample all of the filter responses in time independent of the size of the filter bank. To show the effectiveness of the technique, we apply it to evaluate 100,000 deformable-part models requiring over a million (part) filters on multiple scales of a target image in less than 20 seconds using a single multi-core processor with 20GB of RAM. This represents a speed-up of approximately 20,000 times— four orders of magnitude— when compared withperforming the convolutions explicitly on the same hardware. While mean average precision over the full set of 100,000 object classes is around 0.16 due in large part to the challenges in gathering training data and collecting ground truth for so many classes, we achieve a mAP of at least 0.20 on a third of the classes and 0.30 or better on about 20% of the classes.
Reference: text
sentIndex sentText sentNum sentScore
1 com Abstract Many object detection systems are constrained by the time required to convolve a target image with a bank of filters that code for different aspects of an object’s appearance, such as the presence of component parts. [sent-3, score-0.381]
2 We exploit locality-sensitive hashing to replace the dot-product kernel operator in the convolution with a fixed number of hash-table probes that effectively sample all of the filter responses in time independent of the size of the filter bank. [sent-4, score-0.712]
3 To show the effectiveness of the technique, we apply it to evaluate 100,000 deformable-part models requiring over a million (part) filters on multiple scales of a target image in less than 20 seconds using a single multi-core processor with 20GB of RAM. [sent-5, score-0.29]
4 While mean average precision over the full set of 100,000 object classes is around 0. [sent-7, score-0.189]
5 Introduction Many object detection and recognition systems are constrained by the computation time required to perform a large number of convolutions across a dense array of image windows with a bank of filters. [sent-12, score-0.227]
6 This problem can be mitigated somewhat by using cascades and coarse-to-fine architectures to deploy filters conditionally based upon previously computed thresholded responses, or by sampling image windows with a sparse grid or only at locations determined by an interest-point detector. [sent-13, score-0.324]
7 The former method is inherently greedy and can require traversing a tree of filters that is both deep and broad in order to scale to a large num† Corresponding author. [sent-14, score-0.226]
8 The latter method suffers from low-level salience being a poor guide to precise identification of locations to sample and thus requires that filters be invariant to small positional changes, thereby reducing their discrimination. [sent-16, score-0.306]
9 We exploit locality-sensitive hashing [10] to replace the dot product in the kernel operator of the convolution with a multi-band hash-table lookup. [sent-17, score-0.479]
10 This enables us to determine the filters with highest responses in O(1) time independent of the number of filters. [sent-18, score-0.323]
11 Our approach is not restricted to any particular method or dataset; rather it provides the basis for scaling the number of traditionally compute-intensive signal processing operators from the hundreds or at most thousands applied in current practice to millions. [sent-20, score-0.185]
12 All filters are effectively sampled in one pass over the image with a single probe per location/image window. [sent-21, score-0.226]
13 A descriptor generated from the image window is divided into bands, and each band is hashed in the table associated with that band to retrieve a list of filters potentially responding at that location. [sent-22, score-0.672]
14 The indications from all bands are combined to come up with a set of proposals for the filters with the largest responses, and exact responses are computed for these filters only. [sent-23, score-0.725]
15 We demonstrate the efficacy of the approach by scaling object detection to one hundred thousand object classes employing millions of filters representing objects and their constituent parts across a wide range of poses and scales. [sent-24, score-0.61]
16 We allow filters to operate on diverse features and achieve additional benefits by embedding patches of the resulting feature map in an ordinal space using a highdimensional sparse feature descriptor [21]. [sent-25, score-0.496]
17 By operating in this ordinal space, our similarity measure becomes rank correlation instead of the linear correlation implied by the dot product in the kernel of a traditional convolution. [sent-26, score-0.394]
18 By performing this nonlinear embed- ding in a high-dimensional space, we are able to simplify 111888111422 the hashing process and apply large-scale linear solvers in training object models. [sent-28, score-0.341]
19 Furthermore, this hashing scheme can be implemented exactly in the integer domain, a desirable property for achieving high performance. [sent-29, score-0.241]
20 We believe the present work will accelerate the state of the art in object detection by increasing the number of visual categories by an order of magnitude or more while simultaneously reducing run times by a comparable factor. [sent-36, score-0.212]
21 Related Work Traditionally, object detection is reduced to binary classification and a sliding window classifier is applied at all positions, scales and orientations of an image [20, 4, 8, 19, 22]. [sent-39, score-0.209]
22 Most existing work on improving object detection complexity focuses on reducing L. [sent-41, score-0.166]
23 Very recently, [13, 17] consider scaling category-level detection by learning a set of shared basis parts in the form of steerable filters or sparselets. [sent-48, score-0.433]
24 Using these learned basis parts, approximate part responses can be computed for a large number of classes using a sparse matrix-vector product. [sent-49, score-0.265]
25 While sparse products reduce the dependence of filter convolutions on the number of classes, the overall complexity is still linear in C and it remains to be seen if these approaches can scale to hundreds of thousands of categories. [sent-51, score-0.337]
26 It also remains to be seen just how large the set of basis filters has to grow in order to support a large number of classes. [sent-52, score-0.268]
27 Overall, in contrast to most previous work, our approach reduces object detection complexity by targeting its dependence on the number of object classes C. [sent-55, score-0.274]
28 Our hashingbased framework is the first approach to reduce object detection complexity from O(LC) to O(L). [sent-56, score-0.184]
29 Many other object detection models can be adapted to use our approach, including multiclass cascades [14], recursive compositional models [22], and variants of convolutional neural networks [11]. [sent-64, score-0.19]
30 robust to variations in individual filter values (top row). [sent-143, score-0.157]
31 Ordinal measures of similarity capture filter response differences not reflected in linear measures (dot product) of similarity (bottom row). [sent-144, score-0.157]
32 Rather than operate in the space of HOG features, we encode each filter-sized window of the HOG pyramid as a high-dimensional sparse binary descriptor called a winner-take-all (WTA) hash [21]. [sent-146, score-0.726]
33 We restrict our attention to a subfamily of the hash functions introduced in [21] such that each WTA hash is defined by a sequence of N permutations of the elements in the fixed-sized window that we use in computing filter responses in the HOG pyramid. [sent-147, score-1.472]
34 Each permutation consists of a list of indices into the vector obtained by flattening such a window of HOG coefficients; we need only retain the first K indices for each of the N permutations to implement a WTA hash function. [sent-148, score-0.855]
35 5], and the permutations [1, 4, 3, 2] and [2, 1, 4, 3], the resulting WTA descriptor is [0110] : the first K indices of each permutation, [1, 4, 3] and [2, 1, 4] select [0. [sent-157, score-0.167]
36 Note that since the only operation involved in the entire process is comparison, the hashing scheme can (a) be implemented completely with integer arithmetic and (b) can be efficiently coded without branching, and thus without branch prediction penalties. [sent-164, score-0.32]
37 Each WTA hash function defines an ordinal embedding and an associated rank-correlation similarity measure, which offers a degree of invariance with respect to perturbations in numeric values [21] and is well suited as a basis for locality-sensitive hashing. [sent-165, score-0.805]
38 Intuitively, the information stored in a WTA hash allows one to reconstruct a partial ordering of the coefficients in the hashed vector. [sent-167, score-0.652]
39 The top row of Figure 1 highlights how partial-order statistics can be invariant to deformations and perturbations of individual filter values. [sent-168, score-0.237]
40 In an image processing domain, all three filters in the top row of Figure 1 are qualitatively similar edge filters — this qualitative property follows from all three filters obeying the same ordinal statistic. [sent-169, score-0.866]
41 Likewise, linear measures of similarity can be insensitive to qualitative differences easily captured by an ordinal measure of similarity. [sent-170, score-0.188]
42 For instance, in the bottom row of Figure 1, filter F is qualitatively more similar to filter G than it is to filter H. [sent-171, score-0.471]
43 An ordinal measure of similarity based on partial-order statistics would correctly identify F to be more similar to G than H. [sent-173, score-0.188]
44 A filter is a matrix specifying weights for subwindows of a HOG pyramid. [sent-177, score-0.194]
45 The score of a filter with respect to a subwindow of a HOG pyra- mid is the dot product of the weight vector and the features comprising the subwindow. [sent-178, score-0.499]
46 In our implementation, each model comprises three mixture components representing three common aspect ratios for the bounding box of an object instance. [sent-181, score-0.175]
47 The first step is to compute an image pyramid and the associated HOG pyramid for the target image. [sent-187, score-0.18]
48 An illustration of the key steps in detecting objects and training the models: Training takes as input examples comprising filtersized subwindows of a HOG pyramid as in [8] and produces as output a model that includes a set of filter-sized weight vectors called part filters. [sent-190, score-0.343]
49 We compute the WTA hash for each weight vector, decompose the resulting descriptor into M bands consisting of W spans of length K, construct M hash tables and store each band in its respective table along with the index P of the corresponding part filter. [sent-191, score-1.404]
50 During detection we compute the WTA hash of each filter-sized window of the target HOG pyramid, break the hash into bands, look up each band in its corresponding hash table, and count of how many times each part turns up in a table entry. [sent-192, score-1.901]
51 We use these counts to identify candidate filters for which we compute the exact dot products. [sent-193, score-0.355]
52 In scoring a window at a specified level in the HOG pyramid, they combine the responses of the root filter within the selected window at that level and the responses of the part filters in an appropriately adjusted window at a level one octave higher than the specified level. [sent-195, score-0.988]
53 The contribution of a given part filter to the score of a subwindow is obtained by convolving the filter with the selected subwindow of the HOG pyramid, multiplying it by the deformation Gaussian mask and taking the maximum value. [sent-196, score-0.563]
54 Instead of a response corresponding to the dot product of the filter weights and a filter-sized block of the HOG pyramid, we compute the WTA hash of that same filtersized block of the feature map. [sent-197, score-0.921]
55 We employ locality sensitive hashing [10] to recover the R largest responses corresponding to what the dot products would have been had we actually computed them. [sent-198, score-0.507]
56 During detection we employ hashing as a proxy for explicitly computing the dot product of the convolution subwindow with each of the weight vectors for all the part filters in all object models without actually enumerating the filters. [sent-199, score-0.961]
57 To do so, we employ a multi-band LSH-style hash table to reconstruct the dot products that have a significant number of hash matches [15]. [sent-200, score-1.227]
58 Consider the (N ∗ K)-dimensional binary vectors produced by the WTA hash function. [sent-201, score-0.529]
59 Create one hash table for each band to accommodate the hashed subvectors of length W ∗ K. [sent-206, score-0.721]
60 Let w be the vector of weight coefficients for a part filter and u be the binary vector corresponding to the WTA hash of w. [sent-207, score-0.754]
61 Let Bm denote the mth hash table and um the mth (W ∗ K) span of weight coefficients in u. [sent-208, score-0.65]
62 The hash bin returned from hashing um in Bm is a list of filter indices, which we use to update a histogram of filter match scores. [sent-209, score-1.149]
63 After we select the filters that are above threshold, we continue with Felzenszwalb et al. [sent-210, score-0.226]
64 ’s method, spreading nowsparse filter activations into a more dense map using the part deformation scores. [sent-211, score-0.272]
65 A limitation ofour hashing approach is that all part filters must be the same size. [sent-213, score-0.506]
66 Because the object’s bounding box has arbitrary aspect ratio, we cannot use a corresponding root filter. [sent-214, score-0.173]
67 Since hashing is inexpensive, however, we mitigate the lack of a root filter by tiling the bounding box with part filters and obtaining activations for all of the tiles as we slide the bounding box over the scaled array of features. [sent-215, score-0.984]
68 Additionally, the filter match score generated through hashing is a lower bound on the filter response at that patch (due to hash grouping in bands). [sent-216, score-1.114]
69 Once a candidate part is found, therefore, we compute the actual dot product of the underlying image features with the associated filter and use this in evaluating the location-based deformation. [sent-217, score-0.374]
70 This result is combined with those of other parts for the candidate object to obtain the object’s score, and we use this combined score to suppress all but the best detection in a region. [sent-218, score-0.187]
71 Experimental Results In the following, we use the term baseline algorithm or just baseline to refer to the detection algorithm described in [6] utilizing models generated by the training method from the same paper. [sent-223, score-0.199]
72 By performing several experiments, we compare the performance of the baseline algorithm with the hashing-based detection algorithm described in the previous section, utilizing models generated by the training method from [6] but sans root filters as mentioned earlier. [sent-224, score-0.437]
73 Second, we show that the hashing-based algorithm can scale object detection to hundreds of thousands of object classes and provide insight into the trade-offs involving accuracy, memory and computation time. [sent-226, score-0.399]
74 Finally, we show the results of training 100,000 object classes using our system and running the hashing-based algorithm on the resulting models on a single machine. [sent-227, score-0.187]
75 Note that we report the results for the base system of [6], since we do not use bounding box prediction or context rescoring which were reported to provide further improvements. [sent-247, score-0.194]
76 For the hashing parameters in this experiment, we use K = 4 and W = 4 for 8-bit hash keys and M = 3000 hash tables. [sent-248, score-1.343]
77 The lack of a root filter in our implementation is responsible for much of the deficit, especially the person category which includes many examples too small to be captured by the high-resolution part filters. [sent-253, score-0.254]
78 We are exploring options for restoring the root filter that would reduce this deficit with a small added computational cost. [sent-254, score-0.272]
79 Accuracy Versus Speed and Memory In the previous section we showed that our hashingbased detector performs comparably to the baseline exhaustive detector using the best set of hash parameters. [sent-257, score-0.78]
80 In this section we illustrate how the performance, along with the speed and memory requirements of the detector, change with the hash parameters. [sent-258, score-0.579]
81 For each case we choose W such that we obtain comparable hash keys of 16 or 18 bits. [sent-260, score-0.573]
82 The memory required is a function of M, the number of hash tables. [sent-264, score-0.579]
83 For K = 16, performance saturates at 5 KB per filter, which translates to 5 GB for storing 100,000 classes with 10 filters each. [sent-265, score-0.352]
84 This demonstrates we can store filters for up to 100,000 classes on a single modestly-equipped workstation. [sent-266, score-0.313]
85 Effect of hashing parameters on the accuracy, speed and memory required by the system. [sent-295, score-0.291]
86 In subsequent iterations, we first compute a bounding box by running the detectors on each training image. [sent-305, score-0.205]
87 If the detection score is higher than the score for the bounding box from the previous iteration we use the newly computed bounding box. [sent-306, score-0.305]
88 11 over 100,000 object classes, which is a significant result when considering the fact that the current state-of-the-art for object detection spmtei Amd-Peup62t0,. [sent-312, score-0.187]
89 Top few detections on the validation set for a representative set of categories ordered by detection score. [sent-326, score-0.161]
90 We set a threshold at 80% precision on the validation set and sampled a set of 545 common objects whose detectors fired on a statistically significant number of images out of a set of 12 million random thumbnails from YouTube for which we had no ground truth. [sent-330, score-0.226]
91 Revisiting VOC 2007 with 10,000 Detectors In this section we examine the prediction accuracy of the hashing-based detector as the number of unique objects in the system systematically increases. [sent-339, score-0.17]
92 While it is theoretically possible to maintain the prediction accuracy of the object detector by increasing the number mAP Figure 6. [sent-342, score-0.18]
93 Increasing the number of objects gracefully degrades prediction accuracy on PASCAL VOC 2007 for a fixed computational budget. [sent-345, score-0.209]
94 To explore how a hashing-based object detector scales to a large number of objects, we instead measure the prediction accuracy as we increase the number of unique objects for a fixed computational budget. [sent-347, score-0.23]
95 We systematically perform an exhaustive search of ∼1000 sets of hashing model parameters 1. [sent-348, score-0.272]
96 Figure 7 plots the best achievable mAP for the subset of 20 PASCAL objects for a given time window across a range of object labels. [sent-351, score-0.192]
97 Second, as the number of objects increases, the prediction accuracy on the subset of 20 PASCAL VOC 2007 objects gracefully decreases. [sent-359, score-0.23]
98 This is to be expected as the frequency of errant hash collisions increases, and the entropy ofthe hash table reaches capacity. [sent-360, score-1.058]
99 This set of model parameters demonstrates that a hashing-based object detector can be scaled up to a large number of objects while achieving reasonable balance between prediction accuracy and evaluation throughput. [sent-362, score-0.23]
100 Conclusions Our key contribution is a scalable approach to object detection that replaces linear convolution with ordinal convolution by using an efficient LSH scheme. [sent-365, score-0.435]
wordName wordTfidf (topN-words)
[('hash', 0.529), ('wta', 0.275), ('hashing', 0.241), ('filters', 0.226), ('ordinal', 0.188), ('filter', 0.157), ('bands', 0.137), ('hog', 0.136), ('dot', 0.129), ('voc', 0.104), ('pascal', 0.103), ('band', 0.098), ('responses', 0.097), ('dpm', 0.097), ('hashed', 0.094), ('subwindow', 0.09), ('classes', 0.087), ('conference', 0.086), ('window', 0.082), ('prediction', 0.079), ('permutations', 0.078), ('pyramid', 0.076), ('yagnik', 0.076), ('detection', 0.067), ('convolutions', 0.065), ('bounding', 0.063), ('cascades', 0.063), ('convolution', 0.06), ('object', 0.06), ('root', 0.058), ('deficit', 0.057), ('filtersized', 0.057), ('freebase', 0.057), ('hashes', 0.057), ('hashingbased', 0.057), ('felzenszwalb', 0.054), ('box', 0.052), ('gracefully', 0.051), ('segal', 0.051), ('shlens', 0.051), ('memory', 0.05), ('detectors', 0.05), ('objects', 0.05), ('indices', 0.05), ('product', 0.049), ('validation', 0.048), ('sparselet', 0.047), ('pages', 0.046), ('perturbations', 0.046), ('baseline', 0.046), ('categories', 0.046), ('keys', 0.044), ('comprising', 0.044), ('map', 0.043), ('thousands', 0.043), ('basis', 0.042), ('precision', 0.042), ('detector', 0.041), ('salience', 0.041), ('products', 0.04), ('training', 0.04), ('part', 0.039), ('saturates', 0.039), ('reducing', 0.039), ('google', 0.039), ('descriptor', 0.039), ('proposals', 0.039), ('complement', 0.038), ('steerable', 0.038), ('operators', 0.038), ('subwindows', 0.037), ('million', 0.036), ('windows', 0.035), ('list', 0.035), ('comparably', 0.035), ('deformable', 0.034), ('deformations', 0.034), ('specified', 0.034), ('international', 0.034), ('activations', 0.033), ('spans', 0.033), ('precisions', 0.032), ('hundreds', 0.032), ('permutation', 0.031), ('mth', 0.031), ('exhaustive', 0.031), ('bm', 0.031), ('parts', 0.03), ('um', 0.03), ('scaling', 0.03), ('score', 0.03), ('entities', 0.03), ('degradation', 0.029), ('coefficients', 0.029), ('degrades', 0.029), ('pattern', 0.029), ('overhead', 0.028), ('operating', 0.028), ('target', 0.028), ('vocabularies', 0.028)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999952 163 cvpr-2013-Fast, Accurate Detection of 100,000 Object Classes on a Single Machine
Author: Thomas Dean, Mark A. Ruzon, Mark Segal, Jonathon Shlens, Sudheendra Vijayanarasimhan, Jay Yagnik
Abstract: Many object detection systems are constrained by the time required to convolve a target image with a bank of filters that code for different aspects of an object’s appearance, such as the presence of component parts. We exploit locality-sensitive hashing to replace the dot-product kernel operator in the convolution with a fixed number of hash-table probes that effectively sample all of the filter responses in time independent of the size of the filter bank. To show the effectiveness of the technique, we apply it to evaluate 100,000 deformable-part models requiring over a million (part) filters on multiple scales of a target image in less than 20 seconds using a single multi-core processor with 20GB of RAM. This represents a speed-up of approximately 20,000 times— four orders of magnitude— when compared withperforming the convolutions explicitly on the same hardware. While mean average precision over the full set of 100,000 object classes is around 0.16 due in large part to the challenges in gathering training data and collecting ground truth for so many classes, we achieve a mAP of at least 0.20 on a third of the classes and 0.30 or better on about 20% of the classes.
2 0.35312709 63 cvpr-2013-Binary Code Ranking with Weighted Hamming Distance
Author: Lei Zhang, Yongdong Zhang, Jinhu Tang, Ke Lu, Qi Tian
Abstract: Binary hashing has been widely used for efficient similarity search due to its query and storage efficiency. In most existing binary hashing methods, the high-dimensional data are embedded into Hamming space and the distance or similarity of two points are approximated by the Hamming distance between their binary codes. The Hamming distance calculation is efficient, however, in practice, there are often lots of results sharing the same Hamming distance to a query, which makes this distance measure ambiguous and poses a critical issue for similarity search where ranking is important. In this paper, we propose a weighted Hamming distance ranking algorithm (WhRank) to rank the binary codes of hashing methods. By assigning different bit-level weights to different hash bits, the returned binary codes are ranked at a finer-grained binary code level. We give an algorithm to learn the data-adaptive and query-sensitive weight for each hash bit. Evaluations on two large-scale image data sets demonstrate the efficacy of our weighted Hamming distance for binary code ranking.
3 0.34617952 223 cvpr-2013-Inductive Hashing on Manifolds
Author: Fumin Shen, Chunhua Shen, Qinfeng Shi, Anton van_den_Hengel, Zhenmin Tang
Abstract: Learning based hashing methods have attracted considerable attention due to their ability to greatly increase the scale at which existing algorithms may operate. Most of these methods are designed to generate binary codes that preserve the Euclidean distance in the original space. Manifold learning techniques, in contrast, are better able to model the intrinsic structure embedded in the original highdimensional data. The complexity of these models, and the problems with out-of-sample data, havepreviously rendered them unsuitable for application to large-scale embedding, however. In this work, we consider how to learn compact binary embeddings on their intrinsic manifolds. In order to address the above-mentioned difficulties, we describe an efficient, inductive solution to the out-of-sample data problem, and a process by which non-parametric manifold learning may be used as the basis of a hashing method. Our proposed approach thus allows the development of a range of new hashing techniques exploiting the flexibility of the wide variety of manifold learning approaches available. We particularly show that hashing on the basis of t-SNE [29] outperforms state-of-the-art hashing methods on large-scale benchmark datasets, and is very effective for image classification with very short code lengths.
4 0.23513021 87 cvpr-2013-Compressed Hashing
Author: Yue Lin, Rong Jin, Deng Cai, Shuicheng Yan, Xuelong Li
Abstract: Recent studies have shown that hashing methods are effective for high dimensional nearest neighbor search. A common problem shared by many existing hashing methods is that in order to achieve a satisfied performance, a large number of hash tables (i.e., long codewords) are required. To address this challenge, in this paper we propose a novel approach called Compressed Hashing by exploring the techniques of sparse coding and compressed sensing. In particular, we introduce a sparse coding scheme, based on the approximation theory of integral operator, that generate sparse representation for high dimensional vectors. We then project sparse codes into a low dimensional space by effectively exploring the Restricted Isometry Property (RIP), a key property in compressed sensing theory. Both of the theoretical analysis and the empirical studies on two large data sets show that the proposed approach is more effective than the state-of-the-art hashing algorithms.
5 0.22730094 249 cvpr-2013-Learning Compact Binary Codes for Visual Tracking
Author: Xi Li, Chunhua Shen, Anthony Dick, Anton van_den_Hengel
Abstract: A key problem in visual tracking is to represent the appearance of an object in a way that is robust to visual changes. To attain this robustness, increasingly complex models are used to capture appearance variations. However, such models can be difficult to maintain accurately and efficiently. In this paper, we propose a visual tracker in which objects are represented by compact and discriminative binary codes. This representation can be processed very efficiently, and is capable of effectively fusing information from multiple cues. An incremental discriminative learner is then used to construct an appearance model that optimally separates the object from its surrounds. Furthermore, we design a hypergraph propagation method to capture the contextual information on samples, which further improves the tracking accuracy. Experimental results on challenging videos demonstrate the effectiveness and robustness of the proposed tracker.
6 0.20780377 236 cvpr-2013-K-Means Hashing: An Affinity-Preserving Quantization Method for Learning Binary Compact Codes
7 0.20096718 255 cvpr-2013-Learning Separable Filters
8 0.18168354 70 cvpr-2013-Bottom-Up Segmentation for Top-Down Detection
9 0.15879785 248 cvpr-2013-Learning Collections of Part Models for Object Recognition
10 0.1353281 204 cvpr-2013-Histograms of Sparse Codes for Object Detection
11 0.13207932 364 cvpr-2013-Robust Object Co-detection
12 0.13068283 346 cvpr-2013-Real-Time No-Reference Image Quality Assessment Based on Filter Learning
13 0.12132993 69 cvpr-2013-Boosting Binary Keypoint Descriptors
14 0.12068973 398 cvpr-2013-Single-Pedestrian Detection Aided by Multi-pedestrian Detection
15 0.11968312 388 cvpr-2013-Semi-supervised Learning of Feature Hierarchies for Object Detection in a Video
16 0.1185801 30 cvpr-2013-Accurate Localization of 3D Objects from RGB-D Data Using Segmentation Hypotheses
17 0.11808269 325 cvpr-2013-Part Discovery from Partial Correspondence
18 0.11728679 217 cvpr-2013-Improving an Object Detector and Extracting Regions Using Superpixels
19 0.1133854 67 cvpr-2013-Blocks That Shout: Distinctive Parts for Scene Classification
20 0.11332169 122 cvpr-2013-Detection Evolution with Multi-order Contextual Co-occurrence
topicId topicWeight
[(0, 0.251), (1, -0.078), (2, -0.009), (3, -0.001), (4, 0.116), (5, 0.06), (6, 0.036), (7, -0.08), (8, -0.184), (9, -0.085), (10, -0.305), (11, -0.077), (12, 0.097), (13, 0.042), (14, 0.058), (15, 0.1), (16, 0.05), (17, 0.026), (18, -0.029), (19, 0.133), (20, -0.17), (21, 0.019), (22, 0.015), (23, -0.015), (24, 0.024), (25, -0.04), (26, -0.003), (27, -0.037), (28, -0.02), (29, -0.007), (30, -0.134), (31, -0.104), (32, -0.005), (33, -0.059), (34, 0.086), (35, 0.109), (36, 0.013), (37, 0.019), (38, -0.082), (39, 0.047), (40, -0.032), (41, -0.081), (42, -0.087), (43, -0.051), (44, -0.113), (45, -0.091), (46, 0.1), (47, 0.011), (48, 0.014), (49, -0.018)]
simIndex simValue paperId paperTitle
same-paper 1 0.88073713 163 cvpr-2013-Fast, Accurate Detection of 100,000 Object Classes on a Single Machine
Author: Thomas Dean, Mark A. Ruzon, Mark Segal, Jonathon Shlens, Sudheendra Vijayanarasimhan, Jay Yagnik
Abstract: Many object detection systems are constrained by the time required to convolve a target image with a bank of filters that code for different aspects of an object’s appearance, such as the presence of component parts. We exploit locality-sensitive hashing to replace the dot-product kernel operator in the convolution with a fixed number of hash-table probes that effectively sample all of the filter responses in time independent of the size of the filter bank. To show the effectiveness of the technique, we apply it to evaluate 100,000 deformable-part models requiring over a million (part) filters on multiple scales of a target image in less than 20 seconds using a single multi-core processor with 20GB of RAM. This represents a speed-up of approximately 20,000 times— four orders of magnitude— when compared withperforming the convolutions explicitly on the same hardware. While mean average precision over the full set of 100,000 object classes is around 0.16 due in large part to the challenges in gathering training data and collecting ground truth for so many classes, we achieve a mAP of at least 0.20 on a third of the classes and 0.30 or better on about 20% of the classes.
2 0.75807446 63 cvpr-2013-Binary Code Ranking with Weighted Hamming Distance
Author: Lei Zhang, Yongdong Zhang, Jinhu Tang, Ke Lu, Qi Tian
Abstract: Binary hashing has been widely used for efficient similarity search due to its query and storage efficiency. In most existing binary hashing methods, the high-dimensional data are embedded into Hamming space and the distance or similarity of two points are approximated by the Hamming distance between their binary codes. The Hamming distance calculation is efficient, however, in practice, there are often lots of results sharing the same Hamming distance to a query, which makes this distance measure ambiguous and poses a critical issue for similarity search where ranking is important. In this paper, we propose a weighted Hamming distance ranking algorithm (WhRank) to rank the binary codes of hashing methods. By assigning different bit-level weights to different hash bits, the returned binary codes are ranked at a finer-grained binary code level. We give an algorithm to learn the data-adaptive and query-sensitive weight for each hash bit. Evaluations on two large-scale image data sets demonstrate the efficacy of our weighted Hamming distance for binary code ranking.
3 0.75522441 223 cvpr-2013-Inductive Hashing on Manifolds
Author: Fumin Shen, Chunhua Shen, Qinfeng Shi, Anton van_den_Hengel, Zhenmin Tang
Abstract: Learning based hashing methods have attracted considerable attention due to their ability to greatly increase the scale at which existing algorithms may operate. Most of these methods are designed to generate binary codes that preserve the Euclidean distance in the original space. Manifold learning techniques, in contrast, are better able to model the intrinsic structure embedded in the original highdimensional data. The complexity of these models, and the problems with out-of-sample data, havepreviously rendered them unsuitable for application to large-scale embedding, however. In this work, we consider how to learn compact binary embeddings on their intrinsic manifolds. In order to address the above-mentioned difficulties, we describe an efficient, inductive solution to the out-of-sample data problem, and a process by which non-parametric manifold learning may be used as the basis of a hashing method. Our proposed approach thus allows the development of a range of new hashing techniques exploiting the flexibility of the wide variety of manifold learning approaches available. We particularly show that hashing on the basis of t-SNE [29] outperforms state-of-the-art hashing methods on large-scale benchmark datasets, and is very effective for image classification with very short code lengths.
4 0.73602057 87 cvpr-2013-Compressed Hashing
Author: Yue Lin, Rong Jin, Deng Cai, Shuicheng Yan, Xuelong Li
Abstract: Recent studies have shown that hashing methods are effective for high dimensional nearest neighbor search. A common problem shared by many existing hashing methods is that in order to achieve a satisfied performance, a large number of hash tables (i.e., long codewords) are required. To address this challenge, in this paper we propose a novel approach called Compressed Hashing by exploring the techniques of sparse coding and compressed sensing. In particular, we introduce a sparse coding scheme, based on the approximation theory of integral operator, that generate sparse representation for high dimensional vectors. We then project sparse codes into a low dimensional space by effectively exploring the Restricted Isometry Property (RIP), a key property in compressed sensing theory. Both of the theoretical analysis and the empirical studies on two large data sets show that the proposed approach is more effective than the state-of-the-art hashing algorithms.
5 0.61499363 236 cvpr-2013-K-Means Hashing: An Affinity-Preserving Quantization Method for Learning Binary Compact Codes
Author: Kaiming He, Fang Wen, Jian Sun
Abstract: In computer vision there has been increasing interest in learning hashing codes whose Hamming distance approximates the data similarity. The hashing functions play roles in both quantizing the vector space and generating similarity-preserving codes. Most existing hashing methods use hyper-planes (or kernelized hyper-planes) to quantize and encode. In this paper, we present a hashing method adopting the k-means quantization. We propose a novel Affinity-Preserving K-means algorithm which simultaneously performs k-means clustering and learns the binary indices of the quantized cells. The distance between the cells is approximated by the Hamming distance of the cell indices. We further generalize our algorithm to a product space for learning longer codes. Experiments show our method, named as K-means Hashing (KMH), outperforms various state-of-the-art hashing encoding methods.
6 0.58822215 255 cvpr-2013-Learning Separable Filters
7 0.57910639 69 cvpr-2013-Boosting Binary Keypoint Descriptors
8 0.54917449 204 cvpr-2013-Histograms of Sparse Codes for Object Detection
9 0.54458582 346 cvpr-2013-Real-Time No-Reference Image Quality Assessment Based on Filter Learning
10 0.53231114 122 cvpr-2013-Detection Evolution with Multi-order Contextual Co-occurrence
11 0.53190279 249 cvpr-2013-Learning Compact Binary Codes for Visual Tracking
12 0.51099259 144 cvpr-2013-Efficient Maximum Appearance Search for Large-Scale Object Detection
13 0.50530714 221 cvpr-2013-Incorporating Structural Alternatives and Sharing into Hierarchy for Multiclass Object Recognition and Detection
14 0.49632078 70 cvpr-2013-Bottom-Up Segmentation for Top-Down Detection
15 0.49487692 383 cvpr-2013-Seeking the Strongest Rigid Detector
16 0.49480271 388 cvpr-2013-Semi-supervised Learning of Feature Hierarchies for Object Detection in a Video
17 0.47737753 30 cvpr-2013-Accurate Localization of 3D Objects from RGB-D Data Using Segmentation Hypotheses
18 0.47446701 248 cvpr-2013-Learning Collections of Part Models for Object Recognition
19 0.47213724 364 cvpr-2013-Robust Object Co-detection
20 0.45840579 417 cvpr-2013-Subcategory-Aware Object Classification
topicId topicWeight
[(10, 0.137), (16, 0.018), (20, 0.13), (26, 0.057), (28, 0.018), (33, 0.288), (36, 0.019), (57, 0.014), (67, 0.09), (69, 0.053), (80, 0.015), (87, 0.077)]
simIndex simValue paperId paperTitle
1 0.94339484 447 cvpr-2013-Underwater Camera Calibration Using Wavelength Triangulation
Author: Timothy Yau, Minglun Gong, Yee-Hong Yang
Abstract: In underwater imagery, the image formation process includes refractions that occur when light passes from water into the camera housing, typically through a flat glass port. We extend the existing work on physical refraction models by considering the dispersion of light, and derive new constraints on the model parameters for use in calibration. This leads to a novel calibration method that achieves improved accuracy compared to existing work. We describe how to construct a novel calibration device for our method and evaluate the accuracy of the method through synthetic and real experiments.
2 0.9290337 248 cvpr-2013-Learning Collections of Part Models for Object Recognition
Author: Ian Endres, Kevin J. Shih, Johnston Jiaa, Derek Hoiem
Abstract: We propose a method to learn a diverse collection of discriminative parts from object bounding box annotations. Part detectors can be trained and applied individually, which simplifies learning and extension to new features or categories. We apply the parts to object category detection, pooling part detections within bottom-up proposed regions and using a boosted classifier with proposed sigmoid weak learners for scoring. On PASCAL VOC 2010, we evaluate the part detectors ’ ability to discriminate and localize annotated keypoints. Our detection system is competitive with the best-existing systems, outperforming other HOG-based detectors on the more deformable categories.
3 0.9236958 414 cvpr-2013-Structure Preserving Object Tracking
Author: Lu Zhang, Laurens van_der_Maaten
Abstract: Model-free trackers can track arbitrary objects based on a single (bounding-box) annotation of the object. Whilst the performance of model-free trackers has recently improved significantly, simultaneously tracking multiple objects with similar appearance remains very hard. In this paper, we propose a new multi-object model-free tracker (based on tracking-by-detection) that resolves this problem by incorporating spatial constraints between the objects. The spatial constraints are learned along with the object detectors using an online structured SVM algorithm. The experimental evaluation ofour structure-preserving object tracker (SPOT) reveals significant performance improvements in multi-object tracking. We also show that SPOT can improve the performance of single-object trackers by simultaneously tracking different parts of the object.
same-paper 4 0.92341542 163 cvpr-2013-Fast, Accurate Detection of 100,000 Object Classes on a Single Machine
Author: Thomas Dean, Mark A. Ruzon, Mark Segal, Jonathon Shlens, Sudheendra Vijayanarasimhan, Jay Yagnik
Abstract: Many object detection systems are constrained by the time required to convolve a target image with a bank of filters that code for different aspects of an object’s appearance, such as the presence of component parts. We exploit locality-sensitive hashing to replace the dot-product kernel operator in the convolution with a fixed number of hash-table probes that effectively sample all of the filter responses in time independent of the size of the filter bank. To show the effectiveness of the technique, we apply it to evaluate 100,000 deformable-part models requiring over a million (part) filters on multiple scales of a target image in less than 20 seconds using a single multi-core processor with 20GB of RAM. This represents a speed-up of approximately 20,000 times— four orders of magnitude— when compared withperforming the convolutions explicitly on the same hardware. While mean average precision over the full set of 100,000 object classes is around 0.16 due in large part to the challenges in gathering training data and collecting ground truth for so many classes, we achieve a mAP of at least 0.20 on a third of the classes and 0.30 or better on about 20% of the classes.
5 0.92251468 225 cvpr-2013-Integrating Grammar and Segmentation for Human Pose Estimation
Author: Brandon Rothrock, Seyoung Park, Song-Chun Zhu
Abstract: In this paper we present a compositional and-or graph grammar model for human pose estimation. Our model has three distinguishing features: (i) large appearance differences between people are handled compositionally by allowingparts or collections ofparts to be substituted with alternative variants, (ii) each variant is a sub-model that can define its own articulated geometry and context-sensitive compatibility with neighboring part variants, and (iii) background region segmentation is incorporated into the part appearance models to better estimate the contrast of a part region from its surroundings, and improve resilience to background clutter. The resulting integrated framework is trained discriminatively in a max-margin framework using an efficient and exact inference algorithm. We present experimental evaluation of our model on two popular datasets, and show performance improvements over the state-of-art on both benchmarks.
6 0.9213329 325 cvpr-2013-Part Discovery from Partial Correspondence
7 0.92062432 408 cvpr-2013-Spatiotemporal Deformable Part Models for Action Detection
8 0.92000121 104 cvpr-2013-Deep Convolutional Network Cascade for Facial Point Detection
9 0.91949302 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases
10 0.91823626 79 cvpr-2013-Cartesian K-Means
11 0.91801804 285 cvpr-2013-Minimum Uncertainty Gap for Robust Visual Tracking
12 0.91790003 14 cvpr-2013-A Joint Model for 2D and 3D Pose Estimation from a Single Image
13 0.91739523 206 cvpr-2013-Human Pose Estimation Using Body Parts Dependent Joint Regressors
14 0.91737741 365 cvpr-2013-Robust Real-Time Tracking of Multiple Objects by Volumetric Mass Densities
15 0.91633499 119 cvpr-2013-Detecting and Aligning Faces by Image Retrieval
16 0.91607594 277 cvpr-2013-MODEC: Multimodal Decomposable Models for Human Pose Estimation
17 0.91600353 314 cvpr-2013-Online Object Tracking: A Benchmark
18 0.91578788 4 cvpr-2013-3D Visual Proxemics: Recognizing Human Interactions in 3D from a Single Image
19 0.91550916 221 cvpr-2013-Incorporating Structural Alternatives and Sharing into Hierarchy for Multiclass Object Recognition and Detection
20 0.91541159 60 cvpr-2013-Beyond Physical Connections: Tree Models in Human Pose Estimation