cvpr cvpr2013 cvpr2013-204 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Xiaofeng Ren, Deva Ramanan
Abstract: Object detection has seen huge progress in recent years, much thanks to the heavily-engineered Histograms of Oriented Gradients (HOG) features. Can we go beyond gradients and do better than HOG? Weprovide an affirmative answer byproposing and investigating a sparse representation for object detection, Histograms of Sparse Codes (HSC). We compute sparse codes with dictionaries learned from data using K-SVD, and aggregate per-pixel sparse codes to form local histograms. We intentionally keep true to the sliding window framework (with mixtures and parts) and only change the underlying features. To keep training (and testing) efficient, we apply dimension reduction by computing SVD on learned models, and adopt supervised training where latent positions of roots and parts are given externally e.g. from a HOG-based detector. By learning and using local representations that are much more expressive than gradients, we demonstrate large improvements over the state of the art on the PASCAL benchmark for both root- only and part-based models.
Reference: text
sentIndex sentText sentNum sentScore
1 We compute sparse codes with dictionaries learned from data using K-SVD, and aggregate per-pixel sparse codes to form local histograms. [sent-6, score-0.627]
2 We intentionally keep true to the sliding window framework (with mixtures and parts) and only change the underlying features. [sent-7, score-0.144]
3 To keep training (and testing) efficient, we apply dimension reduction by computing SVD on learned models, and adopt supervised training where latent positions of roots and parts are given externally e. [sent-8, score-0.42]
4 There has been huge progress in object detection in recent years, much thanks to the celebrated Histograms of Oriented Gradients (HOG) features [8, 13]. [sent-14, score-0.133]
5 We develop Histograms-of-Sparse-Codes (HSC), which represents local patches through learned sparse codes instead of gradients and outperforms HOG by a large margin in state-of-the-art sliding window detection. [sent-95, score-0.504]
6 There are evidences that local features are most crucial for detection [23], and we may already be saturating the capacity of HOG [36]. [sent-99, score-0.131]
7 In the wake of recent advances in feature learning [16, 1] and its successes in many vision problems such as recognition [19] and grouping [26], it is promising to consider employing local features automatically learned from data. [sent-101, score-0.222]
8 However, feature learning for detection is a challenging problem, which has seen only limited successes so far [7, 9], partly because the massive number of windows one needs to scan. [sent-102, score-0.15]
9 In this work, we show that indeed a local representation can be effectively learned for object detection, and the learned rich features outperform HOG by a large margin as demonstrated on the PASCAL and INRIA benchmarks. [sent-104, score-0.231]
10 We compute per-pixel sparse codes using dictionaries learned through K-SVD, and aggregate them into “histograms” of sparse codes (HSC) in the spirit of HOG. [sent-105, score-0.627]
11 For a fair comparison, we keep to the HOG-driven scanning window framework as much as possible, with identical settings for mixtures, parts, and training procedure. [sent-106, score-0.164]
12 To enable efficient training (especially for part-based models), we use a supervised training strategy: instead of iterating over latent root and part locations in the semi-convex setting of DPM [13], we assume these locations are given and fixed (computed with a HOG-based detector). [sent-107, score-0.198]
13 We also apply dimension reduction using learned models to effectively compress the high dimensional sparse code representations. [sent-108, score-0.322]
14 We validate the benefits of richer representation through the use of increasingly large dictionary sizes and patch sizes. [sent-117, score-0.416]
15 To the best of our knowledge, our work is the first to show that dictionary-based features can replace and significantly outperform HOG for general object detection. [sent-118, score-0.133]
16 Related Works Object detection: Many contemporary approaches for object detection have converged on the paradigm of linear SVMs trained on HOG features [8, 13, 5, 21], as evidenced by benchmark evaluations such as PASCAL [12]. [sent-120, score-0.133]
17 Most approaches have explored model structure, either through nonparametric mixtures or exemplars [21, 10], compositional grammar structure [15], supervised correspondences [5, 2], and low-dimensional projections [29, 25]. [sent-121, score-0.113]
18 A sampling of such descriptors include local binary patterns [17], integral channel features of gradients and color cues [11], RGB covariance features [28], and multiscale spatial pyramids [4]. [sent-126, score-0.194]
19 We extensively compare to this approach and find that we are able to learn much richer structures on larger patches through sparse coding and achieve substantial improvements over HOG. [sent-131, score-0.374]
20 Sparse coding is a popular way of learning feature representation [1, 24], commonly used in image classification settings [33, 6] but also explored for detection [18]. [sent-133, score-0.204]
21 More recent uses of sparse coding are toward the pixel level, learning patch representations to replace SIFT features [3]. [sent-134, score-0.309]
22 Such patch representations can be applied to other problems such as contour detection [26]. [sent-135, score-0.138]
23 Feature Learning for Object Detection Histograms of Oriented Gradients (HOG) are highly specialized features engineered for object detection, extremely popular and used in virtually every object detection system. [sent-137, score-0.239]
24 How to build a richer local representation that outperforms HOG is a key challenge for detection, which remains open despite efforts of designing features [17], learning them [9], or combining multiple features [30]. [sent-140, score-0.22]
25 We seek to replace HOG with features automatically learned from data. [sent-141, score-0.134]
26 In this section we will develop Histograms-of-SparseCodes (HSC), which resembles HOG but is based on welldeveloped sparse coding techniques that represent each local patch using a sparse set of codewords. [sent-143, score-0.284]
27 Once per-pixel sparse codes are computed, we aggregate the codes into “histograms” on regular cells and use them to replace HOG in the standard Deformable Parts Model [13]. [sent-145, score-0.482]
28 Local Representation via Sparse Coding We use K-SVD [1] for dictionary learning, a standard unsupervised dictionary learning algorithm that generalizes 333222444755 Figure 2: Dictionaries learned through K-SVD for three patch sizes. [sent-148, score-0.624]
29 As patch size and dictionary size grow, increasingly complex patterns are represented in the dictio- nary. [sent-149, score-0.402]
30 Given a set of image patches Y = [y1, · · · , yn], K-SVD jointly finds a dictionary D = [d1, · · · , d,m·]· a·n ,dy an associated sparse code matrix X = [x1, · ,· · , ·x ,nd] by minimizing the reconstruction error mD,iXn? [sent-152, score-0.396]
31 Given the dictionary D, computing the codes X can be efficiently solved using the greedy Orthogonal Matching Pursuit (OMP) [24]. [sent-164, score-0.379]
32 Given the codes X, the dictionary D is updated sequentially by singular value decomposition. [sent-165, score-0.379]
33 Once the dictionary D is learned, we again use Orthogonal Matching Pursuit to compute sparse codes at every pixel in an image pyramid. [sent-167, score-0.459]
34 Examples of the dictionaries learned are shown in Fig. [sent-169, score-0.124]
35 As the patch size and dictionary size grow, more and more inter- esting structures are discovered (such as corners, thin lines, line endings, and high-frequency gratings). [sent-172, score-0.376]
36 Aggregation into Histograms of Sparse Codes The sliding window framework of object detection divides an image into regular cells (8x8 pixels) and computes a feature vector of each cell, to be used in a convolutionbased window scanning. [sent-175, score-0.235]
37 Let X be the sparse code computed at a pixel, whose dimension equals the dictionary size. [sent-177, score-0.415]
38 l tTeh eva rleuseul |tx is| a (s oneme oi-f)d tehnes feo ufera stupraeti vector F on each cell averaging codes in a 16x16 neighborhood, which we call Histograms of Sparse Codes (HSC). [sent-179, score-0.181]
39 Finally, we apply a power transform on each element of F F¯ = Fα (2) as is sometimes done in recognition settings [26]. [sent-181, score-0.112]
40 The power transform makes the distribution of F’s values more uniform and increases the discriminative power of F. [sent-182, score-0.138]
41 That is, each codeword iin the dictionary now has three values in the HSC: [ |xi | , max(xi, 0) , max(−xi, 0) ] (3) It is worth noting that there are very few ad-hoc design choices in these HSC features. [sent-188, score-0.269]
42 This illustrates the power oflearning richer features on larger patches, which captures more information than gradients and has less need for manually designed transforms. [sent-190, score-0.275]
43 Moreover, it is straightforward to change the settings, such as dictionary size, patch size or sparsity level, allowing the HSC features to adapt to the needs of different problems. [sent-191, score-0.41]
44 HSC features capture oriented edges using learned patterns, and can better localize them in each cell (the edges can be off-center). [sent-194, score-0.143]
45 Moreover, HSC features can represent richer patterns such as corners (the girl’s feet) or parallel lines (both horizontal and vertical in the negative image). [sent-195, score-0.154]
46 mentation of the standard sliding window detection framework, following the DPM model of [13]. [sent-200, score-0.127]
47 The computational cost is linear in the feature dimension of φ(I). [sent-211, score-0.111]
48 The learning procedure needs to iterate over training the model and assigning latent variables in the positive images, resulting in an elaborate and slow process, sometimes fragile due to the non-convex nature of the formulation. [sent-216, score-0.118]
49 For general object detection, it is difficult to obtain extensive human labels, and we instead use the state-of-theart HOG-based detection system [14], where the outputs of their final detectors are used as “groundtruth”. [sent-220, score-0.121]
50 By fixing the latent variables in the part-based model, we make a fair and direct comparison of detection using HSC vs HOG features. [sent-221, score-0.17]
51 With latent variables fixed, learning the detection model can be defined as a convex quadratic program aβr,gξnm≥i0n s. [sent-222, score-0.124]
52 This allows us to train our supervised models much faster than the latent hard-negative mining approach of [14], making it feasible to work with high dimensional appearance features in part-based models. [sent-227, score-0.16]
53 Dimension Reduction using Learned Models For root-only experiments on PASCAL, we use a dictionary of 100 codes over 5x5 patches, resulting in a 300dimensional feature vector, an order of magnitude higher than HOG. [sent-230, score-0.41]
54 We find it convenient to reduce the dimension down when training full part-based models. [sent-231, score-0.116]
55 However, unsupervised dimension reduction, such as principal component analysis (PCA) on the data, tends not to work well for either gradient features or sparse codes. [sent-232, score-0.23]
56 One way of doing proper dimension reduction in the SVM setting would be to consider joint optimization such as in the bilinear model of [25], but it requires an expensive iterative algorithm. [sent-233, score-0.172]
57 We find a simple way of doing supervised dimension reduction making use of models we have learned for the rootonly case. [sent-234, score-0.247]
58 Let us write each learned filter wim as an N nf matrix Wim, where N = nxny (the number ofa spatial ×cenlls in a part filter) and nf is the size of our HSC feature F. [sent-235, score-0.25]
59 For INRIA, we use root-only models and evaluate the HSC settings such as dictionary size, sparsity level, patch size, and power transform. [sent-242, score-0.431]
60 For PASCAL2007, we use both root-only and partbased models with supervised training, measure the improvements of HSC over HOG for the 20 classes, and compare to the state-of-the-art DPM system [14] which uses the same model but with additional tweaks (such as symmetry). [sent-243, score-0.129]
61 This dataset is an ideal setting for studying local features and comparing to HOG, as it is what HOG was designed and optimized for, and training is straightforward (there is no need for mixture or latent positions for positive examples). [sent-247, score-0.141]
62 This is an intriguing question and illustrates the difference between reconstructing signals (what sparse coding techniques are designed for) and extracting meaningful structures for recognition. [sent-253, score-0.163]
63 4(a) shows the average precision on INRIA when we change the sparsity level along with the dictionary size using 5x5 patches. [sent-255, score-0.331]
64 We observe that when the dictionary size is small, a patch cannot be well represented with a single codeword, and K > 1 (at least 2) seems to help. [sent-256, score-0.319]
65 However, when the dictionary size grows and includes more structures in its codes, the K = 1curve catches up, and performs very well. [sent-257, score-0.278]
66 Therefore we use K = 1in all the following experiments, which makes the HSC features behave indeed like histograms using a sparse code dictionary. [sent-258, score-0.194]
67 Next we investigate whether our HSC features can capture richer structures using larger patches. [sent-260, score-0.187]
68 4(b) shows the average precision as we change both the patch size and the dictionary size. [sent-262, score-0.355]
69 It is encouraging to see that indeed the average precision greatly increases as we use larger patches (along with larger dictionary size). [sent-263, score-0.368]
70 While 3x3 codes barely show an edge over nraepcsoivgre0 . [sent-264, score-0.158]
71 f8orm1 (a) (b) (c) (d) Figure 4: Investigating the use of sparse codes on INRIA. [sent-272, score-0.238]
72 (a) Average precision (AP) of sparsity level vs dictionary size; sparsity=1 works well when the dictionary is large. [sent-273, score-0.599]
73 (b) Patch size vs dictionary size; larger patches do code richer information but requires larger dictionaries. [sent-274, score-0.551]
74 (d) Power transform significantly improves the discriminative power of the sparse code histograms. [sent-276, score-0.195]
75 HOG, 5x5 and 7x7 codes work much better, and the trend continues beyond 200 codewords. [sent-277, score-0.158]
76 The ability to code and make use of larger patches shows the merits of our feature design and K-SVD learning comparing to the spherical k-medoids clustering in [9], which had considerable trouble with larger patches and observed decreases in accuracy going beyond the small size 3x3. [sent-279, score-0.354]
77 With K = 1, one can also use K-means to learn a dictionary (after normalizing the magnitude of each patch). [sent-281, score-0.221]
78 4(c) compares the detection accuracy with K-SVD vs K-means dictionaries on 5x5 patches. [sent-283, score-0.206]
79 K-SVD dictionaries have a clear advantage over K-means, probably because the reconstruction coefficient in sparse coding allows for a single codeword to model more appearances including the change of sign. [sent-284, score-0.248]
80 We use dictionary size 100 for 3x3 patches, 150 for 5x5, and 300 for 7x7. [sent-295, score-0.246]
81 HSC-based detectors outperform HOG, especially with larger patch sizes, and are competitive with the state-of-the-art DPM system (with parts). [sent-302, score-0.156]
82 We use a K-SVD dictionary of size 100 over 5x5 patches. [sent-311, score-0.246]
83 With the expansion to to half-wave rectified codes, the feature dimension is 300. [sent-312, score-0.136]
84 Our system does not handle the symmetry of filters explicitly, instead we flip the positive images and double the size of the training pool. [sent-313, score-0.128]
85 Table 2(a) shows the average precision evaluation of our root-only models comparing the HSC features with HOG. [sent-315, score-0.108]
86 4, we learn a projection of HSC features to a lower dimension (universally applied to all cells) by utilizing models learned in the root-only case, and integrate it into feature extraction. [sent-328, score-0.208]
87 4 765439801reduScVeDd15m0aiotdenlsi205 INRIA PASCAL Figure 5: Comparing the effectiveness of dimension reduction: SVD-data is the standard way of unsupervised dimension reduction computing SVD on data; SVD-model computes SVD on learned root filters. [sent-335, score-0.324]
88 We use feature dimension 100 (reduced from 300) for our part-based models. [sent-338, score-0.111]
89 To facilitate efficient training of multiple classes as in PASCAL, we precompute the feature pyramids and cache them. [sent-349, score-0.13]
90 g234 (a) Part-based models, with dimension reduction Table 2: Results on the PASCAL2007 dataset. [sent-422, score-0.13]
91 HSC and HOG results are from our supervised training system using identical settings and directly comparable. [sent-423, score-0.199]
92 Figure 6: A few examples of HOG (left) vs HSC (right) based detection (root-only), showing top three candidates (in the order of red, green, blue). [sent-425, score-0.137]
93 Discussions In this work we demonstrated that dictionary based features, learned from data unsupervisedly, can replace and outperform the hand-crafted HOG features for general object detection. [sent-430, score-0.409]
94 Our studies show that large structures in large patches, when captured in a large dictionary, generally improve object detection, calling for future work on designing and learning even richer features. [sent-434, score-0.194]
95 The sparse representation we use in the current HSC features are simple relative to what exits in the feature learning literature. [sent-435, score-0.179]
96 There are a variety of more sophisticated schemes for coding, pooling and codebook learning that could potentially boost detection performance, and we believe this is a crucial direction toward solving the challenging detection problem under real-world conditions. [sent-436, score-0.18]
97 K-SVD: An algorithm for designing overcomplete dictionaries for sparse 333222555200 [2] [3] [4] [5] [6] [7] representation. [sent-442, score-0.171]
98 Text detection and character [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] recognition in scene images with unsupervised feature learning. [sent-484, score-0.124]
99 How important are deformable parts in the deformable parts model? [sent-503, score-0.142]
100 Linear spatial pyramid matching using sparse coding for image classification. [sent-665, score-0.131]
wordName wordTfidf (topN-words)
[('hsc', 0.765), ('hog', 0.262), ('dictionary', 0.221), ('codes', 0.158), ('dpm', 0.11), ('inria', 0.096), ('richer', 0.088), ('sparse', 0.08), ('dimension', 0.08), ('patch', 0.073), ('vs', 0.072), ('pascal', 0.072), ('svd', 0.07), ('dictionaries', 0.069), ('detection', 0.065), ('gradients', 0.063), ('supervised', 0.062), ('patches', 0.061), ('power', 0.057), ('learned', 0.055), ('wim', 0.053), ('coding', 0.051), ('reduction', 0.05), ('sparsity', 0.049), ('codeword', 0.048), ('engineered', 0.046), ('pursuit', 0.044), ('neural', 0.043), ('nf', 0.043), ('bilinear', 0.042), ('features', 0.042), ('parts', 0.04), ('classes', 0.04), ('advances', 0.04), ('pages', 0.04), ('identical', 0.04), ('wmi', 0.038), ('cim', 0.038), ('histograms', 0.038), ('replace', 0.037), ('improvements', 0.037), ('filters', 0.037), ('training', 0.036), ('precision', 0.036), ('binning', 0.036), ('trouble', 0.036), ('virtually', 0.034), ('xi', 0.034), ('increasingly', 0.034), ('code', 0.034), ('caching', 0.034), ('omp', 0.034), ('latent', 0.033), ('sliding', 0.033), ('ap', 0.032), ('dikmen', 0.032), ('structures', 0.032), ('root', 0.031), ('settings', 0.031), ('feature', 0.031), ('deformable', 0.031), ('system', 0.03), ('person', 0.03), ('trainval', 0.03), ('comparing', 0.03), ('window', 0.029), ('outperform', 0.028), ('unsupervised', 0.028), ('keep', 0.028), ('successes', 0.028), ('designs', 0.027), ('intentionally', 0.027), ('aggregate', 0.027), ('mixtures', 0.027), ('object', 0.026), ('learning', 0.026), ('orthogonal', 0.026), ('intel', 0.026), ('domains', 0.026), ('larger', 0.025), ('girshick', 0.025), ('size', 0.025), ('rectified', 0.025), ('margin', 0.025), ('ren', 0.025), ('transform', 0.024), ('grammar', 0.024), ('patterns', 0.024), ('crucial', 0.024), ('ramanan', 0.023), ('pi', 0.023), ('dimensional', 0.023), ('cell', 0.023), ('oriented', 0.023), ('elaborate', 0.023), ('board', 0.023), ('pyramids', 0.023), ('codewords', 0.023), ('designing', 0.022), ('cells', 0.022)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999988 204 cvpr-2013-Histograms of Sparse Codes for Object Detection
Author: Xiaofeng Ren, Deva Ramanan
Abstract: Object detection has seen huge progress in recent years, much thanks to the heavily-engineered Histograms of Oriented Gradients (HOG) features. Can we go beyond gradients and do better than HOG? Weprovide an affirmative answer byproposing and investigating a sparse representation for object detection, Histograms of Sparse Codes (HSC). We compute sparse codes with dictionaries learned from data using K-SVD, and aggregate per-pixel sparse codes to form local histograms. We intentionally keep true to the sliding window framework (with mixtures and parts) and only change the underlying features. To keep training (and testing) efficient, we apply dimension reduction by computing SVD on learned models, and adopt supervised training where latent positions of roots and parts are given externally e.g. from a HOG-based detector. By learning and using local representations that are much more expressive than gradients, we demonstrate large improvements over the state of the art on the PASCAL benchmark for both root- only and part-based models.
2 0.20852858 296 cvpr-2013-Multi-level Discriminative Dictionary Learning towards Hierarchical Visual Categorization
Author: Li Shen, Shuhui Wang, Gang Sun, Shuqiang Jiang, Qingming Huang
Abstract: For the task of visual categorization, the learning model is expected to be endowed with discriminative visual feature representation and flexibilities in processing many categories. Many existing approaches are designed based on a flat category structure, or rely on a set of pre-computed visual features, hence may not be appreciated for dealing with large numbers of categories. In this paper, we propose a novel dictionary learning method by taking advantage of hierarchical category correlation. For each internode of the hierarchical category structure, a discriminative dictionary and a set of classification models are learnt for visual categorization, and the dictionaries in different layers are learnt to exploit the discriminative visual properties of different granularity. Moreover, the dictionaries in lower levels also inherit the dictionary of ancestor nodes, so that categories in lower levels are described with multi-scale visual information using our dictionary learning approach. Experiments on ImageNet object data subset and SUN397 scene dataset demonstrate that our approach achieves promising performance on data with large numbers of classes compared with some state-of-the-art methods, and is more efficient in processing large numbers of categories.
3 0.19560589 392 cvpr-2013-Separable Dictionary Learning
Author: Simon Hawe, Matthias Seibert, Martin Kleinsteuber
Abstract: Many techniques in computer vision, machine learning, and statistics rely on the fact that a signal of interest admits a sparse representation over some dictionary. Dictionaries are either available analytically, or can be learned from a suitable training set. While analytic dictionaries permit to capture the global structure of a signal and allow a fast implementation, learned dictionaries often perform better in applications as they are more adapted to the considered class of signals. In imagery, unfortunately, the numerical burden for (i) learning a dictionary and for (ii) employing the dictionary for reconstruction tasks only allows to deal with relatively small image patches that only capture local image information. The approach presented in this paper aims at overcoming these drawbacks by allowing a separable structure on the dictionary throughout the learning process. On the one hand, this permits larger patch-sizes for the learning phase, on the other hand, the dictionary is applied efficiently in reconstruction tasks. The learning procedure is based on optimizing over a product of spheres which updates the dictionary as a whole, thus enforces basic dictionary proper- , ties such as mutual coherence explicitly during the learning procedure. In the special case where no separable structure is enforced, our method competes with state-of-the-art dictionary learning methods like K-SVD.
4 0.1893951 185 cvpr-2013-Generalized Domain-Adaptive Dictionaries
Author: Sumit Shekhar, Vishal M. Patel, Hien V. Nguyen, Rama Chellappa
Abstract: Data-driven dictionaries have produced state-of-the-art results in various classification tasks. However, when the target data has a different distribution than the source data, the learned sparse representation may not be optimal. In this paper, we investigate if it is possible to optimally represent both source and target by a common dictionary. Specifically, we describe a technique which jointly learns projections of data in the two domains, and a latent dictionary which can succinctly represent both the domains in the projected low-dimensional space. An efficient optimization technique is presented, which can be easily kernelized and extended to multiple domains. The algorithm is modified to learn a common discriminative dictionary, which can be further used for classification. The proposed approach does not require any explicit correspondence between the source and target domains, and shows good results even when there are only a few labels available in the target domain. Various recognition experiments show that the methodperforms onparor better than competitive stateof-the-art methods.
Author: Li He, Hairong Qi, Russell Zaretzki
Abstract: This paper addresses the problem of learning overcomplete dictionaries for the coupled feature spaces, where the learned dictionaries also reflect the relationship between the two spaces. A Bayesian method using a beta process prior is applied to learn the over-complete dictionaries. Compared to previous couple feature spaces dictionary learning algorithms, our algorithm not only provides dictionaries that customized to each feature space, but also adds more consistent and accurate mapping between the two feature spaces. This is due to the unique property of the beta process model that the sparse representation can be decomposed to values and dictionary atom indicators. The proposed algorithm is able to learn sparse representations that correspond to the same dictionary atoms with the same sparsity but different values in coupled feature spaces, thus bringing consistent and accurate mapping between coupled feature spaces. Another advantage of the proposed method is that the number of dictionary atoms and their relative importance may be inferred non-parametrically. We compare the proposed approach to several state-of-the-art dictionary learning methods super-resolution. tionaries learned resolution results ods. by applying this method to single image The experimental results show that dicby our method produces the best supercompared to other state-of-the-art meth-
6 0.18197119 257 cvpr-2013-Learning Structured Low-Rank Representations for Image Classification
7 0.16246039 315 cvpr-2013-Online Robust Dictionary Learning
8 0.15973334 66 cvpr-2013-Block and Group Regularized Sparse Modeling for Dictionary Learning
9 0.1586145 422 cvpr-2013-Tag Taxonomy Aware Dictionary Learning for Region Tagging
10 0.13700514 383 cvpr-2013-Seeking the Strongest Rigid Detector
11 0.1353281 163 cvpr-2013-Fast, Accurate Detection of 100,000 Object Classes on a Single Machine
12 0.13030916 125 cvpr-2013-Dictionary Learning from Ambiguously Labeled Data
13 0.12955584 388 cvpr-2013-Semi-supervised Learning of Feature Hierarchies for Object Detection in a Video
14 0.12476316 304 cvpr-2013-Multipath Sparse Coding Using Hierarchical Matching Pursuit
15 0.12376072 248 cvpr-2013-Learning Collections of Part Models for Object Recognition
16 0.12219539 328 cvpr-2013-Pedestrian Detection with Unsupervised Multi-stage Feature Learning
17 0.12178358 70 cvpr-2013-Bottom-Up Segmentation for Top-Down Detection
18 0.11411607 5 cvpr-2013-A Bayesian Approach to Multimodal Visual Dictionary Learning
19 0.10657949 302 cvpr-2013-Multi-task Sparse Learning with Beta Process Prior for Action Recognition
20 0.10516246 421 cvpr-2013-Supervised Kernel Descriptors for Visual Recognition
topicId topicWeight
[(0, 0.221), (1, -0.148), (2, -0.143), (3, 0.137), (4, -0.02), (5, -0.035), (6, 0.104), (7, 0.101), (8, -0.039), (9, -0.015), (10, -0.112), (11, -0.008), (12, 0.071), (13, -0.076), (14, 0.08), (15, 0.024), (16, 0.005), (17, 0.013), (18, -0.001), (19, 0.022), (20, 0.007), (21, 0.036), (22, 0.031), (23, -0.038), (24, -0.011), (25, 0.045), (26, -0.058), (27, -0.009), (28, -0.008), (29, 0.013), (30, -0.008), (31, -0.034), (32, 0.028), (33, 0.006), (34, 0.033), (35, 0.044), (36, 0.047), (37, -0.0), (38, -0.034), (39, 0.024), (40, 0.022), (41, -0.024), (42, -0.019), (43, -0.045), (44, 0.018), (45, -0.052), (46, -0.016), (47, -0.025), (48, 0.017), (49, -0.055)]
simIndex simValue paperId paperTitle
same-paper 1 0.94217074 204 cvpr-2013-Histograms of Sparse Codes for Object Detection
Author: Xiaofeng Ren, Deva Ramanan
Abstract: Object detection has seen huge progress in recent years, much thanks to the heavily-engineered Histograms of Oriented Gradients (HOG) features. Can we go beyond gradients and do better than HOG? Weprovide an affirmative answer byproposing and investigating a sparse representation for object detection, Histograms of Sparse Codes (HSC). We compute sparse codes with dictionaries learned from data using K-SVD, and aggregate per-pixel sparse codes to form local histograms. We intentionally keep true to the sliding window framework (with mixtures and parts) and only change the underlying features. To keep training (and testing) efficient, we apply dimension reduction by computing SVD on learned models, and adopt supervised training where latent positions of roots and parts are given externally e.g. from a HOG-based detector. By learning and using local representations that are much more expressive than gradients, we demonstrate large improvements over the state of the art on the PASCAL benchmark for both root- only and part-based models.
2 0.79128909 296 cvpr-2013-Multi-level Discriminative Dictionary Learning towards Hierarchical Visual Categorization
Author: Li Shen, Shuhui Wang, Gang Sun, Shuqiang Jiang, Qingming Huang
Abstract: For the task of visual categorization, the learning model is expected to be endowed with discriminative visual feature representation and flexibilities in processing many categories. Many existing approaches are designed based on a flat category structure, or rely on a set of pre-computed visual features, hence may not be appreciated for dealing with large numbers of categories. In this paper, we propose a novel dictionary learning method by taking advantage of hierarchical category correlation. For each internode of the hierarchical category structure, a discriminative dictionary and a set of classification models are learnt for visual categorization, and the dictionaries in different layers are learnt to exploit the discriminative visual properties of different granularity. Moreover, the dictionaries in lower levels also inherit the dictionary of ancestor nodes, so that categories in lower levels are described with multi-scale visual information using our dictionary learning approach. Experiments on ImageNet object data subset and SUN397 scene dataset demonstrate that our approach achieves promising performance on data with large numbers of classes compared with some state-of-the-art methods, and is more efficient in processing large numbers of categories.
Author: Li He, Hairong Qi, Russell Zaretzki
Abstract: This paper addresses the problem of learning overcomplete dictionaries for the coupled feature spaces, where the learned dictionaries also reflect the relationship between the two spaces. A Bayesian method using a beta process prior is applied to learn the over-complete dictionaries. Compared to previous couple feature spaces dictionary learning algorithms, our algorithm not only provides dictionaries that customized to each feature space, but also adds more consistent and accurate mapping between the two feature spaces. This is due to the unique property of the beta process model that the sparse representation can be decomposed to values and dictionary atom indicators. The proposed algorithm is able to learn sparse representations that correspond to the same dictionary atoms with the same sparsity but different values in coupled feature spaces, thus bringing consistent and accurate mapping between coupled feature spaces. Another advantage of the proposed method is that the number of dictionary atoms and their relative importance may be inferred non-parametrically. We compare the proposed approach to several state-of-the-art dictionary learning methods super-resolution. tionaries learned resolution results ods. by applying this method to single image The experimental results show that dicby our method produces the best supercompared to other state-of-the-art meth-
4 0.76206094 66 cvpr-2013-Block and Group Regularized Sparse Modeling for Dictionary Learning
Author: Yu-Tseh Chi, Mohsen Ali, Ajit Rajwade, Jeffrey Ho
Abstract: This paper proposes a dictionary learning framework that combines the proposed block/group (BGSC) or reconstructed block/group (R-BGSC) sparse coding schemes with the novel Intra-block Coherence Suppression Dictionary Learning (ICS-DL) algorithm. An important and distinguishing feature of the proposed framework is that all dictionary blocks are trained simultaneously with respect to each data group while the intra-block coherence being explicitly minimized as an important objective. We provide both empirical evidence and heuristic support for this feature that can be considered as a direct consequence of incorporating both the group structure for the input data and the block structure for the dictionary in the learning process. The optimization problems for both the dictionary learning and sparse coding can be solved efficiently using block-gradient descent, and the details of the optimization algorithms are presented. We evaluate the proposed methods using well-known datasets, and favorable comparisons with state-of-the-art dictionary learning methods demonstrate the viability and validity of the proposed framework.
5 0.74796933 392 cvpr-2013-Separable Dictionary Learning
Author: Simon Hawe, Matthias Seibert, Martin Kleinsteuber
Abstract: Many techniques in computer vision, machine learning, and statistics rely on the fact that a signal of interest admits a sparse representation over some dictionary. Dictionaries are either available analytically, or can be learned from a suitable training set. While analytic dictionaries permit to capture the global structure of a signal and allow a fast implementation, learned dictionaries often perform better in applications as they are more adapted to the considered class of signals. In imagery, unfortunately, the numerical burden for (i) learning a dictionary and for (ii) employing the dictionary for reconstruction tasks only allows to deal with relatively small image patches that only capture local image information. The approach presented in this paper aims at overcoming these drawbacks by allowing a separable structure on the dictionary throughout the learning process. On the one hand, this permits larger patch-sizes for the learning phase, on the other hand, the dictionary is applied efficiently in reconstruction tasks. The learning procedure is based on optimizing over a product of spheres which updates the dictionary as a whole, thus enforces basic dictionary proper- , ties such as mutual coherence explicitly during the learning procedure. In the special case where no separable structure is enforced, our method competes with state-of-the-art dictionary learning methods like K-SVD.
6 0.74745005 257 cvpr-2013-Learning Structured Low-Rank Representations for Image Classification
7 0.72553664 83 cvpr-2013-Classification of Tumor Histology via Morphometric Context
8 0.70469725 315 cvpr-2013-Online Robust Dictionary Learning
9 0.69008195 5 cvpr-2013-A Bayesian Approach to Multimodal Visual Dictionary Learning
10 0.67080802 304 cvpr-2013-Multipath Sparse Coding Using Hierarchical Matching Pursuit
11 0.66661555 185 cvpr-2013-Generalized Domain-Adaptive Dictionaries
12 0.66320431 328 cvpr-2013-Pedestrian Detection with Unsupervised Multi-stage Feature Learning
13 0.65494913 388 cvpr-2013-Semi-supervised Learning of Feature Hierarchies for Object Detection in a Video
14 0.64794761 144 cvpr-2013-Efficient Maximum Appearance Search for Large-Scale Object Detection
15 0.64136469 163 cvpr-2013-Fast, Accurate Detection of 100,000 Object Classes on a Single Machine
16 0.63916218 346 cvpr-2013-Real-Time No-Reference Image Quality Assessment Based on Filter Learning
17 0.6322884 125 cvpr-2013-Dictionary Learning from Ambiguously Labeled Data
18 0.62493831 421 cvpr-2013-Supervised Kernel Descriptors for Visual Recognition
19 0.62323904 221 cvpr-2013-Incorporating Structural Alternatives and Sharing into Hierarchy for Multiclass Object Recognition and Detection
20 0.6207543 67 cvpr-2013-Blocks That Shout: Distinctive Parts for Scene Classification
topicId topicWeight
[(10, 0.126), (16, 0.031), (26, 0.054), (28, 0.026), (33, 0.269), (67, 0.126), (69, 0.064), (80, 0.018), (87, 0.062), (94, 0.12)]
simIndex simValue paperId paperTitle
1 0.93174225 184 cvpr-2013-Gauging Association Patterns of Chromosome Territories via Chromatic Median
Author: Hu Ding, Branislav Stojkovic, Ronald Berezney, Jinhui Xu
Abstract: Computing accurate and robust organizational patterns of chromosome territories inside the cell nucleus is critical for understanding several fundamental genomic processes, such as co-regulation of gene activation, gene silencing, X chromosome inactivation, and abnormal chromosome rearrangement in cancer cells. The usage of advanced fluorescence labeling and image processing techniques has enabled researchers to investigate interactions of chromosome territories at large spatial resolution. The resulting high volume of generated data demands for high-throughput and automated image analysis methods. In this paper, we introduce a novel algorithmic tool for investigating association patterns of chromosome territories in a population of cells. Our method takes as input a set of graphs, one for each cell, containing information about spatial interaction of chromosome territories, and yields a single graph that contains essential information for the whole population and stands as its structural representative. We formulate this combinato- rial problem as a semi-definite programming and present novel techniques to efficiently solve it. We validate our approach on both artificial and real biological data; the experimental results suggest that our approach yields a nearoptimal solution, and can handle large-size datasets, which are significant improvements over existing techniques.
Author: Amy Tabb
Abstract: This paper considers the problem of reconstructing the shape ofthin, texture-less objects such as leafless trees when there is noise or deterministic error in the silhouette extraction step or there are small errors in camera calibration. Traditional intersection-based techniques such as the visual hull are not robust to error because they penalize false negative and false positive error unequally. We provide a voxel-based formalism that penalizes false negative and positive error equally, by casting the reconstruction problem as a pseudo-Boolean minimization problem, where voxels are the variables of a pseudo-Boolean function and are labeled occupied or empty. Since the pseudo-Boolean minimization problem is NP-Hard for nonsubmodular functions, we developed an algorithm for an approximate solution using local minimum search. Our algorithm treats input binary probability maps (in other words, silhouettes) or continuously-valued probability maps identically, and places no constraints on camera placement. The algorithm was tested on three different leafless trees and one metal object where the number of voxels is 54.4 million (voxel sides measure 3.6 mm). Results show that our . usda .gov (a)Orignalimage(b)SilhoueteProbabiltyMap approach reconstructs the complicated branching structure of thin, texture-less objects in the presence of error where intersection-based approaches currently fail. 1
same-paper 3 0.92159671 204 cvpr-2013-Histograms of Sparse Codes for Object Detection
Author: Xiaofeng Ren, Deva Ramanan
Abstract: Object detection has seen huge progress in recent years, much thanks to the heavily-engineered Histograms of Oriented Gradients (HOG) features. Can we go beyond gradients and do better than HOG? Weprovide an affirmative answer byproposing and investigating a sparse representation for object detection, Histograms of Sparse Codes (HSC). We compute sparse codes with dictionaries learned from data using K-SVD, and aggregate per-pixel sparse codes to form local histograms. We intentionally keep true to the sliding window framework (with mixtures and parts) and only change the underlying features. To keep training (and testing) efficient, we apply dimension reduction by computing SVD on learned models, and adopt supervised training where latent positions of roots and parts are given externally e.g. from a HOG-based detector. By learning and using local representations that are much more expressive than gradients, we demonstrate large improvements over the state of the art on the PASCAL benchmark for both root- only and part-based models.
4 0.92063642 119 cvpr-2013-Detecting and Aligning Faces by Image Retrieval
Author: Xiaohui Shen, Zhe Lin, Jonathan Brandt, Ying Wu
Abstract: Detecting faces in uncontrolled environments continues to be a challenge to traditional face detection methods[24] due to the large variation in facial appearances, as well as occlusion and clutter. In order to overcome these challenges, we present a novel and robust exemplarbased face detector that integrates image retrieval and discriminative learning. A large database of faces with bounding rectangles and facial landmark locations is collected, and simple discriminative classifiers are learned from each of them. A voting-based method is then proposed to let these classifiers cast votes on the test image through an efficient image retrieval technique. As a result, faces can be very efficiently detected by selecting the modes from the voting maps, without resorting to exhaustive sliding window-style scanning. Moreover, due to the exemplar-based framework, our approach can detect faces under challenging conditions without explicitly modeling their variations. Evaluation on two public benchmark datasets shows that our new face detection approach is accurate and efficient, and achieves the state-of-the-art performance. We further propose to use image retrieval for face validation (in order to remove false positives) and for face alignment/landmark localization. The same methodology can also be easily generalized to other facerelated tasks, such as attribute recognition, as well as general object detection.
5 0.92024064 248 cvpr-2013-Learning Collections of Part Models for Object Recognition
Author: Ian Endres, Kevin J. Shih, Johnston Jiaa, Derek Hoiem
Abstract: We propose a method to learn a diverse collection of discriminative parts from object bounding box annotations. Part detectors can be trained and applied individually, which simplifies learning and extension to new features or categories. We apply the parts to object category detection, pooling part detections within bottom-up proposed regions and using a boosted classifier with proposed sigmoid weak learners for scoring. On PASCAL VOC 2010, we evaluate the part detectors ’ ability to discriminate and localize annotated keypoints. Our detection system is competitive with the best-existing systems, outperforming other HOG-based detectors on the more deformable categories.
6 0.91953874 339 cvpr-2013-Probabilistic Graphlet Cut: Exploiting Spatial Structure Cue for Weakly Supervised Image Segmentation
7 0.91864938 254 cvpr-2013-Learning SURF Cascade for Fast and Accurate Object Detection
8 0.91835475 288 cvpr-2013-Modeling Mutual Visibility Relationship in Pedestrian Detection
9 0.91794235 2 cvpr-2013-3D Pictorial Structures for Multiple View Articulated Pose Estimation
10 0.91631949 122 cvpr-2013-Detection Evolution with Multi-order Contextual Co-occurrence
11 0.91369647 60 cvpr-2013-Beyond Physical Connections: Tree Models in Human Pose Estimation
12 0.91330242 414 cvpr-2013-Structure Preserving Object Tracking
13 0.91326708 4 cvpr-2013-3D Visual Proxemics: Recognizing Human Interactions in 3D from a Single Image
14 0.91322678 43 cvpr-2013-Analyzing Semantic Segmentation Using Hybrid Human-Machine CRFs
15 0.91273415 225 cvpr-2013-Integrating Grammar and Segmentation for Human Pose Estimation
16 0.9126215 345 cvpr-2013-Real-Time Model-Based Rigid Object Pose Estimation and Tracking Combining Dense and Sparse Visual Cues
17 0.91191274 325 cvpr-2013-Part Discovery from Partial Correspondence
18 0.91157144 104 cvpr-2013-Deep Convolutional Network Cascade for Facial Point Detection
19 0.91062874 94 cvpr-2013-Context-Aware Modeling and Recognition of Activities in Video
20 0.90991312 221 cvpr-2013-Incorporating Structural Alternatives and Sharing into Hierarchy for Multiclass Object Recognition and Detection