iccv iccv2013 iccv2013-426 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Ross Girshick, Jitendra Malik
Abstract: In this paper, we show how to train a deformable part model (DPM) fast—typically in less than 20 minutes, or four times faster than the current fastest method—while maintaining high average precision on the PASCAL VOC datasets. At the core of our approach is “latent LDA,” a novel generalization of linear discriminant analysis for learning latent variable models. Unlike latent SVM, latent LDA uses efficient closed-form updates and does not require an expensive search for hard negative examples. Our approach also acts as a springboard for a detailed experimental study of DPM training. We isolate and quantify the impact of key training factors for the first time (e.g., How important are discriminative SVM filters? How important is joint parameter estimation? How many negative images are needed for training?). Our findings yield useful insights for researchers working with Markov random fields and partbased models, and have practical implications for speeding up tasks such as model selection.
Reference: text
sentIndex sentText sentNum sentScore
1 At the core of our approach is “latent LDA,” a novel generalization of linear discriminant analysis for learning latent variable models. [sent-4, score-0.156]
2 Unlike latent SVM, latent LDA uses efficient closed-form updates and does not require an expensive search for hard negative examples. [sent-5, score-0.511]
3 Training a DPM involves optimizing a latent SVM (LSVM). [sent-21, score-0.156]
4 Some of the aforementioned techniques can accelerate training via fast detection. [sent-24, score-0.151]
5 In this paper we take a more direct approach: we develop techniques that accelerate training by avoiding most of the typically requisite data mining. [sent-26, score-0.151]
6 Using this technique, they learned the parameters of a HOG filter by linear discriminant analysis (LDA) instead of the usual, comparatively slow, SVM training route. [sent-34, score-0.215]
7 The main advantage of LDA-HOG over SVM-HOG is training speed; the former approach does not require searching for hard negative instances. [sent-37, score-0.305]
8 Optimizing our proposed objective 333000111666 involves alternating between imputing latent labels and updating model parameters, like an LSVM. [sent-42, score-0.258]
9 However, when the latent labels are fixed, the optimal parameter vector can be solved for in closed form. [sent-43, score-0.193]
10 We call this natural counterpart to latent SVM “latent LDA” (LLDA). [sent-45, score-0.156]
11 As foreshadowed by the INRIA pedestrian experiments described above, we find that LLDA DPMs achieve surprisingly good AP performance on the much more challenging PASCAL VOC datasets [10], even though no hard negative examples are used. [sent-46, score-0.294]
12 For comparison, training the recently proposed exemplar SVM [17] with 600 examples takes about 4 hours on 100 cores, and yields a similar mean AP (18. [sent-48, score-0.194]
13 , learning each filter independently and then stitching them into a model) can dramatically underperform joint training. [sent-59, score-0.181]
14 This finding can be interpreted more broadly as a cautionary tale showing that joint training of a Markov (or conditional) random field can turn an underperforming technique into one that is state-of-the-art. [sent-60, score-0.146]
15 Our analysis also uncovers a surprising result: even though object detection performance with LLDA DPMs trails behind LSVM, they impute equally good latent labels. [sent-61, score-0.228]
16 , a choice of DPM mixture component and filter positions for each positive example) are “good” in the sense that if they are used as ground truth in large-margin training, they yield performance equal to a DPM trained end-to-end with LSVM. [sent-64, score-0.255]
17 We “warm start” LSVM training by replacing the usual sequence of convex subproblems with latent LDA. [sent-66, score-0.366]
18 A simple, alternative approach to speed up training is to subsample the negative training images. [sent-69, score-0.373]
19 Surprisingly, no previous work has studied how DPM detection accuracy varies as a function of the number of negative training examples (cf. [sent-70, score-0.345]
20 Interestingly, very few negative images are needed to get good performance. [sent-73, score-0.161]
21 For INRIA, data mining from just 64 negative images—instead of the customary 1218—yields 76. [sent-74, score-0.22]
22 However, in LSVM training data mining is performed many times, once for each subproblem solved, making training slow even with subsampling. [sent-77, score-0.345]
23 The LLDA warm start can be viewed as subsampling to the limit where no negative examples are used during all but the last iteration of training. [sent-78, score-0.354]
24 3% mAP) with a median training time under 20 minutes (4x faster than [14]). [sent-81, score-0.155]
25 Training without negative examples In this section, we develop latent LDA, an alternative to latent SVM that can quickly train DPMs without any hard negative examples. [sent-83, score-0.72]
26 Latent SVM primer Consider a set of labeled training examples D = {(xCno, ynnsi)d}enNr= a1, wsether oef ea lacbhe xn comes gfrom ex an input space {X( xand yn is a binary label in {−1, 1}. [sent-88, score-0.248]
27 Correspondingly, a latent label is a pair z = (k, h), where k ∈ {1, . [sent-98, score-0.156]
28 For a DPM, k identifies a pose or viewpoint component while h specifies the image position and scale placement of each filter used by component k. [sent-105, score-0.197]
29 2 regularization penalty is replaced by the “max-component” regularizer [13], which penalizes the mixture component wk with the largest ? [sent-109, score-0.312]
30 Experimentally, this equalization stabilizes LSVM training by preventing mixture components from losing all of their examples and “evaporating” (cf. [sent-117, score-0.193]
31 For example, when training a DPM each subproblem requires densely scanning a large set ofimages in order to extract a small number ofhard negative examples. [sent-124, score-0.312]
32 1): in place of maxcomponent regularization, we constrain all components to have unit norm; we removed the per-example loss for the negative training instances; and for each positive example, the objective drives w towards a solution that gives at least one latent labeling a high score under fw. [sent-138, score-0.487]
33 By adding a Lagrange multiplier λk for each constraint and equating the gradient with respect to each wk to zero, we arrive at wk ∝ ? [sent-155, score-0.442]
34 In each iteration, we use fixed cluster “centers” wk to infer latent cluster assignments (and, in our case, additional latent labels), and then with those fixed, we update the centers wk to be the (normalized) means of the new clusters. [sent-167, score-0.724]
35 However, we show that by applying a simple whitening transformation to the training features, the algorithm’s output has an appealing connection to linear discriminant analysis. [sent-169, score-0.147]
36 To achieve this goal we need to cheat slightly and compute some coarse statistics of all training examples (including those from the negative class). [sent-170, score-0.342]
37 Specifically, we need the sample mean μ of the negative examples and the sample covariance matrix S of the entire training set. [sent-171, score-0.376]
38 Since, by assumption, our feature vectors are non-zero within only one component’s span, computing these statistics for each mixture component independently is sufficient (S is blockdiagonal, with blocks Sk). [sent-172, score-0.166]
39 Our estimates will come in the form of basic building blocks that can synthesize the mean and covariance for a mixture component with any number of filters, each with any shape. [sent-174, score-0.195]
40 Moreover, our estimates are class independent: in datasets with large class imbalance, such as any typical detection dataset, the negative examples’ sample mean μ can be computed from all examples, since the minority class lends a negligible contribution. [sent-175, score-0.191]
41 Plugging the transformed features into the update for wk (Eq. [sent-179, score-0.206]
42 Looking into the dot product, some basic algebra shows that w˜Tk ϕ˜k(x, h) = wkTϕk(x, h) + bk, (9) where wk ? [sent-188, score-0.206]
43 10 says a gd LotD product wfieirth i ws˜ wk i n∝ th Se whiten−ed μ μfeature space is equivalent to a dot product with a vector wk—which has the form of an LDA classifier—in the unwhitened feature space, plus a bias (Eq. [sent-195, score-0.206]
44 Under this interpretation, the function fw (x) = max(k,h)∈Z(x) wkTϕk (x, h) scores an example by picking the best LDA component classifier and latent label pair. [sent-197, score-0.25]
45 Overall, the classifier can be thought of as the “latent LDA” counterpart to a latent SVM. [sent-198, score-0.156]
46 Large-margin LLDA (LM-LLDA) One immediate application of latent LDA is as a fast initialization, or “warm start,” for latent SVM training. [sent-201, score-0.312]
47 We can use LLDA to quickly generate the unobserved labels (kn, hn) for each positive example, and then treat those labels as if they were the observed ground truth in the convex large-margin objective 12mkax? [sent-202, score-0.181]
48 +, where N = {n : yn = −1} is the index set of negative examples a=nd { n{m :}+ y d=eno −te1s} m isa txh{e0, i nmde}. [sent-210, score-0.233]
49 Optionally, one could run multiple LSVM coordinate descent iterations after the warm start, similar to how expectation maximization is used to initialize a latent struc- tural SVM in [21]. [sent-213, score-0.295]
50 A DPM is composed of K mixture components, each of which has a root filter and P part filters. [sent-221, score-0.148]
51 ,wK), where each per-component weight vector wk is composed of the parameter blocks: wk = ( fk0 , . [sent-228, score-0.412]
52 ϕk (x, h) contains: HOG features φkp extracted from image x at the filter placements listed in h; and deformation features δkp(h) = −(dx2 , dx , dy2, dy), where (dx , dy) are displacements relat−iv(ed to the p-th part’s anchor, yielding ϕk(x,h) = ? [sent-293, score-0.216]
53 We sidestep this problem with the following modeling assumptions: (1) HOG features and deformation features are uncorrelated; (2) deformation features are uncorrelated across parts; and (3) HOG features are un− × correlated across parts. [sent-339, score-0.167]
54 , number of filters and their shapes) of any particular mixture component. [sent-351, score-0.183]
55 This padding allows filters to slide outside the image, enabling detection of partially truncated objects. [sent-369, score-0.208]
56 Motivated by the intuition that a filter placed outside the image should have the same expected score as a filter placed in background, we pad with the mean HOG feature vector, instead of zero. [sent-370, score-0.218]
57 With these preliminaries in place we evaluate our first method, LLDA-0—pure latent LDA without any hard negative examples. [sent-380, score-0.355]
58 0% mAP (Table 1), considering that training takes less than 8 minutes for a class with 600 examples on a single six-core machine. [sent-384, score-0.203]
59 Both systems in [15] rescore LDA-HOG filter detections using a second-layer SVM trained with negative examples, and thus are not “pure” LDA methods. [sent-390, score-0.297]
60 Originally designed as a technique to speed up training of graphical models [18], two-stage training is also an ideal tool for understanding what makes a model perform well. [sent-401, score-0.212]
61 In the following experiments, we use the latent labels generated by LLDA-0. [sent-403, score-0.193]
62 We also fix a set of negative training images, with an average of 2300 per class. [sent-405, score-0.267]
63 This feature is designed to allow filters to learn a score bias for placements outside the image. [sent-407, score-0.181]
64 LLDA-1: LLDA-0 followed by large-margin training of deformation costs, filter calibration weights, and biases. [sent-431, score-0.285]
65 LLDA-2: same as the LLDA-1, but with independently trained SVM filters replacing the LDA filters. [sent-432, score-0.203]
66 (a) Improvement from SVM filters (b) Improvement from joint training Figure 1. [sent-437, score-0.29]
67 Independent filter learning dramatically underperforms joint training, suggesting other part-based models (e. [sent-442, score-0.149]
68 This two-stage approach allows us to adjust the relative magnitudes of the filters, without changing their directions, as well as to tune deformation costs and component biases. [sent-451, score-0.158]
69 Discriminatively calibrating these fixed filters boosts mAP substantially from 18. [sent-453, score-0.202]
70 LLDA-1 is similar to the multi-component and exemplar LDA detectors since all systems use a second stage to discriminatively calibrate LDA-HOG filter responses, but achieves a much higher mAP due to the DPM’s parts. [sent-460, score-0.208]
71 For a scalar α that multiplies the output of a fixed filter f, we set the regularization penalty for α to To verify this approach, we performed two-stage training using the jointly trained filters from the LSVM-μ models as unaries. [sent-462, score-0.409]
72 We use the same twostage training methodology as before, but this time we replace each LDA filter with a linear SVM. [sent-472, score-0.215]
73 For training these SVMs, the positive examples come from the labels generated by LLDA-0 (thus the LDA and SVM filters use exactly the same positive feature vectors). [sent-473, score-0.407]
74 The negative examples are all subwindows of the negative images. [sent-474, score-0.37]
75 The second stage model parameterization is the same as in LLDA1, making discriminative SVM filters the only difference. [sent-475, score-0.199]
76 Discriminative calibration and replacing LDA filters with SVM filters substantially closes the gap between LLDA and LSVM. [sent-483, score-0.368]
77 These results suggest the remaining difference stems from either a less effective latent labeling of the positives or from 333000222111 mAP versus negative image subsampling number of negative images (log scale) Figure 2. [sent-484, score-0.537]
78 Mean average precision decays gracefully as the number of negative images decreases (note the log scale). [sent-485, score-0.161]
79 LM-LLDA performs better than LSVM-μ when very few negative images are available. [sent-486, score-0.161]
80 Joint training completely closes the AP gap (Table 1 LLDA-3 and Figure 2). [sent-494, score-0.186]
81 This implies that the latent labels imputed by LLDA-0 are equivalent, from a final mAP perspective, to the ones obtained by coordinate descent with LSVM. [sent-495, score-0.351]
82 Subsampling negative images LDA-HOG [15] and LLDA DPM can be seen as methods for training detectors in the limit of subsampling where no negative images are used. [sent-503, score-0.487]
83 Surprisingly, the question of how detector accuracy varies with the number of negative training examples has not been carefully studied. [sent-504, score-0.315]
84 Data mining from two negative images yields an AP of 29. [sent-510, score-0.22]
85 AP slowly climbs to 80% while increasing the number of negative images 20-fold to the full set of 1218. [sent-519, score-0.161]
86 Data mining and convex optimization time 2 neg im 20 neg im 200 neg im Figure 3. [sent-522, score-0.354]
87 LM-LLDA spends one-third as much time on data mining and convex optimization compared to LSVM-μ across various levels of negative image subsampling. [sent-523, score-0.263]
88 Using only 200 negative images—an order of magnitude fewer than typically used—results in 32. [sent-527, score-0.161]
89 Even with few negative images, mAP is relatively stable over random draws due to averaging (in contrast with the single-class INRIA experi- ments). [sent-532, score-0.184]
90 At two negative images, LM-LLDA had a standard deviation of only 0. [sent-533, score-0.161]
91 Another notable trend is LM-LLDA gains further advantage over LSVM as the number of negative images decreases. [sent-535, score-0.161]
92 , 2) negative images LSVM training begins to impute noisy labels for the positive examples. [sent-538, score-0.382]
93 A final pattern, which might not even appear worth mentioning at first, is that mAP increases with the number of negative examples. [sent-540, score-0.161]
94 find that AP decreases as the number of negative examples increases when binary hinge loss is used instead of their proposed ranking loss. [sent-543, score-0.209]
95 The rest of training is spent imputing labels for the positives, which takes the same amount of time in both cases. [sent-547, score-0.213]
96 Relative to the publicly available DPM code [14], which is currently the fastest option for training DPMs, we can learn models nearly 4x faster (with median training wall times of 19. [sent-552, score-0.212]
97 The four-fold speedup for DPM training achieved in this paper will allow researchers working to improve DPMs, using them as building blocks in a larger system, or applying them to new datasets, to iterate more quickly. [sent-562, score-0.157]
98 Moreover, our latent LDA approach is general and applies to latent variable models beyond DPM. [sent-563, score-0.312]
99 Secondly, we often waste significant time mining hard negative examples from excessively large image sets. [sent-567, score-0.306]
100 Model selection, for instance, should be performed using a small set of negative images (while achieving nearly full AP performance), while only the final model should be trained on the full set, if at all. [sent-568, score-0.188]
wordName wordTfidf (topN-words)
[('llda', 0.46), ('lda', 0.318), ('dpm', 0.314), ('lsvm', 0.293), ('wk', 0.206), ('dpms', 0.201), ('yes', 0.163), ('negative', 0.161), ('latent', 0.156), ('filters', 0.144), ('filter', 0.109), ('training', 0.106), ('imputed', 0.105), ('ap', 0.093), ('warm', 0.086), ('neg', 0.084), ('kp', 0.08), ('hog', 0.078), ('sk', 0.075), ('svm', 0.073), ('deformation', 0.07), ('xn', 0.07), ('biases', 0.066), ('inria', 0.065), ('wkt', 0.063), ('znw', 0.063), ('covariance', 0.061), ('subsampling', 0.059), ('mining', 0.059), ('pascal', 0.058), ('blocks', 0.051), ('fw', 0.05), ('deformable', 0.05), ('minutes', 0.049), ('examples', 0.048), ('subproblem', 0.045), ('accelerate', 0.045), ('costs', 0.044), ('component', 0.044), ('map', 0.044), ('convex', 0.043), ('cago', 0.042), ('closes', 0.042), ('impute', 0.042), ('whitening', 0.041), ('cccp', 0.041), ('whitened', 0.041), ('joint', 0.04), ('exemplar', 0.04), ('mixture', 0.039), ('hard', 0.038), ('gap', 0.038), ('girshick', 0.037), ('placements', 0.037), ('imputing', 0.037), ('labels', 0.037), ('voc', 0.037), ('wt', 0.036), ('positive', 0.036), ('hariharan', 0.035), ('padding', 0.034), ('acceleration', 0.034), ('poselets', 0.034), ('boosts', 0.034), ('spent', 0.033), ('elda', 0.032), ('subproblems', 0.032), ('independently', 0.032), ('stage', 0.031), ('accelerating', 0.031), ('detection', 0.03), ('multiplier', 0.03), ('vedaldi', 0.03), ('slow', 0.029), ('usual', 0.029), ('unaries', 0.029), ('objective', 0.028), ('descent', 0.028), ('anchor', 0.028), ('discriminatively', 0.028), ('isolate', 0.028), ('ent', 0.028), ('felzenszwalb', 0.027), ('coarse', 0.027), ('trained', 0.027), ('stationary', 0.027), ('speeding', 0.027), ('uncorrelated', 0.027), ('span', 0.026), ('maintaining', 0.025), ('coordinate', 0.025), ('insight', 0.025), ('calibrating', 0.024), ('nowozin', 0.024), ('pedestrian', 0.024), ('yn', 0.024), ('parameterization', 0.024), ('regularization', 0.023), ('surprisingly', 0.023), ('draws', 0.023)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000005 426 iccv-2013-Training Deformable Part Models with Decorrelated Features
Author: Ross Girshick, Jitendra Malik
Abstract: In this paper, we show how to train a deformable part model (DPM) fast—typically in less than 20 minutes, or four times faster than the current fastest method—while maintaining high average precision on the PASCAL VOC datasets. At the core of our approach is “latent LDA,” a novel generalization of linear discriminant analysis for learning latent variable models. Unlike latent SVM, latent LDA uses efficient closed-form updates and does not require an expensive search for hard negative examples. Our approach also acts as a springboard for a detailed experimental study of DPM training. We isolate and quantify the impact of key training factors for the first time (e.g., How important are discriminative SVM filters? How important is joint parameter estimation? How many negative images are needed for training?). Our findings yield useful insights for researchers working with Markov random fields and partbased models, and have practical implications for speeding up tasks such as model selection.
2 0.20838307 107 iccv-2013-Deformable Part Descriptors for Fine-Grained Recognition and Attribute Prediction
Author: Ning Zhang, Ryan Farrell, Forrest Iandola, Trevor Darrell
Abstract: Recognizing objects in fine-grained domains can be extremely challenging due to the subtle differences between subcategories. Discriminative markings are often highly localized, leading traditional object recognition approaches to struggle with the large pose variation often present in these domains. Pose-normalization seeks to align training exemplars, either piecewise by part or globally for the whole object, effectively factoring out differences in pose and in viewing angle. Prior approaches relied on computationally-expensive filter ensembles for part localization and required extensive supervision. This paper proposes two pose-normalized descriptors based on computationally-efficient deformable part models. The first leverages the semantics inherent in strongly-supervised DPM parts. The second exploits weak semantic annotations to learn cross-component correspondences, computing pose-normalized descriptors from the latent parts of a weakly-supervised DPM. These representations enable pooling across pose and viewpoint, in turn facilitating tasks such as fine-grained recognition and attribute prediction. Experiments conducted on the Caltech-UCSD Birds 200 dataset and Berkeley Human Attribute dataset demonstrate significant improvements over state-of-art algorithms.
3 0.19022368 187 iccv-2013-Group Norm for Learning Structured SVMs with Unstructured Latent Variables
Author: Daozheng Chen, Dhruv Batra, William T. Freeman
Abstract: Latent variables models have been applied to a number of computer vision problems. However, the complexity of the latent space is typically left as a free design choice. A larger latent space results in a more expressive model, but such models are prone to overfitting and are slower to perform inference with. The goal of this paper is to regularize the complexity of the latent space and learn which hidden states are really relevant for prediction. Specifically, we propose using group-sparsity-inducing regularizers such as ?1-?2 to estimate the parameters of Structured SVMs with unstructured latent variables. Our experiments on digit recognition and object detection show that our approach is indeed able to control the complexity of latent space without any significant loss in accuracy of the learnt model.
4 0.14436822 73 iccv-2013-Class-Specific Simplex-Latent Dirichlet Allocation for Image Classification
Author: Mandar Dixit, Nikhil Rasiwasia, Nuno Vasconcelos
Abstract: An extension of the latent Dirichlet allocation (LDA), denoted class-specific-simplex LDA (css-LDA), is proposed for image classification. An analysis of the supervised LDA models currently used for this task shows that the impact of class information on the topics discovered by these models is very weak in general. This implies that the discovered topics are driven by general image regularities, rather than the semantic regularities of interest for classification. To address this, we introduce a model that induces supervision in topic discovery, while retaining the original flexibility of LDA to account for unanticipated structures of interest. The proposed css-LDA is an LDA model with class supervision at the level of image features. In css-LDA topics are discovered per class, i.e. a single set of topics shared across classes is replaced by multiple class-specific topic sets. This model can be used for generative classification using the Bayes decision rule or even extended to discriminative classification with support vector machines (SVMs). A css-LDA model can endow an image with a vector of class and topic specific count statistics that are similar to the Bag-of-words (BoW) histogram. SVM-based discriminants can be learned for classes in the space of these histograms. The effectiveness of css-LDA model in both generative and discriminative classification frameworks is demonstrated through an extensive experimental evaluation, involving multiple benchmark datasets, where it is shown to outperform all existing LDA based image classification approaches.
5 0.14169578 111 iccv-2013-Detecting Dynamic Objects with Multi-view Background Subtraction
Author: Raúl Díaz, Sam Hallman, Charless C. Fowlkes
Abstract: The confluence of robust algorithms for structure from motion along with high-coverage mapping and imaging of the world around us suggests that it will soon be feasible to accurately estimate camera pose for a large class photographs taken in outdoor, urban environments. In this paper, we investigate how such information can be used to improve the detection of dynamic objects such as pedestrians and cars. First, we show that when rough camera location is known, we can utilize detectors that have been trained with a scene-specific background model in order to improve detection accuracy. Second, when precise camera pose is available, dense matching to a database of existing images using multi-view stereo provides a way to eliminate static backgrounds such as building facades, akin to background-subtraction often used in video analysis. We evaluate these ideas using a dataset of tourist photos with estimated camera pose. For template-based pedestrian detection, we achieve a 50 percent boost in average precision over baseline.
6 0.14107373 390 iccv-2013-Shufflets: Shared Mid-level Parts for Fast Object Detection
7 0.1329881 411 iccv-2013-Symbiotic Segmentation and Part Localization for Fine-Grained Categorization
8 0.12258387 66 iccv-2013-Building Part-Based Object Detectors via 3D Geometry
9 0.12247046 189 iccv-2013-HOGgles: Visualizing Object Detection Features
10 0.12097768 61 iccv-2013-Beyond Hard Negative Mining: Efficient Detector Learning via Block-Circulant Decomposition
11 0.11639551 236 iccv-2013-Learning Discriminative Part Detectors for Image Classification and Cosegmentation
12 0.11031014 249 iccv-2013-Learning to Share Latent Tasks for Action Recognition
13 0.10742714 242 iccv-2013-Learning People Detectors for Tracking in Crowded Scenes
14 0.10690469 85 iccv-2013-Compositional Models for Video Event Detection: A Multiple Kernel Learning Latent Variable Approach
15 0.1045586 269 iccv-2013-Modeling Occlusion by Discriminative AND-OR Structures
16 0.10301634 277 iccv-2013-Multi-channel Correlation Filters
17 0.098527759 286 iccv-2013-NYC3DCars: A Dataset of 3D Vehicles in Geographic Context
18 0.097511321 327 iccv-2013-Predicting an Object Location Using a Global Image Representation
19 0.096785717 104 iccv-2013-Decomposing Bag of Words Histograms
20 0.092901506 377 iccv-2013-Segmentation Driven Object Detection with Fisher Vectors
topicId topicWeight
[(0, 0.198), (1, 0.059), (2, -0.024), (3, -0.056), (4, 0.071), (5, -0.028), (6, -0.017), (7, 0.042), (8, -0.069), (9, -0.111), (10, 0.033), (11, -0.038), (12, -0.068), (13, -0.131), (14, -0.021), (15, -0.042), (16, 0.013), (17, 0.13), (18, 0.109), (19, 0.036), (20, -0.036), (21, 0.05), (22, -0.033), (23, -0.022), (24, 0.035), (25, 0.041), (26, -0.009), (27, -0.05), (28, 0.083), (29, 0.025), (30, 0.028), (31, -0.009), (32, -0.057), (33, 0.035), (34, 0.05), (35, -0.069), (36, 0.013), (37, 0.085), (38, 0.102), (39, 0.002), (40, 0.042), (41, -0.055), (42, 0.067), (43, -0.121), (44, -0.029), (45, -0.051), (46, -0.001), (47, -0.024), (48, -0.068), (49, 0.021)]
simIndex simValue paperId paperTitle
same-paper 1 0.94599789 426 iccv-2013-Training Deformable Part Models with Decorrelated Features
Author: Ross Girshick, Jitendra Malik
Abstract: In this paper, we show how to train a deformable part model (DPM) fast—typically in less than 20 minutes, or four times faster than the current fastest method—while maintaining high average precision on the PASCAL VOC datasets. At the core of our approach is “latent LDA,” a novel generalization of linear discriminant analysis for learning latent variable models. Unlike latent SVM, latent LDA uses efficient closed-form updates and does not require an expensive search for hard negative examples. Our approach also acts as a springboard for a detailed experimental study of DPM training. We isolate and quantify the impact of key training factors for the first time (e.g., How important are discriminative SVM filters? How important is joint parameter estimation? How many negative images are needed for training?). Our findings yield useful insights for researchers working with Markov random fields and partbased models, and have practical implications for speeding up tasks such as model selection.
2 0.82842839 187 iccv-2013-Group Norm for Learning Structured SVMs with Unstructured Latent Variables
Author: Daozheng Chen, Dhruv Batra, William T. Freeman
Abstract: Latent variables models have been applied to a number of computer vision problems. However, the complexity of the latent space is typically left as a free design choice. A larger latent space results in a more expressive model, but such models are prone to overfitting and are slower to perform inference with. The goal of this paper is to regularize the complexity of the latent space and learn which hidden states are really relevant for prediction. Specifically, we propose using group-sparsity-inducing regularizers such as ?1-?2 to estimate the parameters of Structured SVMs with unstructured latent variables. Our experiments on digit recognition and object detection show that our approach is indeed able to control the complexity of latent space without any significant loss in accuracy of the learnt model.
3 0.74886924 390 iccv-2013-Shufflets: Shared Mid-level Parts for Fast Object Detection
Author: Iasonas Kokkinos
Abstract: We present a method to identify and exploit structures that are shared across different object categories, by using sparse coding to learn a shared basis for the ‘part’ and ‘root’ templates of Deformable Part Models (DPMs). Our first contribution consists in using Shift-Invariant Sparse Coding (SISC) to learn mid-level elements that can translate during coding. This results in systematically better approximations than those attained using standard sparse coding. To emphasize that the learned mid-level structures are shiftable we call them shufflets. Our second contribution consists in using the resulting score to construct probabilistic upper bounds to the exact template scores, instead of taking them ‘at face value ’ as is common in current works. We integrate shufflets in DualTree Branch-and-Bound and cascade-DPMs and demonstrate that we can achieve a substantial acceleration, with practically no loss in performance.
4 0.71450299 236 iccv-2013-Learning Discriminative Part Detectors for Image Classification and Cosegmentation
Author: Jian Sun, Jean Ponce
Abstract: In this paper, we address the problem of learning discriminative part detectors from image sets with category labels. We propose a novel latent SVM model regularized by group sparsity to learn these part detectors. Starting from a large set of initial parts, the group sparsity regularizer forces the model to jointly select and optimize a set of discriminative part detectors in a max-margin framework. We propose a stochastic version of a proximal algorithm to solve the corresponding optimization problem. We apply the proposed method to image classification and cosegmentation, and quantitative experiments with standard benchmarks show that it matches or improves upon the state of the art.
5 0.69687629 66 iccv-2013-Building Part-Based Object Detectors via 3D Geometry
Author: Abhinav Shrivastava, Abhinav Gupta
Abstract: This paper proposes a novel part-based representation for modeling object categories. Our representation combines the effectiveness of deformable part-based models with the richness of geometric representation by defining parts based on consistent underlying 3D geometry. Our key hypothesis is that while the appearance and the arrangement of parts might vary across the instances of object categories, the constituent parts will still have consistent underlying 3D geometry. We propose to learn this geometrydriven deformable part-based model (gDPM) from a set of labeled RGBD images. We also demonstrate how the geometric representation of gDPM can help us leverage depth data during training and constrain the latent model learning problem. But most importantly, a joint geometric and appearance based representation not only allows us to achieve state-of-the-art results on object detection but also allows us to tackle the grand challenge of understanding 3D objects from 2D images.
6 0.66640526 107 iccv-2013-Deformable Part Descriptors for Fine-Grained Recognition and Attribute Prediction
7 0.66281861 179 iccv-2013-From Subcategories to Visual Composites: A Multi-level Framework for Object Detection
8 0.66119182 61 iccv-2013-Beyond Hard Negative Mining: Efficient Detector Learning via Block-Circulant Decomposition
9 0.63674039 189 iccv-2013-HOGgles: Visualizing Object Detection Features
10 0.62749684 277 iccv-2013-Multi-channel Correlation Filters
11 0.6178267 349 iccv-2013-Regionlets for Generic Object Detection
12 0.59395844 269 iccv-2013-Modeling Occlusion by Discriminative AND-OR Structures
13 0.58869541 411 iccv-2013-Symbiotic Segmentation and Part Localization for Fine-Grained Categorization
14 0.58369648 286 iccv-2013-NYC3DCars: A Dataset of 3D Vehicles in Geographic Context
15 0.58263105 406 iccv-2013-Style-Aware Mid-level Representation for Discovering Visual Connections in Space and Time
16 0.57925105 109 iccv-2013-Detecting Avocados to Zucchinis: What Have We Done, and Where Are We Going?
17 0.57260996 327 iccv-2013-Predicting an Object Location Using a Global Image Representation
18 0.56955379 249 iccv-2013-Learning to Share Latent Tasks for Action Recognition
19 0.5522548 104 iccv-2013-Decomposing Bag of Words Histograms
20 0.54985464 248 iccv-2013-Learning to Rank Using Privileged Information
topicId topicWeight
[(2, 0.078), (4, 0.043), (7, 0.025), (12, 0.02), (13, 0.015), (26, 0.08), (31, 0.069), (35, 0.023), (40, 0.014), (42, 0.079), (48, 0.014), (55, 0.019), (64, 0.051), (73, 0.031), (74, 0.119), (89, 0.186), (98, 0.025)]
simIndex simValue paperId paperTitle
same-paper 1 0.90852815 426 iccv-2013-Training Deformable Part Models with Decorrelated Features
Author: Ross Girshick, Jitendra Malik
Abstract: In this paper, we show how to train a deformable part model (DPM) fast—typically in less than 20 minutes, or four times faster than the current fastest method—while maintaining high average precision on the PASCAL VOC datasets. At the core of our approach is “latent LDA,” a novel generalization of linear discriminant analysis for learning latent variable models. Unlike latent SVM, latent LDA uses efficient closed-form updates and does not require an expensive search for hard negative examples. Our approach also acts as a springboard for a detailed experimental study of DPM training. We isolate and quantify the impact of key training factors for the first time (e.g., How important are discriminative SVM filters? How important is joint parameter estimation? How many negative images are needed for training?). Our findings yield useful insights for researchers working with Markov random fields and partbased models, and have practical implications for speeding up tasks such as model selection.
Author: Masakazu Iwamura, Tomokazu Sato, Koichi Kise
Abstract: Approximate nearest neighbor search (ANNS) is a basic and important technique used in many tasks such as object recognition. It involves two processes: selecting nearest neighbor candidates and performing a brute-force search of these candidates. Only the former though has scope for improvement. In most existing methods, it approximates the space by quantization. It then calculates all the distances between the query and all the quantized values (e.g., clusters or bit sequences), and selects a fixed number of candidates close to the query. The performance of the method is evaluated based on accuracy as a function of the number of candidates. This evaluation seems rational but poses a serious problem; it ignores the computational cost of the process of selection. In this paper, we propose a new ANNS method that takes into account costs in the selection process. Whereas existing methods employ computationally expensive techniques such as comparative sort and heap, the proposed method does not. This realizes a significantly more efficient search. We have succeeded in reducing computation times by one-third compared with the state-of-the- art on an experiment using 100 million SIFT features.
3 0.89109695 82 iccv-2013-Compensating for Motion during Direct-Global Separation
Author: Supreeth Achar, Stephen T. Nuske, Srinivasa G. Narasimhan
Abstract: Separating the direct and global components of radiance can aid shape recovery algorithms and can provide useful information about materials in a scene. Practical methods for finding the direct and global components use multiple images captured under varying illumination patterns and require the scene, light source and camera to remain stationary during the image acquisition process. In this paper, we develop a motion compensation method that relaxes this condition and allows direct-global separation to beperformed on video sequences of dynamic scenes captured by moving projector-camera systems. Key to our method is being able to register frames in a video sequence to each other in the presence of time varying, high frequency active illumination patterns. We compare our motion compensated method to alternatives such as single shot separation and frame interleaving as well as ground truth. We present results on challenging video sequences that include various types of motions and deformations in scenes that contain complex materials like fabric, skin, leaves and wax.
4 0.88845015 300 iccv-2013-Optical Flow via Locally Adaptive Fusion of Complementary Data Costs
Author: Tae Hyun Kim, Hee Seok Lee, Kyoung Mu Lee
Abstract: Many state-of-the-art optical flow estimation algorithms optimize the data and regularization terms to solve ill-posed problems. In this paper, in contrast to the conventional optical flow framework that uses a single or fixed data model, we study a novel framework that employs locally varying data term that adaptively combines different multiple types of data models. The locally adaptive data term greatly reduces the matching ambiguity due to the complementary nature of the multiple data models. The optimal number of complementary data models is learnt by minimizing the redundancy among them under the minimum description length constraint (MDL). From these chosen data models, a new optical flow estimation energy model is designed with the weighted sum of the multiple data models, and a convex optimization-based highly effective and practical solution thatfinds the opticalflow, as well as the weights isproposed. Comparative experimental results on the Middlebury optical flow benchmark show that the proposed method using the complementary data models outperforms the state-ofthe art methods.
5 0.88749057 239 iccv-2013-Learning Hash Codes with Listwise Supervision
Author: Jun Wang, Wei Liu, Andy X. Sun, Yu-Gang Jiang
Abstract: Hashing techniques have been intensively investigated in the design of highly efficient search engines for largescale computer vision applications. Compared with prior approximate nearest neighbor search approaches like treebased indexing, hashing-based search schemes have prominent advantages in terms of both storage and computational efficiencies. Moreover, the procedure of devising hash functions can be easily incorporated into sophisticated machine learning tools, leading to data-dependent and task-specific compact hash codes. Therefore, a number of learning paradigms, ranging from unsupervised to supervised, have been applied to compose appropriate hash functions. How- ever, most of the existing hash function learning methods either treat hash function design as a classification problem or generate binary codes to satisfy pairwise supervision, and have not yet directly optimized the search accuracy. In this paper, we propose to leverage listwise supervision into a principled hash function learning framework. In particular, the ranking information is represented by a set of rank triplets that can be used to assess the quality of ranking. Simple linear projection-based hash functions are solved efficiently through maximizing the ranking quality over the training data. We carry out experiments on large image datasets with size up to one million and compare with the state-of-the-art hashing techniques. The extensive results corroborate that our learned hash codes via listwise supervision can provide superior search accuracy without incurring heavy computational overhead.
6 0.87921536 122 iccv-2013-Distributed Low-Rank Subspace Segmentation
7 0.86847699 107 iccv-2013-Deformable Part Descriptors for Fine-Grained Recognition and Attribute Prediction
8 0.86668819 137 iccv-2013-Efficient Salient Region Detection with Soft Image Abstraction
9 0.86657917 73 iccv-2013-Class-Specific Simplex-Latent Dirichlet Allocation for Image Classification
10 0.86655724 376 iccv-2013-Scene Text Localization and Recognition with Oriented Stroke Detection
11 0.8633253 158 iccv-2013-Fast High Dimensional Vector Multiplication Face Recognition
12 0.86316538 315 iccv-2013-PhotoOCR: Reading Text in Uncontrolled Conditions
13 0.86256552 95 iccv-2013-Cosegmentation and Cosketch by Unsupervised Learning
14 0.86162031 361 iccv-2013-Robust Trajectory Clustering for Motion Segmentation
15 0.8606782 71 iccv-2013-Category-Independent Object-Level Saliency Detection
16 0.85993248 411 iccv-2013-Symbiotic Segmentation and Part Localization for Fine-Grained Categorization
17 0.85903907 236 iccv-2013-Learning Discriminative Part Detectors for Image Classification and Cosegmentation
18 0.85870838 349 iccv-2013-Regionlets for Generic Object Detection
19 0.85863876 404 iccv-2013-Structured Forests for Fast Edge Detection
20 0.8586033 406 iccv-2013-Style-Aware Mid-level Representation for Discovering Visual Connections in Space and Time