cvpr cvpr2013 cvpr2013-383 knowledge-graph by maker-knowledge-mining

383 cvpr-2013-Seeking the Strongest Rigid Detector


Source: pdf

Author: Rodrigo Benenson, Markus Mathias, Tinne Tuytelaars, Luc Van_Gool

Abstract: The current state of the art solutions for object detection describe each class by a set of models trained on discovered sub-classes (so called “components ”), with each model itself composed of collections of interrelated parts (deformable models). These detectors build upon the now classic Histogram of Oriented Gradients+linear SVM combo. In this paper we revisit some of the core assumptions in HOG+SVM and show that by properly designing the feature pooling, feature selection, preprocessing, and training methods, it is possible to reach top quality, at least for pedestrian detections, using a single rigid component. We provide experiments for a large design space, that give insights into the design of classifiers, as well as relevant information for practitioners. Our best detector is fully feed-forward, has a single unified architecture, uses only histograms of oriented gradients and colour information in monocular static images, and improves over 23 other methods on the INRIA, ETHand Caltech-USA datasets, reducing the average miss-rate over HOG+SVM by more than 30%.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Seeking the strongest rigid detector Rodrigo Benenson∗†‡ Markus Mathias∗† Tinne Tuytelaars† Luc Van Gool† † ESAT-PSI-VISICS/IBBT, Katholieke Universiteit Leuven, Belgium firstname . [sent-1, score-0.247]

2 In this paper we revisit some of the core assumptions in HOG+SVM and show that by properly designing the feature pooling, feature selection, preprocessing, and training methods, it is possible to reach top quality, at least for pedestrian detections, using a single rigid component. [sent-6, score-0.292]

3 These rigid classifiers are built using the now classic HOG+SVM (histogram of oriented gradients, plus linear support vector machine) detector, introduced by Dalal and Triggs [3]. [sent-13, score-0.232]

4 de false positives per image Figure 1: Progress obtained from each section of the paper. [sent-19, score-0.24]

5 From an already strong baseline up to our final detector, on the INRIA dataset. [sent-20, score-0.17]

6 In this paper we propose to revisit this low-level rigid detector. [sent-23, score-0.141]

7 By reconsidering some of its assumptions and design choices, we show that it is possible to have significant quality improvements, reaching state of the art quality on par with flexible part models, while still using only HOG + colour information, in a single rigid model. [sent-24, score-0.482]

8 We provide results per image (instead of per window [3, 5]), and perform evaluations on large datasets. [sent-26, score-0.165]

9 Our final single component rigid classifier reaches record results on INRIA, ETH and Caltech-USA datasets, providing an average miss-rate reduction of more than 30% over HOG+ SVM . [sent-27, score-0.199]

10 Our work is based on the integral channel features detector family, introduced by Dollár et al. [sent-28, score-0.312]

11 Before detailing how our detector works, we give an appetizer by highlighting how our approach differs from HOG+SVM [3] and how it relates/contrasts to more re- cent work. [sent-31, score-0.152]

12 Irregular cells The HOG classifier builds its descriptor by computing histograms over regular square cells of fixed size. [sent-32, score-0.295]

13 Some use no normalization [2], some use local normalization [1, 3, 8]. [sent-38, score-0.334]

14 In HOG+SVM [3] the importance of using local normalization at multiple scales is emphasized. [sent-39, score-0.211]

15 We discover that global normalization can be surprisingly effective. [sent-41, score-0.167]

16 Feature channels The integral channel features framework [5] (see section 2) enables the use of multiple kinds of “channels” (low level pixel-wise features). [sent-42, score-0.223]

17 We constrain ourselves to using only HOG and LUV colour channels, because these are still close to the original HOG+SVM work, and at the same time were validated as best performing [5]. [sent-43, score-0.139]

18 In our case the classifier is non-linear due to the use of de- cision trees as weak learners over HOG features. [sent-46, score-0.24]

19 Multiple kinds of non-linearities on top of HOG have been explored in the past, including neural network sigmoid functions (and variants) [14], as well as different kinds of stumps and decision trees [5, 10]. [sent-47, score-0.145]

20 Multi-scale model Using multiple scales has been shown to improve quality [12], and also speed [2]. [sent-49, score-0.143]

21 Speed Although HOG+SVM is not particularly slow, the specific normalization mechanism used, and the use of high dimensional vectors hinder speed. [sent-58, score-0.167]

22 Single rigid template In this paper we focus on the lower level of more sophisticated part and component-based classifiers. [sent-60, score-0.141]

23 Just as in the original HOG+SVM we use a single rigid template per candidate detection window. [sent-61, score-0.2]

24 For detection we only use a single static colour image. [sent-62, score-0.139]

25 In our experiments we evaluate false positives per image (FPPI), while Dollár’s are done per window (FPPW), which is flawed, as argued in [6]. [sent-72, score-0.344]

26 Integral channel features classifier Our starting point is the Integral Channel Features detector [5]. [sent-80, score-0.299]

27 Given an input image, a set of gradient and colour “channels” are computed (pixel-wise transforms). [sent-82, score-0.139]

28 Like VJ these rectangular regions are then selected and assembled in a set of weak classifiers using boosting. [sent-85, score-0.23]

29 The final strong classifier is a linear combination of the weak classifiers. [sent-86, score-0.256]

30 Without further specifications, the Integral Channel Features detector describes a family of detectors. [sent-87, score-0.152]

31 ChnFtrs detector Typically VJ provides low quality (see figure 7) because it is applied directly over the image intensity channels. [sent-88, score-0.251]

32 showed that by applying a similar approach over oriented gradients, the quality improves drastically. [sent-90, score-0.147]

33 Amongst the different designs explored, they propose to select the so called ChnFt rs detector [5]. [sent-91, score-0.353]

34 HOG and LUV channels are computed (10 chan- nels in total), and 30 000 random rectangles are used as feature pool. [sent-92, score-0.215]

35 To make things faster (with no significant impact on quality), the coordinates of the feature pool and of the candidate detection windows can be quantized by a factor 4 (so called “shrinking factor”). [sent-97, score-0.151]

36 VeryFast detector One peculiarity of the VJ approach, is that the detector runs directly on the input image, without the need to resize the image nor recompute features (and integral images) multiple times at different scales. [sent-100, score-0.388]

37 [2] achieves this too, while at the same time improving on the quality of the ChnFt rs . [sent-102, score-0.3]

38 By learning models at multiple canonical scales, this detector better exploits the information available in the image, leading to better detection. [sent-103, score-0.152]

39 The starting point for our work is the open source release of the VeryFast detector [2]. [sent-105, score-0.152]

40 Our strong baseline The training of ChnFt rs includes a randomness factor in the selection of candidate features. [sent-107, score-0.427]

41 To avoid this source of variability we build a deterministic baseline named SquaresChnFt rs, where the feature pool is composed of all the squares that fit inside the model window. [sent-108, score-0.17]

42 Our baseline already beats a dozen of others approaches (see section 9). [sent-110, score-0.19]

43 Despite their good results the ChnFt rs and VeryFast detectors still leave a large number of free design parameters, such as: how to select the rectangular features? [sent-112, score-0.292]

44 Experimental setup For evaluation we use the Caltech pedestrian detection benchmark, version 3. [sent-120, score-0.141]

45 333666666866 false positives per image (a) Here all models use a model window of size 64 128 pixels. [sent-140, score-0.283]

46 false positives per image (b) Here all models (but HOG) Figure 3: Detector quality on use a window of size 128 INRIA using different feature pool settings. [sent-141, score-0.489]

47 All curves The feature pool (set of rectangles) used to construct the weak learners is one of the key design choices during the training of ChnFt rs. [sent-147, score-0.414]

48 It is known that having a proper feature pool impacts quality [9]. [sent-148, score-0.206]

49 Unfortunately, the space of rectangles inside a 64 128 pixels nmaotdeleyl, ,i tsh very large f(e rveecnta nwghleens using a shrinking f8ac ptoixr olfs 4). [sent-153, score-0.181]

50 Even worse, when training a multi-scales detector (such as VeryFast ) , models of twice and four times the size need to be trained, making the set of all possible rectangles explode in size. [sent-154, score-0.315]

51 RandomSymmet ric 3 0k: We hypothesise that the detector might benefit from comparing the same feature across different channels, or in the same channel using reflection symmetry across the vertical axis. [sent-159, score-0.268]

52 We generate 150 random rectangles on a single channel, mirror them and copy these 300 features in all 10 channels. [sent-160, score-0.152]

53 Everything e8ls×e 8is just as pino sCihtinonFetd rs . [sent-163, score-0.201]

54 Square sChnFt rs Al l This is the strong baseline de: scribed in section 2. [sent-167, score-0.298]

55 AllFeature s : Using 90 Gbyte, 16 cores, on a GPU enabled server, we were able to run an experiment using all rectangles for a 64 128 pixels model (with shrinking factor r4e). [sent-171, score-0.227]

56 Square sChnFt rs 3 0 k: Like Square sChnFt rs 333666666977 false positives per image Figure 4: Detector quality when using different feature normalization schemes. [sent-180, score-0.908]

57 As expected, using Al lFeature s is the best choice when possible, Square sChnFt rs Al l being a close second best. [sent-183, score-0.201]

58 Note that all the curves are significantly better than HOG+SVM, and that Square sChnFt rs -8x8 is not particularly effective. [sent-187, score-0.237]

59 Conclusion Having a good coverage of the feature space seems directly related to the quality of the final detector. [sent-189, score-0.23]

60 Depending on your available computing resources we recommend Al lFeature s > Square sChnFt rs Al l >Random++. [sent-190, score-0.201]

61 In the re-implementation of the ChnFt rs detector done by Benenson et al. [sent-193, score-0.353]

62 Since our baseline detector is based solely on thresholds, injecting some invariance to illumination changes seems reasonable. [sent-197, score-0.313]

63 Experiments All experiments are done on top of the Square sChnFt rs classifier (see section 2) . [sent-198, score-0.272]

64 The IN- false positives per image Figure 5: Which weak classifier to use? [sent-199, score-0.429]

65 Therefore, we choose to report experiments on ETH, where the effect of normalization is more pronounced. [sent-202, score-0.167]

66 We use the “automatic colour equalization” algorithm (ACE ) which provides better results than a simple GreyWorld equalization [13]. [sent-206, score-0.185]

67 LocalNormal i at ion: For local normalization we z look for a more principled approach than the proposal of [5, addendum]. [sent-207, score-0.167]

68 We follow the normalization employed by [1, equation 19], where the gradient orientation features are normalized by the gradient magnitude in the same area. [sent-208, score-0.167]

69 ’s normalization produced results worse than LocalNormal i at ion. [sent-210, score-0.167]

70 z Analysis As expected, normalization improves over using no normalization. [sent-211, score-0.167]

71 Surprisingly even a simple global normalization such as GreyWorld already provides an important gain. [sent-212, score-0.207]

72 In our setup, standard normalization schemes are puzzlingly ineffective. [sent-213, score-0.167]

73 In other experiments we have validated that global normalization improves detections even when using only monochromatic images. [sent-214, score-0.167]

74 Conclusion Simple global normalization such as GreyWorld are very effective and should not be disregarded. [sent-215, score-0.167]

75 Choosing the kind of weak classifier and its weak learner, are an important design decision. [sent-220, score-0.357]

76 Experiments S ingle Stump: we train Square sChnFt rs, using 333666667088 false positives per image Figure 6: Which training method to use? [sent-221, score-0.292]

77 6 000 stumps (same number of stumps as in the baseline). [sent-222, score-0.216]

78 Using slightly more discriminative weak classifiers seems not to improve quality. [sent-231, score-0.252]

79 The variants 2 k/8 k indicate the number of weak learners used in each case. [sent-242, score-0.211]

80 Making the classifier longer seems not to improve the quality either. [sent-244, score-0.268]

81 Conclusion It seems that changing features and weak classifiers (e. [sent-245, score-0.252]

82 The INRIA dataset has been regularly used for training pedestrian detectors [6, table 2], despite being quite small by today’s standards. [sent-252, score-0.202]

83 During the bootstrapping stages of learning, we observe that the baseline fails to find the desired amount of false positives (5 000 per round), pointing out the need for a richer set of negative images. [sent-253, score-0.303]

84 T pihxies iss sthheo uleldve alp pofe jitter we test, using 9 random samples per training pedestrian. [sent-257, score-0.216]

85 Pede st riansNegat ives : we have observed that false positives occur frequently on the legs of large pedestrians. [sent-258, score-0.179]

86 It seems that the original annsn thoeta tqiounalsi already c oonnt IaNinR enough eneamtusra thl jitter aonridadding more only hurts performance. [sent-263, score-0.2]

87 Conclusion The vanilla INRIA training data seems to hold as a good choice for top performance. [sent-264, score-0.15]

88 The Roerei detector Given the lessons learned from the previous sections, we proceed to build our final strong classifier. [sent-266, score-0.269]

89 Note that our baseline already beats 18 methods on INRIA, and 13 on ETH; including the original ChnFt rs detector. [sent-274, score-0.341]

90 For scale 1we use Al lFeature s, for scale 2 Square sChnFt rs Al l,and for the last two scales RandomSymmet ric++. [sent-276, score-0.245]

91 We use 55 scales in all cases, and only change the search range amongst each dataset to reflect the different sizes of pedestrians on each scenario. [sent-278, score-0.183]

92 Our final detector shows a significant improvement over the previous best methods on these datasets, having an homogeneous (single stage) architecture, and evaluating a single rigid model for each candidate window. [sent-279, score-0.324]

93 On Caltech-USA we improve over Mult iRe sC [12] which also uses multi-scale models (but evaluates multiple models per window), and uses deformable parts, a weak geometric prior of the ground plane, and was trained over the Caltech training data using a sophisticated latent-SVM. [sent-281, score-0.317]

94 Training time As discussed in section 4, training with the best feature pool puts pressure on the memory usage. [sent-282, score-0.192]

95 The ACE global normalization is rather slow (5 seconds per frame), but the GreyWorld normalization is much faster (can be computed in less than 10 milliseconds) and provides similar benefits. [sent-287, score-0.395]

96 Using the soft-cascade employed in [2] or the cross-talk cascade of [4], the Roerei detector should run comfortably in 333666777200 the range 5 − 20 Hz. [sent-289, score-0.152]

97 Already our strong baseline Square sChnFt rs obtains better results. [sent-295, score-0.298]

98 Despite usual claims on “hand-designed features”, we believe that the design space of convolutional networks is not smaller than that of the integral channel features classifiers. [sent-296, score-0.249]

99 We have provided extensive experiments and identified the aspects that improve quality over our strong baseline (feature pool, nor- malization, use of multi-scale model). [sent-304, score-0.196]

100 On the INRIA, ETH and Caltech-USA datasets, our new Roerei detector improves over methods using non-linear SVMs, more sophisticated features, geometric priors, motion information, deformable models, or deeper architectures. [sent-305, score-0.238]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('schnft', 0.392), ('chnft', 0.224), ('inria', 0.215), ('hog', 0.212), ('rs', 0.201), ('veryfast', 0.199), ('normalization', 0.167), ('detector', 0.152), ('colour', 0.139), ('doll', 0.125), ('weak', 0.118), ('alias', 0.112), ('greyworld', 0.112), ('lfeature', 0.112), ('roerei', 0.112), ('rectangles', 0.111), ('stumps', 0.108), ('square', 0.108), ('positives', 0.107), ('pool', 0.107), ('quality', 0.099), ('pedestrian', 0.099), ('seems', 0.098), ('rigid', 0.095), ('svm', 0.095), ('pedestrians', 0.092), ('randomsymmet', 0.084), ('integral', 0.084), ('benenson', 0.083), ('adaboost', 0.077), ('channel', 0.076), ('addendum', 0.075), ('false', 0.072), ('classifier', 0.071), ('shrinking', 0.07), ('eth', 0.068), ('channels', 0.063), ('al', 0.063), ('baseline', 0.063), ('jitter', 0.062), ('per', 0.061), ('cells', 0.058), ('allfeature', 0.056), ('eblearn', 0.056), ('iontree', 0.056), ('localnormal', 0.056), ('pede', 0.056), ('riansnegat', 0.056), ('ace', 0.056), ('fppi', 0.054), ('classic', 0.053), ('training', 0.052), ('regularly', 0.051), ('learners', 0.051), ('supplementary', 0.051), ('deep', 0.05), ('design', 0.05), ('dozen', 0.05), ('scrambled', 0.05), ('smal', 0.05), ('lessons', 0.05), ('vj', 0.049), ('boosting', 0.048), ('oriented', 0.048), ('architecture', 0.047), ('amongst', 0.047), ('revisit', 0.046), ('enabled', 0.046), ('luv', 0.046), ('fppw', 0.046), ('equalization', 0.046), ('sophisticated', 0.046), ('scales', 0.044), ('candidate', 0.044), ('mls', 0.043), ('window', 0.043), ('setup', 0.042), ('variants', 0.042), ('appel', 0.041), ('irregular', 0.041), ('random', 0.041), ('rectangular', 0.041), ('already', 0.04), ('deformable', 0.04), ('ric', 0.04), ('mathias', 0.04), ('convolutional', 0.039), ('gpu', 0.038), ('beats', 0.037), ('kavukcuoglu', 0.037), ('decision', 0.037), ('curves', 0.036), ('classifiers', 0.036), ('assembled', 0.035), ('gradients', 0.035), ('strong', 0.034), ('greedy', 0.034), ('randomness', 0.033), ('puts', 0.033), ('final', 0.033)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999946 383 cvpr-2013-Seeking the Strongest Rigid Detector

Author: Rodrigo Benenson, Markus Mathias, Tinne Tuytelaars, Luc Van_Gool

Abstract: The current state of the art solutions for object detection describe each class by a set of models trained on discovered sub-classes (so called “components ”), with each model itself composed of collections of interrelated parts (deformable models). These detectors build upon the now classic Histogram of Oriented Gradients+linear SVM combo. In this paper we revisit some of the core assumptions in HOG+SVM and show that by properly designing the feature pooling, feature selection, preprocessing, and training methods, it is possible to reach top quality, at least for pedestrian detections, using a single rigid component. We provide experiments for a large design space, that give insights into the design of classifiers, as well as relevant information for practitioners. Our best detector is fully feed-forward, has a single unified architecture, uses only histograms of oriented gradients and colour information in monocular static images, and improves over 23 other methods on the INRIA, ETHand Caltech-USA datasets, reducing the average miss-rate over HOG+SVM by more than 30%.

2 0.2241556 328 cvpr-2013-Pedestrian Detection with Unsupervised Multi-stage Feature Learning

Author: Pierre Sermanet, Koray Kavukcuoglu, Soumith Chintala, Yann Lecun

Abstract: Pedestrian detection is a problem of considerable practical interest. Adding to the list of successful applications of deep learning methods to vision, we report state-of-theart and competitive results on all major pedestrian datasets with a convolutional network model. The model uses a few new twists, such as multi-stage features, connections that skip layers to integrate global shape information with local distinctive motif information, and an unsupervised method based on convolutional sparse coding to pre-train the filters at each stage.

3 0.21531658 398 cvpr-2013-Single-Pedestrian Detection Aided by Multi-pedestrian Detection

Author: Wanli Ouyang, Xiaogang Wang

Abstract: In this paper, we address the challenging problem of detecting pedestrians who appear in groups and have interaction. A new approach is proposed for single-pedestrian detection aided by multi-pedestrian detection. A mixture model of multi-pedestrian detectors is designed to capture the unique visual cues which are formed by nearby multiple pedestrians but cannot be captured by single-pedestrian detectors. A probabilistic framework is proposed to model the relationship between the configurations estimated by single- and multi-pedestrian detectors, and to refine the single-pedestrian detection result with multi-pedestrian detection. It can integrate with any single-pedestrian detector without significantly increasing the computation load. 15 state-of-the-art single-pedestrian detection approaches are investigated on three widely used public datasets: Caltech, TUD-Brussels andETH. Experimental results show that our framework significantly improves all these approaches. The average improvement is 9% on the Caltech-Test dataset, 11% on the TUD-Brussels dataset and 17% on the ETH dataset in terms of average miss rate. The lowest average miss rate is reduced from 48% to 43% on the Caltech-Test dataset, from 55% to 50% on the TUD-Brussels dataset and from 51% to 41% on the ETH dataset.

4 0.16996393 158 cvpr-2013-Exploring Weak Stabilization for Motion Feature Extraction

Author: Dennis Park, C. Lawrence Zitnick, Deva Ramanan, Piotr Dollár

Abstract: We describe novel but simple motion features for the problem of detecting objects in video sequences. Previous approaches either compute optical flow or temporal differences on video frame pairs with various assumptions about stabilization. We describe a combined approach that uses coarse-scale flow and fine-scale temporal difference features. Our approach performs weak motion stabilization by factoring out camera motion and coarse object motion while preserving nonrigid motions that serve as useful cues for recognition. We show results for pedestrian detection and human pose estimation in video sequences, achieving state-of-the-art results in both. In particular, given a fixed detection rate our method achieves a five-fold reduction in false positives over prior art on the Caltech Pedestrian benchmark. Finally, we perform extensive diagnostic experiments to reveal what aspects of our system are crucial for good performance. Proper stabilization, long time-scale features, and proper normalization are all critical.

5 0.16368875 363 cvpr-2013-Robust Multi-resolution Pedestrian Detection in Traffic Scenes

Author: Junjie Yan, Xucong Zhang, Zhen Lei, Shengcai Liao, Stan Z. Li

Abstract: The serious performance decline with decreasing resolution is the major bottleneck for current pedestrian detection techniques [14, 23]. In this paper, we take pedestrian detection in different resolutions as different but related problems, and propose a Multi-Task model to jointly consider their commonness and differences. The model contains resolution aware transformations to map pedestrians in different resolutions to a common space, where a shared detector is constructed to distinguish pedestrians from background. For model learning, we present a coordinate descent procedure to learn the resolution aware transformations and deformable part model (DPM) based detector iteratively. In traffic scenes, there are many false positives located around vehicles, therefore, we further build a context model to suppress them according to the pedestrian-vehicle relationship. The context model can be learned automatically even when the vehicle annotations are not available. Our method reduces the mean miss rate to 60% for pedestrians taller than 30 pixels on the Caltech Pedestrian Benchmark, which noticeably outperforms previous state-of-the-art (71%).

6 0.13700514 204 cvpr-2013-Histograms of Sparse Codes for Object Detection

7 0.13653155 248 cvpr-2013-Learning Collections of Part Models for Object Recognition

8 0.13579203 388 cvpr-2013-Semi-supervised Learning of Feature Hierarchies for Object Detection in a Video

9 0.12678884 217 cvpr-2013-Improving an Object Detector and Extracting Regions Using Superpixels

10 0.12404704 167 cvpr-2013-Fast Multiple-Part Based Object Detection Using KD-Ferns

11 0.11581483 254 cvpr-2013-Learning SURF Cascade for Fast and Accurate Object Detection

12 0.11410064 318 cvpr-2013-Optimized Pedestrian Detection for Multiple and Occluded People

13 0.10852076 142 cvpr-2013-Efficient Detector Adaptation for Object Detection in a Video

14 0.10213653 288 cvpr-2013-Modeling Mutual Visibility Relationship in Pedestrian Detection

15 0.097387828 163 cvpr-2013-Fast, Accurate Detection of 100,000 Object Classes on a Single Machine

16 0.092456639 67 cvpr-2013-Blocks That Shout: Distinctive Parts for Scene Classification

17 0.089291617 433 cvpr-2013-Top-Down Segmentation of Non-rigid Visual Objects Using Derivative-Based Search on Sparse Manifolds

18 0.088775776 122 cvpr-2013-Detection Evolution with Multi-order Contextual Co-occurrence

19 0.086138874 386 cvpr-2013-Self-Paced Learning for Long-Term Tracking

20 0.085489735 173 cvpr-2013-Finding Things: Image Parsing with Regions and Per-Exemplar Detectors


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.203), (1, -0.047), (2, -0.001), (3, -0.049), (4, 0.067), (5, 0.022), (6, 0.087), (7, 0.022), (8, 0.012), (9, -0.036), (10, -0.119), (11, -0.08), (12, 0.123), (13, -0.198), (14, 0.077), (15, 0.001), (16, -0.087), (17, -0.008), (18, 0.021), (19, 0.046), (20, -0.038), (21, -0.039), (22, -0.094), (23, 0.066), (24, -0.057), (25, 0.047), (26, -0.004), (27, 0.062), (28, 0.02), (29, 0.081), (30, -0.033), (31, 0.023), (32, -0.008), (33, -0.025), (34, -0.03), (35, -0.002), (36, -0.03), (37, -0.015), (38, -0.071), (39, 0.027), (40, 0.046), (41, 0.019), (42, -0.028), (43, -0.038), (44, 0.044), (45, 0.008), (46, 0.017), (47, 0.019), (48, 0.005), (49, -0.075)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94391304 383 cvpr-2013-Seeking the Strongest Rigid Detector

Author: Rodrigo Benenson, Markus Mathias, Tinne Tuytelaars, Luc Van_Gool

Abstract: The current state of the art solutions for object detection describe each class by a set of models trained on discovered sub-classes (so called “components ”), with each model itself composed of collections of interrelated parts (deformable models). These detectors build upon the now classic Histogram of Oriented Gradients+linear SVM combo. In this paper we revisit some of the core assumptions in HOG+SVM and show that by properly designing the feature pooling, feature selection, preprocessing, and training methods, it is possible to reach top quality, at least for pedestrian detections, using a single rigid component. We provide experiments for a large design space, that give insights into the design of classifiers, as well as relevant information for practitioners. Our best detector is fully feed-forward, has a single unified architecture, uses only histograms of oriented gradients and colour information in monocular static images, and improves over 23 other methods on the INRIA, ETHand Caltech-USA datasets, reducing the average miss-rate over HOG+SVM by more than 30%.

2 0.92041105 363 cvpr-2013-Robust Multi-resolution Pedestrian Detection in Traffic Scenes

Author: Junjie Yan, Xucong Zhang, Zhen Lei, Shengcai Liao, Stan Z. Li

Abstract: The serious performance decline with decreasing resolution is the major bottleneck for current pedestrian detection techniques [14, 23]. In this paper, we take pedestrian detection in different resolutions as different but related problems, and propose a Multi-Task model to jointly consider their commonness and differences. The model contains resolution aware transformations to map pedestrians in different resolutions to a common space, where a shared detector is constructed to distinguish pedestrians from background. For model learning, we present a coordinate descent procedure to learn the resolution aware transformations and deformable part model (DPM) based detector iteratively. In traffic scenes, there are many false positives located around vehicles, therefore, we further build a context model to suppress them according to the pedestrian-vehicle relationship. The context model can be learned automatically even when the vehicle annotations are not available. Our method reduces the mean miss rate to 60% for pedestrians taller than 30 pixels on the Caltech Pedestrian Benchmark, which noticeably outperforms previous state-of-the-art (71%).

3 0.90107095 398 cvpr-2013-Single-Pedestrian Detection Aided by Multi-pedestrian Detection

Author: Wanli Ouyang, Xiaogang Wang

Abstract: In this paper, we address the challenging problem of detecting pedestrians who appear in groups and have interaction. A new approach is proposed for single-pedestrian detection aided by multi-pedestrian detection. A mixture model of multi-pedestrian detectors is designed to capture the unique visual cues which are formed by nearby multiple pedestrians but cannot be captured by single-pedestrian detectors. A probabilistic framework is proposed to model the relationship between the configurations estimated by single- and multi-pedestrian detectors, and to refine the single-pedestrian detection result with multi-pedestrian detection. It can integrate with any single-pedestrian detector without significantly increasing the computation load. 15 state-of-the-art single-pedestrian detection approaches are investigated on three widely used public datasets: Caltech, TUD-Brussels andETH. Experimental results show that our framework significantly improves all these approaches. The average improvement is 9% on the Caltech-Test dataset, 11% on the TUD-Brussels dataset and 17% on the ETH dataset in terms of average miss rate. The lowest average miss rate is reduced from 48% to 43% on the Caltech-Test dataset, from 55% to 50% on the TUD-Brussels dataset and from 51% to 41% on the ETH dataset.

4 0.88038468 167 cvpr-2013-Fast Multiple-Part Based Object Detection Using KD-Ferns

Author: Dan Levi, Shai Silberstein, Aharon Bar-Hillel

Abstract: In this work we present a new part-based object detection algorithm with hundreds of parts performing realtime detection. Part-based models are currently state-ofthe-art for object detection due to their ability to represent large appearance variations. However, due to their high computational demands such methods are limited to several parts only and are too slow for practical real-time implementation. Our algorithm is an accelerated version of the “Feature Synthesis ” (FS) method [1], which uses multiple object parts for detection and is among state-of-theart methods on human detection benchmarks, but also suffers from a high computational cost. The proposed Accelerated Feature Synthesis (AFS) uses several strategies for reducing the number of locations searched for each part. The first strategy uses a novel algorithm for approximate nearest neighbor search which we developed, termed “KDFerns ”, to compare each image location to only a subset of the model parts. Candidate part locations for a specific part are further reduced using spatial inhibition, and using an object-level “coarse-to-fine ” strategy. In our empirical evaluation on pedestrian detection benchmarks, AFS main- × tains almost fully the accuracy performance of the original FS, while running more than 4 faster than existing partbased methods which use only several parts. AFS is to our best knowledge the first part-based object detection method achieving real-time running performance: nearly 10 frames per-second on 640 480 images on a regular CPU.

5 0.83380169 122 cvpr-2013-Detection Evolution with Multi-order Contextual Co-occurrence

Author: Guang Chen, Yuanyuan Ding, Jing Xiao, Tony X. Han

Abstract: Context has been playing an increasingly important role to improve the object detection performance. In this paper we propose an effective representation, Multi-Order Contextual co-Occurrence (MOCO), to implicitly model the high level context using solely detection responses from a baseline object detector. The so-called (1st-order) context feature is computed as a set of randomized binary comparisons on the response map of the baseline object detector. The statistics of the 1st-order binary context features are further calculated to construct a high order co-occurrence descriptor. Combining the MOCO feature with the original image feature, we can evolve the baseline object detector to a stronger context aware detector. With the updated detector, we can continue the evolution till the contextual improvements saturate. Using the successful deformable-partmodel detector [13] as the baseline detector, we test the proposed MOCO evolution framework on the PASCAL VOC 2007 dataset [8] and Caltech pedestrian dataset [7]: The proposed MOCO detector outperforms all known state-ofthe-art approaches, contextually boosting deformable part models (ver.5) [13] by 3.3% in mean average precision on the PASCAL 2007 dataset. For the Caltech pedestrian dataset, our method further reduces the log-average miss rate from 48% to 46% and the miss rate at 1 FPPI from 25% to 23%, compared with the best prior art [6].

6 0.78356606 328 cvpr-2013-Pedestrian Detection with Unsupervised Multi-stage Feature Learning

7 0.7674827 288 cvpr-2013-Modeling Mutual Visibility Relationship in Pedestrian Detection

8 0.7666043 318 cvpr-2013-Optimized Pedestrian Detection for Multiple and Occluded People

9 0.68983942 254 cvpr-2013-Learning SURF Cascade for Fast and Accurate Object Detection

10 0.68360257 142 cvpr-2013-Efficient Detector Adaptation for Object Detection in a Video

11 0.67009693 388 cvpr-2013-Semi-supervised Learning of Feature Hierarchies for Object Detection in a Video

12 0.66121024 401 cvpr-2013-Sketch Tokens: A Learned Mid-level Representation for Contour and Object Detection

13 0.64424217 217 cvpr-2013-Improving an Object Detector and Extracting Regions Using Superpixels

14 0.634 144 cvpr-2013-Efficient Maximum Appearance Search for Large-Scale Object Detection

15 0.61251342 204 cvpr-2013-Histograms of Sparse Codes for Object Detection

16 0.58725125 163 cvpr-2013-Fast, Accurate Detection of 100,000 Object Classes on a Single Machine

17 0.5810554 272 cvpr-2013-Long-Term Occupancy Analysis Using Graph-Based Optimisation in Thermal Imagery

18 0.57086086 15 cvpr-2013-A Lazy Man's Approach to Benchmarking: Semisupervised Classifier Evaluation and Recalibration

19 0.55790055 168 cvpr-2013-Fast Object Detection with Entropy-Driven Evaluation

20 0.55456054 158 cvpr-2013-Exploring Weak Stabilization for Motion Feature Extraction


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(1, 0.227), (10, 0.095), (16, 0.035), (26, 0.047), (28, 0.018), (33, 0.239), (67, 0.129), (69, 0.041), (80, 0.011), (87, 0.086)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.82375556 374 cvpr-2013-Saliency Aggregation: A Data-Driven Approach

Author: Long Mai, Yuzhen Niu, Feng Liu

Abstract: A variety of methods have been developed for visual saliency analysis. These methods often complement each other. This paper addresses the problem of aggregating various saliency analysis methods such that the aggregation result outperforms each individual one. We have two major observations. First, different methods perform differently in saliency analysis. Second, the performance of a saliency analysis method varies with individual images. Our idea is to use data-driven approaches to saliency aggregation that appropriately consider the performance gaps among individual methods and the performance dependence of each method on individual images. This paper discusses various data-driven approaches and finds that the image-dependent aggregation method works best. Specifically, our method uses a Conditional Random Field (CRF) framework for saliency aggregation that not only models the contribution from individual saliency map but also the interaction between neighboringpixels. To account for the dependence of aggregation on an individual image, our approach selects a subset of images similar to the input image from a training data set and trains the CRF aggregation model only using this subset instead of the whole training set. Our experiments on public saliency benchmarks show that our aggregation method outperforms each individual saliency method and is robust with the selection of aggregated methods.

same-paper 2 0.81826973 383 cvpr-2013-Seeking the Strongest Rigid Detector

Author: Rodrigo Benenson, Markus Mathias, Tinne Tuytelaars, Luc Van_Gool

Abstract: The current state of the art solutions for object detection describe each class by a set of models trained on discovered sub-classes (so called “components ”), with each model itself composed of collections of interrelated parts (deformable models). These detectors build upon the now classic Histogram of Oriented Gradients+linear SVM combo. In this paper we revisit some of the core assumptions in HOG+SVM and show that by properly designing the feature pooling, feature selection, preprocessing, and training methods, it is possible to reach top quality, at least for pedestrian detections, using a single rigid component. We provide experiments for a large design space, that give insights into the design of classifiers, as well as relevant information for practitioners. Our best detector is fully feed-forward, has a single unified architecture, uses only histograms of oriented gradients and colour information in monocular static images, and improves over 23 other methods on the INRIA, ETHand Caltech-USA datasets, reducing the average miss-rate over HOG+SVM by more than 30%.

3 0.79526854 84 cvpr-2013-Cloud Motion as a Calibration Cue

Author: Nathan Jacobs, Mohammad T. Islam, Scott Workman

Abstract: We propose cloud motion as a natural scene cue that enables geometric calibration of static outdoor cameras. This work introduces several new methods that use observations of an outdoor scene over days and weeks to estimate radial distortion, focal length and geo-orientation. Cloud-based cues provide strong constraints and are an important alternative to methods that require specific forms of static scene geometry or clear sky conditions. Our method makes simple assumptions about cloud motion and builds upon previous work on motion-based and line-based calibration. We show results on real scenes that highlight the effectiveness of our proposed methods.

4 0.78913558 2 cvpr-2013-3D Pictorial Structures for Multiple View Articulated Pose Estimation

Author: Magnus Burenius, Josephine Sullivan, Stefan Carlsson

Abstract: We consider the problem of automatically estimating the 3D pose of humans from images, taken from multiple calibrated views. We show that it is possible and tractable to extend the pictorial structures framework, popular for 2D pose estimation, to 3D. We discuss how to use this framework to impose view, skeleton, joint angle and intersection constraints in 3D. The 3D pictorial structures are evaluated on multiple view data from a professional football game. The evaluation is focused on computational tractability, but we also demonstrate how a simple 2D part detector can be plugged into the framework.

5 0.78540272 339 cvpr-2013-Probabilistic Graphlet Cut: Exploiting Spatial Structure Cue for Weakly Supervised Image Segmentation

Author: Luming Zhang, Mingli Song, Zicheng Liu, Xiao Liu, Jiajun Bu, Chun Chen

Abstract: Weakly supervised image segmentation is a challenging problem in computer vision field. In this paper, we present a new weakly supervised image segmentation algorithm by learning the distribution of spatially structured superpixel sets from image-level labels. Specifically, we first extract graphlets from each image where a graphlet is a smallsized graph consisting of superpixels as its nodes and it encapsulates the spatial structure of those superpixels. Then, a manifold embedding algorithm is proposed to transform graphlets of different sizes into equal-length feature vectors. Thereafter, we use GMM to learn the distribution of the post-embedding graphlets. Finally, we propose a novel image segmentation algorithm, called graphlet cut, that leverages the learned graphlet distribution in measuring the homogeneity of a set of spatially structured superpixels. Experimental results show that the proposed approach outperforms state-of-the-art weakly supervised image segmentation methods, and its performance is comparable to those of the fully supervised segmentation models.

6 0.78501415 254 cvpr-2013-Learning SURF Cascade for Fast and Accurate Object Detection

7 0.78358746 119 cvpr-2013-Detecting and Aligning Faces by Image Retrieval

8 0.78235114 45 cvpr-2013-Articulated Pose Estimation Using Discriminative Armlet Classifiers

9 0.78048086 238 cvpr-2013-Kernel Methods on the Riemannian Manifold of Symmetric Positive Definite Matrices

10 0.78012019 345 cvpr-2013-Real-Time Model-Based Rigid Object Pose Estimation and Tracking Combining Dense and Sparse Visual Cues

11 0.77951241 160 cvpr-2013-Face Recognition in Movie Trailers via Mean Sequence Sparse Representation-Based Classification

12 0.77728212 322 cvpr-2013-PISA: Pixelwise Image Saliency by Aggregating Complementary Appearance Contrast Measures with Spatial Priors

13 0.77613521 275 cvpr-2013-Lp-Norm IDF for Large Scale Image Search

14 0.77592421 122 cvpr-2013-Detection Evolution with Multi-order Contextual Co-occurrence

15 0.77570808 248 cvpr-2013-Learning Collections of Part Models for Object Recognition

16 0.77387291 375 cvpr-2013-Saliency Detection via Graph-Based Manifold Ranking

17 0.77283436 363 cvpr-2013-Robust Multi-resolution Pedestrian Detection in Traffic Scenes

18 0.76996219 94 cvpr-2013-Context-Aware Modeling and Recognition of Activities in Video

19 0.76984876 60 cvpr-2013-Beyond Physical Connections: Tree Models in Human Pose Estimation

20 0.76977521 225 cvpr-2013-Integrating Grammar and Segmentation for Human Pose Estimation