cvpr cvpr2013 cvpr2013-398 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Wanli Ouyang, Xiaogang Wang
Abstract: In this paper, we address the challenging problem of detecting pedestrians who appear in groups and have interaction. A new approach is proposed for single-pedestrian detection aided by multi-pedestrian detection. A mixture model of multi-pedestrian detectors is designed to capture the unique visual cues which are formed by nearby multiple pedestrians but cannot be captured by single-pedestrian detectors. A probabilistic framework is proposed to model the relationship between the configurations estimated by single- and multi-pedestrian detectors, and to refine the single-pedestrian detection result with multi-pedestrian detection. It can integrate with any single-pedestrian detector without significantly increasing the computation load. 15 state-of-the-art single-pedestrian detection approaches are investigated on three widely used public datasets: Caltech, TUD-Brussels andETH. Experimental results show that our framework significantly improves all these approaches. The average improvement is 9% on the Caltech-Test dataset, 11% on the TUD-Brussels dataset and 17% on the ETH dataset in terms of average miss rate. The lowest average miss rate is reduced from 48% to 43% on the Caltech-Test dataset, from 55% to 50% on the TUD-Brussels dataset and from 51% to 41% on the ETH dataset.
Reference: text
sentIndex sentText sentNum sentScore
1 hk , Abstract In this paper, we address the challenging problem of detecting pedestrians who appear in groups and have interaction. [sent-8, score-0.55]
2 A new approach is proposed for single-pedestrian detection aided by multi-pedestrian detection. [sent-9, score-0.293]
3 A mixture model of multi-pedestrian detectors is designed to capture the unique visual cues which are formed by nearby multiple pedestrians but cannot be captured by single-pedestrian detectors. [sent-10, score-1.006]
4 A probabilistic framework is proposed to model the relationship between the configurations estimated by single- and multi-pedestrian detectors, and to refine the single-pedestrian detection result with multi-pedestrian detection. [sent-11, score-0.472]
5 The average improvement is 9% on the Caltech-Test dataset, 11% on the TUD-Brussels dataset and 17% on the ETH dataset in terms of average miss rate. [sent-15, score-0.286]
6 The lowest average miss rate is reduced from 48% to 43% on the Caltech-Test dataset, from 55% to 50% on the TUD-Brussels dataset and from 51% to 41% on the ETH dataset. [sent-16, score-0.318]
7 Introduction Pedestrian detection is one of the most important topics in object detection and has attracted a lot of attention [2, 4, 10, 3 1, 34]. [sent-18, score-0.352]
8 Pedestrian detection is challenging when multiple pedestrians are close in space. [sent-20, score-0.698]
9 Firstly, a single-pedestrian detector tends to combine the visual cues from different pedestrians as the evidence of seeing a pedestrian and thus the detection result will drift. [sent-21, score-1.499]
10 As a result, nearby pedestrian-existing windows with lower detection scores will be eliminated by nonmaximum suppression (NMS). [sent-22, score-0.381]
11 Aided by a multi-pedestrian detector, the missed pedestrians are detected. [sent-29, score-0.551]
12 Secondly, when a pedestrian is occluded by another nearby pedestrian, its detection score may be too low to be detected. [sent-33, score-0.791]
13 On the other hand, the existence of multiple nearby pedestrians forms some unique patterns (as shown in Figure 2) which do not appear on isolated pedestrians. [sent-36, score-0.809]
14 They can be used as extra visual cues to refine the detection result of single pedestrians. [sent-37, score-0.333]
15 The motivations of this paper are two-folds: 1) It is recognizedby sociologists that nearby pedestrians walk in groups and show particular spatial patterns [17, 22]. [sent-39, score-0.92]
16 2) From the viewpoint of computer vision, these 3D spatial patterns of nearby pedestrians can be translated into unique 2D visual patterns resulting from the perspective projection of 3D pedestrians to 2D image. [sent-40, score-1.408]
17 In the second row, pedestrians on the left are occluded by pedestrians on the right. [sent-45, score-1.095]
18 Our 2-pedestrian detector captures visual cues which cannot be learned with a 1-pedestrian detector. [sent-46, score-0.306]
19 They inspire us to design a multi-pedestrian detector to capture these unique visual patterns. [sent-47, score-0.266]
20 And a multi-pedestrian window found by a multi-pedestrian detector can guide the detection of each pedestrian in this window. [sent-48, score-0.811]
21 2 as an example, when pedestrians walk side by side, they form the shoulder-to-shoulder visual pattern. [sent-50, score-0.596]
22 Taking pedestrians in the second row as another example, the right torso of pedestrians on the left are occluded by the pedestrians on the right. [sent-51, score-1.617]
23 Then the 2-pedestrian detection results are used to reinforce the evidence of detecting each of the two pedestrians. [sent-54, score-0.27]
24 1) A multi-pedestrian detector is learned with a mixture ofdeformable part-based models to effectively capture the unique visual patterns appearing in multiple nearby pedestrians. [sent-56, score-0.68]
25 The spatial configuration patterns ofmultiple nearby pedestrians are learned and clustered into mixture component. [sent-60, score-1.003]
26 2) In the multi-pedestrian detector, each single pedestrian is specifically designed as a part, called pedestrian-part. [sent-61, score-0.36]
27 4(b), the filter of a pedestrian-part is different from and complementary to a 1-pedestrian detector, since it is learned under a specific multi-pedestrian configuration and under the guidance of the multi-pedestrian detector as contextual constraints. [sent-63, score-0.491]
28 3) A new probabilistic framework is proposed to model the configuration relationship between results of multi-pedestrian detection and 1-pedestrian detection. [sent-64, score-0.437]
29 With this framework, multi-pedestrian detection results are used to refine 1-pedestrian detection results. [sent-65, score-0.411]
30 With a fast computation approach, it only adds small computing load on the top of 1- pedestrian detectors. [sent-67, score-0.36]
31 The lowest miss rate is improved from 48% to 43% on the CaltechTest dataset, from 55% to 50% on the TUD-Brussels dataset and from 51% to 41% on the ETH dataset. [sent-70, score-0.318]
32 Related Work The progress on object detection has been achieved by the investigation on classification approaches, features and articulation handling approaches. [sent-72, score-0.278]
33 3) Articulation handling approaches under investigation include Deformable partbased models (DPM) [13, 43, 27], pictorial structures [14], poselet [3] and mixture of parts [41]. [sent-75, score-0.269]
34 They usually employ context cues in two steps: 1) single-object detection results are obtained separately; and 2) the relationship between an object and its context is modeled to refine the detection result. [sent-78, score-0.685]
35 Therefore, the visual cues of seeing multiple objects are from single-object detectors instead of a multi-object detector. [sent-79, score-0.26]
36 The unique visual patterns of multiple nearby pedestrians caused by inter-occlusion and spatial constraint were not explored. [sent-80, score-0.809]
37 So it cannot model multiple pairs of pedestrians in an image. [sent-84, score-0.522]
38 In [19] the context cues are used to improve the centered object, but in our work the detections of two pedestrians are jointly estimated under a probabilistic model. [sent-85, score-0.664]
39 Our paper is different from [30] in two aspects: 1) the segmentation results of pedestrian is required from the training data in [30] while our paper only requires the bounding box information of pedestrians. [sent-87, score-0.441]
40 2) the approach in [30] uses NMS to reject the strong overlap between the 2-pedestrian detection results and the 1-pedestrian detection results (in333 111999779 compatible relationship) while this paper uses a probabilistic framework that favors the strong overlap (compatible relationship). [sent-88, score-0.441]
41 w1 = (x1, y1, s1) is the detection window at location (x1, y1) with size s1. [sent-94, score-0.272]
42 l1 represents the locations and sizes of parts if the single-object detector is DPM. [sent-95, score-0.277]
43 zc (1) where p(c) is the prior of the case when there are c nearby objects. [sent-103, score-0.364]
44 , C) nearby objects with configuration zc and capture the visual cues of zc as the context to assist the estimation of z1. [sent-107, score-0.781]
45 Implementation for pedestrian detection The framework in (1) is implemented as follows: p(z1, I) =p(I, z1|c = 1)p(c = 1)+ ? [sent-110, score-0.569]
46 p(I|z1 , zc, =c) 1is) t ihse e slitkiemlaihteoodd f roofm seeing Ie given configuratpi(oIn|sz z1, zc and c, and calculated by a c-pedestrian detector introduced in Section 4. [sent-116, score-0.524]
47 Design of the multi-pedestrian detector The location and size variation of nearby pedestrians results in the appearance variation of these pedestrians. [sent-121, score-0.923]
48 On the other hand, sociologists have found that pedestrians walking together show a few particular spatial patterns [22]. [sent-122, score-0.699]
49 We empirically show that such approximation can improve pedestrian detection performance (Section 6). [sent-124, score-0.536]
50 Considering at most two pedestrians This paper focuses on the case when c = 1 and c = 2 because of several considerations. [sent-127, score-0.522]
51 1) According to sociological studies [22], the frequency of seeing two pedestrians walking together (28% 42%) is much more than that of seeing more than two pedestrians (< 10%). [sent-128, score-1.289]
52 om a 2-pedestrian detection in (3) is used as the extra information to refine the 1-pedestrian detection result in (2). [sent-137, score-0.45]
53 The priors p(c = 1) and p(c = 2) are used as the weights to balance the 1-pedestrian detection result and the evidence from 2-pedestrian detection. [sent-138, score-0.242]
54 Since the configurations of two pedestrians are complex, we assume that they are sampled from a mixture model and m2 is the configuration mixture type. [sent-141, score-0.924]
55 w2 = (x, y, s) represents the 2-pedestrian detection window at location (x, y) with size s, and l2 represents the locations and sizes of parts in w2. [sent-142, score-0.337]
56 p(I, z1, l2 |w2 , m2) in (4) is the joint distribution of image I, configurations z1 and l2 given mixture m2 and window w2. [sent-159, score-0.252]
57 3 is obtained using (4) and is then added to 1-pedestrian detection results using (2) to obtain the refined detection result in Fig. [sent-166, score-0.352]
58 Mixture of DPM for 2-pedestrian detection w2 In order to learn the mixture type m2 = 1, . [sent-170, score-0.294]
59 1) The two pedestrians form a 2-pedestrian bounding box. [sent-174, score-0.573]
60 The relative location and size between the two pedestrians are used as features for clustering. [sent-177, score-0.555]
61 It can be seen that each detector captures a specific configuration relationship between the two pedestrians. [sent-186, score-0.412]
62 The 2-pedestrian model for a mixture type m2 consists of one root filter and five deformable part filters with deformation under the star model learned with the Latent SVM in [13]. [sent-189, score-0.309]
63 Besides, we add two extra parts that correspond to the two pedestrians in a 2-pedestrian training sample. [sent-192, score-0.62]
64 In order to transfer the knowledge of the 1-pedestrian detector to the 2-pedestrian detector, the initial filters for the two pedestrian-parts are obtained from the root filter of the 1-pedestrian detector. [sent-195, score-0.368]
65 Since the pedestrian-parts are explicitly modeled as parts in the 2-pedestrian model, the size and location ofeach pedestrian in the 2-pedestrian window are also inferred with DPM at the detection stage. [sent-200, score-0.661]
66 This is the key to build the relationship between the 2-pedestrian detection result and the 1-pedestrian detection result. [sent-201, score-0.457]
67 Given the mixture model m2, p(w2 |m2) in (4) can be densely sampled from the image in a slid|imng window manner with varying window sizes. [sent-205, score-0.244]
68 To represent the relationship between the pedestrian-part and the single-pedestrian detection result, we introduce a hidden variable h. [sent-206, score-0.281]
69 h = 0 when the left pedestrian-part in l2 is considered to match the single pedestrian with configuration z1, and h = 1when the right pedestrian-part matches the single pedestrian. [sent-207, score-0.455]
70 (a) Examples of 2-Pedestrian detectors learned for differ- ent clusters, (b) pedestrian-part filter and the single-pedestrian root filter in [13]. [sent-215, score-0.292]
71 (1a): root filter; (2a): three part filters found from root filter; (3a): pedestrian-part filters; (4a): examples detected by the detectors in the same rows. [sent-216, score-0.269]
72 The φb(I; l2 , w2, m2 , h) = λp in (6) is obtained from the pedestrian-part score, which is used as extra information to refine 1-pedestrian detection result. [sent-225, score-0.274]
73 Modeling the relationship between 2- and 1pedestrian detection results With the pedestrian-parts designed in the 2-pedestrian detector, this relationship becomes matching the pedestrianpart in the 2-pedestrian detector with the 1-pedestrian de- tection result. [sent-230, score-0.638]
74 Since there should not be any 2pedestrian window in which no pedestrian is found, the 2pedestrian detector can be evaluated only around Cand1 1-pedestrian candidate windows to save computation (i. [sent-253, score-0.764]
75 we assume that if two nearby pedestrians exist, at least one pedestrian will be detected by the single-pedestrian detector around this region). [sent-255, score-1.25]
76 Since the detection scores of multi-pedestrian detector and 1-pedestrian detector are considered as input, the framework keeps unchanged if other detection models or features are used for 1-pedestrian detector or 2-pedestrian detector. [sent-279, score-1.021]
77 Existing pedestrian detection results can be directly used as the input of our framework. [sent-280, score-0.536]
78 Our framework using LatSVM-V2 as the 1-pedestrian detector is denoted as LatSVM-V2+Our in the experimental results. [sent-283, score-0.245]
79 Other single-pedestrian detectors trained with different models, features and datasets are also integrated with our 2-pedestrian detector and compared in Section 6. [sent-284, score-0.338]
80 As in [10], the log-average miss rate is used to summarize the detector performance, and is computed by averaging the miss rate at nine FPPI rates evenly spaced in the log-space in the range from 10−2 to 100. [sent-288, score-0.71]
81 It consists of pedestrians of ≥ 50 pixels nin o height, wtahsoet are fully svisistisbl oef or ldeessst tihanans 3o f5% ≥ occluded. [sent-290, score-0.522]
82 Preparation of 2-Pedestrian Training Data Since there is no 2-pedestrian detection training dataset, we construct it based on the INRIA training dataset [4] as follows: 1) All the negative images are used for negative samples. [sent-293, score-0.271]
83 2) Because most pedestrians labeled in INRIA are iso- lated pedestrians, this results in a very small number of 2pedestrian positive samples (656). [sent-294, score-0.553]
84 3) Ifthe bounding boxes oftwo pedestrians have overlap, the bounding box that exactly covers the two pedestrians is considered as the label of the 2-pedestrian positive sample. [sent-298, score-1.177]
85 Compared with LatSVMV2, our approach has 10%, 7% and 5% log-average miss rate improvement on the datasets ETH, TUD-Brussels and Caltech-Test respectively. [sent-304, score-0.277]
86 In order to exclude the factor of using a larger training set, we also train the 1-pedestrian detector with DPM 6. [sent-305, score-0.242]
87 By combining with LatSVM-V2-E, our approach (LatSvmV2-E+our) has 9%, 7% and 5% log-average miss rate improvement over LatSVM-V2-E on the datasets ETH, TUDBrussels and Caltech-Test respectively. [sent-309, score-0.277]
88 We also investigate other 1-pedestrian detectors and in- tegrate them with our 2-pedestrian detector in this experiment. [sent-310, score-0.309]
89 For 1-pedestrian detection results, the range of detection score s has large variation for different approaches. [sent-313, score-0.4]
90 Our framework significantly improves all the state-of-the-art pedestrian detectors by integrating with them. [sent-324, score-0.548]
91 the best performing one (LatSVM-V2+Our) reaches the average miss rate of 41%. [sent-336, score-0.249]
92 The current best performing approaches on the Caltech-Test dataset is the MultiResC and the contextual boost in [6], both of which use context information and have an average miss rate 48%. [sent-337, score-0.439]
93 This experiment shows that the multipedestrian detector provides rich complementary information to current state-of-the-art 1-pedestrian detection approaches even when context [25] or motion [33] is used by these approaches. [sent-342, score-0.5]
94 Conclusion In this paper, we propose a new probabilistic framework for single pedestrian detection aided by multi-pedestrian detection. [sent-344, score-0.714]
95 DPM is used to learn the multi-pedestrian detector which effectively captures the unique visual patterns appearing in multiple nearby pedestrians. [sent-345, score-0.527]
96 Detection performance is improved by modeling the relationship between the configurations of single-pedestrian detection results and those ofmulti-pedestrian detection results. [sent-346, score-0.528]
97 Existing pedestrian detection results can be directly used as the input of our framework. [sent-354, score-0.536]
98 The lowest miss rate is reduced from 48% to 43% on the Caltech-Test dataset, from 55% to 50% on the TUD-Brussels dataset and from 51% to 41% on the ETH dataset. [sent-357, score-0.318]
99 The walking behaviour of pedestrian social groups and its impact on crowd dynamics. [sent-524, score-0.425]
100 A discriminative deep model [24] [25] [26] [27] [28] [29] [30] [3 1] [32] [33] [34] [35] for pedestrian detection with occlusion handling. [sent-529, score-0.536]
wordName wordTfidf (topN-words)
[('pedestrians', 0.522), ('pedestrian', 0.36), ('detector', 0.212), ('zc', 0.208), ('miss', 0.188), ('detection', 0.176), ('dpm', 0.16), ('nearby', 0.156), ('eth', 0.123), ('mixture', 0.118), ('aided', 0.117), ('relationship', 0.105), ('seeing', 0.104), ('multiftr', 0.104), ('multiresc', 0.104), ('cuhk', 0.099), ('detectors', 0.097), ('snorm', 0.095), ('configuration', 0.095), ('tudbrussels', 0.084), ('patterns', 0.077), ('caltech', 0.076), ('walk', 0.074), ('contextual', 0.072), ('configurations', 0.071), ('evidence', 0.066), ('investigation', 0.066), ('root', 0.064), ('sociologists', 0.063), ('doll', 0.063), ('window', 0.063), ('rate', 0.061), ('refine', 0.059), ('cues', 0.059), ('lc', 0.059), ('integrating', 0.058), ('context', 0.055), ('unique', 0.054), ('wojek', 0.053), ('shapelet', 0.052), ('occluded', 0.051), ('bounding', 0.051), ('shenzhen', 0.049), ('edgelet', 0.049), ('windows', 0.049), ('score', 0.048), ('filter', 0.048), ('ouyang', 0.047), ('hog', 0.045), ('nms', 0.045), ('filters', 0.044), ('candidate', 0.042), ('fppi', 0.041), ('tection', 0.04), ('extra', 0.039), ('evaluated', 0.038), ('inria', 0.038), ('walking', 0.037), ('sizes', 0.036), ('articulation', 0.036), ('int', 0.036), ('grammar', 0.035), ('learned', 0.035), ('dataset', 0.035), ('lowest', 0.034), ('location', 0.033), ('framework', 0.033), ('operations', 0.032), ('integral', 0.031), ('positive', 0.031), ('aspect', 0.031), ('training', 0.03), ('hong', 0.03), ('parts', 0.029), ('missed', 0.029), ('kong', 0.029), ('public', 0.029), ('complementary', 0.029), ('integrated', 0.029), ('improvement', 0.028), ('compatible', 0.028), ('approaches', 0.028), ('probabilistic', 0.028), ('unaffordable', 0.028), ('ewd', 0.028), ('reinforce', 0.028), ('ehog', 0.028), ('edgelets', 0.028), ('mog', 0.028), ('moussaid', 0.028), ('sabzmeydani', 0.028), ('hiksvm', 0.028), ('poseinv', 0.028), ('aztion', 0.028), ('shapelets', 0.028), ('groups', 0.028), ('pictorial', 0.028), ('rectangles', 0.028), ('ee', 0.028), ('appearing', 0.028)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000002 398 cvpr-2013-Single-Pedestrian Detection Aided by Multi-pedestrian Detection
Author: Wanli Ouyang, Xiaogang Wang
Abstract: In this paper, we address the challenging problem of detecting pedestrians who appear in groups and have interaction. A new approach is proposed for single-pedestrian detection aided by multi-pedestrian detection. A mixture model of multi-pedestrian detectors is designed to capture the unique visual cues which are formed by nearby multiple pedestrians but cannot be captured by single-pedestrian detectors. A probabilistic framework is proposed to model the relationship between the configurations estimated by single- and multi-pedestrian detectors, and to refine the single-pedestrian detection result with multi-pedestrian detection. It can integrate with any single-pedestrian detector without significantly increasing the computation load. 15 state-of-the-art single-pedestrian detection approaches are investigated on three widely used public datasets: Caltech, TUD-Brussels andETH. Experimental results show that our framework significantly improves all these approaches. The average improvement is 9% on the Caltech-Test dataset, 11% on the TUD-Brussels dataset and 17% on the ETH dataset in terms of average miss rate. The lowest average miss rate is reduced from 48% to 43% on the Caltech-Test dataset, from 55% to 50% on the TUD-Brussels dataset and from 51% to 41% on the ETH dataset.
2 0.51454765 363 cvpr-2013-Robust Multi-resolution Pedestrian Detection in Traffic Scenes
Author: Junjie Yan, Xucong Zhang, Zhen Lei, Shengcai Liao, Stan Z. Li
Abstract: The serious performance decline with decreasing resolution is the major bottleneck for current pedestrian detection techniques [14, 23]. In this paper, we take pedestrian detection in different resolutions as different but related problems, and propose a Multi-Task model to jointly consider their commonness and differences. The model contains resolution aware transformations to map pedestrians in different resolutions to a common space, where a shared detector is constructed to distinguish pedestrians from background. For model learning, we present a coordinate descent procedure to learn the resolution aware transformations and deformable part model (DPM) based detector iteratively. In traffic scenes, there are many false positives located around vehicles, therefore, we further build a context model to suppress them according to the pedestrian-vehicle relationship. The context model can be learned automatically even when the vehicle annotations are not available. Our method reduces the mean miss rate to 60% for pedestrians taller than 30 pixels on the Caltech Pedestrian Benchmark, which noticeably outperforms previous state-of-the-art (71%).
3 0.37064388 288 cvpr-2013-Modeling Mutual Visibility Relationship in Pedestrian Detection
Author: Wanli Ouyang, Xingyu Zeng, Xiaogang Wang
Abstract: Detecting pedestrians in cluttered scenes is a challenging problem in computer vision. The difficulty is added when several pedestrians overlap in images and occlude each other. We observe, however, that the occlusion/visibility statuses of overlapping pedestrians provide useful mutual relationship for visibility estimation - the visibility estimation of one pedestrian facilitates the visibility estimation of another. In this paper, we propose a mutual visibility deep model that jointly estimates the visibility statuses of overlapping pedestrians. The visibility relationship among pedestrians is learned from the deep model for recognizing co-existing pedestrians. Experimental results show that the mutual visibility deep model effectively improves the pedestrian detection results. Compared with existing image-based pedestrian detection approaches, our approach has the lowest average miss rate on the CaltechTrain dataset, the Caltech-Test dataset and the ETHdataset. Including mutual visibility leads to 4% −8% improvements on mluudlitnipglem ubteunaclh vmiasibrki ditayta lesaedtss.
4 0.24475673 167 cvpr-2013-Fast Multiple-Part Based Object Detection Using KD-Ferns
Author: Dan Levi, Shai Silberstein, Aharon Bar-Hillel
Abstract: In this work we present a new part-based object detection algorithm with hundreds of parts performing realtime detection. Part-based models are currently state-ofthe-art for object detection due to their ability to represent large appearance variations. However, due to their high computational demands such methods are limited to several parts only and are too slow for practical real-time implementation. Our algorithm is an accelerated version of the “Feature Synthesis ” (FS) method [1], which uses multiple object parts for detection and is among state-of-theart methods on human detection benchmarks, but also suffers from a high computational cost. The proposed Accelerated Feature Synthesis (AFS) uses several strategies for reducing the number of locations searched for each part. The first strategy uses a novel algorithm for approximate nearest neighbor search which we developed, termed “KDFerns ”, to compare each image location to only a subset of the model parts. Candidate part locations for a specific part are further reduced using spatial inhibition, and using an object-level “coarse-to-fine ” strategy. In our empirical evaluation on pedestrian detection benchmarks, AFS main- × tains almost fully the accuracy performance of the original FS, while running more than 4 faster than existing partbased methods which use only several parts. AFS is to our best knowledge the first part-based object detection method achieving real-time running performance: nearly 10 frames per-second on 640 480 images on a regular CPU.
5 0.24179107 217 cvpr-2013-Improving an Object Detector and Extracting Regions Using Superpixels
Author: Guang Shu, Afshin Dehghan, Mubarak Shah
Abstract: We propose an approach to improve the detection performance of a generic detector when it is applied to a particular video. The performance of offline-trained objects detectors are usually degraded in unconstrained video environments due to variant illuminations, backgrounds and camera viewpoints. Moreover, most object detectors are trained using Haar-like features or gradient features but ignore video specificfeatures like consistent colorpatterns. In our approach, we apply a Superpixel-based Bag-of-Words (BoW) model to iteratively refine the output of a generic detector. Compared to other related work, our method builds a video-specific detector using superpixels, hence it can handle the problem of appearance variation. Most importantly, using Conditional Random Field (CRF) along with our super pixel-based BoW model, we develop and algorithm to segment the object from the background . Therefore our method generates an output of the exact object regions instead of the bounding boxes generated by most detectors. In general, our method takes detection bounding boxes of a generic detector as input and generates the detection output with higher average precision and precise object regions. The experiments on four recent datasets demonstrate the effectiveness of our approach and significantly improves the state-of-art detector by 5-16% in average precision.
6 0.23889828 318 cvpr-2013-Optimized Pedestrian Detection for Multiple and Occluded People
7 0.23525368 328 cvpr-2013-Pedestrian Detection with Unsupervised Multi-stage Feature Learning
8 0.22294456 158 cvpr-2013-Exploring Weak Stabilization for Motion Feature Extraction
9 0.21531658 383 cvpr-2013-Seeking the Strongest Rigid Detector
10 0.20923308 122 cvpr-2013-Detection Evolution with Multi-order Contextual Co-occurrence
11 0.16993427 70 cvpr-2013-Bottom-Up Segmentation for Top-Down Detection
12 0.16818497 311 cvpr-2013-Occlusion Patterns for Object Class Detection
13 0.14872512 248 cvpr-2013-Learning Collections of Part Models for Object Recognition
14 0.14058141 388 cvpr-2013-Semi-supervised Learning of Feature Hierarchies for Object Detection in a Video
15 0.13940483 264 cvpr-2013-Learning to Detect Partially Overlapping Instances
16 0.13382728 364 cvpr-2013-Robust Object Co-detection
17 0.12830038 440 cvpr-2013-Tracking People and Their Objects
18 0.12558809 154 cvpr-2013-Explicit Occlusion Modeling for 3D Object Class Representations
19 0.12068973 163 cvpr-2013-Fast, Accurate Detection of 100,000 Object Classes on a Single Machine
20 0.11918847 14 cvpr-2013-A Joint Model for 2D and 3D Pose Estimation from a Single Image
topicId topicWeight
[(0, 0.236), (1, -0.076), (2, 0.033), (3, -0.112), (4, 0.081), (5, 0.034), (6, 0.191), (7, 0.087), (8, 0.039), (9, -0.031), (10, -0.205), (11, -0.147), (12, 0.197), (13, -0.33), (14, 0.141), (15, 0.023), (16, -0.173), (17, 0.081), (18, -0.035), (19, 0.052), (20, -0.055), (21, -0.048), (22, -0.191), (23, 0.215), (24, -0.025), (25, -0.09), (26, -0.006), (27, 0.044), (28, 0.026), (29, 0.082), (30, 0.065), (31, -0.001), (32, 0.046), (33, -0.033), (34, -0.064), (35, -0.03), (36, -0.052), (37, 0.018), (38, -0.059), (39, -0.01), (40, 0.044), (41, 0.02), (42, -0.024), (43, -0.018), (44, 0.07), (45, 0.002), (46, -0.079), (47, 0.039), (48, -0.012), (49, -0.066)]
simIndex simValue paperId paperTitle
1 0.97012544 363 cvpr-2013-Robust Multi-resolution Pedestrian Detection in Traffic Scenes
Author: Junjie Yan, Xucong Zhang, Zhen Lei, Shengcai Liao, Stan Z. Li
Abstract: The serious performance decline with decreasing resolution is the major bottleneck for current pedestrian detection techniques [14, 23]. In this paper, we take pedestrian detection in different resolutions as different but related problems, and propose a Multi-Task model to jointly consider their commonness and differences. The model contains resolution aware transformations to map pedestrians in different resolutions to a common space, where a shared detector is constructed to distinguish pedestrians from background. For model learning, we present a coordinate descent procedure to learn the resolution aware transformations and deformable part model (DPM) based detector iteratively. In traffic scenes, there are many false positives located around vehicles, therefore, we further build a context model to suppress them according to the pedestrian-vehicle relationship. The context model can be learned automatically even when the vehicle annotations are not available. Our method reduces the mean miss rate to 60% for pedestrians taller than 30 pixels on the Caltech Pedestrian Benchmark, which noticeably outperforms previous state-of-the-art (71%).
same-paper 2 0.9542886 398 cvpr-2013-Single-Pedestrian Detection Aided by Multi-pedestrian Detection
Author: Wanli Ouyang, Xiaogang Wang
Abstract: In this paper, we address the challenging problem of detecting pedestrians who appear in groups and have interaction. A new approach is proposed for single-pedestrian detection aided by multi-pedestrian detection. A mixture model of multi-pedestrian detectors is designed to capture the unique visual cues which are formed by nearby multiple pedestrians but cannot be captured by single-pedestrian detectors. A probabilistic framework is proposed to model the relationship between the configurations estimated by single- and multi-pedestrian detectors, and to refine the single-pedestrian detection result with multi-pedestrian detection. It can integrate with any single-pedestrian detector without significantly increasing the computation load. 15 state-of-the-art single-pedestrian detection approaches are investigated on three widely used public datasets: Caltech, TUD-Brussels andETH. Experimental results show that our framework significantly improves all these approaches. The average improvement is 9% on the Caltech-Test dataset, 11% on the TUD-Brussels dataset and 17% on the ETH dataset in terms of average miss rate. The lowest average miss rate is reduced from 48% to 43% on the Caltech-Test dataset, from 55% to 50% on the TUD-Brussels dataset and from 51% to 41% on the ETH dataset.
3 0.89195073 288 cvpr-2013-Modeling Mutual Visibility Relationship in Pedestrian Detection
Author: Wanli Ouyang, Xingyu Zeng, Xiaogang Wang
Abstract: Detecting pedestrians in cluttered scenes is a challenging problem in computer vision. The difficulty is added when several pedestrians overlap in images and occlude each other. We observe, however, that the occlusion/visibility statuses of overlapping pedestrians provide useful mutual relationship for visibility estimation - the visibility estimation of one pedestrian facilitates the visibility estimation of another. In this paper, we propose a mutual visibility deep model that jointly estimates the visibility statuses of overlapping pedestrians. The visibility relationship among pedestrians is learned from the deep model for recognizing co-existing pedestrians. Experimental results show that the mutual visibility deep model effectively improves the pedestrian detection results. Compared with existing image-based pedestrian detection approaches, our approach has the lowest average miss rate on the CaltechTrain dataset, the Caltech-Test dataset and the ETHdataset. Including mutual visibility leads to 4% −8% improvements on mluudlitnipglem ubteunaclh vmiasibrki ditayta lesaedtss.
4 0.8002494 167 cvpr-2013-Fast Multiple-Part Based Object Detection Using KD-Ferns
Author: Dan Levi, Shai Silberstein, Aharon Bar-Hillel
Abstract: In this work we present a new part-based object detection algorithm with hundreds of parts performing realtime detection. Part-based models are currently state-ofthe-art for object detection due to their ability to represent large appearance variations. However, due to their high computational demands such methods are limited to several parts only and are too slow for practical real-time implementation. Our algorithm is an accelerated version of the “Feature Synthesis ” (FS) method [1], which uses multiple object parts for detection and is among state-of-theart methods on human detection benchmarks, but also suffers from a high computational cost. The proposed Accelerated Feature Synthesis (AFS) uses several strategies for reducing the number of locations searched for each part. The first strategy uses a novel algorithm for approximate nearest neighbor search which we developed, termed “KDFerns ”, to compare each image location to only a subset of the model parts. Candidate part locations for a specific part are further reduced using spatial inhibition, and using an object-level “coarse-to-fine ” strategy. In our empirical evaluation on pedestrian detection benchmarks, AFS main- × tains almost fully the accuracy performance of the original FS, while running more than 4 faster than existing partbased methods which use only several parts. AFS is to our best knowledge the first part-based object detection method achieving real-time running performance: nearly 10 frames per-second on 640 480 images on a regular CPU.
5 0.78119808 318 cvpr-2013-Optimized Pedestrian Detection for Multiple and Occluded People
Author: Sitapa Rujikietgumjorn, Robert T. Collins
Abstract: We present a quadratic unconstrained binary optimization (QUBO) framework for reasoning about multiple object detections with spatial overlaps. The method maximizes an objective function composed of unary detection confidence scores andpairwise overlap constraints to determine which overlapping detections should be suppressed, and which should be kept. The framework is flexible enough to handle the problem of detecting objects as a shape covering of a foreground mask, and to handle the problem of filtering confidence weighted detections produced by a traditional sliding window object detector. In our experiments, we show that our method outperforms two existing state-ofthe-art pedestrian detectors.
6 0.77250481 383 cvpr-2013-Seeking the Strongest Rigid Detector
7 0.75780815 122 cvpr-2013-Detection Evolution with Multi-order Contextual Co-occurrence
8 0.66065675 328 cvpr-2013-Pedestrian Detection with Unsupervised Multi-stage Feature Learning
9 0.63985425 272 cvpr-2013-Long-Term Occupancy Analysis Using Graph-Based Optimisation in Thermal Imagery
10 0.58656412 217 cvpr-2013-Improving an Object Detector and Extracting Regions Using Superpixels
11 0.53073496 142 cvpr-2013-Efficient Detector Adaptation for Object Detection in a Video
12 0.52951962 254 cvpr-2013-Learning SURF Cascade for Fast and Accurate Object Detection
13 0.51018894 144 cvpr-2013-Efficient Maximum Appearance Search for Large-Scale Object Detection
14 0.49280363 264 cvpr-2013-Learning to Detect Partially Overlapping Instances
15 0.48813325 120 cvpr-2013-Detecting and Naming Actors in Movies Using Generative Appearance Models
16 0.47451866 388 cvpr-2013-Semi-supervised Learning of Feature Hierarchies for Object Detection in a Video
17 0.47015882 158 cvpr-2013-Exploring Weak Stabilization for Motion Feature Extraction
18 0.45887983 204 cvpr-2013-Histograms of Sparse Codes for Object Detection
19 0.45489752 401 cvpr-2013-Sketch Tokens: A Learned Mid-level Representation for Contour and Object Detection
20 0.4478218 311 cvpr-2013-Occlusion Patterns for Object Class Detection
topicId topicWeight
[(10, 0.092), (16, 0.023), (26, 0.025), (28, 0.015), (33, 0.219), (67, 0.469), (69, 0.042), (80, 0.01), (87, 0.041)]
simIndex simValue paperId paperTitle
1 0.92418444 142 cvpr-2013-Efficient Detector Adaptation for Object Detection in a Video
Author: Pramod Sharma, Ram Nevatia
Abstract: In this work, we present a novel and efficient detector adaptation method which improves the performance of an offline trained classifier (baseline classifier) by adapting it to new test datasets. We address two critical aspects of adaptation methods: generalizability and computational efficiency. We propose an adaptation method, which can be applied to various baseline classifiers and is computationally efficient also. For a given test video, we collect online samples in an unsupervised manner and train a randomfern adaptive classifier . The adaptive classifier improves precision of the baseline classifier by validating the obtained detection responses from baseline classifier as correct detections or false alarms. Experiments demonstrate generalizability, computational efficiency and effectiveness of our method, as we compare our method with state of the art approaches for the problem of human detection and show good performance with high computational efficiency on two different baseline classifiers.
2 0.91729552 103 cvpr-2013-Decoding Children's Social Behavior
Author: James M. Rehg, Gregory D. Abowd, Agata Rozga, Mario Romero, Mark A. Clements, Stan Sclaroff, Irfan Essa, Opal Y. Ousley, Yin Li, Chanho Kim, Hrishikesh Rao, Jonathan C. Kim, Liliana Lo Presti, Jianming Zhang, Denis Lantsman, Jonathan Bidwell, Zhefan Ye
Abstract: We introduce a new problem domain for activity recognition: the analysis of children ’s social and communicative behaviors based on video and audio data. We specifically target interactions between children aged 1–2 years and an adult. Such interactions arise naturally in the diagnosis and treatment of developmental disorders such as autism. We introduce a new publicly-available dataset containing over 160 sessions of a 3–5 minute child-adult interaction. In each session, the adult examiner followed a semistructured play interaction protocol which was designed to elicit a broad range of social behaviors. We identify the key technical challenges in analyzing these behaviors, and describe methods for decoding the interactions. We present experimental results that demonstrate the potential of the dataset to drive interesting research questions, and show preliminary results for multi-modal activity recognition.
same-paper 3 0.88686603 398 cvpr-2013-Single-Pedestrian Detection Aided by Multi-pedestrian Detection
Author: Wanli Ouyang, Xiaogang Wang
Abstract: In this paper, we address the challenging problem of detecting pedestrians who appear in groups and have interaction. A new approach is proposed for single-pedestrian detection aided by multi-pedestrian detection. A mixture model of multi-pedestrian detectors is designed to capture the unique visual cues which are formed by nearby multiple pedestrians but cannot be captured by single-pedestrian detectors. A probabilistic framework is proposed to model the relationship between the configurations estimated by single- and multi-pedestrian detectors, and to refine the single-pedestrian detection result with multi-pedestrian detection. It can integrate with any single-pedestrian detector without significantly increasing the computation load. 15 state-of-the-art single-pedestrian detection approaches are investigated on three widely used public datasets: Caltech, TUD-Brussels andETH. Experimental results show that our framework significantly improves all these approaches. The average improvement is 9% on the Caltech-Test dataset, 11% on the TUD-Brussels dataset and 17% on the ETH dataset in terms of average miss rate. The lowest average miss rate is reduced from 48% to 43% on the Caltech-Test dataset, from 55% to 50% on the TUD-Brussels dataset and from 51% to 41% on the ETH dataset.
Author: Enrique G. Ortiz, Alan Wright, Mubarak Shah
Abstract: This paper presents an end-to-end video face recognition system, addressing the difficult problem of identifying a video face track using a large dictionary of still face images of a few hundred people, while rejecting unknown individuals. A straightforward application of the popular ?1minimization for face recognition on a frame-by-frame basis is prohibitively expensive, so we propose a novel algorithm Mean Sequence SRC (MSSRC) that performs video face recognition using a joint optimization leveraging all of the available video data and the knowledge that the face track frames belong to the same individual. By adding a strict temporal constraint to the ?1-minimization that forces individual frames in a face track to all reconstruct a single identity, we show the optimization reduces to a single minimization over the mean of the face track. We also introduce a new Movie Trailer Face Dataset collected from 101 movie trailers on YouTube. Finally, we show that our methodmatches or outperforms the state-of-the-art on three existing datasets (YouTube Celebrities, YouTube Faces, and Buffy) and our unconstrained Movie Trailer Face Dataset. More importantly, our method excels at rejecting unknown identities by at least 8% in average precision.
5 0.82645142 45 cvpr-2013-Articulated Pose Estimation Using Discriminative Armlet Classifiers
Author: Georgia Gkioxari, Pablo Arbeláez, Lubomir Bourdev, Jitendra Malik
Abstract: We propose a novel approach for human pose estimation in real-world cluttered scenes, and focus on the challenging problem of predicting the pose of both arms for each person in the image. For this purpose, we build on the notion of poselets [4] and train highly discriminative classifiers to differentiate among arm configurations, which we call armlets. We propose a rich representation which, in addition to standardHOGfeatures, integrates the information of strong contours, skin color and contextual cues in a principled manner. Unlike existing methods, we evaluate our approach on a large subset of images from the PASCAL VOC detection dataset, where critical visual phenomena, such as occlusion, truncation, multiple instances and clutter are the norm. Our approach outperforms Yang and Ramanan [26], the state-of-the-art technique, with an improvement from 29.0% to 37.5% PCP accuracy on the arm keypoint prediction task, on this new pose estimation dataset.
6 0.81719035 275 cvpr-2013-Lp-Norm IDF for Large Scale Image Search
7 0.80845344 2 cvpr-2013-3D Pictorial Structures for Multiple View Articulated Pose Estimation
8 0.80355656 246 cvpr-2013-Learning Binary Codes for High-Dimensional Data Using Bilinear Projections
9 0.78926063 375 cvpr-2013-Saliency Detection via Graph-Based Manifold Ranking
10 0.77202284 345 cvpr-2013-Real-Time Model-Based Rigid Object Pose Estimation and Tracking Combining Dense and Sparse Visual Cues
11 0.742302 288 cvpr-2013-Modeling Mutual Visibility Relationship in Pedestrian Detection
12 0.74177361 339 cvpr-2013-Probabilistic Graphlet Cut: Exploiting Spatial Structure Cue for Weakly Supervised Image Segmentation
13 0.72777236 254 cvpr-2013-Learning SURF Cascade for Fast and Accurate Object Detection
14 0.71668333 363 cvpr-2013-Robust Multi-resolution Pedestrian Detection in Traffic Scenes
15 0.71034229 119 cvpr-2013-Detecting and Aligning Faces by Image Retrieval
16 0.70215297 122 cvpr-2013-Detection Evolution with Multi-order Contextual Co-occurrence
17 0.67618436 60 cvpr-2013-Beyond Physical Connections: Tree Models in Human Pose Estimation
18 0.67572641 438 cvpr-2013-Towards Pose Robust Face Recognition
19 0.66967714 338 cvpr-2013-Probabilistic Elastic Matching for Pose Variant Face Verification
20 0.66864806 63 cvpr-2013-Binary Code Ranking with Weighted Hamming Distance