cvpr cvpr2013 cvpr2013-363 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Junjie Yan, Xucong Zhang, Zhen Lei, Shengcai Liao, Stan Z. Li
Abstract: The serious performance decline with decreasing resolution is the major bottleneck for current pedestrian detection techniques [14, 23]. In this paper, we take pedestrian detection in different resolutions as different but related problems, and propose a Multi-Task model to jointly consider their commonness and differences. The model contains resolution aware transformations to map pedestrians in different resolutions to a common space, where a shared detector is constructed to distinguish pedestrians from background. For model learning, we present a coordinate descent procedure to learn the resolution aware transformations and deformable part model (DPM) based detector iteratively. In traffic scenes, there are many false positives located around vehicles, therefore, we further build a context model to suppress them according to the pedestrian-vehicle relationship. The context model can be learned automatically even when the vehicle annotations are not available. Our method reduces the mean miss rate to 60% for pedestrians taller than 30 pixels on the Caltech Pedestrian Benchmark, which noticeably outperforms previous state-of-the-art (71%).
Reference: text
sentIndex sentText sentNum sentScore
1 cn , i i a Abstract The serious performance decline with decreasing resolution is the major bottleneck for current pedestrian detection techniques [14, 23]. [sent-5, score-0.795]
2 In this paper, we take pedestrian detection in different resolutions as different but related problems, and propose a Multi-Task model to jointly consider their commonness and differences. [sent-6, score-0.896]
3 The model contains resolution aware transformations to map pedestrians in different resolutions to a common space, where a shared detector is constructed to distinguish pedestrians from background. [sent-7, score-1.275]
4 For model learning, we present a coordinate descent procedure to learn the resolution aware transformations and deformable part model (DPM) based detector iteratively. [sent-8, score-0.599]
5 In traffic scenes, there are many false positives located around vehicles, therefore, we further build a context model to suppress them according to the pedestrian-vehicle relationship. [sent-9, score-0.275]
6 The context model can be learned automatically even when the vehicle annotations are not available. [sent-10, score-0.267]
7 Our method reduces the mean miss rate to 60% for pedestrians taller than 30 pixels on the Caltech Pedestrian Benchmark, which noticeably outperforms previous state-of-the-art (71%). [sent-11, score-0.67]
8 In recent years, especially due to the popularity of gradient features, pedestrian detection field has achieved impressive progresses in both effectiveness [6, 3 1, 43, 41, 19, 33] and efficiency [25, 11, 18, 4, 10]. [sent-14, score-0.576]
9 The leading detectors can achieve satisfactory performance on high resolution benchmarks (e. [sent-15, score-0.252]
10 INRIA [6]), however, they encounter difficulties for the low resolution pedestrians (e. [sent-17, score-0.554]
11 Unfortunately, the low resolution pedestrians are often very important in real applications. [sent-21, score-0.554]
12 the low resolution pedestrians to provide enough time for reaction. [sent-26, score-0.554]
13 Traditional pedestrian detectors usually follow the scale invariant assumption: a scale invariant feature based detector trained at a fixed resolution could be generalized to all resolutions, by resizing the detector [40, 4], image [6, 19] or both of them [11]. [sent-27, score-1.026]
14 However, the finite sampling frequency of the sensor results in much information loss for low resolution pedestrians. [sent-28, score-0.273]
15 For example, the best detector achieves 21% mean miss rate for pedestrians taller than 80 pixels in Caltech Pedestrian Benchmark [14], while increases to 73% for pedestrians 30-80 pixels high. [sent-30, score-1.088]
16 Our philosophy is that the relationship among different resolutions should be explored for robust multi-resolution pedestrian detection. [sent-31, score-0.749]
17 For example, the low resolution samples contain a lot of noise that may mislead the detector in the training phase, and the information contained in high resolution samples can help to regularize it. [sent-32, score-0.742]
18 We argue that for pedestrians in different resolutions, the differences exist in the features of local patch (e. [sent-33, score-0.315]
19 Particularly, we extend the popular deformable part model (DPM) [19] to multi-task DPM (MT-DPM), which aims to find an optimal combination of DPM detector and resolution aware transformations. [sent-39, score-0.488]
20 We prove that when the resolution aware transformations are fixed, the multi-task problems can be transformed to be a Latent-SVM optimization problem, and when the DPM detector in the mapped space is fixed, the problem equals to a standard SVM problem. [sent-40, score-0.505]
21 In addition, we propose a new context model to improve the detection performance in traffic scenes. [sent-42, score-0.28]
22 The vehicle localization is much easier than pedestrian, which motivates us to employ pedestrian-vehicle relationship as an additional cue to judge whether the detection is a false or true positive. [sent-45, score-0.366]
23 Since the vehicle annotations are often not available in pedestrian benchmark, we further present a method to learn the context model from ground truth pedestrian annotations and noisy vehicle detections. [sent-47, score-1.401]
24 For the pedestrians taller than 30 pixels, our MT-DPM reduces 8% and our context model further reduces 3% mean miss rate over previous state-of-the-art performance. [sent-49, score-0.742]
25 The multi-task DPM detector and pedestrian-vehicle context model are discussed in Section 3 and Section 4, respectively. [sent-51, score-0.209]
26 Related work There is a long history of research on pedestrian detection. [sent-54, score-0.484]
27 Some papers focused on special problems in pedestrian detection, including occlusion handling [46, 43, 38, 2], speed [25, 11, 18, 4, 10], and detector transfer in new scenes [42, 27]. [sent-58, score-0.624]
28 We refer the detailed surveys on pedestrian detection to [21, 14]. [sent-59, score-0.576]
29 [16] found that the pedestrian detection performance depends on the resolution of training samples. [sent-61, score-0.795]
30 [14] pointed that the pedestrian detection performance drops with decreasing resolution. [sent-62, score-0.576]
31 The most related work is [33], which utilized root and part filters for high resolution pedestrians, while only used the rigid root filter for low resolution pedestrians. [sent-65, score-0.599]
32 Our pedestrian detector is built on the popular DPM (deformable part model) [19], which combined rigid root filter and deformable part filters for detection. [sent-67, score-0.73]
33 The DPM only performs well for high resolution objects, while our MTDPM generalizes it to low resolution case. [sent-68, score-0.492]
34 The coordinate descent procedure in learning is motivated by the steerable part model [35, 34], which trained the shared part bases to accelerate the detection. [sent-69, score-0.221]
35 To the best of our knowledge, this is the first work to capture the pedestrian-vehicle relationship to improve pedestrian detection in traffic scenes. [sent-77, score-0.721]
36 One is to combine samples from different resolutions to train a single detector (Fig. [sent-80, score-0.346]
37 2(a)), and another is to train independent detectors for different resolutions (Fig. [sent-81, score-0.24]
38 On the contrary, multi-resolution model takes pedestrian detection in different resolutions as independent problems, and the relationship among them are missed. [sent-86, score-0.841]
39 The unreliable features of low resolution pedestrians can mislead the learned detector and make it difficult to be generalized to novel test samples. [sent-87, score-0.7]
40 In this part, we present a multi-resolution detection method by considering the relationship of samples from different resolutions, including the commonness and the differences, which are captured by a multi-task strategy simultaneously. [sent-88, score-0.294]
41 Considering the differences of different resolu333000333422 × tions, we use the resolution aware transformations to map features from different resolutions to a common subspace, in which they have similar distribution. [sent-89, score-0.603]
42 A shared detector is trained in the resolution-invariant subspace by samples from all resolutions, to capture the structural commonness. [sent-90, score-0.261]
43 Here we consider the partition of two resolutions (low resolution: 30-80 pixels tall, and high resolution: taller than 80 pixels, as advised in [14]). [sent-93, score-0.492]
44 Note that extending the strategy for other local feature based linear detectors and more resolution partitions are straightforward. [sent-94, score-0.252]
45 The appearance filters in the detector are concatenated to be a nf nc matrix Wa in the same way. [sent-104, score-0.21]
46 m mGiavteinon nth oef l0, all the part locations are latent variables, score is maxL∗ score(I, L∗), where L∗ is the part configurations when the root location is aware transformations. [sent-107, score-0.238]
47 In DPM, pedestrian consists of parts, and every part consists of HOG cells. [sent-111, score-0.517]
48 When the pedestrian resolution changes, the structure of parts and the HOG cell spatial relationship keep the same. [sent-112, score-0.761]
49 The only difference among different resolution lies in the feature vector of evert cell, so that the resolution aware transformations PL and PH are defined on it. [sent-113, score-0.581]
50 The PL and PH are of the dimension nd nf, and they map the low and high resolution samples fro×m n the original × dimensional feature space to the nd dimensional subspace. [sent-114, score-0.412]
51 The features from different resolutions are mapped into the common subspace, so that can share the same detector. [sent-115, score-0.242]
52 We still denote the learned appearance parameters in the mapped resolution invariant subspace as Wa, which is a nd nc matrix, and of the same size with PHΦa (I, L). [sent-116, score-0.368]
53 (2) The model defined above provides the flexibility to describe pedestrians of different resolutions, but also brings challenges, since the Wa, ws, PH, PL are all unknown. [sent-120, score-0.281]
54 In analogy to the original DPM, MT-DPM is formulated as: 12wsTws argWa,wms,PinH,PL +fIH (Wa, ws, PH) + fIL (Wa, ws, PL), (4) where IH and IL denote the high and low resolution training sets, including both pedestrian and background. [sent-131, score-0.757]
55 The second term is the detection loss for resolution aware detection, corresponding to the detection model in Eq. [sent-137, score-0.496]
56 Note that more partitions of resolutions can be handle naturally in Eq. [sent-140, score-0.207]
57 1 Optimize Wa and ws When PH and PL are fixed, we can map the features to the common space on which DPM detector can be learned. [sent-150, score-0.227]
58 We denote PHPHT + PLPLT as A, as For high resolution samples we denote PHΦa(Inf, Ln∗) as (In, Ln∗), and for low resolution samples wfe denote A−12PLΦa(In, Ln∗) as Ln∗). [sent-151, score-0.554]
59 2 Optimize PH fand PL When the Wa and ws are fixed, PH and PL are independent, thus the optimization problem can be divided into two subproblems: arg minPH fIH (Wa, ws , PH) and arg minPL fIL (Wa, ws , PL). [sent-160, score-0.357]
60 tures from randomly generated high and low resolution patches, and use the first nd eigenvectors as the initial value of PH and PL, respectively. [sent-176, score-0.303]
61 The bin size in HOG is set to 8 for high resolution model, and 4 333000333644 for low resolution. [sent-184, score-0.273]
62 Pedestrian-Vehicle Context in Traffic Scenes A lot of detections are located around vehicles in traffic scenes (33. [sent-187, score-0.283]
63 It is possible to use the pedestrian-vehicle relationship to infer whether the detection is true or false positive. [sent-190, score-0.2]
64 4, the detections above a vehi- cle, and detection at the wheel position of a vehicle can be safely removed. [sent-192, score-0.313]
65 We split the spatial relationship between pedestrians and vehicles into five types, including: “Above”, “Next-to”, “Below”, “Overlap” and “Far”. [sent-197, score-0.448]
66 If a pedestrian detection p and a vehicle detection1 v have one of the first four relationships, the context features at the corresponding dimensions are defined as (σ(s) , ∆cx , ∆cy, ∆h, 1), and other dimensions retain to be 0. [sent-199, score-0.843]
67 If the pedestrian detection and vehicle detection are too far or there’s no vehicle, all the dimensions of its pedestrian-vehicle feature is 0. [sent-200, score-0.834]
68 Here ∆cx = |cvx − cpx |, ∆cy = cvy − cpy , and ∆h = hv/hp, where= =(c |vcx , cvy )c, (cpx , cpy ) are the− ce cnter coordinates of vehicle detection v and pedestrian detection p, respectively. [sent-201, score-1.049]
69 Moreover, as pointed in [33], there also has a relationship between the coordinate and the scale of pedestrians under the assumption that the cameras is aligned with ground plane. [sent-204, score-0.368]
70 We further define this geometry context feature for pedestrian detection p as g(p) = (σ(s) , cy, h, cy2 , h2), where s, cy, h are the detection score, y-center and height of the detection respectively, and cy and h are normalized by the height of the image. [sent-205, score-0.918]
71 The context score is the summation of context scores of all pedestrian detections, and context score of a pedestrian is further divided to its geometry and pedestrian-vehicle scores. [sent-207, score-1.355]
72 where wp and wv are the parameters of geometry context and pedestrian-vehicle context, which ensure the truth de- tection (P, V ) has larger context score than any other detection hypotheses. [sent-211, score-0.365]
73 9 is a integer programming problem, but becomes trivial when the label of V is fixed, since it equals to maximizing every pedestrians independently. [sent-214, score-0.281]
74 In typical traffic scenes, the number of vehicles is limited. [sent-215, score-0.196]
75 ∀P0,∀V0, S(Pk, Vk) − S(Pk0, Vk0) ≥ L(Pk, Pk0) − ξk, where Pk0 and Vk0 are arbitrary pedestrian and vehicle hypotheses in the kth image, and Pk and Vk are the ground truth. [sent-223, score-0.65]
76 L(Pk , Pk0) is the Hamming loss of pedestrian detection hypothesis Pk0 and ground truth Pk. [sent-224, score-0.576]
77 The difficulty in pedestrian based applications is that only pedestrian ground 333000333755 truth Pk is available in public pedestrian databases, and vehicle annotation Vk is unknown. [sent-225, score-1.618]
78 To address the problem, we use the noisy vehicle detection result as the initial estimation of Vk, and jointly learn context model and infer whether the vehicle detection is true or false positive, by optimizing the following problem: minwc,ξk21kwck22+ λXKkξk s. [sent-226, score-0.667]
79 ∀P0,∀V0:V mk⊆aVxkS(Pk,Vbk) − S(Pk0,Vk0) ≥ L(Pk,Pk0) Vbk where is a subset of Vk, which reflects the current inference obf the vehicle detections by maximizing the overall context sbcore. [sent-228, score-0.322]
80 We use the ROC or the mean miss rate3 to compare methods as advised in [14]. [sent-235, score-0.191]
81 In the following experiments, we examine the influence of the subspace dimension in MT-DPM, then compare it with other strategies for low resolution detection. [sent-240, score-0.413]
82 The Subspace Dimension in MT-DPM The dimension of the mapped common subspace in MTDPM reflects the tradeoff between commonness and differences among different resolutions. [sent-245, score-0.286]
83 We examine the parameter between 8 and 18 with a interval 2, and measure the performance on pedestrians taller than 30 pixels. [sent-247, score-0.473]
84 Contributions of the context cues in multi-resolution pedestrian detection. [sent-287, score-0.585]
85 Comparisons with Other Detection Strategies We compare the proposed MT-DPM with other strategies for multi-resolution pedestrian detection. [sent-292, score-0.52]
86 The compared methods including: (1) DPM trained on the high resolution pedestrians; (2) DPM trained on the high resolution pedestrians and tested by resizing images 1. [sent-294, score-0.823]
87 5 times, respectively; (3) DPM trained on low resolution pedestrians; (4) DPM trained on both high and low resolution pedestrians data (Fig. [sent-297, score-0.887]
88 2(a)); (5) Multi-resolution DPMs trained on high resolution and low resolution independently, and their detection results are fused (Fig. [sent-298, score-0.614]
89 ROCs of pedestrians taller than 30 pixels are reported 333000333866 10−3 10−2 10−1 100 false positives per image 101 (a) Multi-resolution (taller than 30 pixels) (b) Low resolution (30-80 pixels high) (c) Reasonable (taller than 50 pixels) Figure 8. [sent-300, score-0.837]
90 High resolution model can not detect the low resolution pedestrians directly, but some of the low resolution pedestrians can be detected by resizing images. [sent-304, score-1.371]
91 The low resolution DPM outperforms high resolution DPM, since the low resolution pedestrians is more than high resolution pedestrians. [sent-310, score-1.265]
92 Combining low and high resolution would always help, but the improvement depends on the strategy. [sent-311, score-0.273]
93 Fusing low and high resolution data to train a single detector is better than training two independent detectors. [sent-312, score-0.381]
94 1 and 1 FPPI for pedestrians taller than 30 pixels are shown in Fig. [sent-319, score-0.502]
95 The improvement of context is more remarkable when more false positives are allowed, for example, there is a 3. [sent-325, score-0.188]
96 For the space limitation here, we only show results of multi-resolution pedestrians (Fig. [sent-331, score-0.281]
97 The time for processing one frame is less than 1s on a standard PC, including high resolution and low resolution pedestrian detection, vehicle detection and context model. [sent-345, score-1.335]
98 Conclusion In this paper, we propose a Multi-Task DPM detector to jointly encode the commonness and differences between pedestrians from different resolutions, and achieve robust performance for multi-resolution pedestrian detection. [sent-348, score-1.02]
99 The pedestrian-vehicle relationship is modeled to infer the true or false positives in traffic scenes, and we show how to learn it automatically from the data. [sent-349, score-0.232]
100 Survey of pedestrian detection for advanced driver assistance systems. [sent-499, score-0.614]
wordName wordTfidf (topN-words)
[('pedestrian', 0.484), ('pedestrians', 0.281), ('dpm', 0.242), ('resolution', 0.219), ('resolutions', 0.207), ('taller', 0.192), ('ph', 0.175), ('wa', 0.167), ('vehicle', 0.166), ('fih', 0.149), ('ln', 0.143), ('miss', 0.127), ('ws', 0.119), ('wst', 0.113), ('commonness', 0.113), ('vehicles', 0.109), ('detector', 0.108), ('pl', 0.106), ('context', 0.101), ('aware', 0.093), ('caltech', 0.093), ('detection', 0.092), ('traffic', 0.087), ('fil', 0.082), ('nf', 0.074), ('advised', 0.064), ('doll', 0.064), ('ihn', 0.06), ('vk', 0.06), ('relationship', 0.058), ('cy', 0.057), ('subspace', 0.056), ('detections', 0.055), ('low', 0.054), ('wts', 0.052), ('hog', 0.05), ('pk', 0.05), ('transformations', 0.05), ('false', 0.05), ('benchmark', 0.049), ('dimension', 0.048), ('resizing', 0.044), ('yn', 0.043), ('argtp', 0.043), ('cpx', 0.043), ('cpy', 0.043), ('cvy', 0.043), ('cxmax', 0.043), ('minwc', 0.043), ('missrate', 0.043), ('mtdpm', 0.043), ('nxh', 0.043), ('phpht', 0.043), ('phtwa', 0.043), ('plplt', 0.043), ('subexperiments', 0.043), ('vbk', 0.043), ('watph', 0.043), ('wawat', 0.043), ('regularize', 0.042), ('score', 0.042), ('tr', 0.041), ('rate', 0.041), ('fa', 0.039), ('assistance', 0.038), ('xkk', 0.038), ('mislead', 0.038), ('argw', 0.038), ('pfh', 0.038), ('wc', 0.037), ('positives', 0.037), ('root', 0.037), ('shared', 0.036), ('strategies', 0.036), ('wojek', 0.036), ('deformable', 0.035), ('multiftr', 0.035), ('multiresc', 0.035), ('wat', 0.035), ('mapped', 0.035), ('lei', 0.034), ('differences', 0.034), ('part', 0.033), ('detectors', 0.033), ('multiresolution', 0.032), ('descent', 0.032), ('scenes', 0.032), ('xn', 0.032), ('samples', 0.031), ('nd', 0.03), ('trained', 0.03), ('coordinate', 0.029), ('ieee', 0.029), ('pixels', 0.029), ('wv', 0.029), ('stan', 0.028), ('steerable', 0.028), ('nc', 0.028), ('kitti', 0.027), ('chinese', 0.027)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999964 363 cvpr-2013-Robust Multi-resolution Pedestrian Detection in Traffic Scenes
Author: Junjie Yan, Xucong Zhang, Zhen Lei, Shengcai Liao, Stan Z. Li
Abstract: The serious performance decline with decreasing resolution is the major bottleneck for current pedestrian detection techniques [14, 23]. In this paper, we take pedestrian detection in different resolutions as different but related problems, and propose a Multi-Task model to jointly consider their commonness and differences. The model contains resolution aware transformations to map pedestrians in different resolutions to a common space, where a shared detector is constructed to distinguish pedestrians from background. For model learning, we present a coordinate descent procedure to learn the resolution aware transformations and deformable part model (DPM) based detector iteratively. In traffic scenes, there are many false positives located around vehicles, therefore, we further build a context model to suppress them according to the pedestrian-vehicle relationship. The context model can be learned automatically even when the vehicle annotations are not available. Our method reduces the mean miss rate to 60% for pedestrians taller than 30 pixels on the Caltech Pedestrian Benchmark, which noticeably outperforms previous state-of-the-art (71%).
2 0.51454765 398 cvpr-2013-Single-Pedestrian Detection Aided by Multi-pedestrian Detection
Author: Wanli Ouyang, Xiaogang Wang
Abstract: In this paper, we address the challenging problem of detecting pedestrians who appear in groups and have interaction. A new approach is proposed for single-pedestrian detection aided by multi-pedestrian detection. A mixture model of multi-pedestrian detectors is designed to capture the unique visual cues which are formed by nearby multiple pedestrians but cannot be captured by single-pedestrian detectors. A probabilistic framework is proposed to model the relationship between the configurations estimated by single- and multi-pedestrian detectors, and to refine the single-pedestrian detection result with multi-pedestrian detection. It can integrate with any single-pedestrian detector without significantly increasing the computation load. 15 state-of-the-art single-pedestrian detection approaches are investigated on three widely used public datasets: Caltech, TUD-Brussels andETH. Experimental results show that our framework significantly improves all these approaches. The average improvement is 9% on the Caltech-Test dataset, 11% on the TUD-Brussels dataset and 17% on the ETH dataset in terms of average miss rate. The lowest average miss rate is reduced from 48% to 43% on the Caltech-Test dataset, from 55% to 50% on the TUD-Brussels dataset and from 51% to 41% on the ETH dataset.
3 0.28329059 288 cvpr-2013-Modeling Mutual Visibility Relationship in Pedestrian Detection
Author: Wanli Ouyang, Xingyu Zeng, Xiaogang Wang
Abstract: Detecting pedestrians in cluttered scenes is a challenging problem in computer vision. The difficulty is added when several pedestrians overlap in images and occlude each other. We observe, however, that the occlusion/visibility statuses of overlapping pedestrians provide useful mutual relationship for visibility estimation - the visibility estimation of one pedestrian facilitates the visibility estimation of another. In this paper, we propose a mutual visibility deep model that jointly estimates the visibility statuses of overlapping pedestrians. The visibility relationship among pedestrians is learned from the deep model for recognizing co-existing pedestrians. Experimental results show that the mutual visibility deep model effectively improves the pedestrian detection results. Compared with existing image-based pedestrian detection approaches, our approach has the lowest average miss rate on the CaltechTrain dataset, the Caltech-Test dataset and the ETHdataset. Including mutual visibility leads to 4% −8% improvements on mluudlitnipglem ubteunaclh vmiasibrki ditayta lesaedtss.
4 0.23635291 328 cvpr-2013-Pedestrian Detection with Unsupervised Multi-stage Feature Learning
Author: Pierre Sermanet, Koray Kavukcuoglu, Soumith Chintala, Yann Lecun
Abstract: Pedestrian detection is a problem of considerable practical interest. Adding to the list of successful applications of deep learning methods to vision, we report state-of-theart and competitive results on all major pedestrian datasets with a convolutional network model. The model uses a few new twists, such as multi-stage features, connections that skip layers to integrate global shape information with local distinctive motif information, and an unsupervised method based on convolutional sparse coding to pre-train the filters at each stage.
5 0.22416383 167 cvpr-2013-Fast Multiple-Part Based Object Detection Using KD-Ferns
Author: Dan Levi, Shai Silberstein, Aharon Bar-Hillel
Abstract: In this work we present a new part-based object detection algorithm with hundreds of parts performing realtime detection. Part-based models are currently state-ofthe-art for object detection due to their ability to represent large appearance variations. However, due to their high computational demands such methods are limited to several parts only and are too slow for practical real-time implementation. Our algorithm is an accelerated version of the “Feature Synthesis ” (FS) method [1], which uses multiple object parts for detection and is among state-of-theart methods on human detection benchmarks, but also suffers from a high computational cost. The proposed Accelerated Feature Synthesis (AFS) uses several strategies for reducing the number of locations searched for each part. The first strategy uses a novel algorithm for approximate nearest neighbor search which we developed, termed “KDFerns ”, to compare each image location to only a subset of the model parts. Candidate part locations for a specific part are further reduced using spatial inhibition, and using an object-level “coarse-to-fine ” strategy. In our empirical evaluation on pedestrian detection benchmarks, AFS main- × tains almost fully the accuracy performance of the original FS, while running more than 4 faster than existing partbased methods which use only several parts. AFS is to our best knowledge the first part-based object detection method achieving real-time running performance: nearly 10 frames per-second on 640 480 images on a regular CPU.
6 0.21770689 158 cvpr-2013-Exploring Weak Stabilization for Motion Feature Extraction
7 0.21745972 318 cvpr-2013-Optimized Pedestrian Detection for Multiple and Occluded People
8 0.18956487 217 cvpr-2013-Improving an Object Detector and Extracting Regions Using Superpixels
9 0.16913092 122 cvpr-2013-Detection Evolution with Multi-order Contextual Co-occurrence
10 0.16368875 383 cvpr-2013-Seeking the Strongest Rigid Detector
11 0.14800332 70 cvpr-2013-Bottom-Up Segmentation for Top-Down Detection
12 0.14074215 272 cvpr-2013-Long-Term Occupancy Analysis Using Graph-Based Optimisation in Thermal Imagery
13 0.13516602 311 cvpr-2013-Occlusion Patterns for Object Class Detection
14 0.11799312 248 cvpr-2013-Learning Collections of Part Models for Object Recognition
15 0.10668455 264 cvpr-2013-Learning to Detect Partially Overlapping Instances
16 0.10360514 388 cvpr-2013-Semi-supervised Learning of Feature Hierarchies for Object Detection in a Video
17 0.1033021 440 cvpr-2013-Tracking People and Their Objects
18 0.095706306 30 cvpr-2013-Accurate Localization of 3D Objects from RGB-D Data Using Segmentation Hypotheses
19 0.094585232 364 cvpr-2013-Robust Object Co-detection
20 0.088362269 204 cvpr-2013-Histograms of Sparse Codes for Object Detection
topicId topicWeight
[(0, 0.197), (1, -0.05), (2, 0.004), (3, -0.073), (4, 0.064), (5, 0.021), (6, 0.153), (7, 0.051), (8, 0.015), (9, -0.027), (10, -0.158), (11, -0.142), (12, 0.194), (13, -0.316), (14, 0.125), (15, 0.023), (16, -0.187), (17, 0.07), (18, -0.032), (19, 0.053), (20, -0.056), (21, -0.06), (22, -0.193), (23, 0.207), (24, -0.023), (25, -0.092), (26, -0.015), (27, 0.029), (28, 0.04), (29, 0.065), (30, 0.056), (31, 0.019), (32, 0.051), (33, -0.063), (34, -0.07), (35, -0.027), (36, -0.064), (37, 0.009), (38, -0.093), (39, 0.01), (40, 0.053), (41, 0.028), (42, -0.013), (43, -0.03), (44, 0.079), (45, -0.02), (46, -0.07), (47, 0.04), (48, -0.018), (49, -0.083)]
simIndex simValue paperId paperTitle
same-paper 1 0.96526319 363 cvpr-2013-Robust Multi-resolution Pedestrian Detection in Traffic Scenes
Author: Junjie Yan, Xucong Zhang, Zhen Lei, Shengcai Liao, Stan Z. Li
Abstract: The serious performance decline with decreasing resolution is the major bottleneck for current pedestrian detection techniques [14, 23]. In this paper, we take pedestrian detection in different resolutions as different but related problems, and propose a Multi-Task model to jointly consider their commonness and differences. The model contains resolution aware transformations to map pedestrians in different resolutions to a common space, where a shared detector is constructed to distinguish pedestrians from background. For model learning, we present a coordinate descent procedure to learn the resolution aware transformations and deformable part model (DPM) based detector iteratively. In traffic scenes, there are many false positives located around vehicles, therefore, we further build a context model to suppress them according to the pedestrian-vehicle relationship. The context model can be learned automatically even when the vehicle annotations are not available. Our method reduces the mean miss rate to 60% for pedestrians taller than 30 pixels on the Caltech Pedestrian Benchmark, which noticeably outperforms previous state-of-the-art (71%).
2 0.91733438 398 cvpr-2013-Single-Pedestrian Detection Aided by Multi-pedestrian Detection
Author: Wanli Ouyang, Xiaogang Wang
Abstract: In this paper, we address the challenging problem of detecting pedestrians who appear in groups and have interaction. A new approach is proposed for single-pedestrian detection aided by multi-pedestrian detection. A mixture model of multi-pedestrian detectors is designed to capture the unique visual cues which are formed by nearby multiple pedestrians but cannot be captured by single-pedestrian detectors. A probabilistic framework is proposed to model the relationship between the configurations estimated by single- and multi-pedestrian detectors, and to refine the single-pedestrian detection result with multi-pedestrian detection. It can integrate with any single-pedestrian detector without significantly increasing the computation load. 15 state-of-the-art single-pedestrian detection approaches are investigated on three widely used public datasets: Caltech, TUD-Brussels andETH. Experimental results show that our framework significantly improves all these approaches. The average improvement is 9% on the Caltech-Test dataset, 11% on the TUD-Brussels dataset and 17% on the ETH dataset in terms of average miss rate. The lowest average miss rate is reduced from 48% to 43% on the Caltech-Test dataset, from 55% to 50% on the TUD-Brussels dataset and from 51% to 41% on the ETH dataset.
3 0.84284139 288 cvpr-2013-Modeling Mutual Visibility Relationship in Pedestrian Detection
Author: Wanli Ouyang, Xingyu Zeng, Xiaogang Wang
Abstract: Detecting pedestrians in cluttered scenes is a challenging problem in computer vision. The difficulty is added when several pedestrians overlap in images and occlude each other. We observe, however, that the occlusion/visibility statuses of overlapping pedestrians provide useful mutual relationship for visibility estimation - the visibility estimation of one pedestrian facilitates the visibility estimation of another. In this paper, we propose a mutual visibility deep model that jointly estimates the visibility statuses of overlapping pedestrians. The visibility relationship among pedestrians is learned from the deep model for recognizing co-existing pedestrians. Experimental results show that the mutual visibility deep model effectively improves the pedestrian detection results. Compared with existing image-based pedestrian detection approaches, our approach has the lowest average miss rate on the CaltechTrain dataset, the Caltech-Test dataset and the ETHdataset. Including mutual visibility leads to 4% −8% improvements on mluudlitnipglem ubteunaclh vmiasibrki ditayta lesaedtss.
4 0.76815259 167 cvpr-2013-Fast Multiple-Part Based Object Detection Using KD-Ferns
Author: Dan Levi, Shai Silberstein, Aharon Bar-Hillel
Abstract: In this work we present a new part-based object detection algorithm with hundreds of parts performing realtime detection. Part-based models are currently state-ofthe-art for object detection due to their ability to represent large appearance variations. However, due to their high computational demands such methods are limited to several parts only and are too slow for practical real-time implementation. Our algorithm is an accelerated version of the “Feature Synthesis ” (FS) method [1], which uses multiple object parts for detection and is among state-of-theart methods on human detection benchmarks, but also suffers from a high computational cost. The proposed Accelerated Feature Synthesis (AFS) uses several strategies for reducing the number of locations searched for each part. The first strategy uses a novel algorithm for approximate nearest neighbor search which we developed, termed “KDFerns ”, to compare each image location to only a subset of the model parts. Candidate part locations for a specific part are further reduced using spatial inhibition, and using an object-level “coarse-to-fine ” strategy. In our empirical evaluation on pedestrian detection benchmarks, AFS main- × tains almost fully the accuracy performance of the original FS, while running more than 4 faster than existing partbased methods which use only several parts. AFS is to our best knowledge the first part-based object detection method achieving real-time running performance: nearly 10 frames per-second on 640 480 images on a regular CPU.
5 0.76264244 318 cvpr-2013-Optimized Pedestrian Detection for Multiple and Occluded People
Author: Sitapa Rujikietgumjorn, Robert T. Collins
Abstract: We present a quadratic unconstrained binary optimization (QUBO) framework for reasoning about multiple object detections with spatial overlaps. The method maximizes an objective function composed of unary detection confidence scores andpairwise overlap constraints to determine which overlapping detections should be suppressed, and which should be kept. The framework is flexible enough to handle the problem of detecting objects as a shape covering of a foreground mask, and to handle the problem of filtering confidence weighted detections produced by a traditional sliding window object detector. In our experiments, we show that our method outperforms two existing state-ofthe-art pedestrian detectors.
6 0.7512151 383 cvpr-2013-Seeking the Strongest Rigid Detector
7 0.71868867 122 cvpr-2013-Detection Evolution with Multi-order Contextual Co-occurrence
8 0.63661784 328 cvpr-2013-Pedestrian Detection with Unsupervised Multi-stage Feature Learning
9 0.61302161 272 cvpr-2013-Long-Term Occupancy Analysis Using Graph-Based Optimisation in Thermal Imagery
10 0.54002208 217 cvpr-2013-Improving an Object Detector and Extracting Regions Using Superpixels
11 0.50040007 254 cvpr-2013-Learning SURF Cascade for Fast and Accurate Object Detection
12 0.48739621 142 cvpr-2013-Efficient Detector Adaptation for Object Detection in a Video
13 0.45077536 144 cvpr-2013-Efficient Maximum Appearance Search for Large-Scale Object Detection
14 0.44905388 158 cvpr-2013-Exploring Weak Stabilization for Motion Feature Extraction
15 0.43199793 388 cvpr-2013-Semi-supervised Learning of Feature Hierarchies for Object Detection in a Video
16 0.43139261 264 cvpr-2013-Learning to Detect Partially Overlapping Instances
17 0.42437878 204 cvpr-2013-Histograms of Sparse Codes for Object Detection
18 0.42155582 401 cvpr-2013-Sketch Tokens: A Learned Mid-level Representation for Contour and Object Detection
19 0.4177973 120 cvpr-2013-Detecting and Naming Actors in Movies Using Generative Appearance Models
20 0.4034439 15 cvpr-2013-A Lazy Man's Approach to Benchmarking: Semisupervised Classifier Evaluation and Recalibration
topicId topicWeight
[(10, 0.088), (16, 0.308), (26, 0.037), (28, 0.015), (33, 0.192), (67, 0.171), (69, 0.036), (80, 0.01), (87, 0.072)]
simIndex simValue paperId paperTitle
1 0.89029986 410 cvpr-2013-Specular Reflection Separation Using Dark Channel Prior
Author: Hyeongwoo Kim, Hailin Jin, Sunil Hadap, Inso Kweon
Abstract: We present a novel method to separate specular reflection from a single image. Separating an image into diffuse and specular components is an ill-posed problem due to lack of observations. Existing methods rely on a specularfree image to detect and estimate specularity, which however may confuse diffuse pixels with the same hue but a different saturation value as specular pixels. Our method is based on a novel observation that for most natural images the dark channel can provide an approximate specular-free image. We also propose a maximum a posteriori formulation which robustly recovers the specular reflection and chromaticity despite of the hue-saturation ambiguity. We demonstrate the effectiveness of the proposed algorithm on real and synthetic examples. Experimental results show that our method significantly outperforms the state-of-theart methods in separating specular reflection.
2 0.84554863 118 cvpr-2013-Detecting Pulse from Head Motions in Video
Author: Guha Balakrishnan, Fredo Durand, John Guttag
Abstract: We extract heart rate and beat lengths from videos by measuring subtle head motion caused by the Newtonian reaction to the influx of blood at each beat. Our method tracks features on the head and performs principal component analysis (PCA) to decompose their trajectories into a set of component motions. It then chooses the component that best corresponds to heartbeats based on its temporal frequency spectrum. Finally, we analyze the motion projected to this component and identify peaks of the trajectories, which correspond to heartbeats. When evaluated on 18 subjects, our approach reported heart rates nearly identical to an electrocardiogram device. Additionally we were able to capture clinically relevant information about heart rate variability.
same-paper 3 0.83674157 363 cvpr-2013-Robust Multi-resolution Pedestrian Detection in Traffic Scenes
Author: Junjie Yan, Xucong Zhang, Zhen Lei, Shengcai Liao, Stan Z. Li
Abstract: The serious performance decline with decreasing resolution is the major bottleneck for current pedestrian detection techniques [14, 23]. In this paper, we take pedestrian detection in different resolutions as different but related problems, and propose a Multi-Task model to jointly consider their commonness and differences. The model contains resolution aware transformations to map pedestrians in different resolutions to a common space, where a shared detector is constructed to distinguish pedestrians from background. For model learning, we present a coordinate descent procedure to learn the resolution aware transformations and deformable part model (DPM) based detector iteratively. In traffic scenes, there are many false positives located around vehicles, therefore, we further build a context model to suppress them according to the pedestrian-vehicle relationship. The context model can be learned automatically even when the vehicle annotations are not available. Our method reduces the mean miss rate to 60% for pedestrians taller than 30 pixels on the Caltech Pedestrian Benchmark, which noticeably outperforms previous state-of-the-art (71%).
4 0.79295343 27 cvpr-2013-A Theory of Refractive Photo-Light-Path Triangulation
Author: Visesh Chari, Peter Sturm
Abstract: 3D reconstruction of transparent refractive objects like a plastic bottle is challenging: they lack appearance related visual cues and merely reflect and refract light from the surrounding environment. Amongst several approaches to reconstruct such objects, the seminal work of Light-Path triangulation [17] is highly popular because of its general applicability and analysis of minimal scenarios. A lightpath is defined as the piece-wise linear path taken by a ray of light as it passes from source, through the object and into the camera. Transparent refractive objects not only affect the geometric configuration of light-paths but also their radiometric properties. In this paper, we describe a method that combines both geometric and radiometric information to do reconstruction. We show two major consequences of the addition of radiometric cues to the light-path setup. Firstly, we extend the case of scenarios in which reconstruction is plausible while reducing the minimal re- quirements for a unique reconstruction. This happens as a consequence of the fact that radiometric cues add an additional known variable to the already existing system of equations. Secondly, we present a simple algorithm for reconstruction, owing to the nature of the radiometric cue. We present several synthetic experiments to validate our theories, and show high quality reconstructions in challenging scenarios.
5 0.77730262 271 cvpr-2013-Locally Aligned Feature Transforms across Views
Author: Wei Li, Xiaogang Wang
Abstract: In this paper, we propose a new approach for matching images observed in different camera views with complex cross-view transforms and apply it to person reidentification. It jointly partitions the image spaces of two camera views into different configurations according to the similarity of cross-view transforms. The visual features of an image pair from different views are first locally aligned by being projected to a common feature space and then matched with softly assigned metrics which are locally optimized. The features optimal for recognizing identities are different from those for clustering cross-view transforms. They are jointly learned by utilizing sparsityinducing norm and information theoretical regularization. . cuhk . edu .hk (a) Camera view A (b) Camera view B This approach can be generalized to the settings where test images are from new camera views, not the same as those in the training set. Extensive experiments are conducted on public datasets and our own dataset. Comparisons with the state-of-the-art metric learning and person re-identification methods show the superior performance of our approach.
6 0.77421123 138 cvpr-2013-Efficient 2D-to-3D Correspondence Filtering for Scalable 3D Object Recognition
7 0.7688157 224 cvpr-2013-Information Consensus for Distributed Multi-target Tracking
8 0.73272026 403 cvpr-2013-Sparse Output Coding for Large-Scale Visual Recognition
10 0.68952012 103 cvpr-2013-Decoding Children's Social Behavior
11 0.68076664 454 cvpr-2013-Video Enhancement of People Wearing Polarized Glasses: Darkening Reversal and Reflection Reduction
12 0.67658913 398 cvpr-2013-Single-Pedestrian Detection Aided by Multi-pedestrian Detection
13 0.67609084 349 cvpr-2013-Reconstructing Gas Flows Using Light-Path Approximation
14 0.67580223 45 cvpr-2013-Articulated Pose Estimation Using Discriminative Armlet Classifiers
15 0.67433381 2 cvpr-2013-3D Pictorial Structures for Multiple View Articulated Pose Estimation
16 0.66932607 160 cvpr-2013-Face Recognition in Movie Trailers via Mean Sequence Sparse Representation-Based Classification
17 0.66636163 288 cvpr-2013-Modeling Mutual Visibility Relationship in Pedestrian Detection
18 0.66577196 54 cvpr-2013-BRDF Slices: Accurate Adaptive Anisotropic Appearance Acquisition
19 0.66522378 322 cvpr-2013-PISA: Pixelwise Image Saliency by Aggregating Complementary Appearance Contrast Measures with Spatial Priors
20 0.66494858 345 cvpr-2013-Real-Time Model-Based Rigid Object Pose Estimation and Tracking Combining Dense and Sparse Visual Cues