cvpr cvpr2013 cvpr2013-122 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Guang Chen, Yuanyuan Ding, Jing Xiao, Tony X. Han
Abstract: Context has been playing an increasingly important role to improve the object detection performance. In this paper we propose an effective representation, Multi-Order Contextual co-Occurrence (MOCO), to implicitly model the high level context using solely detection responses from a baseline object detector. The so-called (1st-order) context feature is computed as a set of randomized binary comparisons on the response map of the baseline object detector. The statistics of the 1st-order binary context features are further calculated to construct a high order co-occurrence descriptor. Combining the MOCO feature with the original image feature, we can evolve the baseline object detector to a stronger context aware detector. With the updated detector, we can continue the evolution till the contextual improvements saturate. Using the successful deformable-partmodel detector [13] as the baseline detector, we test the proposed MOCO evolution framework on the PASCAL VOC 2007 dataset [8] and Caltech pedestrian dataset [7]: The proposed MOCO detector outperforms all known state-ofthe-art approaches, contextually boosting deformable part models (ver.5) [13] by 3.3% in mean average precision on the PASCAL 2007 dataset. For the Caltech pedestrian dataset, our method further reduces the log-average miss rate from 48% to 46% and the miss rate at 1 FPPI from 25% to 23%, compared with the best prior art [6].
Reference: text
sentIndex sentText sentNum sentScore
1 In this paper we propose an effective representation, Multi-Order Contextual co-Occurrence (MOCO), to implicitly model the high level context using solely detection responses from a baseline object detector. [sent-11, score-0.582]
2 The so-called (1st-order) context feature is computed as a set of randomized binary comparisons on the response map of the baseline object detector. [sent-12, score-0.683]
3 The statistics of the 1st-order binary context features are further calculated to construct a high order co-occurrence descriptor. [sent-13, score-0.402]
4 Combining the MOCO feature with the original image feature, we can evolve the baseline object detector to a stronger context aware detector. [sent-14, score-0.597]
5 With the updated detector, we can continue the evolution till the contextual improvements saturate. [sent-15, score-0.532]
6 For the Caltech pedestrian dataset, our method further reduces the log-average miss rate from 48% to 46% and the miss rate at 1 FPPI from 25% to 23%, compared with the best prior art [6]. [sent-19, score-0.33]
7 The framework evolves the detector using high-order context till the convergence. [sent-24, score-0.516]
8 At each iteration, response map and 0th-order context is computed using the initial baseline detector (for the 1st iteration) or the evolved detector from the prior iteration (for later iterations). [sent-25, score-0.957]
9 Then the 0th-order context is used for computing the 1st-order context, upon which high order co-occurrence descriptors are computed. [sent-26, score-0.335]
10 Finally context in all orders are combined to train a evolving detector. [sent-27, score-0.408]
11 The evolution eliminates many false positives using implicit contextual information, and fortifies the true detections. [sent-29, score-0.495]
12 Such methods extract image features in each scan window and classify the features to determine the confidence of the presence of the target object [25, 32, 16]. [sent-31, score-0.274]
13 Naturally, to improve detection accuracy, context in the neighborhood of each scan window can provide rich information and should be explored. [sent-34, score-0.513]
14 For example, a scanning window in a pathway region is more likely to be a true detection of human than the one inside a water region. [sent-35, score-0.243]
15 In fact, there have been some efforts on utilizing contextual information for object detection and a variety of valuable approaches have been proposed [14, 27, 28]. [sent-36, score-0.424]
16 High level image contexts such as semantic context [4], image statistics [27], and 3D geometric context [15], are used as well as low level image contexts, including local pixel context [5] and shape context [23]. [sent-37, score-1.168]
17 Besides utilizing context information from the original image directly, another line of works including Spatial Boost [1], Auto-Context [29], and the extensions ele- gantly integrate the classifier responses from nearby background pixels to help determine the target pixels of interest. [sent-38, score-0.443]
18 Inspired by these prior arts, Contextual Boost [6] was proposed to extract multi-scale contextual cues from the detector response map to boost the detection performance. [sent-40, score-0.872]
19 In this paper we aim at developing an effective and generic approach to utilize contextual information without resorting to the multiple object detectors. [sent-44, score-0.331]
20 The rationale is that, even though there is only one classifier/detector, higher order contextual information such as the co-occurrence of objects of different categories can still be implicitly and effectively used by carefully organizing the responses from a single object detector. [sent-45, score-0.541]
21 However, the difference among the responses of the single classifier on different object regions implicitly conveys such contex- tual information. [sent-47, score-0.228]
22 The responses of a pedestrian detector to various object regions such as the sky, streets, and trees, may vary greatly, but a homogeneous region of the response map corresponds to a region with semantic similarity. [sent-50, score-0.635]
23 This reasoning hints a possibility to encode higher order contextual information with single object detection response. [sent-53, score-0.447]
24 Therefore, if we treat the single classifier response map as an “image”, we can extract descriptors to represent high order contextual information. [sent-54, score-0.618]
25 Our multi-order context representation is inspired by the recent success of randomized binary image descriptors [22, 3, 24]. [sent-55, score-0.409]
26 First we propose a series of binary features where each bit encodes the relationship of classification response values for a pair of pixels. [sent-56, score-0.241]
27 The difference of detector responses at different pixels implicitly captures the contextual co-occurrence patterns pertinent to detection improvements. [sent-57, score-0.666]
28 Accordingly we further propose a novel high order contextual descriptor based on the binary pattern of comparisons. [sent-59, score-0.447]
29 Our high order contextual descriptor captures the co-occurrence of binary contextual features based on their statistics in the local neighborhood. [sent-60, score-0.792]
30 The context features at all different orders are complementary to each other and are therefore combined together to form a multi-order context representation. [sent-61, score-0.671]
31 Finally the proposed multi-order context representations are integrated into an iterative classification framework, where the classifier response map from the previous iteration is further explored to supply more contextual constraints for the current iteration. [sent-62, score-0.88]
32 This process is a straightforward extension of our contextual boost algorithm in [6]. [sent-63, score-0.395]
33 Similar to [6], since the multi-order contextual feature encodes the contextual relationships between neighborhood image regions, through iterations it naturally evolves to cover greater neighborhoods and incorporates more global contextual information into the classification process. [sent-64, score-1.024]
34 As a result our framework effectively enables the detector evolving to be stronger across iterations. [sent-65, score-0.263]
35 On the Caltech dataset [7], compared with the best prior art achieved by contextual boost [6], our method further reduces the logaverage miss rate from 48% to 46% and the miss rate at 1 FPPI from 25% to 23%. [sent-72, score-0.627]
36 (2) summarizes the flow chart for constructing the multi-order context representation from an image. [sent-75, score-0.271]
37 The detection response maps for each scale are smoothed as in Sec. [sent-78, score-0.291]
38 111777999977 (0th-order) and its position (red solid area) in the smoothed detection responses map. [sent-81, score-0.237]
39 Finally we compute the binary comparison based context features, upon which we further extract high order co-occurrence descriptor detailed in Sec. [sent-85, score-0.455]
40 We define the context region in terms of spatial and scale for each candidate location. [sent-89, score-0.294]
41 We then compute a series of binary features using randomized comparison of detector responses within the context region, as detailed in Sec. [sent-90, score-0.648]
42 Context Basis (0th-order) Intuitively, the appearance of the original image patch containing the neighborhood of target objects provides important contextual cues. [sent-99, score-0.385]
43 However it is difficult to model this kind of context in original image because the neighborhood around target objects may vary dramatically in differ- . [sent-100, score-0.362]
44 A logical approach to this problem is: firstly convolve the original image with a particular filter to reduce the diversity of the neighborhood of a true target object as foreground with various backgrounds; then extract context feature from the filtered image. [sent-102, score-0.468]
45 For object detection tasks, we prefer such a filter to be detector driven. [sent-103, score-0.318]
46 (1) that the positive responses cluster densely around humans but occur sparsely in the background, we simply take the object detector as this specific filter and directly extract context information from the classification response map, denoted as M. [sent-105, score-0.783]
47 Such a response image thus conveys context information, which we denote as 0th-order context. [sent-121, score-0.462]
48 The context structure Ω(P˙) around in the spatial and scale space is defined as: P˙; P˙, P˙, P˙ Ω(P˙;W,H,L) =? [sent-128, score-0.271]
49 For example, (1, 1, 1) means the context structure is a 3 F3 ×r e3x caumbpicle region. [sent-137, score-0.271]
50 Binary Pattern of Comparisons (1st-order) Given the 0th-order context structure, we propose to use comparison based binary features to incorporate the cooccurrence of different objects. [sent-140, score-0.386]
51 Although we only have a single object detector, the response values at different locations indicate the confidences of the target object existing. [sent-141, score-0.302]
52 Therefore, each binary comparison encodes the contextual information of whether one location is more likely to contain the target object than the other. [sent-142, score-0.433]
53 1 Comparison of Response Values Specifically, we define the binary comparison τ in the 0th- ×× order context structure Ω(P˙) of size W H τ(s;a,b) :=? [sent-145, score-0.351]
54 Complementarily they provide rich context cues and are combined into the MultiOrder Contextual co-Occurrence (MOCO) descriptor, fc = [fn, fp] . [sent-168, score-0.321]
55 Detection Evolution To effectively use the MOCO descriptor for object detection, we propose an iterative framework that allows the detector to evolve and achieve better accuracy. [sent-170, score-0.348]
56 Such a concept of detection “evolution” had been successfully used for pedestrian detection in Contextual Boost [6]. [sent-171, score-0.284]
57 In this paper, we straightforwardly extend the MOCO based evolution framework to integrate with deformable-part models [10, 13] for general object detection tasks. [sent-172, score-0.331]
58 Feature Selection Our detector uses the MOCO descriptor together with the non-context image features extracted in each scan window in the final classification process. [sent-175, score-0.355]
59 General Evolution Algorithm The iterative process ofthe detector evolution framework is similar to Contextual Boost [6]. [sent-187, score-0.351]
60 Given an initial baseline detector, the iteration procedure for training a new evolving detector is as follows. [sent-188, score-0.312]
61 First, the baseline detector is used to calculate the response maps. [sent-189, score-0.357]
62 Finally, the selected features are fed into a general classification algorithm to construct a new detector, which will serve as the new baseline detector for the next iteration. [sent-193, score-0.231]
63 As our MOCO is defined in a context region, the iteration will automatically propagate context cues to larger and larger regions. [sent-194, score-0.592]
64 As a result, more and more context will be incorporated through the iterations, and the evolved detectors can yield better performance. [sent-195, score-0.402]
65 In the testing stage, the same evolution procedure is applied using the learned detectors respectively. [sent-197, score-0.251]
66 1 where sr is the detection score of the root filter, spi and di respectively represent the detection score and deformation cost of the i-th part filter, and Np is the number of part filters. [sent-205, score-0.358]
67 From the viewpoint of context, the deformable-partmodel essentially exploits the intra context inside the object region, e. [sent-207, score-0.357]
68 Therefore it exploits the inter context around the object region. [sent-211, score-0.333]
69 Clearly these two kinds of context are exclusive and complementary to each other. [sent-212, score-0.294]
70 This encourages us to combine them together to provide more comprehensive contextual constraints. [sent-213, score-0.294]
71 (6) consists of both the final detection response sf and the detection responses spi from the Np part filters. [sent-215, score-0.577]
72 Since each response s corresponds to a response map, we calculate the MOCO descriptors using each of the response maps. [sent-216, score-0.506]
73 Furthermore, to effectively evolve the baseline deformablepart-model detector using the calculated MOCO, we apply the iterative framework not only on the root filter but also on part filters and detectors for every component. [sent-219, score-0.482]
74 Then we use the latent-SVM to fuse the Nc components and retrain an evolved detector for the next iteration. [sent-228, score-0.231]
75 Experiments and Discussion We have conducted extensive experiments to evaluate the proposed MOCO and the detection evolution framework. [sent-232, score-0.294]
76 First, to demonstrate the advantage of the MOCO, we compare the performance achieved by using different orders of context information. [sent-235, score-0.322]
77 Second, we compare the performance at different iterations as the detector evolves to show that the detectors quickly converge in about 2∼3 iterations. [sent-243, score-0.296]
78 Two important parameters that directly affect the computation of context descriptors are the size of Ωp and the number n of binary comparisons. [sent-255, score-0.369]
79 Since Figure 4: Mean AP (mAP) Varies for Different Parameters: the size W H L of context structure Ω(P˙) and the number n of binary com- ×× parison te ×sts L. [sent-256, score-0.328]
80 Only 1st-order context features and the image features is used for evaluation. [sent-259, score-0.329]
81 2, we choose type iii of Gaussian sampling for constructing the 1st-order context descriptor. [sent-273, score-0.271]
82 The most important parameter for computing high order context descriptor is the dimension m of the histogram. [sent-283, score-0.367]
83 Since the high order context descriptor fp is complementary to the 1st-order context feature fn, they are combined when evaluating the detection performance. [sent-284, score-0.823]
84 The high order context descriptor together with 1st-order context feature and the image features are used. [sent-292, score-0.667]
85 7F L4B5P Table 2: Mean AP (mAP) varies with the combination of different order context feature, where 0th , 1st , H respectively refers to 0th, 1st and high order descriptors. [sent-300, score-0.393]
86 We also compared with SURF [2] or LBP [33] extracted on each level of context structure Ω(P˙). [sent-301, score-0.271]
87 7 Table 3 : Mean AP (mAP) varies with respect to the proposed detection evolution algorithm, where 0-iteration in the left refers to the baseline without detection evolution. [sent-309, score-0.515]
88 To show that different orders of context provide complimentary constraints for object detection, we compared the detection accuracy using different combinations of the multi-order context descriptors. [sent-312, score-0.723]
89 (2), clearly the MOCO descriptor that combines all orders of context achieves the best detection performance. [sent-315, score-0.488]
90 Another way of exploring the 1st-order context is to extract the gradientbased features such as SURF [2] or LBP [33] directly on each scale of the context structure Ω(P˙). [sent-317, score-0.602]
91 This means that the context across larger spatial neighborhood or different scales can be more effective than the context conveyed by local gradients between adjacent positions. [sent-320, score-0.588]
92 Detector Evolution Using the best parameters for the MOCO descriptor ob- tained using the “train” and “val” datasets, we evaluate the detector evolution process across iterations. [sent-323, score-0.424]
93 To better show the trend of the detector evolution process, we keep it running for 6 iterations. [sent-330, score-0.351]
94 (6): the best reported log-average miss rate is 48% [6], while our algorithm further lowers the miss rate to 46%. [sent-353, score-0.232]
95 Processing Speed Our detection evolution framework needs to evaluate each test image Nd times, where Nd is the number of evolved detectors. [sent-357, score-0.375]
96 Conclusion In this paper we have proposed a novel multi-order context representation that effectively exploits co-occurrence contexts of different objects, denoted as MOCO, even though we only use detectors for a single object. [sent-387, score-0.435]
97 We preprocess the detector response map and extract the 1st-order context features based on randomized binary comparison and further develop a high order co-occurrence descriptor based on the 1st-order context. [sent-388, score-0.877]
98 Furthermore, we have proposed to combine our multi-order context representation with the recently proposed deformable part models [ 13] to supply a comprehensive coverage over both inter-contexts among objects and inner-context inside the target object region. [sent-390, score-0.486]
99 As the future work, we plan to further extend our MOCO to temporal context from videos and contexts from multiple object detectors or multi-class problems. [sent-392, score-0.42]
100 Proximity distribution kernels for geometric context in category recognition. [sent-528, score-0.271]
wordName wordTfidf (topN-words)
[('moco', 0.677), ('contextual', 0.294), ('context', 0.271), ('evolution', 0.201), ('response', 0.155), ('detector', 0.15), ('boost', 0.101), ('responses', 0.101), ('pedestrian', 0.098), ('detection', 0.093), ('caltech', 0.085), ('evolved', 0.081), ('miss', 0.08), ('spi', 0.078), ('pascal', 0.076), ('descriptor', 0.073), ('strain', 0.067), ('contexts', 0.062), ('evolve', 0.061), ('evolving', 0.06), ('scanning', 0.059), ('scan', 0.059), ('evolves', 0.058), ('binary', 0.057), ('arts', 0.056), ('baseline', 0.052), ('ap', 0.052), ('orders', 0.051), ('iteration', 0.05), ('epson', 0.05), ('fhog', 0.05), ('gkl', 0.05), ('voc', 0.05), ('detectors', 0.05), ('fppi', 0.048), ('map', 0.048), ('deformable', 0.048), ('varies', 0.046), ('neighborhood', 0.046), ('target', 0.045), ('window', 0.044), ('root', 0.044), ('fp', 0.043), ('smoothed', 0.043), ('descriptors', 0.041), ('randomized', 0.04), ('np', 0.039), ('filter', 0.038), ('iterations', 0.038), ('till', 0.037), ('object', 0.037), ('stops', 0.036), ('rate', 0.036), ('surf', 0.036), ('conveys', 0.036), ('supply', 0.036), ('filters', 0.035), ('calonder', 0.033), ('jose', 0.033), ('lbp', 0.033), ('sf', 0.032), ('viola', 0.032), ('fn', 0.032), ('extract', 0.031), ('categories', 0.031), ('refers', 0.03), ('accordingly', 0.029), ('val', 0.029), ('bootstrapping', 0.029), ('features', 0.029), ('cooccurrence', 0.029), ('hm', 0.029), ('implicitly', 0.028), ('hj', 0.028), ('confidences', 0.028), ('lepetit', 0.027), ('performances', 0.027), ('effectively', 0.027), ('classifier', 0.026), ('stronger', 0.026), ('boosting', 0.026), ('combined', 0.026), ('part', 0.025), ('exploits', 0.025), ('converges', 0.025), ('varma', 0.025), ('jones', 0.025), ('ding', 0.025), ('inside', 0.024), ('fc', 0.024), ('confirms', 0.024), ('order', 0.023), ('windows', 0.023), ('region', 0.023), ('san', 0.023), ('comparisons', 0.023), ('complementary', 0.023), ('statistics', 0.022), ('rvoamlue', 0.022), ('thrown', 0.022)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 122 cvpr-2013-Detection Evolution with Multi-order Contextual Co-occurrence
Author: Guang Chen, Yuanyuan Ding, Jing Xiao, Tony X. Han
Abstract: Context has been playing an increasingly important role to improve the object detection performance. In this paper we propose an effective representation, Multi-Order Contextual co-Occurrence (MOCO), to implicitly model the high level context using solely detection responses from a baseline object detector. The so-called (1st-order) context feature is computed as a set of randomized binary comparisons on the response map of the baseline object detector. The statistics of the 1st-order binary context features are further calculated to construct a high order co-occurrence descriptor. Combining the MOCO feature with the original image feature, we can evolve the baseline object detector to a stronger context aware detector. With the updated detector, we can continue the evolution till the contextual improvements saturate. Using the successful deformable-partmodel detector [13] as the baseline detector, we test the proposed MOCO evolution framework on the PASCAL VOC 2007 dataset [8] and Caltech pedestrian dataset [7]: The proposed MOCO detector outperforms all known state-ofthe-art approaches, contextually boosting deformable part models (ver.5) [13] by 3.3% in mean average precision on the PASCAL 2007 dataset. For the Caltech pedestrian dataset, our method further reduces the log-average miss rate from 48% to 46% and the miss rate at 1 FPPI from 25% to 23%, compared with the best prior art [6].
2 0.20923308 398 cvpr-2013-Single-Pedestrian Detection Aided by Multi-pedestrian Detection
Author: Wanli Ouyang, Xiaogang Wang
Abstract: In this paper, we address the challenging problem of detecting pedestrians who appear in groups and have interaction. A new approach is proposed for single-pedestrian detection aided by multi-pedestrian detection. A mixture model of multi-pedestrian detectors is designed to capture the unique visual cues which are formed by nearby multiple pedestrians but cannot be captured by single-pedestrian detectors. A probabilistic framework is proposed to model the relationship between the configurations estimated by single- and multi-pedestrian detectors, and to refine the single-pedestrian detection result with multi-pedestrian detection. It can integrate with any single-pedestrian detector without significantly increasing the computation load. 15 state-of-the-art single-pedestrian detection approaches are investigated on three widely used public datasets: Caltech, TUD-Brussels andETH. Experimental results show that our framework significantly improves all these approaches. The average improvement is 9% on the Caltech-Test dataset, 11% on the TUD-Brussels dataset and 17% on the ETH dataset in terms of average miss rate. The lowest average miss rate is reduced from 48% to 43% on the Caltech-Test dataset, from 55% to 50% on the TUD-Brussels dataset and from 51% to 41% on the ETH dataset.
3 0.16913092 363 cvpr-2013-Robust Multi-resolution Pedestrian Detection in Traffic Scenes
Author: Junjie Yan, Xucong Zhang, Zhen Lei, Shengcai Liao, Stan Z. Li
Abstract: The serious performance decline with decreasing resolution is the major bottleneck for current pedestrian detection techniques [14, 23]. In this paper, we take pedestrian detection in different resolutions as different but related problems, and propose a Multi-Task model to jointly consider their commonness and differences. The model contains resolution aware transformations to map pedestrians in different resolutions to a common space, where a shared detector is constructed to distinguish pedestrians from background. For model learning, we present a coordinate descent procedure to learn the resolution aware transformations and deformable part model (DPM) based detector iteratively. In traffic scenes, there are many false positives located around vehicles, therefore, we further build a context model to suppress them according to the pedestrian-vehicle relationship. The context model can be learned automatically even when the vehicle annotations are not available. Our method reduces the mean miss rate to 60% for pedestrians taller than 30 pixels on the Caltech Pedestrian Benchmark, which noticeably outperforms previous state-of-the-art (71%).
4 0.11671659 142 cvpr-2013-Efficient Detector Adaptation for Object Detection in a Video
Author: Pramod Sharma, Ram Nevatia
Abstract: In this work, we present a novel and efficient detector adaptation method which improves the performance of an offline trained classifier (baseline classifier) by adapting it to new test datasets. We address two critical aspects of adaptation methods: generalizability and computational efficiency. We propose an adaptation method, which can be applied to various baseline classifiers and is computationally efficient also. For a given test video, we collect online samples in an unsupervised manner and train a randomfern adaptive classifier . The adaptive classifier improves precision of the baseline classifier by validating the obtained detection responses from baseline classifier as correct detections or false alarms. Experiments demonstrate generalizability, computational efficiency and effectiveness of our method, as we compare our method with state of the art approaches for the problem of human detection and show good performance with high computational efficiency on two different baseline classifiers.
5 0.11332169 163 cvpr-2013-Fast, Accurate Detection of 100,000 Object Classes on a Single Machine
Author: Thomas Dean, Mark A. Ruzon, Mark Segal, Jonathon Shlens, Sudheendra Vijayanarasimhan, Jay Yagnik
Abstract: Many object detection systems are constrained by the time required to convolve a target image with a bank of filters that code for different aspects of an object’s appearance, such as the presence of component parts. We exploit locality-sensitive hashing to replace the dot-product kernel operator in the convolution with a fixed number of hash-table probes that effectively sample all of the filter responses in time independent of the size of the filter bank. To show the effectiveness of the technique, we apply it to evaluate 100,000 deformable-part models requiring over a million (part) filters on multiple scales of a target image in less than 20 seconds using a single multi-core processor with 20GB of RAM. This represents a speed-up of approximately 20,000 times— four orders of magnitude— when compared withperforming the convolutions explicitly on the same hardware. While mean average precision over the full set of 100,000 object classes is around 0.16 due in large part to the challenges in gathering training data and collecting ground truth for so many classes, we achieve a mAP of at least 0.20 on a third of the classes and 0.30 or better on about 20% of the classes.
6 0.11166264 217 cvpr-2013-Improving an Object Detector and Extracting Regions Using Superpixels
7 0.10865331 256 cvpr-2013-Learning Structured Hough Voting for Joint Object Detection and Occlusion Reasoning
8 0.10409168 248 cvpr-2013-Learning Collections of Part Models for Object Recognition
9 0.10275661 388 cvpr-2013-Semi-supervised Learning of Feature Hierarchies for Object Detection in a Video
10 0.10135596 158 cvpr-2013-Exploring Weak Stabilization for Motion Feature Extraction
11 0.097009644 144 cvpr-2013-Efficient Maximum Appearance Search for Large-Scale Object Detection
12 0.096726149 167 cvpr-2013-Fast Multiple-Part Based Object Detection Using KD-Ferns
13 0.095976189 318 cvpr-2013-Optimized Pedestrian Detection for Multiple and Occluded People
14 0.09462814 221 cvpr-2013-Incorporating Structural Alternatives and Sharing into Hierarchy for Multiclass Object Recognition and Detection
15 0.093862481 254 cvpr-2013-Learning SURF Cascade for Fast and Accurate Object Detection
16 0.090643153 43 cvpr-2013-Analyzing Semantic Segmentation Using Hybrid Human-Machine CRFs
17 0.089946195 70 cvpr-2013-Bottom-Up Segmentation for Top-Down Detection
18 0.088822849 328 cvpr-2013-Pedestrian Detection with Unsupervised Multi-stage Feature Learning
19 0.088775776 383 cvpr-2013-Seeking the Strongest Rigid Detector
20 0.083858624 247 cvpr-2013-Learning Class-to-Image Distance with Object Matchings
topicId topicWeight
[(0, 0.195), (1, -0.06), (2, 0.021), (3, -0.049), (4, 0.055), (5, 0.031), (6, 0.07), (7, 0.049), (8, -0.015), (9, -0.026), (10, -0.086), (11, -0.093), (12, 0.1), (13, -0.128), (14, 0.054), (15, -0.034), (16, -0.043), (17, 0.017), (18, 0.001), (19, 0.06), (20, -0.009), (21, -0.036), (22, -0.048), (23, 0.087), (24, -0.018), (25, -0.017), (26, 0.008), (27, 0.045), (28, 0.01), (29, 0.012), (30, -0.045), (31, -0.01), (32, -0.014), (33, -0.068), (34, -0.022), (35, -0.046), (36, -0.032), (37, 0.028), (38, -0.042), (39, 0.056), (40, 0.008), (41, -0.038), (42, -0.019), (43, -0.028), (44, 0.065), (45, -0.009), (46, 0.014), (47, 0.038), (48, 0.033), (49, -0.006)]
simIndex simValue paperId paperTitle
same-paper 1 0.94712359 122 cvpr-2013-Detection Evolution with Multi-order Contextual Co-occurrence
Author: Guang Chen, Yuanyuan Ding, Jing Xiao, Tony X. Han
Abstract: Context has been playing an increasingly important role to improve the object detection performance. In this paper we propose an effective representation, Multi-Order Contextual co-Occurrence (MOCO), to implicitly model the high level context using solely detection responses from a baseline object detector. The so-called (1st-order) context feature is computed as a set of randomized binary comparisons on the response map of the baseline object detector. The statistics of the 1st-order binary context features are further calculated to construct a high order co-occurrence descriptor. Combining the MOCO feature with the original image feature, we can evolve the baseline object detector to a stronger context aware detector. With the updated detector, we can continue the evolution till the contextual improvements saturate. Using the successful deformable-partmodel detector [13] as the baseline detector, we test the proposed MOCO evolution framework on the PASCAL VOC 2007 dataset [8] and Caltech pedestrian dataset [7]: The proposed MOCO detector outperforms all known state-ofthe-art approaches, contextually boosting deformable part models (ver.5) [13] by 3.3% in mean average precision on the PASCAL 2007 dataset. For the Caltech pedestrian dataset, our method further reduces the log-average miss rate from 48% to 46% and the miss rate at 1 FPPI from 25% to 23%, compared with the best prior art [6].
2 0.89671981 363 cvpr-2013-Robust Multi-resolution Pedestrian Detection in Traffic Scenes
Author: Junjie Yan, Xucong Zhang, Zhen Lei, Shengcai Liao, Stan Z. Li
Abstract: The serious performance decline with decreasing resolution is the major bottleneck for current pedestrian detection techniques [14, 23]. In this paper, we take pedestrian detection in different resolutions as different but related problems, and propose a Multi-Task model to jointly consider their commonness and differences. The model contains resolution aware transformations to map pedestrians in different resolutions to a common space, where a shared detector is constructed to distinguish pedestrians from background. For model learning, we present a coordinate descent procedure to learn the resolution aware transformations and deformable part model (DPM) based detector iteratively. In traffic scenes, there are many false positives located around vehicles, therefore, we further build a context model to suppress them according to the pedestrian-vehicle relationship. The context model can be learned automatically even when the vehicle annotations are not available. Our method reduces the mean miss rate to 60% for pedestrians taller than 30 pixels on the Caltech Pedestrian Benchmark, which noticeably outperforms previous state-of-the-art (71%).
3 0.89374465 398 cvpr-2013-Single-Pedestrian Detection Aided by Multi-pedestrian Detection
Author: Wanli Ouyang, Xiaogang Wang
Abstract: In this paper, we address the challenging problem of detecting pedestrians who appear in groups and have interaction. A new approach is proposed for single-pedestrian detection aided by multi-pedestrian detection. A mixture model of multi-pedestrian detectors is designed to capture the unique visual cues which are formed by nearby multiple pedestrians but cannot be captured by single-pedestrian detectors. A probabilistic framework is proposed to model the relationship between the configurations estimated by single- and multi-pedestrian detectors, and to refine the single-pedestrian detection result with multi-pedestrian detection. It can integrate with any single-pedestrian detector without significantly increasing the computation load. 15 state-of-the-art single-pedestrian detection approaches are investigated on three widely used public datasets: Caltech, TUD-Brussels andETH. Experimental results show that our framework significantly improves all these approaches. The average improvement is 9% on the Caltech-Test dataset, 11% on the TUD-Brussels dataset and 17% on the ETH dataset in terms of average miss rate. The lowest average miss rate is reduced from 48% to 43% on the Caltech-Test dataset, from 55% to 50% on the TUD-Brussels dataset and from 51% to 41% on the ETH dataset.
4 0.87759376 383 cvpr-2013-Seeking the Strongest Rigid Detector
Author: Rodrigo Benenson, Markus Mathias, Tinne Tuytelaars, Luc Van_Gool
Abstract: The current state of the art solutions for object detection describe each class by a set of models trained on discovered sub-classes (so called “components ”), with each model itself composed of collections of interrelated parts (deformable models). These detectors build upon the now classic Histogram of Oriented Gradients+linear SVM combo. In this paper we revisit some of the core assumptions in HOG+SVM and show that by properly designing the feature pooling, feature selection, preprocessing, and training methods, it is possible to reach top quality, at least for pedestrian detections, using a single rigid component. We provide experiments for a large design space, that give insights into the design of classifiers, as well as relevant information for practitioners. Our best detector is fully feed-forward, has a single unified architecture, uses only histograms of oriented gradients and colour information in monocular static images, and improves over 23 other methods on the INRIA, ETHand Caltech-USA datasets, reducing the average miss-rate over HOG+SVM by more than 30%.
5 0.82828367 167 cvpr-2013-Fast Multiple-Part Based Object Detection Using KD-Ferns
Author: Dan Levi, Shai Silberstein, Aharon Bar-Hillel
Abstract: In this work we present a new part-based object detection algorithm with hundreds of parts performing realtime detection. Part-based models are currently state-ofthe-art for object detection due to their ability to represent large appearance variations. However, due to their high computational demands such methods are limited to several parts only and are too slow for practical real-time implementation. Our algorithm is an accelerated version of the “Feature Synthesis ” (FS) method [1], which uses multiple object parts for detection and is among state-of-theart methods on human detection benchmarks, but also suffers from a high computational cost. The proposed Accelerated Feature Synthesis (AFS) uses several strategies for reducing the number of locations searched for each part. The first strategy uses a novel algorithm for approximate nearest neighbor search which we developed, termed “KDFerns ”, to compare each image location to only a subset of the model parts. Candidate part locations for a specific part are further reduced using spatial inhibition, and using an object-level “coarse-to-fine ” strategy. In our empirical evaluation on pedestrian detection benchmarks, AFS main- × tains almost fully the accuracy performance of the original FS, while running more than 4 faster than existing partbased methods which use only several parts. AFS is to our best knowledge the first part-based object detection method achieving real-time running performance: nearly 10 frames per-second on 640 480 images on a regular CPU.
6 0.75771564 318 cvpr-2013-Optimized Pedestrian Detection for Multiple and Occluded People
7 0.74146563 288 cvpr-2013-Modeling Mutual Visibility Relationship in Pedestrian Detection
8 0.72885144 328 cvpr-2013-Pedestrian Detection with Unsupervised Multi-stage Feature Learning
9 0.70236242 144 cvpr-2013-Efficient Maximum Appearance Search for Large-Scale Object Detection
10 0.69920301 217 cvpr-2013-Improving an Object Detector and Extracting Regions Using Superpixels
11 0.67436743 254 cvpr-2013-Learning SURF Cascade for Fast and Accurate Object Detection
12 0.66437215 173 cvpr-2013-Finding Things: Image Parsing with Regions and Per-Exemplar Detectors
13 0.6626665 163 cvpr-2013-Fast, Accurate Detection of 100,000 Object Classes on a Single Machine
14 0.65450615 401 cvpr-2013-Sketch Tokens: A Learned Mid-level Representation for Contour and Object Detection
15 0.65137929 388 cvpr-2013-Semi-supervised Learning of Feature Hierarchies for Object Detection in a Video
16 0.6510123 142 cvpr-2013-Efficient Detector Adaptation for Object Detection in a Video
17 0.60890543 204 cvpr-2013-Histograms of Sparse Codes for Object Detection
18 0.60549605 221 cvpr-2013-Incorporating Structural Alternatives and Sharing into Hierarchy for Multiclass Object Recognition and Detection
19 0.5970422 239 cvpr-2013-Kernel Null Space Methods for Novelty Detection
20 0.59320641 264 cvpr-2013-Learning to Detect Partially Overlapping Instances
topicId topicWeight
[(10, 0.135), (16, 0.027), (26, 0.039), (28, 0.014), (33, 0.247), (67, 0.159), (69, 0.053), (76, 0.012), (80, 0.011), (86, 0.145), (87, 0.07)]
simIndex simValue paperId paperTitle
same-paper 1 0.90184057 122 cvpr-2013-Detection Evolution with Multi-order Contextual Co-occurrence
Author: Guang Chen, Yuanyuan Ding, Jing Xiao, Tony X. Han
Abstract: Context has been playing an increasingly important role to improve the object detection performance. In this paper we propose an effective representation, Multi-Order Contextual co-Occurrence (MOCO), to implicitly model the high level context using solely detection responses from a baseline object detector. The so-called (1st-order) context feature is computed as a set of randomized binary comparisons on the response map of the baseline object detector. The statistics of the 1st-order binary context features are further calculated to construct a high order co-occurrence descriptor. Combining the MOCO feature with the original image feature, we can evolve the baseline object detector to a stronger context aware detector. With the updated detector, we can continue the evolution till the contextual improvements saturate. Using the successful deformable-partmodel detector [13] as the baseline detector, we test the proposed MOCO evolution framework on the PASCAL VOC 2007 dataset [8] and Caltech pedestrian dataset [7]: The proposed MOCO detector outperforms all known state-ofthe-art approaches, contextually boosting deformable part models (ver.5) [13] by 3.3% in mean average precision on the PASCAL 2007 dataset. For the Caltech pedestrian dataset, our method further reduces the log-average miss rate from 48% to 46% and the miss rate at 1 FPPI from 25% to 23%, compared with the best prior art [6].
2 0.89883405 2 cvpr-2013-3D Pictorial Structures for Multiple View Articulated Pose Estimation
Author: Magnus Burenius, Josephine Sullivan, Stefan Carlsson
Abstract: We consider the problem of automatically estimating the 3D pose of humans from images, taken from multiple calibrated views. We show that it is possible and tractable to extend the pictorial structures framework, popular for 2D pose estimation, to 3D. We discuss how to use this framework to impose view, skeleton, joint angle and intersection constraints in 3D. The 3D pictorial structures are evaluated on multiple view data from a professional football game. The evaluation is focused on computational tractability, but we also demonstrate how a simple 2D part detector can be plugged into the framework.
3 0.89011544 339 cvpr-2013-Probabilistic Graphlet Cut: Exploiting Spatial Structure Cue for Weakly Supervised Image Segmentation
Author: Luming Zhang, Mingli Song, Zicheng Liu, Xiao Liu, Jiajun Bu, Chun Chen
Abstract: Weakly supervised image segmentation is a challenging problem in computer vision field. In this paper, we present a new weakly supervised image segmentation algorithm by learning the distribution of spatially structured superpixel sets from image-level labels. Specifically, we first extract graphlets from each image where a graphlet is a smallsized graph consisting of superpixels as its nodes and it encapsulates the spatial structure of those superpixels. Then, a manifold embedding algorithm is proposed to transform graphlets of different sizes into equal-length feature vectors. Thereafter, we use GMM to learn the distribution of the post-embedding graphlets. Finally, we propose a novel image segmentation algorithm, called graphlet cut, that leverages the learned graphlet distribution in measuring the homogeneity of a set of spatially structured superpixels. Experimental results show that the proposed approach outperforms state-of-the-art weakly supervised image segmentation methods, and its performance is comparable to those of the fully supervised segmentation models.
4 0.88958859 45 cvpr-2013-Articulated Pose Estimation Using Discriminative Armlet Classifiers
Author: Georgia Gkioxari, Pablo Arbeláez, Lubomir Bourdev, Jitendra Malik
Abstract: We propose a novel approach for human pose estimation in real-world cluttered scenes, and focus on the challenging problem of predicting the pose of both arms for each person in the image. For this purpose, we build on the notion of poselets [4] and train highly discriminative classifiers to differentiate among arm configurations, which we call armlets. We propose a rich representation which, in addition to standardHOGfeatures, integrates the information of strong contours, skin color and contextual cues in a principled manner. Unlike existing methods, we evaluate our approach on a large subset of images from the PASCAL VOC detection dataset, where critical visual phenomena, such as occlusion, truncation, multiple instances and clutter are the norm. Our approach outperforms Yang and Ramanan [26], the state-of-the-art technique, with an improvement from 29.0% to 37.5% PCP accuracy on the arm keypoint prediction task, on this new pose estimation dataset.
5 0.88929129 160 cvpr-2013-Face Recognition in Movie Trailers via Mean Sequence Sparse Representation-Based Classification
Author: Enrique G. Ortiz, Alan Wright, Mubarak Shah
Abstract: This paper presents an end-to-end video face recognition system, addressing the difficult problem of identifying a video face track using a large dictionary of still face images of a few hundred people, while rejecting unknown individuals. A straightforward application of the popular ?1minimization for face recognition on a frame-by-frame basis is prohibitively expensive, so we propose a novel algorithm Mean Sequence SRC (MSSRC) that performs video face recognition using a joint optimization leveraging all of the available video data and the knowledge that the face track frames belong to the same individual. By adding a strict temporal constraint to the ?1-minimization that forces individual frames in a face track to all reconstruct a single identity, we show the optimization reduces to a single minimization over the mean of the face track. We also introduce a new Movie Trailer Face Dataset collected from 101 movie trailers on YouTube. Finally, we show that our methodmatches or outperforms the state-of-the-art on three existing datasets (YouTube Celebrities, YouTube Faces, and Buffy) and our unconstrained Movie Trailer Face Dataset. More importantly, our method excels at rejecting unknown identities by at least 8% in average precision.
6 0.88913131 288 cvpr-2013-Modeling Mutual Visibility Relationship in Pedestrian Detection
7 0.88741601 104 cvpr-2013-Deep Convolutional Network Cascade for Facial Point Detection
9 0.88208526 398 cvpr-2013-Single-Pedestrian Detection Aided by Multi-pedestrian Detection
10 0.88191503 275 cvpr-2013-Lp-Norm IDF for Large Scale Image Search
11 0.88119364 254 cvpr-2013-Learning SURF Cascade for Fast and Accurate Object Detection
12 0.88077438 119 cvpr-2013-Detecting and Aligning Faces by Image Retrieval
13 0.87938899 248 cvpr-2013-Learning Collections of Part Models for Object Recognition
14 0.87796015 103 cvpr-2013-Decoding Children's Social Behavior
15 0.8771081 441 cvpr-2013-Tracking Sports Players with Context-Conditioned Motion Models
16 0.87623119 375 cvpr-2013-Saliency Detection via Graph-Based Manifold Ranking
17 0.87375617 60 cvpr-2013-Beyond Physical Connections: Tree Models in Human Pose Estimation
18 0.87117147 414 cvpr-2013-Structure Preserving Object Tracking
19 0.87099236 246 cvpr-2013-Learning Binary Codes for High-Dimensional Data Using Bilinear Projections
20 0.86909258 50 cvpr-2013-Augmenting CRFs with Boltzmann Machine Shape Priors for Image Labeling