cvpr cvpr2013 cvpr2013-167 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Dan Levi, Shai Silberstein, Aharon Bar-Hillel
Abstract: In this work we present a new part-based object detection algorithm with hundreds of parts performing realtime detection. Part-based models are currently state-ofthe-art for object detection due to their ability to represent large appearance variations. However, due to their high computational demands such methods are limited to several parts only and are too slow for practical real-time implementation. Our algorithm is an accelerated version of the “Feature Synthesis ” (FS) method [1], which uses multiple object parts for detection and is among state-of-theart methods on human detection benchmarks, but also suffers from a high computational cost. The proposed Accelerated Feature Synthesis (AFS) uses several strategies for reducing the number of locations searched for each part. The first strategy uses a novel algorithm for approximate nearest neighbor search which we developed, termed “KDFerns ”, to compare each image location to only a subset of the model parts. Candidate part locations for a specific part are further reduced using spatial inhibition, and using an object-level “coarse-to-fine ” strategy. In our empirical evaluation on pedestrian detection benchmarks, AFS main- × tains almost fully the accuracy performance of the original FS, while running more than 4 faster than existing partbased methods which use only several parts. AFS is to our best knowledge the first part-based object detection method achieving real-time running performance: nearly 10 frames per-second on 640 480 images on a regular CPU.
Reference: text
sentIndex sentText sentNum sentScore
1 Our algorithm is an accelerated version of the “Feature Synthesis ” (FS) method [1], which uses multiple object parts for detection and is among state-of-theart methods on human detection benchmarks, but also suffers from a high computational cost. [sent-7, score-0.312]
2 The first strategy uses a novel algorithm for approximate nearest neighbor search which we developed, termed “KDFerns ”, to compare each image location to only a subset of the model parts. [sent-9, score-0.214]
3 In our empirical evaluation on pedestrian detection benchmarks, AFS main- × tains almost fully the accuracy performance of the original FS, while running more than 4 faster than existing partbased methods which use only several parts. [sent-11, score-0.377]
4 Such a capability can support a wide range of real-world applications from aid to the blind to pedestrian detection for advanced driver assistance systems. [sent-15, score-0.282]
5 Although the current performance is improving, as reflected on standard benchmarks like the PASCAL VOC challenge [9] and the Caltech pedestrian benchmark [7], it remains poor compared to that of human vision. [sent-16, score-0.248]
6 Nevertheless, vision-based pedestrian detection technology in vehicles is already commer. [sent-17, score-0.282]
7 The Cascaded Deformable Part-based Model [10] (c-DPM) uses a cascade of part detectors to accelerate the original DPM and is considered the fastest part-based method available, but is still limited in the number of parts and does not reach real-time performance. [sent-31, score-0.209]
8 We present the Accelerated Feature Synthesis (AFS) algorithm, which is based on the Feature Synthesis (FS) [1], a part-based detection method which uses hundreds of parts in its object model. [sent-32, score-0.158]
9 To speed up the coarse level the KD-Ferns algorithm is used to compare only a small subset of the parts to each image location. [sent-43, score-0.193]
10 We evaluate the AFS on the pedestrian detection task using the INRIA pedestrians [3] and the Caltech pedestrian benchmark [7]. [sent-46, score-0.602]
11 We compare the run time of the AFS with the methods evaluated on the Caltech pedestrian benchmark. [sent-48, score-0.236]
12 However, since visiting each tree node is associated with complex operations such as updating a priority queue [19], or a full dimensional distance computation [14], exhaustive search is in practice more efficient for small databases. [sent-56, score-0.309]
13 This is useful in particular for part based object detection in which we need to find the nearest “part descriptors” from a relatively small set of O(100) parts in the model. [sent-58, score-0.182]
14 The “KD-Ferns” algorithm for approximate nearest neighbor search Consider the exact nearest neighbor search problem: given a database of points P ⊂ Rk and a query vector q ∈ Rk find arg minp∈P ? [sent-63, score-0.352]
15 A popular search technique uses the KD-Tree data structure in which a balanced binary tree containing the database points as leaves is constructed. [sent-66, score-0.168]
16 Given a query q, (with q(d) denoting its d-th entry), the tree is traversed root to leaf by computing in each node the binary value of q(d) > τ and following the right branch on 1 and left one on 0. [sent-71, score-0.25]
17 In addition, each traversed node defined by d, τ is inserted to a priority queue (PQ) with a key which equals its distance to the query: |q(d) −τ| . [sent-73, score-0.165]
18 After a leaf is reached the search continues by descending in the tree from the node with the minimal key in PQ. [sent-74, score-0.234]
19 The search is stopped when the minimal key in PQ is larger than the minimal distance found, ensuring an exact nearest neighbor is returned. [sent-75, score-0.199]
20 A “KD-Fern” is a KD-Tree with the following property: all nodes in the same level (depth) of the tree have the same splitting dimension d and threshold τ. [sent-76, score-0.242]
21 The downside is that a balanced tree 999994444488666 × Synthesis (AFS) detection algorithm flow. [sent-93, score-0.176]
22 First level processes a full scale pyramid of the image while the second level processes only regions around candidate locations from level 1 and returns the final detections. [sent-94, score-0.289]
23 An example of a selected appearance fragment (blue rectangle) within the training image it was extracted from. [sent-96, score-0.482]
24 The grid represents the spatial bins of size b for computing the local gradient orientation histograms and the SIFT descriptor of the fragment. [sent-97, score-0.225]
25 For a given node, the splitting dimension d with the highest variance is selected, and τ is set to the median value of p(d) for all dataset points in the node p. [sent-101, score-0.175]
26 In each level the splitting dimension is chosen to maximize the conditional variance averaged over all current nodes (line 1) for increasing discrimination. [sent-103, score-0.18]
27 The splitting threshold is then chosen such that the resulting intermediate tree is as balanced as possible by maximizing the entropy measure of the distribution of dataset points after splitting (line 3(b)). [sent-104, score-0.324]
28 Instead of choosing the splitting dimension dl according to maximal average variance (line 1) a fixed number of dimensions Kd with maximal variance are considered, and dl is chosen randomly among them. [sent-108, score-0.423]
29 C is parameterized by F, a set of classifier features, R, a set of rectangular image fragments extracted Algorithm 1The KD-Fern construction algorithm Input:A dataset,P={pj}jN=1⊂Rn. [sent-115, score-0.207]
30 , (dL , τL)) : An ordered set of splitting dimensions and thresholds, dl ∈ {1. [sent-119, score-0.179]
31 Choose the splitting dimension with maximal average variance over current leafs: = NBN(b) dl+1 argmaxd ? [sent-130, score-0.192]
32 For each fragment r ∈ R the “fragment similarity map” ar (x, y) represents the appearance similarity of r to each (x, y) position in Is. [sent-143, score-0.522]
33 (x, y) is computed as the inner-product between the 128-dimension SIFT descriptor [15] of r and that of the image fragment in position (x, y). [sent-144, score-0.519]
34 Subsequent stages use a list of spatially sparse fragment detection locations Lr = {lk = (xk, yk)}kK=1 computed by finding the K = 5 top local maxima in The appearance score of each location l ∈ Lr is then ar (l). [sent-145, score-0.777]
35 999994444499777 × tion f : Is → R, computed using the fragment detections Lr of one or more fragments r. [sent-147, score-0.636]
36 Such features represent location sensitive part detection, attaining a high value when both the appearance score is high and the position is close to a preferred part location μr, similar to parts in a star-like model [11]. [sent-151, score-0.163]
37 The original FS uses image fragments r ∈ R with different sizes and aspect ratios, all represented by a 128-dimensional SIFT (4 4 spatial bins and 8 orientation bins), and therefore the spatial bin size Bx , By is different for each fragment and equal to the fragment size |r|x , |r|y divided by 4. [sent-157, score-1.35]
38 In order to share the computation of local orientation gradient histograms between many fragments we use at most two different spatial bin sizes B{x,y} = b in our representation, but keep the different fragment sizes. [sent-158, score-0.84]
39 For orientation we use |ori| = 8 orientation bins. [sent-159, score-0.176]
40 The result is for each fragment r a variable dimension descriptor SIFTb(r) with dimension k(r) = · · 8. [sent-161, score-0.605]
41 An example of a selected fragment is illustrated in figure 1(d). [sent-162, score-0.482]
42 We denote by C = (F, R, W) a classifier model as defined previously with this modified fragment descriptor. [sent-163, score-0.535]
43 It uses a trained coarse classifier C1 = (F1, R1, W1) to compute the classification score for a dense set of subwindows sampled in scale and position space. [sent-167, score-0.181]
44 Around each such location a local region is defined and sub-windows are sampled in that region on a finer grid with stride s = s2 and processed by the second level with classifier C2 = (F2 , R2 , W2) to produce the final classification score. [sent-170, score-0.282]
45 The input to the first-level detection is the entire scale pyramid of Im and to the second level detection only the candidate image regions. [sent-174, score-0.241]
46 Performing one-level detection using classifier model C(F, R, W) is composed of three sequential stages that compute the following intermediate results: local gradient orientation histograms, fragment similarity maps and classification scores, as we describe next. [sent-178, score-0.769]
47 The first stage computes the image local gradient orientation histograms of I spatial bins of size b b corresponding to the bin for size used to describe fragments r ∈ R. [sent-180, score-0.424]
48 We then compute for each orientation energy map Eθ at each location on a grid with stride s, the energy sum in a spatial bin of size b b. [sent-186, score-0.354]
49 In this stage we compute for each fragment r ∈ R with bin size b its similarity with the image in a dense set of locations. [sent-191, score-0.595]
50 Computing SIFTb(I([x, x + |r|x] , [y, y + |r|y] )) is made efficient using the gradient orientation histograms for bin size b computed in the previous stage. [sent-195, score-0.204]
51 It remains to get the pre-computed values for bin centers located in the rectangle corresponding to image positions [x, x + |r|x] , [y, y + |r|y] from each orientation map and concatenate them into one vector. [sent-196, score-0.17]
52 Denote by Rk the subset of fragments r ∈ R with SIFT dimension k. [sent-197, score-0.197]
53 The time complexity of this stage for all fragments r ∈ Rk is O(k · |Rk | · sA2 ). [sent-198, score-0.185]
54 We introduce a significant 999995444500888 speedup at the first-level detection by computing ar (x, y) for each image location (x, y) only for fragments r which are the most similar to that image location, setting the score for the rest to zero. [sent-199, score-0.411]
55 For each part-based feature f relying on appearance fragment r we compute Lr from the map ar. [sent-207, score-0.482]
56 We obtain a significant reduction of considered part locations by using only |Lr | = K = 1 locally maximal fragment detections per window instead of K = 5. [sent-208, score-0.638]
57 This is a form of spatial inhibition in which the strongest fragment detection suppresses the nearby detections, producing a much sparser set of detections. [sent-209, score-0.599]
58 The HoG-component features are not based on fragments and are fast to compute directly from the local gradient orientation histograms. [sent-210, score-0.276]
59 The computation can then be accomplished in time O(|R| · As2) for obtaining local detection maxima and O( |F| · As2) for computing the feature and classifier score. [sent-212, score-0.168]
60 To make the first level faster we therefore use a larger stride s, shorter fragment descriptors k¯ (by taking a larger spatial bin size b) and less features |F| in the coarse classifier C1. [sent-217, score-0.917]
61 The result is a coarse (large s, b) first-level detection running at several orders of magnitude faster than the second-level detection, which uses a fine classifier C2 with parameters set to reach the best classification accuracy. [sent-218, score-0.366]
62 Experimental Results To qualitatively evaluate the proposed object detection × × method we chose the pedestrian detection task due to the high availability of benchmarks and tested methods [7, 3] and due to the practical need for real-time detection. [sent-223, score-0.416]
63 The AFS pedestrian detector used throughout the following experiments was trained on the INRIA pedestrians dataset [3]. [sent-224, score-0.32]
64 An initial fragment pool consisting of 40, 000 fragments was used, in sizes ranging from 8 8 to 80 32 pixels. [sent-227, score-0.636]
65 Half of the fragments (the smaller ones) were represented using spatial bin size b = 4 pixels and the other half using b = 8. [sent-228, score-0.236]
66 The first-level coarse classifier C1 was trained with an initial pool of 20, 000 fragments all with size 32 32 pixels and with a single bin size b = 16. [sent-233, score-0.358]
67 To speedup the part-based feature computation we used the KD-Fern algorithm which computes the similarity of each fragment descriptor with only 25 candidates in each location. [sent-236, score-0.578]
68 We evaluate the final classifier AccFeat Synth_L2 using the per-window evaluation on the INRIA pedestrian dataset as specified in [3]. [sent-242, score-0.251]
69 This type ofevaluation allows a fair one-to-one comparison ofthe performance of the AFS with the original FS which is too slow to run on full images (the full image FS results shown in [1] use another classifier as a first level cascade and process the returned windows only). [sent-243, score-0.258]
70 5% miss rate at 10−4 false alarms per window (FPPW), which is a small decrease in performance compared to the original FS (Feat Synth: 5. [sent-246, score-0.196]
71 See [1] for details on the evaluated methods (b) Results on the full Caltech pedestrian test dataset. [sent-255, score-0.23]
72 In parenthesis: the log-average of miss rates between 10−2 and 100 false positives per image (c-f) Caltech pedestrian test on several partitions. [sent-256, score-0.399]
73 The Caltech pedestrian benchmark [7] is divided into 10 different sessions containing movies taken from a moving vehicle. [sent-259, score-0.226]
74 This is the largest available set for pedestrian detection containing over 100, 000 frames of video with 155, 000 instances of pedestrians in difficult real world scenarios. [sent-261, score-0.404]
75 To reach the full range of annotated pedestrians in the dataset we used a 3 up-scaling of the images. [sent-262, score-0.154]
76 Using known camera calibration and assumptions on the height of pedestrians it is possible to significantly narrow the space of window locations searched. [sent-265, score-0.192]
77 This is not possible in the Caltech pedestrian dataset since the positioning of the camera in each session is slightly different. [sent-266, score-0.198]
78 However, since camera positions are roughly similar, we can obtain some bounds on possible pedestrian locations in the image. [sent-267, score-0.236]
79 We used the Caltech pedestrian training set to gather statistics on the height of each bounding box and its bottom y-axis image position, and fitted piece-wise linear bounds to this distribution. [sent-268, score-0.198]
80 The DET curves plot the miss rate as a function of the number of false positives per image (fppi) on a log-log scale. [sent-274, score-0.201]
81 , have a clear advantage for occluded pedestrians (ranked in places 5, 3, 1respectively), an advantage that gradually decreases for partial occlusion and no occlusion. [sent-284, score-0.15]
82 This may suggest combining template-based methods for un- ×× occluded pedestrians with part-based methods for handling occlusions. [sent-285, score-0.15]
83 Figure 3(a) shows the log-average miss rate versus running time of all the methods tested on the Caltech dataset on pedestrians over 100 pixels. [sent-289, score-0.286]
84 The AccFeat Synth is the fastest method, running in 105 milliseconds per frame, close to 10 frames per second (fps), with 38% log-average miss rate. [sent-291, score-0.299]
85 67 fps), which is a template-based method for pedestrian detection. [sent-295, score-0.198]
86 Table 1(right) provides a breakdown of the AFS average runtime using a single thread on the 640 480 Caltech test images. [sent-297, score-0.147]
87 5 faster than the provided implementation of the c-DPM, which is currently considered the fastest part-based detection method implementation. [sent-300, score-0.198]
88 In the AFS coarse level the KDFerns search is used to reduce for each image location the number of candidate model fragments for which similarity is computed. [sent-306, score-0.387]
89 We construct a KD-Fern structure with N = 25 trees from the coarse-level database of 90 fragment descriptors, each of length 32. [sent-307, score-0.482]
90 At detection time, the descriptor at a single position serves as an input query descriptor to the KD-Ferns algorithm, which returns the N closest candidates according to the search algorithm described in 2. [sent-308, score-0.269]
91 We did a separate experiment to compare the KD-Ferns with existing approximate nearest neighbor (ANN) algorithms and with naive exhaustive search for searching our fragment descriptor database. [sent-310, score-0.695]
92 As the searched database we use our 90 fragment descriptors set. [sent-316, score-0.562]
93 The test query set consists of 50, 000 fragment descriptors densely sampled from caltech dataset images, as the ones used in our detection system. [sent-317, score-0.839]
94 The optimal randomized kd-trees uses 5 trees and the hierarchical k-means tree uses the “gonzales” initialization and a branching factor of 6. [sent-324, score-0.158]
95 Each of these algorithms does reduce the number of query to database descriptor comparison, but has additional cost in traversing the trees and updating priority queues. [sent-327, score-0.152]
96 For small databases (up to several hundred fragment descriptors in our experiments), this additional cost is higher than the saved cost of comparing descriptors. [sent-328, score-0.52]
97 At test time, we optimize the running time of these two algorithms by limiting the number of visited leafs to the minimal num- ×× ber required to provide the {? [sent-329, score-0.164]
98 Runtime breakdown (average ms) of the AFS on Caltech 640 480 images, for each level in the cascade and each processing stage using a single thread. [sent-352, score-0.163]
99 runtime for pedestrians over 100 pixels, (b) pedestrians 100 pixels over (b) Accuracy 50 pixels. [sent-357, score-0.309]
100 runtime for pedestrians over 50 pixels comparison methodology and compared methods. [sent-359, score-0.187]
wordName wordTfidf (topN-words)
[('fragment', 0.482), ('afs', 0.45), ('accfeat', 0.231), ('pedestrian', 0.198), ('synth', 0.173), ('caltech', 0.173), ('fragments', 0.154), ('fs', 0.135), ('pedestrians', 0.122), ('stride', 0.111), ('miss', 0.106), ('synthesis', 0.105), ('siftb', 0.105), ('splitting', 0.092), ('ann', 0.091), ('orientation', 0.088), ('dl', 0.087), ('kdferns', 0.084), ('detection', 0.084), ('bin', 0.082), ('lr', 0.081), ('fastest', 0.077), ('accelerated', 0.07), ('coarse', 0.069), ('runtime', 0.065), ('geomet', 0.063), ('families', 0.063), ('levi', 0.062), ('query', 0.062), ('tree', 0.062), ('speedup', 0.059), ('cascade', 0.058), ('running', 0.058), ('maximal', 0.057), ('strings', 0.056), ('nbn', 0.056), ('leaf', 0.056), ('rk', 0.054), ('classifier', 0.053), ('thread', 0.053), ('priority', 0.053), ('nearest', 0.051), ('benchmarks', 0.05), ('leafs', 0.049), ('search', 0.049), ('entropy', 0.048), ('parts', 0.047), ('string', 0.047), ('neighbor', 0.045), ('level', 0.045), ('dimension', 0.043), ('inria', 0.043), ('searched', 0.042), ('istrain', 0.042), ('queue', 0.042), ('randomized', 0.042), ('location', 0.042), ('node', 0.04), ('ar', 0.04), ('run', 0.038), ('fine', 0.038), ('descriptors', 0.038), ('locations', 0.038), ('ori', 0.037), ('descriptor', 0.037), ('faster', 0.037), ('positives', 0.037), ('dpm', 0.035), ('deformable', 0.035), ('bins', 0.035), ('shai', 0.035), ('fppw', 0.035), ('fps', 0.034), ('gradient', 0.034), ('ry', 0.033), ('inhibition', 0.033), ('fpdw', 0.033), ('window', 0.032), ('speed', 0.032), ('full', 0.032), ('score', 0.032), ('maxima', 0.031), ('doll', 0.031), ('stage', 0.031), ('exhaustive', 0.031), ('grid', 0.031), ('balanced', 0.03), ('traversed', 0.03), ('visited', 0.03), ('false', 0.029), ('per', 0.029), ('dan', 0.029), ('breakdown', 0.029), ('occluded', 0.028), ('processes', 0.028), ('stages', 0.028), ('candidate', 0.028), ('sessions', 0.028), ('uses', 0.027), ('minimal', 0.027)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000004 167 cvpr-2013-Fast Multiple-Part Based Object Detection Using KD-Ferns
Author: Dan Levi, Shai Silberstein, Aharon Bar-Hillel
Abstract: In this work we present a new part-based object detection algorithm with hundreds of parts performing realtime detection. Part-based models are currently state-ofthe-art for object detection due to their ability to represent large appearance variations. However, due to their high computational demands such methods are limited to several parts only and are too slow for practical real-time implementation. Our algorithm is an accelerated version of the “Feature Synthesis ” (FS) method [1], which uses multiple object parts for detection and is among state-of-theart methods on human detection benchmarks, but also suffers from a high computational cost. The proposed Accelerated Feature Synthesis (AFS) uses several strategies for reducing the number of locations searched for each part. The first strategy uses a novel algorithm for approximate nearest neighbor search which we developed, termed “KDFerns ”, to compare each image location to only a subset of the model parts. Candidate part locations for a specific part are further reduced using spatial inhibition, and using an object-level “coarse-to-fine ” strategy. In our empirical evaluation on pedestrian detection benchmarks, AFS main- × tains almost fully the accuracy performance of the original FS, while running more than 4 faster than existing partbased methods which use only several parts. AFS is to our best knowledge the first part-based object detection method achieving real-time running performance: nearly 10 frames per-second on 640 480 images on a regular CPU.
2 0.24475673 398 cvpr-2013-Single-Pedestrian Detection Aided by Multi-pedestrian Detection
Author: Wanli Ouyang, Xiaogang Wang
Abstract: In this paper, we address the challenging problem of detecting pedestrians who appear in groups and have interaction. A new approach is proposed for single-pedestrian detection aided by multi-pedestrian detection. A mixture model of multi-pedestrian detectors is designed to capture the unique visual cues which are formed by nearby multiple pedestrians but cannot be captured by single-pedestrian detectors. A probabilistic framework is proposed to model the relationship between the configurations estimated by single- and multi-pedestrian detectors, and to refine the single-pedestrian detection result with multi-pedestrian detection. It can integrate with any single-pedestrian detector without significantly increasing the computation load. 15 state-of-the-art single-pedestrian detection approaches are investigated on three widely used public datasets: Caltech, TUD-Brussels andETH. Experimental results show that our framework significantly improves all these approaches. The average improvement is 9% on the Caltech-Test dataset, 11% on the TUD-Brussels dataset and 17% on the ETH dataset in terms of average miss rate. The lowest average miss rate is reduced from 48% to 43% on the Caltech-Test dataset, from 55% to 50% on the TUD-Brussels dataset and from 51% to 41% on the ETH dataset.
3 0.22416383 363 cvpr-2013-Robust Multi-resolution Pedestrian Detection in Traffic Scenes
Author: Junjie Yan, Xucong Zhang, Zhen Lei, Shengcai Liao, Stan Z. Li
Abstract: The serious performance decline with decreasing resolution is the major bottleneck for current pedestrian detection techniques [14, 23]. In this paper, we take pedestrian detection in different resolutions as different but related problems, and propose a Multi-Task model to jointly consider their commonness and differences. The model contains resolution aware transformations to map pedestrians in different resolutions to a common space, where a shared detector is constructed to distinguish pedestrians from background. For model learning, we present a coordinate descent procedure to learn the resolution aware transformations and deformable part model (DPM) based detector iteratively. In traffic scenes, there are many false positives located around vehicles, therefore, we further build a context model to suppress them according to the pedestrian-vehicle relationship. The context model can be learned automatically even when the vehicle annotations are not available. Our method reduces the mean miss rate to 60% for pedestrians taller than 30 pixels on the Caltech Pedestrian Benchmark, which noticeably outperforms previous state-of-the-art (71%).
4 0.21610251 52 cvpr-2013-Axially Symmetric 3D Pots Configuration System Using Axis of Symmetry and Break Curve
Author: Kilho Son, Eduardo B. Almeida, David B. Cooper
Abstract: Thispaper introduces a novel approachfor reassembling pot sherds found at archaeological excavation sites, for the purpose ofreconstructing claypots that had been made on a wheel. These pots and the sherds into which they have broken are axially symmetric. The reassembly process can be viewed as 3D puzzle solving or generalized cylinder learning from broken fragments. The estimation exploits both local and semi-global geometric structure, thus making it a fundamental problem of geometry estimation from noisy fragments in computer vision and pattern recognition. The data used are densely digitized 3D laser scans of each fragment’s outer surface. The proposed reassembly system is automatic and functions when the pile of available fragments is from one or multiple pots, and even when pieces are missing from any pot. The geometric structure used are curves on the pot along which the surface had broken and the silhouette of a pot with respect to an axis, called axisprofile curve (APC). For reassembling multiple pots with or without missing pieces, our algorithm estimates the APC from each fragment, then reassembles into configurations the ones having distinctive APC. Further growth of configurations is based on adding remaining fragments such that their APC and break curves are consistent with those of a configuration. The method is novel, more robust and handles the largest numbers of fragments to date.
5 0.16733707 124 cvpr-2013-Determining Motion Directly from Normal Flows Upon the Use of a Spherical Eye Platform
Author: Tak-Wai Hui, Ronald Chung
Abstract: We address the problem of recovering camera motion from video data, which does not require the establishment of feature correspondences or computation of optical flows but from normal flows directly. We have designed an imaging system that has a wide field of view by fixating a number of cameras together to form an approximate spherical eye. With a substantially widened visual field, we discover that estimating the directions of translation and rotation components of the motion separately are possible and particularly efficient. In addition, the inherent ambiguities between translation and rotation also disappear. Magnitude of rotation is recovered subsequently. Experimental results on synthetic and real image data are provided. The results show that not only the accuracy of motion estimation is comparable to those of the state-of-the-art methods that require explicit feature correspondences or optical flows, but also a faster computation time.
6 0.15393814 328 cvpr-2013-Pedestrian Detection with Unsupervised Multi-stage Feature Learning
7 0.14179885 158 cvpr-2013-Exploring Weak Stabilization for Motion Feature Extraction
8 0.13170061 288 cvpr-2013-Modeling Mutual Visibility Relationship in Pedestrian Detection
9 0.1276983 318 cvpr-2013-Optimized Pedestrian Detection for Multiple and Occluded People
10 0.12404704 383 cvpr-2013-Seeking the Strongest Rigid Detector
11 0.10560046 340 cvpr-2013-Probabilistic Label Trees for Efficient Large Scale Image Classification
12 0.096726149 122 cvpr-2013-Detection Evolution with Multi-order Contextual Co-occurrence
13 0.093624681 248 cvpr-2013-Learning Collections of Part Models for Object Recognition
14 0.086519547 217 cvpr-2013-Improving an Object Detector and Extracting Regions Using Superpixels
15 0.086373098 371 cvpr-2013-SCaLE: Supervised and Cascaded Laplacian Eigenmaps for Visual Object Recognition Based on Nearest Neighbors
16 0.085132241 254 cvpr-2013-Learning SURF Cascade for Fast and Accurate Object Detection
17 0.083045386 209 cvpr-2013-Hypergraphs for Joint Multi-view Reconstruction and Multi-object Tracking
18 0.080227047 163 cvpr-2013-Fast, Accurate Detection of 100,000 Object Classes on a Single Machine
19 0.077930316 390 cvpr-2013-Semi-supervised Node Splitting for Random Forest Construction
20 0.07627359 311 cvpr-2013-Occlusion Patterns for Object Class Detection
topicId topicWeight
[(0, 0.196), (1, -0.028), (2, 0.011), (3, -0.034), (4, 0.069), (5, 0.012), (6, 0.082), (7, 0.008), (8, -0.014), (9, -0.03), (10, -0.12), (11, -0.038), (12, 0.134), (13, -0.167), (14, 0.103), (15, -0.017), (16, -0.076), (17, 0.024), (18, 0.017), (19, -0.008), (20, -0.018), (21, -0.039), (22, -0.104), (23, 0.129), (24, -0.026), (25, 0.012), (26, 0.018), (27, 0.06), (28, 0.016), (29, 0.106), (30, 0.034), (31, 0.07), (32, 0.037), (33, 0.014), (34, -0.057), (35, 0.002), (36, -0.029), (37, -0.026), (38, -0.088), (39, 0.008), (40, 0.032), (41, 0.044), (42, -0.001), (43, 0.05), (44, 0.089), (45, 0.092), (46, -0.032), (47, 0.019), (48, -0.032), (49, -0.061)]
simIndex simValue paperId paperTitle
same-paper 1 0.90286267 167 cvpr-2013-Fast Multiple-Part Based Object Detection Using KD-Ferns
Author: Dan Levi, Shai Silberstein, Aharon Bar-Hillel
Abstract: In this work we present a new part-based object detection algorithm with hundreds of parts performing realtime detection. Part-based models are currently state-ofthe-art for object detection due to their ability to represent large appearance variations. However, due to their high computational demands such methods are limited to several parts only and are too slow for practical real-time implementation. Our algorithm is an accelerated version of the “Feature Synthesis ” (FS) method [1], which uses multiple object parts for detection and is among state-of-theart methods on human detection benchmarks, but also suffers from a high computational cost. The proposed Accelerated Feature Synthesis (AFS) uses several strategies for reducing the number of locations searched for each part. The first strategy uses a novel algorithm for approximate nearest neighbor search which we developed, termed “KDFerns ”, to compare each image location to only a subset of the model parts. Candidate part locations for a specific part are further reduced using spatial inhibition, and using an object-level “coarse-to-fine ” strategy. In our empirical evaluation on pedestrian detection benchmarks, AFS main- × tains almost fully the accuracy performance of the original FS, while running more than 4 faster than existing partbased methods which use only several parts. AFS is to our best knowledge the first part-based object detection method achieving real-time running performance: nearly 10 frames per-second on 640 480 images on a regular CPU.
2 0.86875844 363 cvpr-2013-Robust Multi-resolution Pedestrian Detection in Traffic Scenes
Author: Junjie Yan, Xucong Zhang, Zhen Lei, Shengcai Liao, Stan Z. Li
Abstract: The serious performance decline with decreasing resolution is the major bottleneck for current pedestrian detection techniques [14, 23]. In this paper, we take pedestrian detection in different resolutions as different but related problems, and propose a Multi-Task model to jointly consider their commonness and differences. The model contains resolution aware transformations to map pedestrians in different resolutions to a common space, where a shared detector is constructed to distinguish pedestrians from background. For model learning, we present a coordinate descent procedure to learn the resolution aware transformations and deformable part model (DPM) based detector iteratively. In traffic scenes, there are many false positives located around vehicles, therefore, we further build a context model to suppress them according to the pedestrian-vehicle relationship. The context model can be learned automatically even when the vehicle annotations are not available. Our method reduces the mean miss rate to 60% for pedestrians taller than 30 pixels on the Caltech Pedestrian Benchmark, which noticeably outperforms previous state-of-the-art (71%).
3 0.8512978 398 cvpr-2013-Single-Pedestrian Detection Aided by Multi-pedestrian Detection
Author: Wanli Ouyang, Xiaogang Wang
Abstract: In this paper, we address the challenging problem of detecting pedestrians who appear in groups and have interaction. A new approach is proposed for single-pedestrian detection aided by multi-pedestrian detection. A mixture model of multi-pedestrian detectors is designed to capture the unique visual cues which are formed by nearby multiple pedestrians but cannot be captured by single-pedestrian detectors. A probabilistic framework is proposed to model the relationship between the configurations estimated by single- and multi-pedestrian detectors, and to refine the single-pedestrian detection result with multi-pedestrian detection. It can integrate with any single-pedestrian detector without significantly increasing the computation load. 15 state-of-the-art single-pedestrian detection approaches are investigated on three widely used public datasets: Caltech, TUD-Brussels andETH. Experimental results show that our framework significantly improves all these approaches. The average improvement is 9% on the Caltech-Test dataset, 11% on the TUD-Brussels dataset and 17% on the ETH dataset in terms of average miss rate. The lowest average miss rate is reduced from 48% to 43% on the Caltech-Test dataset, from 55% to 50% on the TUD-Brussels dataset and from 51% to 41% on the ETH dataset.
4 0.80938011 383 cvpr-2013-Seeking the Strongest Rigid Detector
Author: Rodrigo Benenson, Markus Mathias, Tinne Tuytelaars, Luc Van_Gool
Abstract: The current state of the art solutions for object detection describe each class by a set of models trained on discovered sub-classes (so called “components ”), with each model itself composed of collections of interrelated parts (deformable models). These detectors build upon the now classic Histogram of Oriented Gradients+linear SVM combo. In this paper we revisit some of the core assumptions in HOG+SVM and show that by properly designing the feature pooling, feature selection, preprocessing, and training methods, it is possible to reach top quality, at least for pedestrian detections, using a single rigid component. We provide experiments for a large design space, that give insights into the design of classifiers, as well as relevant information for practitioners. Our best detector is fully feed-forward, has a single unified architecture, uses only histograms of oriented gradients and colour information in monocular static images, and improves over 23 other methods on the INRIA, ETHand Caltech-USA datasets, reducing the average miss-rate over HOG+SVM by more than 30%.
5 0.78701431 288 cvpr-2013-Modeling Mutual Visibility Relationship in Pedestrian Detection
Author: Wanli Ouyang, Xingyu Zeng, Xiaogang Wang
Abstract: Detecting pedestrians in cluttered scenes is a challenging problem in computer vision. The difficulty is added when several pedestrians overlap in images and occlude each other. We observe, however, that the occlusion/visibility statuses of overlapping pedestrians provide useful mutual relationship for visibility estimation - the visibility estimation of one pedestrian facilitates the visibility estimation of another. In this paper, we propose a mutual visibility deep model that jointly estimates the visibility statuses of overlapping pedestrians. The visibility relationship among pedestrians is learned from the deep model for recognizing co-existing pedestrians. Experimental results show that the mutual visibility deep model effectively improves the pedestrian detection results. Compared with existing image-based pedestrian detection approaches, our approach has the lowest average miss rate on the CaltechTrain dataset, the Caltech-Test dataset and the ETHdataset. Including mutual visibility leads to 4% −8% improvements on mluudlitnipglem ubteunaclh vmiasibrki ditayta lesaedtss.
6 0.73832244 318 cvpr-2013-Optimized Pedestrian Detection for Multiple and Occluded People
7 0.70431626 122 cvpr-2013-Detection Evolution with Multi-order Contextual Co-occurrence
8 0.65661418 328 cvpr-2013-Pedestrian Detection with Unsupervised Multi-stage Feature Learning
9 0.59763288 272 cvpr-2013-Long-Term Occupancy Analysis Using Graph-Based Optimisation in Thermal Imagery
10 0.56325078 401 cvpr-2013-Sketch Tokens: A Learned Mid-level Representation for Contour and Object Detection
11 0.55811334 254 cvpr-2013-Learning SURF Cascade for Fast and Accurate Object Detection
12 0.55256832 142 cvpr-2013-Efficient Detector Adaptation for Object Detection in a Video
13 0.53705937 15 cvpr-2013-A Lazy Man's Approach to Benchmarking: Semisupervised Classifier Evaluation and Recalibration
14 0.5263235 158 cvpr-2013-Exploring Weak Stabilization for Motion Feature Extraction
15 0.5202378 144 cvpr-2013-Efficient Maximum Appearance Search for Large-Scale Object Detection
16 0.51580149 168 cvpr-2013-Fast Object Detection with Entropy-Driven Evaluation
17 0.51359802 217 cvpr-2013-Improving an Object Detector and Extracting Regions Using Superpixels
18 0.50195962 382 cvpr-2013-Scene Text Recognition Using Part-Based Tree-Structured Character Detection
19 0.50144291 52 cvpr-2013-Axially Symmetric 3D Pots Configuration System Using Axis of Symmetry and Break Curve
20 0.49734569 264 cvpr-2013-Learning to Detect Partially Overlapping Instances
topicId topicWeight
[(5, 0.176), (10, 0.089), (16, 0.047), (26, 0.058), (28, 0.016), (33, 0.238), (67, 0.129), (69, 0.051), (80, 0.015), (87, 0.1)]
simIndex simValue paperId paperTitle
1 0.91795403 282 cvpr-2013-Measuring Crowd Collectiveness
Author: Bolei Zhou, Xiaoou Tang, Xiaogang Wang
Abstract: Collective motions are common in crowd systems and have attracted a great deal of attention in a variety of multidisciplinary fields. Collectiveness, which indicates the degree of individuals acting as a union in collective motion, is a fundamental and universal measurement for various crowd systems. By integrating path similarities among crowds on collective manifold, this paper proposes a descriptor of collectiveness and an efficient computation for the crowd and its constituent individuals. The algorithm of the Collective Merging is then proposed to detect collective motions from random motions. We validate the effectiveness and robustness of the proposed collectiveness descriptor on the system of self-driven particles. We then compare the collectiveness descriptor to human perception for collective motion and show high consistency. Our experiments regarding the detection of collective motions and the measurement of collectiveness in videos of pedestrian crowds and bacteria colony demonstrate a wide range of applications of the collectiveness descriptor1.
same-paper 2 0.85945153 167 cvpr-2013-Fast Multiple-Part Based Object Detection Using KD-Ferns
Author: Dan Levi, Shai Silberstein, Aharon Bar-Hillel
Abstract: In this work we present a new part-based object detection algorithm with hundreds of parts performing realtime detection. Part-based models are currently state-ofthe-art for object detection due to their ability to represent large appearance variations. However, due to their high computational demands such methods are limited to several parts only and are too slow for practical real-time implementation. Our algorithm is an accelerated version of the “Feature Synthesis ” (FS) method [1], which uses multiple object parts for detection and is among state-of-theart methods on human detection benchmarks, but also suffers from a high computational cost. The proposed Accelerated Feature Synthesis (AFS) uses several strategies for reducing the number of locations searched for each part. The first strategy uses a novel algorithm for approximate nearest neighbor search which we developed, termed “KDFerns ”, to compare each image location to only a subset of the model parts. Candidate part locations for a specific part are further reduced using spatial inhibition, and using an object-level “coarse-to-fine ” strategy. In our empirical evaluation on pedestrian detection benchmarks, AFS main- × tains almost fully the accuracy performance of the original FS, while running more than 4 faster than existing partbased methods which use only several parts. AFS is to our best knowledge the first part-based object detection method achieving real-time running performance: nearly 10 frames per-second on 640 480 images on a regular CPU.
3 0.84513199 342 cvpr-2013-Prostate Segmentation in CT Images via Spatial-Constrained Transductive Lasso
Author: Yinghuan Shi, Shu Liao, Yaozong Gao, Daoqiang Zhang, Yang Gao, Dinggang Shen
Abstract: Accurate prostate segmentation in CT images is a significant yet challenging task for image guided radiotherapy. In this paper, a novel semi-automated prostate segmentation method is presented. Specifically, to segment the prostate in the current treatment image, the physician first takes a few seconds to manually specify the first and last slices of the prostate in the image space. Then, the prostate is segmented automatically by theproposed two steps: (i) Thefirst step of prostate-likelihood estimation to predict the prostate likelihood for each voxel in the current treatment image, aiming to generate the 3-D prostate-likelihood map by the proposed Spatial-COnstrained Transductive LassO (SCOTO); (ii) The second step of multi-atlases based label fusion to generate the final segmentation result by using the prostate shape information obtained from the planning and previous treatment images. The experimental result shows that the proposed method outperforms several state-of-the-art methods on prostate segmentation in a real prostate CT dataset, consisting of 24 patients with 330 images. Moreover, it is also clinically feasible since our method just requires the physician to spend a few seconds on manual specification of the first and last slices of the prostate.
4 0.83970582 2 cvpr-2013-3D Pictorial Structures for Multiple View Articulated Pose Estimation
Author: Magnus Burenius, Josephine Sullivan, Stefan Carlsson
Abstract: We consider the problem of automatically estimating the 3D pose of humans from images, taken from multiple calibrated views. We show that it is possible and tractable to extend the pictorial structures framework, popular for 2D pose estimation, to 3D. We discuss how to use this framework to impose view, skeleton, joint angle and intersection constraints in 3D. The 3D pictorial structures are evaluated on multiple view data from a professional football game. The evaluation is focused on computational tractability, but we also demonstrate how a simple 2D part detector can be plugged into the framework.
5 0.83899033 254 cvpr-2013-Learning SURF Cascade for Fast and Accurate Object Detection
Author: Jianguo Li, Yimin Zhang
Abstract: This paper presents a novel learning framework for training boosting cascade based object detector from large scale dataset. The framework is derived from the wellknown Viola-Jones (VJ) framework but distinguished by three key differences. First, the proposed framework adopts multi-dimensional SURF features instead of single dimensional Haar features to describe local patches. In this way, the number of used local patches can be reduced from hundreds of thousands to several hundreds. Second, it adopts logistic regression as weak classifier for each local patch instead of decision trees in the VJ framework. Third, we adopt AUC as a single criterion for the convergence test during cascade training rather than the two trade-off criteria (false-positive-rate and hit-rate) in the VJ framework. The benefit is that the false-positive-rate can be adaptive among different cascade stages, and thus yields much faster convergence speed of SURF cascade. Combining these points together, the proposed approach has three good properties. First, the boosting cascade can be trained very efficiently. Experiments show that the proposed approach can train object detectors from billions of negative samples within one hour even on personal computers. Second, the built detector is comparable to the stateof-the-art algorithm not only on the accuracy but also on the processing speed. Third, the built detector is small in model-size due to short cascade stages.
6 0.83733767 119 cvpr-2013-Detecting and Aligning Faces by Image Retrieval
9 0.83277678 45 cvpr-2013-Articulated Pose Estimation Using Discriminative Armlet Classifiers
10 0.83264494 363 cvpr-2013-Robust Multi-resolution Pedestrian Detection in Traffic Scenes
11 0.83226889 322 cvpr-2013-PISA: Pixelwise Image Saliency by Aggregating Complementary Appearance Contrast Measures with Spatial Priors
12 0.82920963 345 cvpr-2013-Real-Time Model-Based Rigid Object Pose Estimation and Tracking Combining Dense and Sparse Visual Cues
13 0.82860935 248 cvpr-2013-Learning Collections of Part Models for Object Recognition
14 0.82828045 190 cvpr-2013-Graph-Based Optimization with Tubularity Markov Tree for 3D Vessel Segmentation
15 0.82818097 160 cvpr-2013-Face Recognition in Movie Trailers via Mean Sequence Sparse Representation-Based Classification
16 0.82480556 275 cvpr-2013-Lp-Norm IDF for Large Scale Image Search
17 0.82463372 122 cvpr-2013-Detection Evolution with Multi-order Contextual Co-occurrence
18 0.82305443 365 cvpr-2013-Robust Real-Time Tracking of Multiple Objects by Volumetric Mass Densities
19 0.82276005 4 cvpr-2013-3D Visual Proxemics: Recognizing Human Interactions in 3D from a Single Image
20 0.82162082 94 cvpr-2013-Context-Aware Modeling and Recognition of Activities in Video