cvpr cvpr2013 cvpr2013-30 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Byung-soo Kim, Shili Xu, Silvio Savarese
Abstract: In this paper we focus on the problem of detecting objects in 3D from RGB-D images. We propose a novel framework that explores the compatibility between segmentation hypotheses of the object in the image and the corresponding 3D map. Our framework allows to discover the optimal location of the object using a generalization of the structural latent SVM formulation in 3D as well as the definition of a new loss function defined over the 3D space in training. We evaluate our method using two existing RGB-D datasets. Extensive quantitative and qualitative experimental results show that our proposed approach outperforms state-of-theart as methods well as a number of baseline approaches for both 3D and 2D object recognition tasks.
Reference: text
sentIndex sentText sentNum sentScore
1 We propose a novel framework that explores the compatibility between segmentation hypotheses of the object in the image and the corresponding 3D map. [sent-12, score-0.372]
2 Our framework allows to discover the optimal location of the object using a generalization of the structural latent SVM formulation in 3D as well as the definition of a new loss function defined over the 3D space in training. [sent-13, score-0.328]
3 Researchers have shown that the associated depth information can enhance detection performances [2, 3] and that, in general, the ability to reason in the 3D physical space provides critical contextual information that does facilitate object detection [4, 5, 6]. [sent-19, score-0.293]
4 However, most of the existing approaches aim at localizing objects in the image and ignore the problem of estimating object location in the 3D space (we refer to this problem as to 3D object localization) (Fig. [sent-20, score-0.334]
5 In this work we focus on the 3D object localization problem and propose a new method that is capable of jointly detecting objects in 2D images and the 3D physical space using RGB-D images. [sent-23, score-0.382]
6 detection methods [8, 9, 10] which identify object proposals in the image by means of bounding boxes. [sent-25, score-0.294]
7 Starting from these bounding box proposals, we introduce a novel framework that explores the compatibility between hypotheses of the object in the bounding box and the corresponding 3D map associated to the pixels within the bounding box. [sent-26, score-0.95]
8 These object hypotheses are generated from foregroundvs-background object segmentation hypotheses within the bounding box. [sent-27, score-0.769]
9 The intuition is that the ability to combine appearance and corresponding depth values within the HFMs allows constructing more discriminative features for 2D and 3D localization than if such features are extracted from bounding boxes only (Fig. [sent-29, score-0.588]
10 Object models are learnt using a latent max-margin formulation whereby the latent variables are the object part locations in 3D. [sent-31, score-0.293]
11 The deformation costs, or penalty costs, for the rela- tive distance between object parts and the object root position, are calculated in 3D space, where a novel efficient 3D matching strategy is proposed. [sent-33, score-0.416]
12 (b) In this paper we argue that by using segmentation hypotheses for the foreground object (the HFMs), we have the opportunity to identify points in 3D that are only relevant to the foreground object and therefore enable much more accurate 3D localization capabilities. [sent-38, score-1.054]
13 expensive compared to object detection schemes based on sliding bounding cubes in 3D space. [sent-39, score-0.363]
14 [3] used 3D features and obtained improvement in detection performance, and [2, 7] used 3D feature to achieve accurate 2D detection performance. [sent-43, score-0.164]
15 [13] proposed the method to detect object and localize objects in 3D from a RGB or RGB-D image. [sent-45, score-0.18]
16 In this paper, we use foreground segments as initial hypotheses as efficiently as in [19] and find out the optimal hypothesis using our novel formulation. [sent-49, score-0.401]
17 Our attempt to use a latent structural SVM formulation in 3D is clearly related to [8] as well as to recent work [10, 20] which propose to model an object as collection of 3D parts. [sent-52, score-0.205]
18 From each bounding box, multiple hypothetical object foreground masks are generated. [sent-64, score-0.723]
19 From these features, the object’s best foreground mask as well as its 3D location are estimated using our structural SVM formulation. [sent-66, score-0.449]
20 Our main contributions are four-fold: i) we introduce HFM to help extract more descriptive 3D features, leading to a more robust 3D localization (Sec. [sent-68, score-0.245]
21 1); ii) we propose a novel matching process in 3D, integrating responses from deformable parts in 3D (Sec. [sent-70, score-0.169]
22 1); iii) we use our structural SVM scheme for joint 3D object localization and selection of the best segmentation hypothesis; finally, iv) we provide annotations for 3D object locations on top of existing RGB-D datasets (Sec. [sent-73, score-0.646]
23 Accurate 3D Object Localization with Hypothetical Foreground Masks In this section, we introduce our framework for accurate object detection and localization in 3D with RGB-D data from a single view. [sent-77, score-0.422]
24 Bounding boxes have been widely used to generate hypotheses of object location in 2D from which features such as HOG can be extracted [8, 21]. [sent-82, score-0.424]
25 The fact that a bounding box contains not only the foreground object but also the portions of the background scene is not necessarily an issue when it comes to object detection in 2D. [sent-83, score-0.674]
26 The reason being that the appearance of the background is often correlated to the foreground object (think about a cow sitting on grass) and therefore the combination of the two can enhance object detection. [sent-84, score-0.352]
27 In such a case, 333 111888113 I Hypothetical Masks Remaining columns show top K foreground segmentation hypothesis (or, masks) when K = 10. [sent-86, score-0.283]
28 The hypotheses highlighted with green lines indicate the segmentation hypothesis that is closest to the ground truth. [sent-87, score-0.332]
29 the 3D content associated to portions of a bounding box outside the foreground object can be fairly uncorrelated with the object and scattered in 3D space depending on the geometry of the background region (See Fig. [sent-88, score-0.658]
30 In this paper we propose to associate each bounding box hypothesis (a HBB) to a set of hypotheses for the foreground object segment (or mask) - the HFM. [sent-91, score-0.72]
31 Specifically, each 2D HBB yb,2D with a height H and a width W is associated with an HFM ym ∈ {0, 1}H·W, which is a set of binary variables for all pixels w {0h,e1r}e 1indicates foreground pixels and 0 is for background. [sent-92, score-0.348]
32 If the mask ym tightly covers an objects itself, we can map the mask into 3D space as shown in Fig. [sent-93, score-0.455]
33 To resolve the problem, we narrow down the searching space for ym using the top-K segmentation hypotheses (masks) provided by a state-of-the-art segmentation approach such as [19]. [sent-96, score-0.428]
34 To this end, we introduce an auxiliary indexing variable im where yimm indicates imth mask among K masks. [sent-99, score-0.207]
35 We localize object in 3D space by projecting pixels within the HFM into 3D points to produce accurate localization results. [sent-110, score-0.418]
36 2 (a) and (b) show localization results from an estimated HBB and HFM, respectively. [sent-112, score-0.245]
37 As the figure shows, when the correct HFM is used, ym the corresponding 3D point cloud enables much more accurate localization results than if an HBB is used in isolation. [sent-113, score-0.435]
38 5, we quantitatively and qualitatively show that the proposed scheme significantly improves the 3D localization performance. [sent-116, score-0.245]
39 res for M components ofthe mixture model, which encode 2D and 3D appearance cues, 3D distances between root and part filters and a offset value. [sent-124, score-0.193]
40 1 3D Matching The procedure that is used to estimate the root and part location in 2D is referred to as matching [8], which takes into account the 2D Euclidean distance between filter locations [24]. [sent-131, score-0.335]
41 In contrast, our framework searches for the best 3D root and part locations, and this process is referred to as 3D matching. [sent-132, score-0.159]
42 By looking at 3D distance between root and part filters, this process suppresses false alarms in object part localization if the 3D distance between root and part is large, even they are close in the 2D image. [sent-133, score-0.771]
43 Then, we define a score function which is obtained as the summation of the root and parts responses, with respect to their deformation costs in 3D. [sent-139, score-0.192]
44 λ is the scale difference between root and part filters. [sent-145, score-0.159]
45 (a)) can be suppressed if 3D distances between root and part filters are large. [sent-163, score-0.193]
46 For 3D matching, part responses are firstly mapped into 3D space, and 3D distance transform is applied to efficiently calculate deformation costs between root and part filters. [sent-164, score-0.365]
47 Once the root location is found in 3D, parts locations also can be found by looking up the optimal displacements, similar to the 2D case [8]. [sent-179, score-0.282]
48 This can improve the precision of decision boundaries of trained classifier since it penalizes inaccurate 3D localization predictions during the training process. [sent-183, score-0.278]
49 In the following, we describe how the labeling space in 3D is formulated, and also introduce a loss function that penalizes inaccurate 3D localization predictions. [sent-184, score-0.297]
50 Our training data is equipped with object class label yl and the object foreground mask ym, i. [sent-186, score-0.56]
51 To help associate the mask with 3D locations, we use ys which is equivalent to ym with different parametrization; ys = {(u1, v1) , . [sent-189, score-0.36]
52 , (uS, vS)} is indicating pixels of the object =for {eg(ruound mask where )y}m is s(u in, vdi)c a=ti 1g. [sent-192, score-0.238]
53 S is the number of pixels belonging to the foreground region. [sent-193, score-0.17]
54 On top of that, we obtain 3D object location by projecting to point clouds ys,3D as follows: ys,3D = g(ys,Depth, Camera) ys . [sent-209, score-0.273]
55 We use 3D ellipsoids in order to identify the point cloud ys,3D which identify an object in 3D space. [sent-222, score-0.358]
56 1, ellipsoids are more convenient (than bounding cubes) for annotating objects in 3D. [sent-225, score-0.433]
57 3D ellipsoids are characterized with 9 parameters as follows: yb,3D = Ellipsoid(ys,3D) = [cx, cy, cz, v1 v2, v3, d1 d2, d3] , , (5) where {cx , cy, cz} is the center, {v1, v2 , v3} are the 3 major axes, ea {ndc {d1, d2 , d is3} t are erantdeiir ,o {fv the ellipsoid. [sent-226, score-0.231]
58 that, since y contains information about the 3D ellipsoid l? [sent-247, score-0.172]
59 ocation, it is able to take the 3D localization accuracy into account for designing the loss function Δ(yi , y¯) for the training process. [sent-248, score-0.297]
60 Obtaining the most violating sample ¯y is computationally inefficient if we infer y¯m with 2H·W binary variables, where H and W are the bounding box height and width, respectively. [sent-250, score-0.298]
61 We design the loss function Δ(yi , y¯) depending on both 2D and 3D localization accuracies. [sent-256, score-0.297]
62 To take into account both 2D and 3D localization accuracy, we propose to use the following intersection and union re-weighted over 2D and 3D. [sent-259, score-0.278]
63 To provide an accurate ground truth 3D locations of objects for both training and testing, we propose an annotation procedure that allows to efficiently annotate an object foreground mask and the associated 3D ellipsoid (Sec. [sent-266, score-0.874]
64 [2, 3] annotated locations of objects 1See the supplementary material [26] for the method intersecting volume between two ellipsoids. [sent-272, score-0.233]
65 [27] include range data along with accurate location with 3D cubes for outdoor scenes. [sent-275, score-0.179]
66 3D ellipsoids are good to capture the size of the object using the 3 major axes ofthe object, and also describe objects’ location in 3D space accurately. [sent-277, score-0.456]
67 Also, as described next, they can be used for providing a ground truth 3D object location’s and pose’s annotations more accurately and efficiently than bounding boxes do. [sent-278, score-0.402]
68 Using this tool, the annotator can simply draw a polygon capturing the object foreground, the 3D points corresponding to the pixels enclosed by the polygon are used to calculate the centroid and the principal axes of the ellipsoid tightly enclosing such 3D points. [sent-280, score-0.446]
69 Statistics related to our annotated ellipsoids and its comparison with other statistics can be found in the supplementary material [26]. [sent-282, score-0.312]
70 In our framework, the overlap ratio between ground truth and estimated ellipsoids are used to calculate the loss function for training the StLSVM model, as well as for evaluating 3D localization performance. [sent-287, score-0.64]
71 Implementation Details As for the experiments with the B3DO dataset, we concatenated HOG features calculated from deformable parts [8] with 3D features proposed in [14]. [sent-290, score-0.201]
72 Foreground Mask Accuracy There is a trade-off between the computational complexity and the number of hypothetical masks. [sent-294, score-0.185]
73 This is at the expense of the added computational time that is required to calculate features and apply the object model. [sent-296, score-0.162]
74 Thus, we set the number of hypothetical masks K to 10 for the experiments. [sent-300, score-0.306]
75 3D localization was not tested in [2], so we propose several baseline methods in Sec. [sent-306, score-0.288]
76 1 3D Detection Performance Similar to the Pascal Challenge criteria in 2D [29], the 3D localization is counted as correct if the overlapping volume between estimated ellipsoid and the ground truth ellipsoid is more than a threshold. [sent-313, score-0.695]
77 For a detected 2D bounding box, we project all the pixels inside that bounding box. [sent-318, score-0.312]
78 The ellipsoids are generated so as to enclose all the corresponding 3D points. [sent-319, score-0.231]
79 Among K hypothetical masks, we choose the top-ranked mask from [19] as a foreground mask. [sent-321, score-0.502]
80 3D location and the size of the object is estimated based on statistics for each object category. [sent-324, score-0.253]
81 In specific, a center of the 3D location for the object is set to the mean depth value inside the bounding box. [sent-325, score-0.395]
82 The size of the object in 3D (width, height, thickness) is set to the average size of objects for the object category collected from the training set. [sent-326, score-0.228]
83 1 shows the average precisions of 3D localization results of proposed method. [sent-330, score-0.362]
84 Typical 3D localization results can be found in Fig. [sent-338, score-0.245]
85 While there are remarkable improvements by using features from HFMs and 3D loss function, the 3While 2D detection often use 50% as a threshold, 3D localization is more challenging and 25% is a reasonable threshold for evaluation. [sent-342, score-0.375]
86 Figure 6: Average precisions of 3D object localization for 8 classes in B3DO dataset. [sent-344, score-0.453]
87 7 shows the average precisions of various detection results in 2D using the B3DO dataset. [sent-365, score-0.164]
88 The first method is calledpruning, where detected results are pruned out if the approximated object size (bounding box diagonal times mean depth) is different from the statistics of the dataset. [sent-367, score-0.163]
89 Note that in 333 111 888755 Figure 7: Average precisions of 2D object localization obtained by using DPM, the methods proposed in [2] and our method on B3DO dataset. [sent-376, score-0.453]
90 Figure 8: Average precisions of 3D object localization in the WRGBD dataset. [sent-378, score-0.453]
91 order to make the comparison with [3] fair, the features are extracted from RGB, depth map as well as estimated object size as in [3]. [sent-380, score-0.199]
92 We notice that the objects in this dataset have small variance in their size and pose, so that the baseline DPM+SizePrior already achieves a 3D localization accuracy of 32. [sent-387, score-0.334]
93 Al- Figure 9: Average precisions of 2D object localization from DPM (with features proposed in [3]) and our method on the WRGBD dataset. [sent-396, score-0.484]
94 We explored the idea of using segmentation hypotheses for the foreground object to guide the process of accurately localizing the object in 3D. [sent-401, score-0.635]
95 Directions for future work include the ability to integrate segmentation hypotheses in both 2D and 3D. [sent-403, score-0.248]
96 3 3 111888668 Ground Truth (2D) Failure Ground Truth (3D) DPM+FillMask DPM+1stMask DPM+SizePrior Ours cases Figure 10: This figure shows typical examples of object localization in 3D obtained using the proposed model and baseline methods. [sent-440, score-0.379]
97 Each column represents ground truth bounding boxes in 2D, ground truth bounding boxes in 3D, 3D localization results using 3 baseline methods, and 3D localization results using our method, respectively. [sent-441, score-1.085]
98 The localization results are drawn with black ellipsoids and green is used for ground truths. [sent-442, score-0.512]
99 Notice the ellipsoids estimated by our framework is very close to ground truth ellipsoids, whereas baseline methods give less well localized ellipsoids. [sent-444, score-0.346]
100 Fei-Fei, “Spatially coherent latent topic model for concurrent segmentation and classification of objects and scenes,” in ICCV, 2007. [sent-502, score-0.164]
wordName wordTfidf (topN-words)
[('hfm', 0.359), ('localization', 0.245), ('hfms', 0.239), ('ellipsoids', 0.231), ('hypothetical', 0.185), ('hypotheses', 0.183), ('hbb', 0.179), ('wrgbd', 0.179), ('dpm', 0.176), ('ellipsoid', 0.172), ('foreground', 0.17), ('bounding', 0.156), ('mask', 0.147), ('root', 0.121), ('masks', 0.121), ('sizeprior', 0.12), ('precisions', 0.117), ('ym', 0.115), ('savarese', 0.091), ('object', 0.091), ('ymim', 0.09), ('alarms', 0.079), ('depth', 0.077), ('dz', 0.077), ('box', 0.072), ('location', 0.071), ('violating', 0.07), ('cubes', 0.069), ('segmentation', 0.065), ('axes', 0.063), ('clouds', 0.062), ('michigan', 0.061), ('yl', 0.061), ('structural', 0.061), ('stlsvm', 0.06), ('yimm', 0.06), ('yli', 0.06), ('arbor', 0.059), ('locations', 0.058), ('umi', 0.058), ('dy', 0.057), ('responses', 0.055), ('lsvm', 0.053), ('latent', 0.053), ('loss', 0.052), ('ys', 0.049), ('annotation', 0.048), ('hypothesis', 0.048), ('boxes', 0.048), ('intersecting', 0.048), ('detection', 0.047), ('portions', 0.047), ('matching', 0.047), ('objects', 0.046), ('washington', 0.043), ('baseline', 0.043), ('localize', 0.043), ('cz', 0.042), ('material', 0.042), ('pepik', 0.041), ('cup', 0.04), ('polygon', 0.04), ('calculate', 0.04), ('accurate', 0.039), ('costs', 0.039), ('ann', 0.039), ('supplementary', 0.039), ('fox', 0.038), ('concatenated', 0.038), ('part', 0.038), ('yi', 0.037), ('darrell', 0.036), ('cloud', 0.036), ('ground', 0.036), ('berkeley', 0.036), ('truth', 0.036), ('deformable', 0.035), ('saenko', 0.035), ('stark', 0.035), ('localizing', 0.035), ('annotations', 0.035), ('lai', 0.034), ('counted', 0.034), ('calculated', 0.034), ('dx', 0.034), ('transform', 0.034), ('filters', 0.034), ('precision', 0.033), ('union', 0.033), ('fritz', 0.033), ('explores', 0.033), ('cx', 0.033), ('schiele', 0.033), ('associating', 0.033), ('rgbd', 0.032), ('cy', 0.032), ('width', 0.032), ('parts', 0.032), ('associated', 0.031), ('features', 0.031)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999976 30 cvpr-2013-Accurate Localization of 3D Objects from RGB-D Data Using Segmentation Hypotheses
Author: Byung-soo Kim, Shili Xu, Silvio Savarese
Abstract: In this paper we focus on the problem of detecting objects in 3D from RGB-D images. We propose a novel framework that explores the compatibility between segmentation hypotheses of the object in the image and the corresponding 3D map. Our framework allows to discover the optimal location of the object using a generalization of the structural latent SVM formulation in 3D as well as the definition of a new loss function defined over the 3D space in training. We evaluate our method using two existing RGB-D datasets. Extensive quantitative and qualitative experimental results show that our proposed approach outperforms state-of-theart as methods well as a number of baseline approaches for both 3D and 2D object recognition tasks.
2 0.2268821 70 cvpr-2013-Bottom-Up Segmentation for Top-Down Detection
Author: Sanja Fidler, Roozbeh Mottaghi, Alan Yuille, Raquel Urtasun
Abstract: In this paper we are interested in how semantic segmentation can help object detection. Towards this goal, we propose a novel deformable part-based model which exploits region-based segmentation algorithms that compute candidate object regions by bottom-up clustering followed by ranking of those regions. Our approach allows every detection hypothesis to select a segment (including void), and scores each box in the image using both the traditional HOG filters as well as a set of novel segmentation features. Thus our model “blends ” between the detector and segmentation models. Since our features can be computed very efficiently given the segments, we maintain the same complexity as the original DPM [14]. We demonstrate the effectiveness of our approach in PASCAL VOC 2010, and show that when employing only a root filter our approach outperforms Dalal & Triggs detector [12] on all classes, achieving 13% higher average AP. When employing the parts, we outperform the original DPM [14] in 19 out of 20 classes, achieving an improvement of 8% AP. Furthermore, we outperform the previous state-of-the-art on VOC’10 test by 4%.
3 0.18410152 248 cvpr-2013-Learning Collections of Part Models for Object Recognition
Author: Ian Endres, Kevin J. Shih, Johnston Jiaa, Derek Hoiem
Abstract: We propose a method to learn a diverse collection of discriminative parts from object bounding box annotations. Part detectors can be trained and applied individually, which simplifies learning and extension to new features or categories. We apply the parts to object category detection, pooling part detections within bottom-up proposed regions and using a boosted classifier with proposed sigmoid weak learners for scoring. On PASCAL VOC 2010, we evaluate the part detectors ’ ability to discriminate and localize annotated keypoints. Our detection system is competitive with the best-existing systems, outperforming other HOG-based detectors on the more deformable categories.
4 0.17784101 256 cvpr-2013-Learning Structured Hough Voting for Joint Object Detection and Occlusion Reasoning
Author: Tao Wang, Xuming He, Nick Barnes
Abstract: Wepropose a structuredHough voting methodfor detecting objects with heavy occlusion in indoor environments. First, we extend the Hough hypothesis space to include both object location and its visibility pattern, and design a new score function that accumulates votes for object detection and occlusion prediction. In addition, we explore the correlation between objects and their environment, building a depth-encoded object-context model based on RGB-D data. Particularly, we design a layered context representation and .barne s }@ nict a . com .au (a)(b)(c) (d)(e)(f) allow image patches from both objects and backgrounds voting for the object hypotheses. We demonstrate that using a data-driven 2.1D representation we can learn visual codebooks with better quality, and more interpretable detection results in terms of spatial relationship between objects and viewer. We test our algorithm on two challenging RGB-D datasets with significant occlusion and intraclass variation, and demonstrate the superior performance of our method.
5 0.16415219 364 cvpr-2013-Robust Object Co-detection
Author: Xin Guo, Dong Liu, Brendan Jou, Mojun Zhu, Anni Cai, Shih-Fu Chang
Abstract: Object co-detection aims at simultaneous detection of objects of the same category from a pool of related images by exploiting consistent visual patterns present in candidate objects in the images. The related image set may contain a mixture of annotated objects and candidate objects generated by automatic detectors. Co-detection differs from the conventional object detection paradigm in which detection over each test image is determined one-by-one independently without taking advantage of common patterns in the data pool. In this paper, we propose a novel, robust approach to dramatically enhance co-detection by extracting a shared low-rank representation of the object instances in multiple feature spaces. The idea is analogous to that of the well-known Robust PCA [28], but has not been explored in object co-detection so far. The representation is based on a linear reconstruction over the entire data set and the low-rank approach enables effective removal of noisy and outlier samples. The extracted low-rank representation can be used to detect the target objects by spectral clustering. Extensive experiments over diverse benchmark datasets demonstrate consistent and significant performance gains of the proposed method over the state-of-the-art object codetection method and the generic object detection methods without co-detection formulations.
6 0.1638166 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases
7 0.13070184 154 cvpr-2013-Explicit Occlusion Modeling for 3D Object Class Representations
8 0.12905611 370 cvpr-2013-SCALPEL: Segmentation Cascades with Localized Priors and Efficient Learning
9 0.12522145 311 cvpr-2013-Occlusion Patterns for Object Class Detection
10 0.1185801 163 cvpr-2013-Fast, Accurate Detection of 100,000 Object Classes on a Single Machine
11 0.11677863 217 cvpr-2013-Improving an Object Detector and Extracting Regions Using Superpixels
12 0.11366483 325 cvpr-2013-Part Discovery from Partial Correspondence
13 0.11141816 207 cvpr-2013-Human Pose Estimation Using a Joint Pixel-wise and Part-wise Formulation
14 0.1092109 450 cvpr-2013-Unsupervised Joint Object Discovery and Segmentation in Internet Images
15 0.1049605 445 cvpr-2013-Understanding Bayesian Rooms Using Composite 3D Object Models
16 0.10472471 245 cvpr-2013-Layer Depth Denoising and Completion for Structured-Light RGB-D Cameras
17 0.10198469 398 cvpr-2013-Single-Pedestrian Detection Aided by Multi-pedestrian Detection
18 0.10152113 173 cvpr-2013-Finding Things: Image Parsing with Regions and Per-Exemplar Detectors
19 0.10125292 273 cvpr-2013-Looking Beyond the Image: Unsupervised Learning for Object Saliency and Detection
20 0.095706306 363 cvpr-2013-Robust Multi-resolution Pedestrian Detection in Traffic Scenes
topicId topicWeight
[(0, 0.239), (1, -0.005), (2, 0.059), (3, -0.052), (4, 0.092), (5, 0.013), (6, 0.089), (7, 0.133), (8, -0.017), (9, -0.027), (10, -0.082), (11, -0.073), (12, 0.046), (13, -0.006), (14, 0.017), (15, -0.082), (16, -0.01), (17, 0.048), (18, -0.08), (19, 0.035), (20, 0.002), (21, 0.008), (22, 0.112), (23, -0.031), (24, 0.134), (25, -0.077), (26, -0.019), (27, -0.007), (28, -0.036), (29, -0.026), (30, 0.0), (31, 0.01), (32, 0.035), (33, -0.045), (34, 0.006), (35, 0.002), (36, 0.075), (37, -0.068), (38, 0.004), (39, 0.061), (40, 0.02), (41, 0.015), (42, 0.033), (43, 0.005), (44, -0.03), (45, -0.047), (46, 0.072), (47, 0.029), (48, 0.058), (49, 0.004)]
simIndex simValue paperId paperTitle
same-paper 1 0.96194893 30 cvpr-2013-Accurate Localization of 3D Objects from RGB-D Data Using Segmentation Hypotheses
Author: Byung-soo Kim, Shili Xu, Silvio Savarese
Abstract: In this paper we focus on the problem of detecting objects in 3D from RGB-D images. We propose a novel framework that explores the compatibility between segmentation hypotheses of the object in the image and the corresponding 3D map. Our framework allows to discover the optimal location of the object using a generalization of the structural latent SVM formulation in 3D as well as the definition of a new loss function defined over the 3D space in training. We evaluate our method using two existing RGB-D datasets. Extensive quantitative and qualitative experimental results show that our proposed approach outperforms state-of-theart as methods well as a number of baseline approaches for both 3D and 2D object recognition tasks.
2 0.86699283 70 cvpr-2013-Bottom-Up Segmentation for Top-Down Detection
Author: Sanja Fidler, Roozbeh Mottaghi, Alan Yuille, Raquel Urtasun
Abstract: In this paper we are interested in how semantic segmentation can help object detection. Towards this goal, we propose a novel deformable part-based model which exploits region-based segmentation algorithms that compute candidate object regions by bottom-up clustering followed by ranking of those regions. Our approach allows every detection hypothesis to select a segment (including void), and scores each box in the image using both the traditional HOG filters as well as a set of novel segmentation features. Thus our model “blends ” between the detector and segmentation models. Since our features can be computed very efficiently given the segments, we maintain the same complexity as the original DPM [14]. We demonstrate the effectiveness of our approach in PASCAL VOC 2010, and show that when employing only a root filter our approach outperforms Dalal & Triggs detector [12] on all classes, achieving 13% higher average AP. When employing the parts, we outperform the original DPM [14] in 19 out of 20 classes, achieving an improvement of 8% AP. Furthermore, we outperform the previous state-of-the-art on VOC’10 test by 4%.
3 0.81554669 256 cvpr-2013-Learning Structured Hough Voting for Joint Object Detection and Occlusion Reasoning
Author: Tao Wang, Xuming He, Nick Barnes
Abstract: Wepropose a structuredHough voting methodfor detecting objects with heavy occlusion in indoor environments. First, we extend the Hough hypothesis space to include both object location and its visibility pattern, and design a new score function that accumulates votes for object detection and occlusion prediction. In addition, we explore the correlation between objects and their environment, building a depth-encoded object-context model based on RGB-D data. Particularly, we design a layered context representation and .barne s }@ nict a . com .au (a)(b)(c) (d)(e)(f) allow image patches from both objects and backgrounds voting for the object hypotheses. We demonstrate that using a data-driven 2.1D representation we can learn visual codebooks with better quality, and more interpretable detection results in terms of spatial relationship between objects and viewer. We test our algorithm on two challenging RGB-D datasets with significant occlusion and intraclass variation, and demonstrate the superior performance of our method.
4 0.81422621 364 cvpr-2013-Robust Object Co-detection
Author: Xin Guo, Dong Liu, Brendan Jou, Mojun Zhu, Anni Cai, Shih-Fu Chang
Abstract: Object co-detection aims at simultaneous detection of objects of the same category from a pool of related images by exploiting consistent visual patterns present in candidate objects in the images. The related image set may contain a mixture of annotated objects and candidate objects generated by automatic detectors. Co-detection differs from the conventional object detection paradigm in which detection over each test image is determined one-by-one independently without taking advantage of common patterns in the data pool. In this paper, we propose a novel, robust approach to dramatically enhance co-detection by extracting a shared low-rank representation of the object instances in multiple feature spaces. The idea is analogous to that of the well-known Robust PCA [28], but has not been explored in object co-detection so far. The representation is based on a linear reconstruction over the entire data set and the low-rank approach enables effective removal of noisy and outlier samples. The extracted low-rank representation can be used to detect the target objects by spectral clustering. Extensive experiments over diverse benchmark datasets demonstrate consistent and significant performance gains of the proposed method over the state-of-the-art object codetection method and the generic object detection methods without co-detection formulations.
5 0.77823216 445 cvpr-2013-Understanding Bayesian Rooms Using Composite 3D Object Models
Author: Luca Del_Pero, Joshua Bowdish, Bonnie Kermgard, Emily Hartley, Kobus Barnard
Abstract: We develop a comprehensive Bayesian generative model for understanding indoor scenes. While it is common in this domain to approximate objects with 3D bounding boxes, we propose using strong representations with finer granularity. For example, we model a chair as a set of four legs, a seat and a backrest. We find that modeling detailed geometry improves recognition and reconstruction, and enables more refined use of appearance for scene understanding. We demonstrate this with a new likelihood function that re- wards 3D object hypotheses whose 2D projection is more uniform in color distribution. Such a measure would be confused by background pixels if we used a bounding box to represent a concave object like a chair. Complex objects are modeled using a set or re-usable 3D parts, and we show that this representation captures much of the variation among object instances with relatively few parameters. We also designed specific data-driven inference mechanismsfor eachpart that are shared by all objects containing that part, which helps make inference transparent to the modeler. Further, we show how to exploit contextual relationships to detect more objects, by, for example, proposing chairs around and underneath tables. We present results showing the benefits of each of these innovations. The performance of our approach often exceeds that of state-of-the-art methods on the two tasks of room layout estimation and object recognition, as evaluated on two bench mark data sets used in this domain. work. 1) Detailed geometric models, such as tables with legs and top (bottom left), provide better reconstructions than plain boxes (top right), when supported by image features such as geometric context [5] (top middle), or an approach to using color introduced here. 2) Non convex models allow for complex configurations, such as a chair under a table (bottom middle). 3) 3D contextual relationships, such as chairs being around a table, allow identifying objects supported by little image evidence, like the chair behind the table (bottom right). Best viewed in color.
6 0.76754344 248 cvpr-2013-Learning Collections of Part Models for Object Recognition
7 0.76342607 1 cvpr-2013-3D-Based Reasoning with Blocks, Support, and Stability
8 0.73981094 247 cvpr-2013-Learning Class-to-Image Distance with Object Matchings
9 0.71074241 370 cvpr-2013-SCALPEL: Segmentation Cascades with Localized Priors and Efficient Learning
10 0.70810825 221 cvpr-2013-Incorporating Structural Alternatives and Sharing into Hierarchy for Multiclass Object Recognition and Detection
11 0.70609784 145 cvpr-2013-Efficient Object Detection and Segmentation for Fine-Grained Recognition
12 0.69516659 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases
13 0.68532199 325 cvpr-2013-Part Discovery from Partial Correspondence
14 0.68334979 154 cvpr-2013-Explicit Occlusion Modeling for 3D Object Class Representations
15 0.68043101 144 cvpr-2013-Efficient Maximum Appearance Search for Large-Scale Object Detection
16 0.67851615 163 cvpr-2013-Fast, Accurate Detection of 100,000 Object Classes on a Single Machine
17 0.66833854 173 cvpr-2013-Finding Things: Image Parsing with Regions and Per-Exemplar Detectors
18 0.65645343 122 cvpr-2013-Detection Evolution with Multi-order Contextual Co-occurrence
19 0.62854195 417 cvpr-2013-Subcategory-Aware Object Classification
20 0.62810302 67 cvpr-2013-Blocks That Shout: Distinctive Parts for Scene Classification
topicId topicWeight
[(10, 0.142), (16, 0.034), (26, 0.049), (28, 0.02), (33, 0.202), (39, 0.223), (67, 0.086), (69, 0.071), (80, 0.011), (87, 0.095)]
simIndex simValue paperId paperTitle
1 0.88674647 381 cvpr-2013-Scene Parsing by Integrating Function, Geometry and Appearance Models
Author: Yibiao Zhao, Song-Chun Zhu
Abstract: Indoor functional objects exhibit large view and appearance variations, thus are difficult to be recognized by the traditional appearance-based classification paradigm. In this paper, we present an algorithm to parse indoor images based on two observations: i) The functionality is the most essentialproperty to define an indoor object, e.g. “a chair to sit on ”; ii) The geometry (3D shape) ofan object is designed to serve its function. We formulate the nature of the object function into a stochastic grammar model. This model characterizes a joint distribution over the function-geometryappearance (FGA) hierarchy. The hierarchical structure includes a scene category, , functional groups, , functional objects, functional parts and 3D geometric shapes. We use a simulated annealing MCMC algorithm to find the maximum a posteriori (MAP) solution, i.e. a parse tree. We design four data-driven steps to accelerate the search in the FGA space: i) group the line segments into 3D primitive shapes, ii) assign functional labels to these 3D primitive shapes, iii) fill in missing objects/parts according to the functional labels, and iv) synthesize 2D segmentation maps and verify the current parse tree by the Metropolis-Hastings acceptance probability. The experimental results on several challenging indoor datasets demonstrate theproposed approach not only significantly widens the scope ofindoor sceneparsing algorithm from the segmentation and the 3D recovery to the functional object recognition, but also yields improved overall performance.
2 0.86253536 136 cvpr-2013-Discriminatively Trained And-Or Tree Models for Object Detection
Author: Xi Song, Tianfu Wu, Yunde Jia, Song-Chun Zhu
Abstract: This paper presents a method of learning reconfigurable And-Or Tree (AOT) models discriminatively from weakly annotated data for object detection. To explore the appearance and geometry space of latent structures effectively, we first quantize the image lattice using an overcomplete set of shape primitives, and then organize them into a directed acyclic And-Or Graph (AOG) by exploiting their compositional relations. We allow overlaps between child nodes when combining them into a parent node, which is equivalent to introducing an appearance Or-node implicitly for the overlapped portion. The learning of an AOT model consists of three components: (i) Unsupervised sub-category learning (i.e., branches of an object Or-node) with the latent structures in AOG being integrated out. (ii) Weaklysupervised part configuration learning (i.e., seeking the globally optimal parse trees in AOG for each sub-category). To search the globally optimal parse tree in AOG efficiently, we propose a dynamic programming (DP) algorithm. (iii) Joint appearance and structural parameters training under latent structural SVM framework. In experiments, our method is tested on PASCAL VOC 2007 and 2010 detection , benchmarks of 20 object classes and outperforms comparable state-of-the-art methods.
3 0.84836715 240 cvpr-2013-Keypoints from Symmetries by Wave Propagation
Author: Samuele Salti, Alessandro Lanza, Luigi Di_Stefano
Abstract: The paper conjectures and demonstrates that repeatable keypoints based on salient symmetries at different scales can be detected by a novel analysis grounded on the wave equation rather than the heat equation underlying traditional Gaussian scale–space theory. While the image structures found by most state-of-the-art detectors, such as blobs and corners, occur typically on planar highly textured surfaces, salient symmetries are widespread in diverse kinds of images, including those related to untextured objects, which are hardly dealt with by current feature-based recognition pipelines. We provide experimental results on standard datasets and also contribute with a new dataset focused on untextured objects. Based on the positive experimental results, we hope to foster further research on the promising topic ofscale invariant analysis through the wave equation.
same-paper 4 0.8121236 30 cvpr-2013-Accurate Localization of 3D Objects from RGB-D Data Using Segmentation Hypotheses
Author: Byung-soo Kim, Shili Xu, Silvio Savarese
Abstract: In this paper we focus on the problem of detecting objects in 3D from RGB-D images. We propose a novel framework that explores the compatibility between segmentation hypotheses of the object in the image and the corresponding 3D map. Our framework allows to discover the optimal location of the object using a generalization of the structural latent SVM formulation in 3D as well as the definition of a new loss function defined over the 3D space in training. We evaluate our method using two existing RGB-D datasets. Extensive quantitative and qualitative experimental results show that our proposed approach outperforms state-of-theart as methods well as a number of baseline approaches for both 3D and 2D object recognition tasks.
5 0.7757504 225 cvpr-2013-Integrating Grammar and Segmentation for Human Pose Estimation
Author: Brandon Rothrock, Seyoung Park, Song-Chun Zhu
Abstract: In this paper we present a compositional and-or graph grammar model for human pose estimation. Our model has three distinguishing features: (i) large appearance differences between people are handled compositionally by allowingparts or collections ofparts to be substituted with alternative variants, (ii) each variant is a sub-model that can define its own articulated geometry and context-sensitive compatibility with neighboring part variants, and (iii) background region segmentation is incorporated into the part appearance models to better estimate the contrast of a part region from its surroundings, and improve resilience to background clutter. The resulting integrated framework is trained discriminatively in a max-margin framework using an efficient and exact inference algorithm. We present experimental evaluation of our model on two popular datasets, and show performance improvements over the state-of-art on both benchmarks.
6 0.77350843 359 cvpr-2013-Robust Discriminative Response Map Fitting with Constrained Local Models
7 0.76328146 248 cvpr-2013-Learning Collections of Part Models for Object Recognition
8 0.76288497 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases
9 0.76224262 61 cvpr-2013-Beyond Point Clouds: Scene Understanding by Reasoning Geometry and Physics
10 0.76089329 397 cvpr-2013-Simultaneous Super-Resolution of Depth and Images Using a Single Camera
12 0.75200981 414 cvpr-2013-Structure Preserving Object Tracking
13 0.75149381 365 cvpr-2013-Robust Real-Time Tracking of Multiple Objects by Volumetric Mass Densities
14 0.74935466 292 cvpr-2013-Multi-agent Event Detection: Localization and Role Assignment
15 0.7480042 400 cvpr-2013-Single Image Calibration of Multi-axial Imaging Systems
16 0.74531019 408 cvpr-2013-Spatiotemporal Deformable Part Models for Action Detection
17 0.74517649 331 cvpr-2013-Physically Plausible 3D Scene Tracking: The Single Actor Hypothesis
18 0.74516273 325 cvpr-2013-Part Discovery from Partial Correspondence
19 0.74512839 285 cvpr-2013-Minimum Uncertainty Gap for Robust Visual Tracking
20 0.74421883 288 cvpr-2013-Modeling Mutual Visibility Relationship in Pedestrian Detection