cvpr cvpr2013 cvpr2013-154 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: M. Zeeshan Zia, Michael Stark, Konrad Schindler
Abstract: Despite the success of current state-of-the-art object class detectors, severe occlusion remains a major challenge. This is particularly true for more geometrically expressive 3D object class representations. While these representations have attracted renewed interest for precise object pose estimation, the focus has mostly been on rather clean datasets, where occlusion is not an issue. In this paper, we tackle the challenge of modeling occlusion in the context of a 3D geometric object class model that is capable of fine-grained, part-level 3D object reconstruction. Following the intuition that 3D modeling should facilitate occlusion reasoning, we design an explicit representation of likely geometric occlusion patterns. Robustness is achieved by pooling image evidence from of a set of fixed part detectors as well as a non-parametric representation of part configurations in the spirit of poselets. We confirm the potential of our method on cars in a newly collected data set of inner-city street scenes with varying levels of occlusion, and demonstrate superior performance in occlusion estimation and part localization, compared to baselines that are unaware of occlusions.
Reference: text
sentIndex sentText sentNum sentScore
1 This is particularly true for more geometrically expressive 3D object class representations. [sent-3, score-0.174]
2 While these representations have attracted renewed interest for precise object pose estimation, the focus has mostly been on rather clean datasets, where occlusion is not an issue. [sent-4, score-0.649]
3 In this paper, we tackle the challenge of modeling occlusion in the context of a 3D geometric object class model that is capable of fine-grained, part-level 3D object reconstruction. [sent-5, score-0.724]
4 Following the intuition that 3D modeling should facilitate occlusion reasoning, we design an explicit representation of likely geometric occlusion patterns. [sent-6, score-0.981]
5 Robustness is achieved by pooling image evidence from of a set of fixed part detectors as well as a non-parametric representation of part configurations in the spirit of poselets. [sent-7, score-0.654]
6 We confirm the potential of our method on cars in a newly collected data set of inner-city street scenes with varying levels of occlusion, and demonstrate superior performance in occlusion estimation and part localization, compared to baselines that are unaware of occlusions. [sent-8, score-0.604]
7 Introduction In recent years there has been a renewed interest in 3D object (class) models for recognition and detection. [sent-10, score-0.178]
8 This trend has lead to a fruitful confluence of ideas from object detection on one side and 3D computer vision on the other side. [sent-11, score-0.132]
9 State-of-the-art methods are not only capable of view-point invariant object categorization, but also give an estimate of the object’s 3D pose [28, 21], and the locations of its parts [20, 26]. [sent-12, score-0.313]
10 Some go as far as estimating 3D wireframe models and continuous pose from single images [40, 19, 39]. [sent-13, score-0.213]
11 Here, we focus on the problem of (partial) occlusion by other scene parts. [sent-15, score-0.402]
12 sion pattern of an object is valuable information both for the object detector itself and for higher-level scene models that use the object class model. [sent-18, score-0.315]
13 In fact, 3D object detection under severe occlusions is still a largely open problem. [sent-19, score-0.188]
14 Most detectors [5, 9] break down at occlusion levels of ≈ 20%. [sent-20, score-0.482]
15 However, when working with an explicit 3D representation of an object class, it should in principle be possible to estimate that pattern. [sent-21, score-0.17]
16 Addressing self-occlusion is rather straight-forward with a 3D representation [37, 40], since it is fully determined by the object shape and pose. [sent-22, score-0.16]
17 On the other hand, inter-object occlusion is much harder to model, because it introduces relatively many additional unknowns (the occlusion states of all individual regions/parts of the object). [sent-23, score-0.804]
18 Some part-based models resort to a data-driven strategy: every individual part can be occluded or unoccluded, and that latent state is estimated together with the object shape and pose [20, 12]. [sent-24, score-0.414]
19 Such a model has two weaknesses: first, it does not make any assumptions about the nature of the occluder, and can therefore lead to rather unlikely occlusion patterns (e. [sent-25, score-0.402]
20 The latter is due to the tendency to simply label any individual part as occluded whenever it does not fit the evidence, and the associated brittle trade-off between the likelihood of occlusion and the uncertainty of the image evidence. [sent-29, score-0.62]
21 We argue that in many scenarios a per-part occlusion model is unnecessarily general. [sent-30, score-0.402]
22 Rather, one can put a strong prior on the co-occurrence of part occlusions, because most occluders are compact objects, and all one needs to know 333333222644 about them is the (also compact) projection of their outline onto the image plane. [sent-31, score-0.441]
23 We therefore propose to restrict the possible occluders to a small finite set that can be explicitly enumerated, and to estimate the type of occluder and its location during inference. [sent-32, score-0.564]
24 The very simple, but powerful intuition behind this is that when restricted to compact regions inside the object’s bounding box, the number of possible occlusion patterns is in fact very small. [sent-33, score-0.532]
25 Still such an occluder model is more general than one that only truncates the bounding box from left, right, above or below (e. [sent-34, score-0.434]
26 The contribution described in this paper is a viewpointinvariant method for detailed reconstruction of severely occluded objects in monocular images. [sent-43, score-0.154]
27 To obtain a complete framework for detection and reconstruction, the novel method is initialized with a variant of the poselets framework [2] adapted to the needs of our 3D object model. [sent-44, score-0.205]
28 Experiments on images with strong oc- clusions show that the model can correctly infer even large occluders, and enables monocular 3D modeling in situations where representations without occlusion model fail. [sent-46, score-0.512]
29 Related work In the early days of computer vision, 3D object models with a lot of geometric detail [27, 3, 22, 30] commanded a lot of interest, but unfortunately failed to tackle challenging real world imagery. [sent-48, score-0.185]
30 Most current object class detectors provide coarse outputs in the form of 2D or 3D bounding boxes along with classification into a discrete set of viewpoints [38, 28, 21, 9, 24, 29, 33, 25, 13]. [sent-49, score-0.484]
31 Recently, there has been renewed interest in providing geometrically more detailed outputs, with different degrees of geometric consistency across viewpoints [20, 40, 37, 26, 15, 39]. [sent-50, score-0.304]
32 Unfortunately occlusion, which is one of the most challenging impediments to visual object class modeling, has largely remained untouched in the context of such finegrained object models. [sent-54, score-0.315]
33 Fixed global object models have been known to give good results for fully visible object recognition [5], often outperforming part-based models. [sent-56, score-0.257]
34 Partbased 3D object models with strong geometric constraints as [20, 40] are thus strong candidates for part-level occlusion reasoning: they can cope with locally missing evidence, but still ensure the relative part placement always corresponds to a plausible global shape. [sent-58, score-0.748]
35 On the downside, these are computationally fairly expensive models, therefore their evaluation on images in [20] is limited to a small bounding box around the object of interest. [sent-59, score-0.235]
36 Note that the two layers go together well, since spatially compact occluders will leave configurations of adjacent object parts (“poselets”) visible. [sent-61, score-0.778]
37 Model We propose to split 3D object detection and modeling into two layers. [sent-63, score-0.169]
38 The first layer is a representation in the spirit of the poselet framework [2], i. [sent-64, score-0.402]
39 a collection of viewpoint-dependent part configurations tied together by relatively loose geometric constraints. [sent-66, score-0.397]
40 The purpose of this layer is to find, in a large image, approximate 2D bounding boxes with rough initial estimates of the objects’ pose. [sent-67, score-0.385]
41 The part-based structure enables the model to deal with partial occlusion, and provides evidence for visible configurations that can be used in the second layer. [sent-68, score-0.395]
42 The second layer is a 3D active shape model based on local parts, augmented with a collection of explicit occlusion masks. [sent-69, score-0.924]
43 The ASM tightly constrains the object geometry to plausible shapes, and thus can more robustly predict object shape when parts are occluded, respectively predict the locations of the occluded parts. [sent-70, score-0.54]
44 The model also includes the activations of the configurations from the first layer as additional evidence, tying the two layers together. [sent-71, score-0.512]
45 Parts and part configurations We start the explanation with the local appearance model. [sent-74, score-0.304]
46 (a) Two larger part configurations comprising of multiple smaller parts, as well as their relative distributions, (b) a few example occlusion masks. [sent-77, score-0.768]
47 The classifier is viewpoint-invariant, meaning that one class label includes views of a part over all poses in which the part is visible [39]. [sent-79, score-0.306]
48 The basic unit of the first layer are larger part configurations ranging in size from 25% to 60% of the full object extent. [sent-87, score-0.726]
49 These are defined in the spirit of poselets: Small sets of the local parts described above are chosen and clustered (with standard k-means) according to the parts’ spatial layout. [sent-88, score-0.204]
50 The advantage of this clustering is that it discovers when object portions have high variability in appearance, e. [sent-89, score-0.184]
51 The first layer follows the philosophy of the ISM/poselet method. [sent-98, score-0.295]
52 For each configuration the mean offset from the object centroid as well as the mean relative scale are stored during training, and at test time detected configurations cast a vote for the object center and scale. [sent-99, score-0.564]
53 After non-maximum suppression, the output of the first layer consists of a set of approximate 2D bounding boxes, each with a coarse pose estimate 1Also, training is two orders of magnitude faster. [sent-101, score-0.49]
54 The second layer utilizes a more explicit representation of global object geometry that is better suited for estimating detailed 3D object shape and pose. [sent-103, score-0.719]
55 In the tradition of active shape models we learn a deformable 3D wireframe from annotated 3D CAD models, like in [40, 39]. [sent-104, score-0.29]
56 The wireframe model is defined through an ordered collection of n vertices in 3D-space, chosen at salient points on the object surface in a fixed topological layout. [sent-105, score-0.305]
57 Following standard point-based shape analysis [4] the object shape and variability are represented as the sum of a mean wireframe μ and deformations along r principal component directions pj . [sent-106, score-0.43]
58 t Tnh oef parts cover xth (e≈ f1u0ll% %e xinte snitz eo fo ft thhee represented object class, thus they allow for fine-grained estimation of 3D geometry and continuous pose, as well as for detailed reasoning about occlusion relations. [sent-113, score-0.817]
59 We point out once more that these parts are viewpoint-independent, i. [sent-114, score-0.156]
60 Explicit occluder representation While the first layer contains only implicit information about occluders (in the form of supposedly visible, but un- detected configurations), the second layer includes an explicit occluder representation. [sent-119, score-1.599]
61 Due to the object being modeled as a sparse collection of parts, occluders can only be distinguished if the visibility of at least one part changes, which further reduces the space of possible occluders. [sent-121, score-0.487]
62 Thus, one can well approximate the set of all occluders by a discrete set of occlusion masks a (for convenience we denote the empty mask which leaves the object fully visible by a0). [sent-122, score-1.053]
63 With that set, we aim to explicitly recover the occlusion pattern during second-layer inference, by selecting one of the masks. [sent-125, score-0.402]
64 All parts falling inside the occlusion mask are considered occluded, and consequently their detection scores are not considered in the objective function (Sec. [sent-126, score-0.702]
65 Instead, they are assigned a fixed low score, corresponding to a weak uniform prior that prefers parts to be visible and counters the bias to “hide behind the occluder”. [sent-129, score-0.308]
66 Occlusion of parts is modeled by indicator functions oj (s, θ, a), where j represents the part index, s represents the object geometry (3. [sent-130, score-0.557]
67 The set of masks ai act as a prior that specifies which parts occlusions can co-occur. [sent-132, score-0.278]
68 For completeness we mention that object self333333222866 occlusion is modeled with the same indicator variables, but does not require separate treatment, since it is completely determined by shape and pose. [sent-133, score-0.562]
69 Shape, pose, and occlusion estimation During inference, we attempt to find instances of the 3D shape model and of the occlusion mask that best explain the observed image evidence. [sent-136, score-1.013]
70 Recall that we wish to estimate an object’s 3D pose (5 parameters, assuming no in-plane rotation), geometric shape (7 ASM shape parameters), and an occluder index (1 parameter). [sent-137, score-0.558]
71 We therefore first cut down the search space in the first layer with a simpler and more robust object detection step, and then fit the full model locally at a small number of (candidate) detections. [sent-139, score-0.466]
72 First layer inference starts by detecting instances of our part configurations in the image with the corresponding DPM detectors. [sent-140, score-0.636]
73 Each detected configuration casts an associated vote for the full object 2D location and scale q = (qx , qy , qs), and for the pose θ = (θaz , θel). [sent-141, score-0.369]
74 At this point, the azimuth angle is restricted to a small set of discrete steps and the elevation angle is fixed, both to be refined in the second layer. [sent-142, score-0.153]
75 The votes are clustered with a greedy agglomerative clustering scheme as in [2] to obtain detection hypotheses H, each with a list of contributing configutriaotinon hsy p{olt1h . [sent-143, score-0.131]
76 configurations are made up of multiple parts confined to a spe- cific layout with little spatial variability (Sec. [sent-149, score-0.463]
77 1), their detected instances li already provide some information about the part locations in image space. [sent-151, score-0.164]
78 The means μij and covariances σi2j ofthe parts’ locations within a bounding box are estimated from the training data, and vij are binary flags indicating which parts j are found within the configuration li. [sent-152, score-0.448]
79 2(a) illustrates two such larger configurations, whose detection can be used to predict the location of the constituent parts as gaussian distributions with the respective means and covariances relative to the bounding box of the configuration. [sent-154, score-0.438]
80 After evaluating the first layer of the model we are left with a sparse set of (putative) detections, such that we can afford to evaluate a relatively expensive objective function. [sent-156, score-0.295]
81 We denote an object instance by h = (s, f,θ, q, a) , comprising of shape parameters s (eqn. [sent-157, score-0.222]
82 jm=1 oj (s, θ, a0) normalizes for the varying number of sel? [sent-171, score-0.188]
83 ly visible part there are three terms: Lv is the eevacidhen pocete Sj (aςll,y xj) ifbolre part j hife riet aisr evi tshirbelee, t eformunsd: by looking up the detection score at image location xj and scale ς. [sent-174, score-0.421]
84 Other than the remaining parameters, the mask indices a are discrete and have no obvious ordering. [sent-193, score-0.139]
85 The inference is initialized at the location, scale and pose returned by the first layer, while the initial shape parameters are chosen randomly and the occlusion mask is set to a0. [sent-198, score-0.643]
86 Experiments In the following, we evaluate the performance of our approach in detail, focusing on its ability to recover finegrained, part-level accurate object shape and accompanying occlusion estimates. [sent-200, score-0.562]
87 5), and to estimate occluded object portions (in the form of part occlusion labels), for varying levels of occlusion (Sect. [sent-205, score-1.119]
88 The set of 288 occlusion masks has been generated automatically and pruned manually to exclude very unlikely masks. [sent-209, score-0.468]
89 It consists of 101 images of resolution 2 mega-pixels, showing street scenes with cars, with occlusions ranging from 0% to > 60% of the bounding box as well as the parts. [sent-213, score-0.252]
90 Although there are several publicly available car datasets, none of them is suitable for our purposes, sinc we found that part detector performance deteriorates significantly for objects smaller than 60 pixels in height. [sent-214, score-0.138]
91 (ii) the ASM model of [40, 39], which corresponds to the second layer of our model without any form of occlusion reasoning (i. [sent-223, score-0.774]
92 assuming that all parts are visible except for self-occlusions), and without using the part configurations from the first layer. [sent-225, score-0.541]
93 (iii) the proposed model, including prediction of occluders, but not using the configurations during secondlayer inference. [sent-226, score-0.253]
94 (iv) our full model with occluder prediction and leveraging additional evidence from configurations for second-layer inference. [sent-227, score-0.64]
95 a combination of DPM configuration detectors and poseletstyle voting, is competitive with alternative algorithms for detecting objects in 2D. [sent-232, score-0.177]
96 We also include the deformable part model (DPM, [9]), both trained on the same 1000 car images (using default parameters), as well as the pre-trained model (on Pascal VOC [8]), as a popular state-of-the-art reference. [sent-240, score-0.138]
97 We follow the classical object detection protocol of Pascal VOC [8], plotting precision vs. [sent-243, score-0.132]
98 Our first layer outperforms both by a significant margin, achieving 88% AP, which we consider a solid basis for the subsequent 3D inference. [sent-250, score-0.295]
99 In particular we point out that the combination of a strong part detector with Houghstyle voting reaches high recall (up to 95%) at reasonable precision. [sent-251, score-0.124]
100 The fact that only few instances are irrevocably lost in the first layer confirms that splitting into a coarse detection layer and a detailed modeling layer is a viable approach (see Tab. [sent-252, score-1.095]
wordName wordTfidf (topN-words)
[('occlusion', 0.402), ('layer', 0.295), ('occluder', 0.287), ('occluders', 0.277), ('configurations', 0.217), ('oj', 0.188), ('parts', 0.156), ('wireframe', 0.144), ('mask', 0.1), ('occluded', 0.098), ('evidence', 0.097), ('renewed', 0.09), ('bounding', 0.09), ('object', 0.088), ('part', 0.087), ('xj', 0.086), ('dpm', 0.085), ('explicit', 0.082), ('visible', 0.081), ('detectors', 0.08), ('reasoning', 0.077), ('asm', 0.076), ('poselets', 0.073), ('shape', 0.072), ('pose', 0.069), ('cars', 0.066), ('masks', 0.066), ('viewpoints', 0.065), ('comprising', 0.062), ('configuration', 0.061), ('poselet', 0.059), ('geometric', 0.058), ('azimuth', 0.057), ('elevation', 0.057), ('box', 0.057), ('lc', 0.056), ('detailed', 0.056), ('occlusions', 0.056), ('variability', 0.054), ('voc', 0.054), ('cad', 0.054), ('finegrained', 0.052), ('class', 0.051), ('car', 0.051), ('street', 0.049), ('lv', 0.049), ('agglomerative', 0.048), ('covariances', 0.048), ('spirit', 0.048), ('pascal', 0.045), ('detection', 0.044), ('jm', 0.044), ('constituent', 0.043), ('portions', 0.042), ('compact', 0.04), ('detected', 0.04), ('lo', 0.04), ('unfortunately', 0.039), ('discrete', 0.039), ('full', 0.039), ('votes', 0.039), ('placement', 0.039), ('fixed', 0.038), ('geometry', 0.038), ('active', 0.038), ('instances', 0.037), ('modeling', 0.037), ('strong', 0.037), ('vote', 0.036), ('coarse', 0.036), ('afforded', 0.036), ('clusions', 0.036), ('exemplary', 0.036), ('maxh', 0.036), ('massively', 0.036), ('houghstyle', 0.036), ('flags', 0.036), ('secondlayer', 0.036), ('acuracy', 0.036), ('wtedit', 0.036), ('asserting', 0.036), ('cific', 0.036), ('commence', 0.036), ('evi', 0.036), ('ihfe', 0.036), ('poseletstyle', 0.036), ('qy', 0.036), ('supposedly', 0.036), ('tradition', 0.036), ('unsurprisingly', 0.036), ('untouched', 0.036), ('ap', 0.036), ('internet', 0.036), ('collection', 0.035), ('gaussians', 0.035), ('outputs', 0.035), ('geometrically', 0.035), ('offset', 0.034), ('counters', 0.033), ('brittle', 0.033)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000002 154 cvpr-2013-Explicit Occlusion Modeling for 3D Object Class Representations
Author: M. Zeeshan Zia, Michael Stark, Konrad Schindler
Abstract: Despite the success of current state-of-the-art object class detectors, severe occlusion remains a major challenge. This is particularly true for more geometrically expressive 3D object class representations. While these representations have attracted renewed interest for precise object pose estimation, the focus has mostly been on rather clean datasets, where occlusion is not an issue. In this paper, we tackle the challenge of modeling occlusion in the context of a 3D geometric object class model that is capable of fine-grained, part-level 3D object reconstruction. Following the intuition that 3D modeling should facilitate occlusion reasoning, we design an explicit representation of likely geometric occlusion patterns. Robustness is achieved by pooling image evidence from of a set of fixed part detectors as well as a non-parametric representation of part configurations in the spirit of poselets. We confirm the potential of our method on cars in a newly collected data set of inner-city street scenes with varying levels of occlusion, and demonstrate superior performance in occlusion estimation and part localization, compared to baselines that are unaware of occlusions.
2 0.52930915 311 cvpr-2013-Occlusion Patterns for Object Class Detection
Author: Bojan Pepikj, Michael Stark, Peter Gehler, Bernt Schiele
Abstract: Despite the success of recent object class recognition systems, the long-standing problem of partial occlusion remains a major challenge, and a principled solution is yet to be found. In this paper we leave the beaten path of methods that treat occlusion as just another source of noise instead, we include the occluder itself into the modelling, by mining distinctive, reoccurring occlusion patterns from annotated training data. These patterns are then used as training data for dedicated detectors of varying sophistication. In particular, we evaluate and compare models that range from standard object class detectors to hierarchical, part-based representations of occluder/occludee pairs. In an extensive evaluation we derive insights that can aid further developments in tackling the occlusion challenge. –
3 0.20944169 256 cvpr-2013-Learning Structured Hough Voting for Joint Object Detection and Occlusion Reasoning
Author: Tao Wang, Xuming He, Nick Barnes
Abstract: Wepropose a structuredHough voting methodfor detecting objects with heavy occlusion in indoor environments. First, we extend the Hough hypothesis space to include both object location and its visibility pattern, and design a new score function that accumulates votes for object detection and occlusion prediction. In addition, we explore the correlation between objects and their environment, building a depth-encoded object-context model based on RGB-D data. Particularly, we design a layered context representation and .barne s }@ nict a . com .au (a)(b)(c) (d)(e)(f) allow image patches from both objects and backgrounds voting for the object hypotheses. We demonstrate that using a data-driven 2.1D representation we can learn visual codebooks with better quality, and more interpretable detection results in terms of spatial relationship between objects and viewer. We test our algorithm on two challenging RGB-D datasets with significant occlusion and intraclass variation, and demonstrate the superior performance of our method.
4 0.20488469 248 cvpr-2013-Learning Collections of Part Models for Object Recognition
Author: Ian Endres, Kevin J. Shih, Johnston Jiaa, Derek Hoiem
Abstract: We propose a method to learn a diverse collection of discriminative parts from object bounding box annotations. Part detectors can be trained and applied individually, which simplifies learning and extension to new features or categories. We apply the parts to object category detection, pooling part detections within bottom-up proposed regions and using a boosted classifier with proposed sigmoid weak learners for scoring. On PASCAL VOC 2010, we evaluate the part detectors ’ ability to discriminate and localize annotated keypoints. Our detection system is competitive with the best-existing systems, outperforming other HOG-based detectors on the more deformable categories.
5 0.17537458 357 cvpr-2013-Revisiting Depth Layers from Occlusions
Author: Adarsh Kowdle, Andrew Gallagher, Tsuhan Chen
Abstract: In this work, we consider images of a scene with a moving object captured by a static camera. As the object (human or otherwise) moves about the scene, it reveals pairwise depth-ordering or occlusion cues. The goal of this work is to use these sparse occlusion cues along with monocular depth occlusion cues to densely segment the scene into depth layers. We cast the problem of depth-layer segmentation as a discrete labeling problem on a spatiotemporal Markov Random Field (MRF) that uses the motion occlusion cues along with monocular cues and a smooth motion prior for the moving object. We quantitatively show that depth ordering produced by the proposed combination of the depth cues from object motion and monocular occlusion cues are superior to using either feature independently, and using a na¨ ıve combination of the features.
6 0.15476106 70 cvpr-2013-Bottom-Up Segmentation for Top-Down Detection
7 0.15219492 325 cvpr-2013-Part Discovery from Partial Correspondence
8 0.14605141 439 cvpr-2013-Tracking Human Pose by Tracking Symmetric Parts
9 0.13615738 364 cvpr-2013-Robust Object Co-detection
10 0.13413855 14 cvpr-2013-A Joint Model for 2D and 3D Pose Estimation from a Single Image
11 0.13311875 207 cvpr-2013-Human Pose Estimation Using a Joint Pixel-wise and Part-wise Formulation
12 0.13070184 30 cvpr-2013-Accurate Localization of 3D Objects from RGB-D Data Using Segmentation Hypotheses
13 0.12709795 445 cvpr-2013-Understanding Bayesian Rooms Using Composite 3D Object Models
14 0.12629555 371 cvpr-2013-SCaLE: Supervised and Cascaded Laplacian Eigenmaps for Visual Object Recognition Based on Nearest Neighbors
15 0.12558809 398 cvpr-2013-Single-Pedestrian Detection Aided by Multi-pedestrian Detection
16 0.12475965 67 cvpr-2013-Blocks That Shout: Distinctive Parts for Scene Classification
17 0.1213851 335 cvpr-2013-Poselet Conditioned Pictorial Structures
18 0.11959613 82 cvpr-2013-Class Generative Models Based on Feature Regression for Pose Estimation of Object Categories
19 0.11684521 245 cvpr-2013-Layer Depth Denoising and Completion for Structured-Light RGB-D Cameras
20 0.11665951 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases
topicId topicWeight
[(0, 0.268), (1, -0.01), (2, 0.054), (3, -0.084), (4, 0.093), (5, 0.001), (6, 0.143), (7, 0.148), (8, 0.058), (9, -0.059), (10, -0.147), (11, -0.016), (12, 0.02), (13, -0.075), (14, 0.058), (15, 0.022), (16, -0.017), (17, 0.101), (18, -0.061), (19, 0.046), (20, 0.012), (21, -0.084), (22, 0.149), (23, -0.049), (24, 0.093), (25, -0.09), (26, 0.031), (27, -0.089), (28, 0.008), (29, -0.004), (30, 0.137), (31, -0.098), (32, 0.026), (33, -0.045), (34, -0.035), (35, 0.008), (36, 0.048), (37, -0.041), (38, 0.234), (39, -0.01), (40, -0.126), (41, 0.045), (42, 0.068), (43, -0.018), (44, -0.173), (45, 0.13), (46, -0.151), (47, -0.056), (48, -0.048), (49, 0.17)]
simIndex simValue paperId paperTitle
same-paper 1 0.95952171 154 cvpr-2013-Explicit Occlusion Modeling for 3D Object Class Representations
Author: M. Zeeshan Zia, Michael Stark, Konrad Schindler
Abstract: Despite the success of current state-of-the-art object class detectors, severe occlusion remains a major challenge. This is particularly true for more geometrically expressive 3D object class representations. While these representations have attracted renewed interest for precise object pose estimation, the focus has mostly been on rather clean datasets, where occlusion is not an issue. In this paper, we tackle the challenge of modeling occlusion in the context of a 3D geometric object class model that is capable of fine-grained, part-level 3D object reconstruction. Following the intuition that 3D modeling should facilitate occlusion reasoning, we design an explicit representation of likely geometric occlusion patterns. Robustness is achieved by pooling image evidence from of a set of fixed part detectors as well as a non-parametric representation of part configurations in the spirit of poselets. We confirm the potential of our method on cars in a newly collected data set of inner-city street scenes with varying levels of occlusion, and demonstrate superior performance in occlusion estimation and part localization, compared to baselines that are unaware of occlusions.
2 0.95613027 311 cvpr-2013-Occlusion Patterns for Object Class Detection
Author: Bojan Pepikj, Michael Stark, Peter Gehler, Bernt Schiele
Abstract: Despite the success of recent object class recognition systems, the long-standing problem of partial occlusion remains a major challenge, and a principled solution is yet to be found. In this paper we leave the beaten path of methods that treat occlusion as just another source of noise instead, we include the occluder itself into the modelling, by mining distinctive, reoccurring occlusion patterns from annotated training data. These patterns are then used as training data for dedicated detectors of varying sophistication. In particular, we evaluate and compare models that range from standard object class detectors to hierarchical, part-based representations of occluder/occludee pairs. In an extensive evaluation we derive insights that can aid further developments in tackling the occlusion challenge. –
3 0.71244121 256 cvpr-2013-Learning Structured Hough Voting for Joint Object Detection and Occlusion Reasoning
Author: Tao Wang, Xuming He, Nick Barnes
Abstract: Wepropose a structuredHough voting methodfor detecting objects with heavy occlusion in indoor environments. First, we extend the Hough hypothesis space to include both object location and its visibility pattern, and design a new score function that accumulates votes for object detection and occlusion prediction. In addition, we explore the correlation between objects and their environment, building a depth-encoded object-context model based on RGB-D data. Particularly, we design a layered context representation and .barne s }@ nict a . com .au (a)(b)(c) (d)(e)(f) allow image patches from both objects and backgrounds voting for the object hypotheses. We demonstrate that using a data-driven 2.1D representation we can learn visual codebooks with better quality, and more interpretable detection results in terms of spatial relationship between objects and viewer. We test our algorithm on two challenging RGB-D datasets with significant occlusion and intraclass variation, and demonstrate the superior performance of our method.
4 0.60956568 357 cvpr-2013-Revisiting Depth Layers from Occlusions
Author: Adarsh Kowdle, Andrew Gallagher, Tsuhan Chen
Abstract: In this work, we consider images of a scene with a moving object captured by a static camera. As the object (human or otherwise) moves about the scene, it reveals pairwise depth-ordering or occlusion cues. The goal of this work is to use these sparse occlusion cues along with monocular depth occlusion cues to densely segment the scene into depth layers. We cast the problem of depth-layer segmentation as a discrete labeling problem on a spatiotemporal Markov Random Field (MRF) that uses the motion occlusion cues along with monocular cues and a smooth motion prior for the moving object. We quantitatively show that depth ordering produced by the proposed combination of the depth cues from object motion and monocular occlusion cues are superior to using either feature independently, and using a na¨ ıve combination of the features.
5 0.59251642 30 cvpr-2013-Accurate Localization of 3D Objects from RGB-D Data Using Segmentation Hypotheses
Author: Byung-soo Kim, Shili Xu, Silvio Savarese
Abstract: In this paper we focus on the problem of detecting objects in 3D from RGB-D images. We propose a novel framework that explores the compatibility between segmentation hypotheses of the object in the image and the corresponding 3D map. Our framework allows to discover the optimal location of the object using a generalization of the structural latent SVM formulation in 3D as well as the definition of a new loss function defined over the 3D space in training. We evaluate our method using two existing RGB-D datasets. Extensive quantitative and qualitative experimental results show that our proposed approach outperforms state-of-theart as methods well as a number of baseline approaches for both 3D and 2D object recognition tasks.
6 0.57418072 288 cvpr-2013-Modeling Mutual Visibility Relationship in Pedestrian Detection
7 0.57379574 445 cvpr-2013-Understanding Bayesian Rooms Using Composite 3D Object Models
8 0.56008273 364 cvpr-2013-Robust Object Co-detection
9 0.55822015 248 cvpr-2013-Learning Collections of Part Models for Object Recognition
10 0.54192293 70 cvpr-2013-Bottom-Up Segmentation for Top-Down Detection
11 0.51824719 439 cvpr-2013-Tracking Human Pose by Tracking Symmetric Parts
12 0.50715411 96 cvpr-2013-Correlation Filters for Object Alignment
13 0.50536311 325 cvpr-2013-Part Discovery from Partial Correspondence
14 0.50515962 221 cvpr-2013-Incorporating Structural Alternatives and Sharing into Hierarchy for Multiclass Object Recognition and Detection
15 0.50457817 144 cvpr-2013-Efficient Maximum Appearance Search for Large-Scale Object Detection
16 0.50285339 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases
17 0.50235397 330 cvpr-2013-Photometric Ambient Occlusion
18 0.4997651 136 cvpr-2013-Discriminatively Trained And-Or Tree Models for Object Detection
19 0.4974933 247 cvpr-2013-Learning Class-to-Image Distance with Object Matchings
20 0.47052199 14 cvpr-2013-A Joint Model for 2D and 3D Pose Estimation from a Single Image
topicId topicWeight
[(10, 0.514), (26, 0.038), (33, 0.206), (67, 0.061), (69, 0.048), (87, 0.066)]
simIndex simValue paperId paperTitle
1 0.93598169 295 cvpr-2013-Multi-image Blind Deblurring Using a Coupled Adaptive Sparse Prior
Author: Haichao Zhang, David Wipf, Yanning Zhang
Abstract: This paper presents a robust algorithm for estimating a single latent sharp image given multiple blurry and/or noisy observations. The underlying multi-image blind deconvolution problem is solved by linking all of the observations together via a Bayesian-inspired penalty function which couples the unknown latent image, blur kernels, and noise levels together in a unique way. This coupled penalty function enjoys a number of desirable properties, including a mechanism whereby the relative-concavity or shape is adapted as a function of the intrinsic quality of each blurry observation. In this way, higher quality observations may automatically contribute more to the final estimate than heavily degraded ones. The resulting algorithm, which requires no essential tuning parameters, can recover a high quality image from a set of observations containing potentially both blurry and noisy examples, without knowing a priorithe degradation type of each observation. Experimental results on both synthetic and real-world test images clearly demonstrate the efficacy of the proposed method.
same-paper 2 0.9249202 154 cvpr-2013-Explicit Occlusion Modeling for 3D Object Class Representations
Author: M. Zeeshan Zia, Michael Stark, Konrad Schindler
Abstract: Despite the success of current state-of-the-art object class detectors, severe occlusion remains a major challenge. This is particularly true for more geometrically expressive 3D object class representations. While these representations have attracted renewed interest for precise object pose estimation, the focus has mostly been on rather clean datasets, where occlusion is not an issue. In this paper, we tackle the challenge of modeling occlusion in the context of a 3D geometric object class model that is capable of fine-grained, part-level 3D object reconstruction. Following the intuition that 3D modeling should facilitate occlusion reasoning, we design an explicit representation of likely geometric occlusion patterns. Robustness is achieved by pooling image evidence from of a set of fixed part detectors as well as a non-parametric representation of part configurations in the spirit of poselets. We confirm the potential of our method on cars in a newly collected data set of inner-city street scenes with varying levels of occlusion, and demonstrate superior performance in occlusion estimation and part localization, compared to baselines that are unaware of occlusions.
3 0.92369688 307 cvpr-2013-Non-uniform Motion Deblurring for Bilayer Scenes
Author: Chandramouli Paramanand, Ambasamudram N. Rajagopalan
Abstract: We address the problem of estimating the latent image of a static bilayer scene (consisting of a foreground and a background at different depths) from motion blurred observations captured with a handheld camera. The camera motion is considered to be composed of in-plane rotations and translations. Since the blur at an image location depends both on camera motion and depth, deblurring becomes a difficult task. We initially propose a method to estimate the transformation spread function (TSF) corresponding to one of the depth layers. The estimated TSF (which reveals the camera motion during exposure) is used to segment the scene into the foreground and background layers and determine the relative depth value. The deblurred image of the scene is finally estimated within a regularization framework by accounting for blur variations due to camera motion as well as depth.
4 0.91600442 76 cvpr-2013-Can a Fully Unconstrained Imaging Model Be Applied Effectively to Central Cameras?
Author: Filippo Bergamasco, Andrea Albarelli, Emanuele Rodolà, Andrea Torsello
Abstract: Traditional camera models are often the result of a compromise between the ability to account for non-linearities in the image formation model and the need for a feasible number of degrees of freedom in the estimation process. These considerations led to the definition of several ad hoc models that best adapt to different imaging devices, ranging from pinhole cameras with no radial distortion to the more complex catadioptric or polydioptric optics. In this paper we dai s .unive . it ence points in the scene with their projections on the image plane [5]. Unfortunately, no real camera behaves exactly like an ideal pinhole. In fact, in most cases, at least the distortion effects introduced by the lens should be accounted for [19]. Any pinhole-based model, regardless of its level of sophistication, is geometrically unable to properly describe cameras exhibiting a frustum angle that is near or above 180 degrees. For wide-angle cameras, several different para- metric models have been proposed. Some of them try to modify the captured image in order to follow the original propose the use of an unconstrained model even in standard central camera settings dominated by the pinhole model, and introduce a novel calibration approach that can deal effectively with the huge number of free parameters associated with it, resulting in a higher precision calibration than what is possible with the standard pinhole model with correction for radial distortion. This effectively extends the use of general models to settings that traditionally have been ruled by parametric approaches out of practical considerations. The benefit of such an unconstrained model to quasipinhole central cameras is supported by an extensive experimental validation.
5 0.91102183 90 cvpr-2013-Computing Diffeomorphic Paths for Large Motion Interpolation
Author: Dohyung Seo, Jeffrey Ho, Baba C. Vemuri
Abstract: In this paper, we introduce a novel framework for computing a path of diffeomorphisms between a pair of input diffeomorphisms. Direct computation of a geodesic path on the space of diffeomorphisms Diff(Ω) is difficult, and it can be attributed mainly to the infinite dimensionality of Diff(Ω). Our proposed framework, to some degree, bypasses this difficulty using the quotient map of Diff(Ω) to the quotient space Diff(M)/Diff(M)μ obtained by quotienting out the subgroup of volume-preserving diffeomorphisms Diff(M)μ. This quotient space was recently identified as the unit sphere in a Hilbert space in mathematics literature, a space with well-known geometric properties. Our framework leverages this recent result by computing the diffeomorphic path in two stages. First, we project the given diffeomorphism pair onto this sphere and then compute the geodesic path between these projected points. Sec- ond, we lift the geodesic on the sphere back to the space of diffeomerphisms, by solving a quadratic programming problem with bilinear constraints using the augmented Lagrangian technique with penalty terms. In this way, we can estimate the path of diffeomorphisms, first, staying in the space of diffeomorphisms, and second, preserving shapes/volumes in the deformed images along the path as much as possible. We have applied our framework to interpolate intermediate frames of frame-sub-sampled video sequences. In the reported experiments, our approach compares favorably with the popular Large Deformation Diffeomorphic Metric Mapping framework (LDDMM).
6 0.89776742 386 cvpr-2013-Self-Paced Learning for Long-Term Tracking
7 0.87850058 186 cvpr-2013-GeoF: Geodesic Forests for Learning Coupled Predictors
8 0.8763392 3 cvpr-2013-3D R Transform on Spatio-temporal Interest Points for Action Recognition
9 0.84314007 458 cvpr-2013-Voxel Cloud Connectivity Segmentation - Supervoxels for Point Clouds
10 0.84151942 198 cvpr-2013-Handling Noise in Single Image Deblurring Using Directional Filters
12 0.79378301 324 cvpr-2013-Part-Based Visual Tracking with Online Latent Structural Learning
13 0.78228778 193 cvpr-2013-Graph Transduction Learning with Connectivity Constraints with Application to Multiple Foreground Cosegmentation
14 0.77483982 314 cvpr-2013-Online Object Tracking: A Benchmark
15 0.77044314 131 cvpr-2013-Discriminative Non-blind Deblurring
16 0.76089442 414 cvpr-2013-Structure Preserving Object Tracking
17 0.75629801 285 cvpr-2013-Minimum Uncertainty Gap for Robust Visual Tracking
18 0.74544263 400 cvpr-2013-Single Image Calibration of Multi-axial Imaging Systems
19 0.74127084 248 cvpr-2013-Learning Collections of Part Models for Object Recognition