Abstract: Despite the success of recent object class recognition systems, the long-standing problem of partial occlusion remains a major challenge, and a principled solution is yet to be found. In this paper we leave the beaten path of methods that treat occlusion as just another source of noise instead, we include the occluder itself into the modelling, by mining distinctive, reoccurring occlusion patterns from annotated training data. These patterns are then used as training data for dedicated detectors of varying sophistication. In particular, we evaluate and compare models that range from standard object class detectors to hierarchical, part-based representations of occluder/occludee pairs. In an extensive evaluation we derive insights that can aid further developments in tackling the occlusion challenge. –

1 In this paper we leave the beaten path of methods that treat occlusion as just another source of noise instead, we include the occluder itself into the modelling, by mining distinctive, reoccurring occlusion patterns from annotated training data. [sent-2, score-1.902]

2 These patterns are then used as training data for dedicated detectors of varying sophistication. [sent-3, score-0.315]

3 In particular, we evaluate and compare models that range from standard object class detectors to hierarchical, part-based representations of occluder/occludee pairs. [sent-4, score-0.237]

4 In an extensive evaluation we derive insights that can aid further developments in tackling the occlusion challenge. [sent-5, score-0.773]

5 Despite these achievements towards more accurate object hypotheses, partial occlusion still poses a major challenge to state-of-the-art detectors [4, 7], as becomes apparent when analyzing the results of current benchmark datasets [6]. [sent-9, score-0.874]

6 Curiously, what is also common to these approaches is that they focus entirely on the occluded object the occludee – without any explicit notion of the cause of occlu– Figure 1. [sent-11, score-0.446]

7 (Left) True positive detections by our occluded objects detector. [sent-13, score-0.243]

8 In this paper we therefore follow a different route, by treating the occluder as a first class citizen in the occlusion problem. [sent-18, score-1.045]

9 In particular, we start from the observation that certain types of occlusions are more likely than others: consider a street scene with cars parked on either side of the road (as in Fig. [sent-19, score-0.183]

10 Clearly, the visible and occluded portions of cars tend to form patterns that repeat numerous times, providing valuable visual cues about both the presence of individual objects and the layout of the scene as a whole. [sent-21, score-0.386]

11 Based on this observation, we chose to explicitly model these occlusion patterns by leveraging fine-grained, 3D annotations of a recent data set of urban street scenes [9]. [sent-22, score-0.942]

12 In particular, we mine reoccurring spatial arrangements of objects observed from a specific viewpoint, and model their distinctive appearance by an array of specialized detectors. [sent-23, score-0.266]

13 As baselines we include a standard, state333222888644 of-the-art object class detector [7] as well as a recently proposed double-person detector [19] in the evaluation, with sometimes surprising results (Sect. [sent-25, score-0.195]

14 First, we approach the challenging problem of partial occlusions in object class recognition from a different angle than most recent attempts by treating causes of occlusions as first class citizens in the model. [sent-28, score-0.42]

15 And third, in an extensive experimental study we evaluate and compare these different techniques, providing insights that we believe to be helpful in tackling the partial occlusion challenge in a principled manner. [sent-30, score-0.782]

16 Related work Sensitivity to partial occlusion has so far mostly been considered a lack in robustness, essentially treating occlusion as “noise rather than signal”1 . [sent-32, score-1.424]

17 Among the most successful implementations are integrated models of detection and segmentation using structured prediction and branch-and-bound [8], latent occlusion variables in a max-margin framework [20], and boosting [21]. [sent-36, score-0.841]

18 Only recently, [19] leveraged the joint appearance of multiple people for robust people detection and tracking by training a double-person detector [7] on pairs of people rather than single humans. [sent-39, score-0.279]

19 While our evaluation includes their model as a baseline, we systematically evaluate and contrast different ways of modelling occluders as first class citizens, and propose a more expressive, hierarchical model of occluder/occludee pairs that outperforms their model in certain configurations. [sent-40, score-0.28]

20 In the realm of deformable part models [10] has considered part-level occlusion in the form of dedicated “occlusion” candidate parts that represent generic occlusion features (such as a visible occlusion edge). [sent-41, score-2.168]

21 On the scene-level occlusion has been tackled with quite some success in terms of recognition performance by drawing evidence from partial object detections in probabilistic scene models [22, 14]. [sent-47, score-0.897]

22 While these models can reason about occluder/occludee in principle, their level of detail is limited by the chosen object class representation – in both cases standard 2D bounding box-based detectors are used [7] which clearly fail to capture interactions between objects that are not box-shaped. [sent-48, score-0.43]

23 Occlusion patterns Our approach to modelling partial occlusions is based on the notion of occlusion patterns, i. [sent-51, score-1.066]

24 Specifically, we limit ourselves to pairs of objects, giving rise to occlusion patterns on the level of single objects (occludees) and double objects (occluder-occludee pairs). [sent-55, score-1.043]

25 Mining occlusion patterns We mine occlusion patterns from training data by leveraging fine-grained annotations in the form of 3D object bounding boxes and camera projection matrices that are readily available as part of the KITTI dataset [9]. [sent-58, score-1.947]

26 We use these annotations to define a joint feature space that represents both the relative layout of two objects taking part in an occlusion and the viewpoint from which this arrangement is observed by the camera. [sent-59, score-0.863]

27 We then perform clustering on this joint feature space, resulting in an assignment of object pairs to clusters that we use as training data for the components of mixture models, as detailed in Sec. [sent-60, score-0.189]

28 We use the following properties of occlusion patterns as features in our clustering: i) occluder left/right of occludee in image space, ii) occluder and occludee orientation in 3D object coordinates, iii) occluder is/is not itself occluded, iv) degree of occlusion of occludee. [sent-63, score-2.806]

29 based on assigning the viewing angle of the occluder to one of a fixed number 333222888755 Figure 2. [sent-67, score-0.283]

30 Occlusion patterns span a wide range of occluder-occludee arrangements: resulting appearance can be well aligned (leftmost columns), or diverging (rightmost columns) – note that occluders are sometimes themselves occluded. [sent-70, score-0.253]

31 Figure 2 visualizes a selection of occlusion patterns mined from the KITTI dataset [9]. [sent-72, score-0.884]

32 As shown by the average images over cluster members (row (2)), some occlusion patterns are quite well aligned, which is a prerequisite for learning reliable detectors from them (Sec. [sent-73, score-1.02]

33 Occlusion pattern detectors In the following, we introduce three different models for the detection of occlusion patterns, each based on the well known and tested deformable part model (DPM [7]) framework. [sent-77, score-0.973]

34 2) focuses on individual occluded objects, by dedicating distinct mixture components to different single-object occlusion patterns. [sent-80, score-0.954]

35 The DPM is a mixture of C star shaped loglinear conditional random fields (CRF), all of which have a root p0 and a number of latent parts pi , i = 1, . [sent-89, score-0.221]

36 , N of pairs of images I object and annotations y, consisting of bounding boxes (ln, rn, tn, bn) and coarse viewpoint estimates. [sent-119, score-0.271]

37 Single-object occlusion patterns – OC-DPM We experiment with the following extension of the DPM [7]. [sent-122, score-0.829]

38 , Cvisible that represent the appearances of instances of an object class of interest, we introduce additional mixture components dedicated to representing the distinctive appearance of occluded instances of that class. [sent-126, score-0.386]

39 In particular, we reserve a distinct mixture components, for each of the occludee members of clusters resulting from our occlusion pattern mining step (Sec. [sent-127, score-1.077]

40 Double-object occlusion patterns While the single-object occlusion model of Sec. [sent-131, score-1.501]

41 2 has the potential to represent distinctive occlusion patterns in 333222888866 the data, modelling occluder and corresponding occludee jointly suggests a potential improvement: intuitively, the strong evidence of the occluder should provide strong cues as to where to look for the occludee. [sent-133, score-1.793]

42 In the following we capture this intuition by designing two variants of a hierarchical occlusion model based on the DPM [7] framework. [sent-134, score-0.7]

43 In these models occluder and occludee are allowed to move w. [sent-135, score-0.519]

44 1 Double-objects with joint root – Sym-DPM The first double-object occlusion pattern detector is graphically depicted in Fig. [sent-143, score-0.9]

45 The idea is to join two star shaped CRFs, one for the occluding object p0, and one for the occluded object p0 by an extra common root part p0 = (l, r, t, b). [sent-145, score-0.342]

46 As annotation for the root part we use the tightest rectangle around the union of the two objects, see the green bounding boxes in Fig. [sent-146, score-0.317]

47 The inclusion of this common root part introduces three new terms to the energy, an appearance term for the common root ? [sent-148, score-0.203]

48 2 Double-objects without joint root – Asym-DPM The second double-object model is a variation of SymDPM, where the common root part is omitted (Fig. [sent-162, score-0.243]

49 This relationship is asymmetric which is why we refer to this model as Asym-DPM and follows the intuition that the occluder can typically be trusted more (because it provides unhampered image evidence). [sent-165, score-0.283]

50 (3) For the models considered here, β refers to all their parameters (v, w, w, w) for all components c, y to the bounding box annotations per example (can be 1 or 2), and h to the latent part placements. [sent-187, score-0.32]

51 The latter step involves detecting high scoring bounding boxes and latent part assignments (y? [sent-191, score-0.192]

52 We use the standard intersection over union loss ΔV OC for a pair of bounding boxes y, y? [sent-206, score-0.204]

53 In case the model predicts a single bounding box y only (decided through the choice of the component) the loss is the intersection over union loss between Δ(yn, y) in case there is one annotation and Δ(yn, y) in case of an occlusion annotation. [sent-213, score-0.955]

54 When two bounding boxes are predicted y, y the loss is computed as either Δ(yn, y) in case there is a single annotation or as the average 0. [sent-215, score-0.214]

55 Our implementation of the loss function is capturing both single and double object detections simultaneously. [sent-226, score-0.185]

56 Experimental evaluation In the following, we give a detailed analysis of the various methods based on the notion of occlusion patterns that we introduced in Sect. [sent-231, score-0.871]

57 In a series of experiments we consider both results according to classical 2D bounding box-based localization measures, as well as a closer look at specific occlusion cases. [sent-233, score-0.752]

58 We commence by confirming the ability of our models to detect occlusion patterns in isolation 5. [sent-234, score-0.954]

59 2, and then move on the task of object class detection in an unconstrained setting, comprising both un-occluded and occluded objects of varying difficulty 5. [sent-235, score-0.323]

60 The KITTI dataset [9] is a rich source of challenging occlusion cases, as shown in Tab. [sent-251, score-0.672]

61 In all our experiments on Car (Pedestrian) we train our occlusion models with 6 (6) components for visible objects and 16 (15)2 components for occlusion patterns. [sent-261, score-1.496]

62 We obtain these numbers after keeping the occlusion pattern clusters which have at least 30 positive training examples. [sent-262, score-0.721]

63 Detecting occlusion patterns We commence by evaluating the ability of our models to reliably detect occlusion patterns in isolation, since this constitutes the basis for handling occlusion cases in a realistic detection setting (Sect. [sent-265, score-2.502]

64 We first consider the joint detection of occlusion patterns in the form of object pairs (occluder and occludee). [sent-270, score-0.989]

65 images that contain occlusion pairs, which we determine from the available fine-grained annotations (we run the occlusion pattern mining of Sect. [sent-273, score-1.503]

66 This targeted evaluation is essential in order to separate concerns, and to draw meaningful conclusions about the role of different variants of occlusion mod- elling from the results. [sent-275, score-0.7]

67 Based on the setup of the previous experiment we turn to evaluating our occlusion pattern detectors on the level of individual objects (this comprises both occluders and occludees from the doubleobject occlusion patterns). [sent-285, score-1.738]

68 To that end, we add our singleobject detectors to the comparison, namely, our Asym-DPM (orange), our OC-DPM (cyan), and the deformable part model [7] baseline (green). [sent-286, score-0.251]

69 Clearly, all explicit means of modelling occlusion improve over the DPM [7] baseline (53. [sent-289, score-0.815]

70 As concerns the relative performance of the different occlusion models, we observe a different order- ing compared to the double-object occlusion pattern case: the double-object baseline [19] (blue, 61% AP) performs slightly better than our double-resolution Sym-DPM (red, 57. [sent-294, score-1.449]

71 To summarize, we conclude that detecting occlusion patterns in images is in fact feasible, achieving both sufficiently high recall (over 90% for both single- and double-object occlusion patterns) and reasonable AP (up to 74% for single-object occlusion patterns). [sent-300, score-2.173]

72 We consider this result viable evidence that occlusion pattern detectors have the potential to aid recognition in the case of occlusion (which we examine and verify in Sect. [sent-301, score-1.608]

73 Furthermore, careful and explicit modelling of occluder and occludee characteristics helps for the joint detection of double-object patterns (our hierarchical Sym-DPM model outperforms the flat baseline [19]). [sent-304, score-0.882]

74 Occlusion patterns for object class detection In this section we apply our findings from the isolated evaluation of occlusion pattern detectors to the more realistic setting of unconstrained object class detection, again considering the KITTI dataset [9] as a testbed. [sent-308, score-1.296]

75 Since the focus is again on occlusion, we consider a series of increasingly difficult scenarios for comparing performance, corresponding to increasing levels of occlusion (which we measure based on 3D annotations and the given camera parameters). [sent-309, score-0.778]

76 8 (a)), the data set restricted to at most 20% occluded objects (Fig. [sent-311, score-0.182]

77 In order to enable detection of occluded as well as unoccluded object instances, we augment our various occlusion pattern detectors by additional mixture components for unoccluded objects. [sent-318, score-1.216]

78 8 (a)) we observe that the trends from the isolated evaluation of occlusion patterns (Sect. [sent-321, score-0.853]

79 2) transfer to the more realistic object class detection setting: while the double-object occlusion pattern detectors are comparable in terms of AP (Asym-DPM, orange, 52. [sent-323, score-0.981]

80 4%), improving over the next best double-object occlusion pattern detector Sym-DPM by a significant margin of 10. [sent-326, score-0.772]

81 8% AP) beats all double-object occlusion pattern detectors, but is in turn outperformed by our OC-DPM (cyan, 64. [sent-329, score-0.721]

82 All double-object detectors have proven to be very sensitive to the non-maxima supression scheme used and suffer from score incomparability among the double and single object components. [sent-334, score-0.197]

83 2%), confirming the benefit of our occlusion modelling, while Sym-DPM (31. [sent-339, score-0.703]

84 We proceed by examining the results for increasing levels of occlusion (Fig. [sent-343, score-0.703]

85 First, we observe that the relative ordering among double-object and single-object occlusion pattern detectors is stable across occlusion levels: our OCDPM (cyan) outperforms all double-object occlusion pattern detectors, namely, Sym-DPM (blue) and Asym-DPM (orange). [sent-345, score-2.233]

86 Second, the DPM [7] baseline (green) excels at low levels of occlusion (77. [sent-346, score-0.733]

87 2% AP for up to 20% occlusion, 37% AP for 20 to 40% occlusion), performing better than the double-object occlusion pattern detectors for all occlusion levels. [sent-347, score-1.512]

88 But third, the DPM [7] is outperformed by our OC-DPM for all occlusion levels above 40% by significant margins (12. [sent-348, score-0.703]

89 We conclude that occlusion pattern detectors can in fact aid detection in presence of occlusion, and the benefit increases with increasing occlusion level. [sent-360, score-1.6]

90 While, to our surprise, we found that double-object occlusion pattern detectors were not competitive with [7], our simpler, singleobject occlusion pattern detector (OC-DPM) improved performance for occlusion by a significant margin. [sent-361, score-2.326]

91 From our experience, the poor performance of double-object occlusion detectors on the KITTI dataset [9] (Sect. [sent-366, score-0.791]

92 3), which is in contrast to [19]’s findings for people detection, can be explained by the distribution over occlusion patterns: it seems biased towards extremely challenging “occluded occluder” cases. [sent-368, score-0.747]

93 row of cars parked on the side of the road), where the occluder is itself occluded these cases are not correctly represented by occluder-occludee models. [sent-371, score-0.532]

94 Examplesofno tightB an otaions cases it proves less robust to combine possibly conflicting pairwise detections (Asym-DPM, Sym-DPM) into a consistent interpretation than aggregating single-object occlusion patterns (OC-DPM). [sent-375, score-0.946]

95 We also found that the KITTI dataset [9] contains a significant number of occluded objects that are not annotated, supposingly due to being in the Lidar shadow, and hence missing 3D ground truth evidence for annotation. [sent-378, score-0.238]

96 Conclusions We have considered the long-standing problem of partial occlusion by making occluders first class citizens in modelling. [sent-388, score-0.946]

97 Detection performance for class Car on (a) the full dataset, (b)-(f) increasing occlusion levels from [0 − 20] % to [80 − 100] %. [sent-437, score-0.762]

98 Valid detections on unannotated objects els for detecting distinctive, reoccurring occlusion patterns, mined from annotated training data. [sent-441, score-0.92]

99 Using these detectors we could improve over the performance of a current, stateof-the-art object class detector over an entire dataset ofchallenging urban street scenes, but even more so for increasingly difficult cases in terms of occlusion. [sent-442, score-0.331]

100 Our most important findings are: i) reoccurring automatically occlusion patterns can be mined and reliably detected, ii) they can aid object detection, and iii) occlusion is still challenging also in terms of dataset annotation. [sent-443, score-1.781]

