iccv iccv2013 iccv2013-269 iccv2013-269-reference knowledge-graph by maker-knowledge-mining

269 iccv-2013-Modeling Occlusion by Discriminative AND-OR Structures

Source: pdf

Author: Bo Li, Wenze Hu, Tianfu Wu, Song-Chun Zhu

Abstract: Occlusion presents a challenge for detecting objects in real world applications. To address this issue, this paper models object occlusion with an AND-OR structure which (i) represents occlusion at semantic part level, and (ii) captures the regularities of different occlusion configurations (i.e., the different combinations of object part visibilities). This paper focuses on car detection on street. Since annotating part occlusion on real images is time-consuming and error-prone, we propose to learn the the AND-OR structure automatically using synthetic images of CAD models placed at different relative positions. The model parameters are learned from real images under the latent structural SVM (LSSVM) framework. In inference, an efficient dynamic programming (DP) algorithm is utilized. In experiments, we test our method on both car detection and car view estimation. Experimental results show that (i) Our CAD simulation strategy is capable of generating occlusion patterns for real scenarios, (ii) The proposed AND-OR structure model is effective for modeling occlusions, which outperforms the deformable part-based model (DPM) [6, 10] in car detec- , tion on both our self-collected streetparking dataset and the Pascal VOC 2007 car dataset [4], (iii) The learned model is on-par with the state-of-the-art methods on car view estimation tested on two public datasets.

reference text

[1] M. B. Blaschko and C. H. Lampert. Learning to localize objects with structured output regression. In ECCV, 2008.

[2] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In CVPR, 2005.

[3] G. Duan, H. Ai, and S. Lao. A structural filter approach to human detection. In ECCV, 2010. 22556666 Best viewed in color and zooming in.

[4] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results. http://www.pascalnetwork.org/challenges/VOC/voc2007/workshop/index.html.

[5] M. Everingham, A. Zisserman, C. K. I. Williams, and L. Van Gool. The PASCAL Visual Object Classes Challenge 2006 (VOC2006) Results. http://www.pascalnetwork.org/challenges/VOC/voc2006/results.pdf.

[6] P. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan. Object detection with discriminatively trained partbased models. PAMI, 2010.

[7] T. Gao, B. Packer, and D. Koller. A segmentation-aware object detection model with occlusion handling. In CVPR, 2011.

[8] A. Geiger, P. Lenz, and R. Urtasun. Are we ready for autonomous driving? the kitti vision benchmark suite. In CVPR, 2012.

[9] R. Girshick, P. Felzenszwalb, and D. McAllester. Object de-

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22] tection with grammar models. In NIPS, 2011. R. B. Girshick, P. F. Felzenszwalb, and D. McAllester. Discriminatively trained deformable part models, release 5. http://people.cs.uchicago.edu/ rbg/latent-release5/. D. Glasner, M. Galun, S. Alpert, R. Basri, and G. Shakhnarovich. Viewpoint-aware object detection and pose estimation. In ICCV, 2011. C. Gu and X. Ren. Discriminative Mixture-of-Templates for Viewpoint Classification. In ECCV, 2010. M. Hejrati and D. Ramanan. Analyzing 3d objects in cluttered images. In NIPS, 2012. G. Hinton, S. Osindero, and Y. Teh. A fast learning algorithm for deep belief nets. Neural computation, 2006. D. Hoiem, Y. Chodpathumwan, and Q. Dai. Diagnosing error in object detectors. In ECCV, 2012. E. Hsiao and M. Hebert. Occlusion reasoning for object detection under arbitrary viewpoint. In CVPR, 2012. W. Hu. Learning 3d object templates by hierarchical quantization of geometry and appearance spaces. In CVPR, 2012. B. Leibe and B. Schiele. Analyzing appearance and contour based methods for object categorization. In CVPR, 2003. J. Liebelt and C. Schmid. Multi-view object class detection with a 3D geometric model. In CVPR, 2010. J. Liebelt, C. Schmid, and K. Schertler. Viewpointindependent object class detection using 3d feature maps. In CVPR, 2008. R. J. Lopez-Sastre, T. T., and S. Savarese. Deformable part models revisited: A performance evaluation for object category pose estimation. In ICCV-WS CORP, 2011. A. Opelt and A. Pinz. Object Localization with Boosting and Weak Supervision for Generic Object Recognition. In SCIA,

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30] [3 1]

[32]

[33]

[34]

[35] 2005. M. Ozuysal, V. Lepetit, and P. Fua. Pose estimation for category specific multiview object localization. In CVPR, 2009. B. Pepik, M. Stark, P. Gehler, and B. Schiele. Teaching 3d geometry to deformable part models. In CVPR, 2012. B. C. Russell, A. B. Torralba, K. P. Murphy, and W. T. Freeman. LabelMe: A Database and Web-Based Tool for Image Annotation. IJCV, 2008. S. Savarese and L. Fei-Fei. 3d generic object categorization, localization and pose estimation. In ICCV, 2007. Z. Si and S.-C. Zhu. Learning and-or templates for object recognition and detection. PAMI, 2013. M. Sun, H. Su, S. Savarese, and F. fei Li. A multi-view probabilistic model for 3D object classes. In CVPR, 2009. X. Wang, T. Han, and S. Yan. An hog-lbp human detector with partial occlusion handling. In ICCV, 2009. B. Wu and R. Nevatia. Detection and tracking of multiple, partially occluded humans by bayesian combination of edgelet based part detectors. IJCV, 2007. Y. Yang and D. Ramanan. Articulated pose estimation with flexible mixtures-of-parts. In CVPR, 2011. C. Yu and T. Joachims. Learning structural svms with latent variables. In ICML, 2009. A. L. Yuille and A. Rangarajan. The Concave-Convex Procedure (CCCP). In NIPS, 2001. L. Zhu, Y. Chen, A. Yuille, and W. Freeman. Latent hierarchical structural learning for object detection. In CVPR, 2010. S.-C. Zhu and D. Mumford. A stochastic grammar ofimages. Found. Trends. Comput. Graph. Vis., 2006. 22556677