nips nips2012 nips2012-1 nips2012-1-reference knowledge-graph by maker-knowledge-mining

1 nips-2012-3D Object Detection and Viewpoint Estimation with a Deformable 3D Cuboid Model


Source: pdf

Author: Sanja Fidler, Sven Dickinson, Raquel Urtasun

Abstract: This paper addresses the problem of category-level 3D object detection. Given a monocular image, our aim is to localize the objects in 3D by enclosing them with tight oriented 3D bounding boxes. We propose a novel approach that extends the well-acclaimed deformable part-based model [1] to reason in 3D. Our model represents an object class as a deformable 3D cuboid composed of faces and parts, which are both allowed to deform with respect to their anchors on the 3D box. We model the appearance of each face in fronto-parallel coordinates, thus effectively factoring out the appearance variation induced by viewpoint. Our model reasons about face visibility patters called aspects. We train the cuboid model jointly and discriminatively and share weights across all aspects to attain efficiency. Inference then entails sliding and rotating the box in 3D and scoring object hypotheses. While for inference we discretize the search space, the variables are continuous in our model. We demonstrate the effectiveness of our approach in indoor and outdoor scenarios, and show that our approach significantly outperforms the stateof-the-art in both 2D [1] and 3D object detection [2]. 1


reference text

[1] Felzenszwalb, P. F., Girshick, R. B., McAllester, D., and Ramanan, D. (2010) Object detection with discriminatively trained part based models. IEEE TPAMI, 32, 1627–1645.

[2] Hedau, V., Hoiem, D., and Forsyth, D. (2010) Thinking inside the box: Using appearance models and context based on room geometry. ECCV, vol. 6, pp. 224–237.

[3] Hinterstoisser, S., Lepetit, V., Ilic, S., Fua, P., and Navab, N. (2010) Dominant orientation templates for real-time detection of texture-less objects. CVPR.

[4] Schneiderman, H. and Kanade, T. (2000) A statistical method for 3d object detection applied to faces and cars. CVPR, pp. 1746–1759.

[5] Torralba, A., Murphy, K. P., and Freeman, W. T. (2007) Sharing visual features for multiclass and multiview object detection. IEEE TPAMI, 29, 854–869.

[6] Gu, C. and Ren, X. (2010) Discriminative mixture-of-templates for viewpoint classification. ECCV, pp. 408–421.

[7] Lowe, D. (1991) Fitting parameterized three-dimensional models to images. IEEE TPAMI, 13, 441–450.

[8] Liebelt, J., Schmid, C., and Schertler, K. (2008) Viewpoint-independent object class detection using 3d feature maps. CVPR.

[9] Yan, P., Khan, S. M., and Shah, M. (2007) 3d model based oblect class detection in an arbitrary view. ICCV.

[10] Glasner, D., Galun, M., Alpert, S., Basri, R., and Shakhnarovich, G. (2011) Viewpoint-aware object detection and pose estimation. ICCV.

[11] Savarese, S. and Fei-Fei, L. (2007) 3d generic object categorization, localization and pose estimation. ICCV.

[12] Su, H., Sun, M., Fei-Fei, L., and Savarese, S. (2009) Learning a dense multi-view representation for detection, viewpoint classification and synthesis of object categories. ICCV.

[13] Pepik, B., Stark, M., Gehler, P., and Schiele, B. (2012) Teaching 3d geometry to deformable part models. Belongie, S., Blake, A., Luo, J., and Yuille, A. (eds.), CVPR.

[14] Dalal, N. and Triggs, B. (2005) Histograms of oriented gradients for human detection. CVPR.

[15] Koenderink, J. and van Doorn, A. (1976) The singularities of the visual mappings. Bio. Cyber., 24, 51–59.

[16] Geiger, A., Lenz, P., and Urtasun, R. (2012) Are we ready for autonomous driving? CVPR.

[17] Kushal, A., Schmid, C., and Ponce, J. (2007) Flexible object models for category-level 3d object recognition. CVPR.

[18] Thomas, A., Ferrari, V., Leibe, B., Tuytelaars, T., Schiele, B., and Gool, L. V. (2006) Toward multi-view object class detection. CVPR.

[19] Hoiem, D., Rother, C., and Winn, J. (2007) 3d layoutcrf for multi-view object class recognition and segmentation. CVPR.

[20] Sun, M., Su, H., Savarese, S., and Fei-Fei, L. (2009) A multi-view probabilistic model for 3d oblect classes. CVPR.

[21] Payet, N. and Todorovic, S. (2011) Probabilistic pose recovery using learned hierarchical object models. ICCV.

[22] Stark, M., Goesele, M., and Schiele, B. (2010) Back to the future: Learning shape models from 3d cad data. British Machine Vision Conference.

[23] Brooks, R. A. (1983) Model-based three-dimensional interpretations of two-dimensional images. IEEE TPAMI, 5, 140–150.

[24] Dickinson, S. J., Pentland, A. P., and Rosenfeld, A. (1992) 3-d shape recovery using distributed aspect matching. IEEE TPAMI, 14, 174–198.

[25] Sun, M., Bradski, G., Xu, B.-X., and Savarese, S. (2010) Depth-encoded hough voting for coherent object detection, pose estimation, and shape recovery. ECCV.

[26] Xiang, Y. and Savarese, S. (2012) Estimating the aspect layout of object categories. CVPR.

[27] Yu, C.-N. and Joachims, T. (2009) Learning structural svms with latent variables. ICML.

[28] Schwing, A., Hazan, T., Pollefeys, M., and Urtasun, R. (2012) Efficient structured prediction for 3d indoor scene understanding. CVPR. 9