nips nips2012 nips2012-40 nips2012-40-reference knowledge-graph by maker-knowledge-mining

40 nips-2012-Analyzing 3D Objects in Cluttered Images


Source: pdf

Author: Mohsen Hejrati, Deva Ramanan

Abstract: We present an approach to detecting and analyzing the 3D configuration of objects in real-world images with heavy occlusion and clutter. We focus on the application of finding and analyzing cars. We do so with a two-stage model; the first stage reasons about 2D shape and appearance variation due to within-class variation (station wagons look different than sedans) and changes in viewpoint. Rather than using a view-based model, we describe a compositional representation that models a large number of effective views and shapes using a small number of local view-based templates. We use this model to propose candidate detections and 2D estimates of shape. These estimates are then refined by our second stage, using an explicit 3D model of shape and viewpoint. We use a morphable model to capture 3D within-class variation, and use a weak-perspective camera model to capture viewpoint. We learn all model parameters from 2D annotations. We demonstrate state-of-the-art accuracy for detection, viewpoint estimation, and 3D shape reconstruction on challenging images from the PASCAL VOC 2011 dataset. 1


reference text

[1] M. Arie-Nachimson and R. Basri. Constructing implicit 3d shape models for pose estimation. In ICCV, 2009.

[2] T. Binford. Survey of model-based image analysis systems. The International Journal of Robotics Research, 1(1):18–64, 1982.

[3] V. Blanz and T. Vetter. A morphable model for the synthesis of 3d faces. In Proceedings of the 26th annual conference on Computer graphics and interactive techniques, pages 187–194. ACM Press/AddisonWesley Publishing Co., 1999.

[4] L. Bourdev and J. Malik. Poselets: Body part detectors trained using 3d human pose annotations. In Computer Vision, 2009 IEEE 12th International Conference on, pages 1365–1372. IEEE, 2009.

[5] K. Bowyer and C. Dyer. Aspect graphs: An introduction and survey of recent results. International Journal of Imaging Systems and Technology, 2(4):315–328, 1990.

[6] C. Chow and C. Liu. Approximating discrete probability distributions with dependence trees. Information Theory, IEEE Transactions on, 14(3):462–467, 1968.

[7] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In CVPR, 2005.

[8] C. Desai and D. Ramanan. Detecting actions, poses, and objects with relational phraselets. ECCV, 2012.

[9] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The PASCAL Visual Object Classes Challenge 2011 (VOC2011) Results. http://www.pascalnetwork.org/challenges/VOC/voc2011/workshop/index.html.

[10] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan. Object detection with discriminatively trained part based models. IEEE PAMI, 99(1), 5555.

[11] R. Girshick, P. Felzenszwalb, and D. McAllester. Object detection with grammar models. In NIPS, 2011.

[12] D. Glasner, M. Galun, S. Alpert, R. Basri, and G. Shakhnarovich. Viewpoint-aware object detection and pose estimation. In ICCV, pages 1275–1282. IEEE, 2011.

[13] C. Gu and X. Ren. Discriminative mixture-of-templates for viewpoint classification. ECCV, pages 408– 421, 2010.

[14] B. Horn. Robot vision. The MIT Press, 1986.

[15] S. Ioffe and D. Forsyth. Mixtures of trees for object recognition. In CVPR, 2001.

[16] T. Joachims, T. Finley, and C. Yu. Cutting plane training of structural SVMs. Machine Learning, 2009.

[17] M. Jones and P. Viola. Fast multi-view face detection. In CVPR 2003.

[18] Y. Li, L. Gu, and T. Kanade. A robust shape model for multi-view car alignment. In CVPR, 2009.

[19] R. Lopez-Sastre, T. Tuytelaars, and S. Savarese. Deformable part models revisited: A performance evaluation for object category pose estimation. In Computer Vision Workshops (ICCV Workshops), 2011.

[20] M. Meila and M. Jordan. Learning with mixtures of trees. JMLR, 1:1–48, 2001.

[21] P. Ott and M. Everingham. Shared parts for deformable part-based models. In CVPR, 2011.

[22] B. Pepik, M. Stark, P. Gehler, and B. Scheile. Teaching geometry to deformable part models. In CVPR, 2012.

[23] S. Savarese and L. Fei-Fei. 3d generic object categorization, localization and pose estimation. In ICCV, pages 1–8. IEEE, 2007.

[24] H. Schneiderman and T. Kanade. A statistical method for 3d object detection applied to faces and cars. In CVPR, volume 1, pages 746–751. IEEE, 2000.

[25] M. Sun, H. Su, S. Savarese, and L. Fei-Fei. A multi-view probabilistic model for 3d object classes. In CVPR, pages 1247–1254. IEEE, 2009.

[26] A. Thomas, V. Ferrar, B. Leibe, T. Tuytelaars, B. Schiel, and L. Van Gool. Towards multi-view object class detection. In CVPR, volume 2, pages 1589–1596. IEEE, 2006.

[27] A. Torralba, K. Murphy, and W. Freeman. Sharing visual features for multiclass and multiview object detection. PAMI, 29(5):854–869, 2007.

[28] L. Torresani, A. Hertzmann, and C. Bregler. Learning non-rigid 3d shape from 2d motion. Advances in Neural Information Processing Systems, 16, 2003.

[29] L. Torresani, D. Yang, E. Alexander, and C. Bregler. Tracking and modeling non-rigid objects with rank constraints. In CVPR, volume 1, pages I–493. IEEE, 2001.

[30] P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. In CVPR, volume 1, pages I–511. IEEE, 2001.

[31] Y. Yang and D. Ramanan. Articulated pose estimation with flexible mixtures-of-parts. In CVPR, 2011.

[32] L. Zhu, Y. Chen, A. Torralba, W. Freeman, and A. Yuille. Part and appearance sharing: Recursive compositional models for multi-view multi-object detection. Pattern Recognition, 2010.

[33] X. Zhu and D. Ramanan. Face detection, pose estimation, and landmark localization in the wild. In CVPR, 2012.

[34] M. Zia, M. Stark, B. Schiele, and K. Schindler. Revisiting 3d geometric models for accurate object shape and pose. In ICCV Workshops, pages 569–576. IEEE, 2011. 9