iccv iccv2013 iccv2013-46 iccv2013-46-reference knowledge-graph by maker-knowledge-mining

46 iccv-2013-Allocentric Pose Estimation


Source: pdf

Author: M. José Antonio, Luc De_Raedt, Tinne Tuytelaars

Abstract: The task of object pose estimation has been a challenge since the early days of computer vision. To estimate the pose (or viewpoint) of an object, people have mostly looked at object intrinsic features, such as shape or appearance. Surprisingly, informative features provided by other, external elements in the scene, have so far mostly been ignored. At the same time, contextual cues have been shown to be of great benefit for related tasks such as object detection or action recognition. In this paper, we explore how information from other objects in the scene can be exploited for pose estimation. In particular, we look at object configurations. We show that, starting from noisy object detections and pose estimates, exploiting the estimated pose and location of other objects in the scene can help to estimate the objects’ poses more accurately. We explore both a camera-centered as well as an object-centered representation for relations. Experiments on the challenging KITTI dataset show that object configurations can indeed be used as a complementary cue to appearance-based pose estimation. In addition, object-centered relational representations can also assist object detection.


reference text

[1] S. Y. Bao, M. Bagra, Y.-W. Chao, and S. Savarese. Semantic structure from motion with points, regions, and objects. In CVPR, 2012. 2

[2] S. Y. Bao and S. Savarese. Semantic structure from motion. In CVPR, 2011. 2

[3] S. Y.-Z. Bao, M. Sun, and S. Savarese. Toward coherent object detection and scene layout understanding. In CVPR, 2010. 4

[4] R. G. Cinbis and S. Sclaroff. Contextual object detection using set-based classification. In ECCV, 2012. 2

[5] C. Desai, D. Ramanan, and C. C. Fowlkes. Discriminative models for multi-class object layout. IJCV, 2011. 1, 2

[6] S. K. Divvala, D. Hoiem, J. H. Hays, A. A. Efros, and M. Hebert. An empirical study of context in object detection. In CVPR, June 2009. 2

[7] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The PASCAL Visual Object Classes Challenge 2012 Results. 3, 5, 6

[8] D. A. Forsyth, J. Malik, M. M. Fleck, H. Greenspan, T. Leung, S. Belongie, C. Carson, and C. Bregler. Finding pictures of objects in large collections of images. In ECCV, 1996. 2

[9] C. Galleguillos, B. McFee, S. Belongie, and G. R. G. Lanckriet. Multi-class object localization by combining local contextual interactions. In CVPR, 2010. 2

[10] A. Geiger, P. Lenz, and R. Urtasun. Are we ready for autonomous driving? the kitti vision benchmark suite. In CVPR, 2012. 1, 5

[11] A. Geiger, M. Roser, and R. Urtasun. Efficient large-scale stereo matching. In ACCV, 2010. 4

[12] A. Geiger, C. Wojek, and R. Urtasun. Joint 3d estimation of objects and scene layout. In NIPS, 2011. 2, 4, 5, 6, 7

[13] D. Glasner, M. Galun, S. Alpert, R. Basri, and G. Shakhnarovich. Viewpoint-aware object detection and pose estimation. In CVPR, 2011. 5

[14] G. Heitz and D. Koller. Learning spatial context: Using stuff to find things. In ECCV, 2008. 1

[15] D. Hoiem, A. A. Efros, and M. Hebert. Putting objects in perspective. In CVPR, 2006. 4

[16] D. Hoiem, C. Rother, and J. M. Winn. 3d layoutcrf for multiview object class recognition and segmentation. In CVPR, 2007. 1, 2

[17] L. G. Hulbert, M. L. Corrła da Silva, and G. Adegboyega. Cooperation in social dilemmas and allocentrism: a social values approach. European Journal of Social Psychology, 2001. 2

[18] A. Jain, A. Gupta, and L. S. Davis. Learning what and how of contextual models for scene labeling. In ECCV, 2010. 2

[19] C. Li, D. Parikh, and T. Chen. Automatic discovery ofgroups of objects for scene understanding. In CVPR, 2012. 2

[20] J. Liebelt and C. Schmid. Multi-view object class detection with a 3d geometric model. In CVPR, 2010. 1, 2, 5

[21] R. J. Lopez-Sastre, T. Tuytelaars, and S. Savarese. Deformable part models revisited: A performance evaluation for object category pose estimation. In ICCV WS, 2011. 1, 2, 4, 5, 6, 7

[22] F. Lv, T. Zhao, and R. Nevatia. Camera calibration from video of a walking human. TPAMI, 2006. 4

[23] S. A. Macskassy and F. J. Provost. Classification in networked data: A toolkit and a univariate case study. JMLR, 2007. 1, 3

[24] T. Malisiewicz and A. A. Efros. Beyond categories: The visual memex model for reasoning about object relationships. In NIPS, 2009. 2

[25] A. Oliva and A. Torralba. Modeling the shape of the scene: A holistic representation of the spatial envelope. IJCV, 2001 . 1

[26] D. M. P. Felzenszwalb, R. Girshick. Cascade object detection with deformable part models. In CVPR, 2010. 4

[27] B. Pepik, M. Stark, P. Gehler, and B. Schiele. Teaching 3d geometry to deformable part models. In CVPR, 2012. 1, 2, 5

[28] R. Perko and A. Leonardis. A framework for visual-contextaware object detection in still images. CVIU, 2010. 2, 4

[29] M. Ristin, J. Gall, and L. van Gool. Local context priors for object proposal generation. In (ACCV), 2012. 2

[30] M. A. Sadeghi and A. Farhadi. Recognition using visual [3 1]

[32]

[33]

[34]

[35]

[36]

[37] phrases. In CVPR, 2011. 2 S. Savarese and L. Fei-Fei. 3d generic object categorization, localization and pose estimation. In ICCV, 2007. 1, 2, 5 P. Sen, G. Namata, M. Bilgic, and L. Getoor. Collective classification. In Encyclopedia of Machine Learning. 2010. 1 A. Torralba, K. P. Murphy, and W. T. Freeman. Contextual models for object detection using boosted random fields. In NIPS, 2004. 2 H. C. Triandis and E. M. Suh. Cultural influences on personality. Annual Review of Psychology, 2002. 2 M. Wand and M. Jones. Kernel smoothing, 1995. Chapman & Hall CRC. 5 X. Wang and E. Grimson. Spatial latent dirichlet allocation. In NIPS, 2007. 2 C. Wojek, S. Walk, S. Roth, K. Schindler, and B. Schiele. Monocular Visual Scene Understanding: Understanding Multi-Object Traffic Scenes. TPAMI, 2013. 2 296