cvpr cvpr2013 cvpr2013-60 cvpr2013-60-reference knowledge-graph by maker-knowledge-mining

60 cvpr-2013-Beyond Physical Connections: Tree Models in Human Pose Estimation


Source: pdf

Author: Fang Wang, Yi Li

Abstract: Simple tree models for articulated objects prevails in the last decade. However, it is also believed that these simple tree models are not capable of capturing large variations in many scenarios, such as human pose estimation. This paper attempts to address three questions: 1) are simple tree models sufficient? more specifically, 2) how to use tree models effectively in human pose estimation? and 3) how shall we use combined parts together with single parts efficiently? Assuming we have a set of single parts and combined parts, and the goal is to estimate a joint distribution of their locations. We surprisingly find that no latent variables are introduced in the Leeds Sport Dataset (LSP) during learning latent trees for deformable model, which aims at approximating the joint distributions of body part locations using minimal tree structure. This suggests one can straightforwardly use a mixed representation of single and combined parts to approximate their joint distribution in a simple tree model. As such, one only needs to build Visual Categories of the combined parts, and then perform inference on the learned latent tree. Our method outperformed the state of the art on the LSP, both in the scenarios when the training images are from the same dataset and from the PARSE dataset. Experiments on animal images from the VOC challenge further support our findings.


reference text

[1] Y. Tian, C. Lawrence Zitnick, and Srinivasa G. Narasimhan, “Exploring the spatial hierarchy of mixture models for human pose estimation,” in ECCV (5), 2012, pp. 256–269.

[2] Y. Wang, D. Tran, and Z. Liao, “Learning hierarchical poselets for human parsing,” in Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on. IEEE, 2011, pp. 1705–1712.

[3] S. Johnson and M. Everingham, “Clustered pose and nonlinear appearance models for human pose estimation,” in Proceedings of the British Machine Vision Conference, 2010, doi: 10.5244/C.24.12.

[4] Y. Yang and D. Ramanan, “Articulated pose estimation with flexible mixtures-of-parts,” in Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on. IEEE, 2011, pp. 1385–1392.

[5] P. F. Felzenszwalb and D. P. Huttenlocher, “Pictorial structures for object recognition,” Int. J. Comput. Vision, vol. 61, no. 1, pp. 55–79, Jan. 2005.

[6] D. Ramanan, “Learning to parse images of articulated bodies,” Advances in Neural Information Processing Systems, vol. 19, pp. 1129, 2007.

[7] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman, “The PASCAL Visual Object Classes Challenge 2009 (VOC2009) Results,” http://www.pascalnetwork.org/challenges/VOC/voc2009.

[8] N. Dalal and B. Triggs, “Histograms oforiented gradients for human detection,” in Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on. IEEE, 2005, vol. 1, pp. 886–893.

[9] L. Bourdev, S. Maji, T. Brox, and J. Malik, “Detecting people using mutually consistent poselet activations,” Computer Vision–ECCV 2010, pp. 168–181, 2010.

[10] M. Sun and S. Savarese, “Articulated part-based model for joint object detection and pose estimation,” in Computer Vi-

[11]

[12]

[13]

[14]

[15]

[16]

[17] sion (ICCV), 2011 IEEE International Conference on. IEEE, 2011, pp. 723–730. M. J. Choi, V. Tan, A. Anandkumar, and A. S. Willsky, “Learning latent tree graphical models,” J. Mach. Learn. Res., vol. 12, pp. 1771–1812, July 2011. C. Chow and C. Liu, “Approximating discrete probability distributions with dependence trees,” Information Theory, IEEE Transactions on, vol. 14, no. 3, pp. 462–467. P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan, “Object detection with discriminatively trained partbased models,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 32, no. 9, pp. 1627–1645, 2010. S. Divvala, A. Efros, and M. Hebert, “How important are deformable parts in the deformable parts model?,” CoRR, vol. abs/1206.3714, 2012. B. Sapp, A. Toshev, and B. Taskar, “Cascaded models for articulated pose estimation,” in ECCV (2), 2010. V. Ferrari, M. J. Mar ı´n-Jim e´nez, and A. Zisserman, “Progressive search space reduction for human pose estimation,” in CVPR, 2008. S. Johnson and M. Everingham, “Learning effective human pose estimation from inaccurate annotation,” in CVPR, 2011, pp. 1465–1472. 666000113