nips nips2006 nips2006-122 nips2006-122-reference knowledge-graph by maker-knowledge-mining

122 nips-2006-Learning to parse images of articulated bodies

Source: pdf

Author: Deva Ramanan

Abstract: We consider the machine vision task of pose estimation from static images, specifically for the case of articulated objects. This problem is hard because of the large number of degrees of freedom to be estimated. Following a established line of research, pose estimation is framed as inference in a probabilistic model. In our experience however, the success of many approaches often lie in the power of the features. Our primary contribution is a novel casting of visual inference as an iterative parsing process, where one sequentially learns better and better features tuned to a particular image. We show quantitative results for human pose estimation on a database of over 300 images that suggest our algorithm is competitive with or surpasses the state-of-the-art. Since our procedure is quite general (it does not rely on face or skin detection), we also use it to estimate the poses of horses in the Weizmann database. 1

reference text

[1] E. Borenstein and S. Ullman. Class-speciﬁc, top-down segmentation. In ECCV, 2002. Figure 7: Sample results. We show the original image, the initial edge-based parse, and the ﬁnal region-based parse. We are able to capture some extreme articulations. In many cases the posterior is ambiguous because the image is (ie, multiple people are present). In particular, it may be surprising that the pair in the bottom-right both are recognized by the region model – this suggests that the the iter-region dissimilarity learned by the color histograms is a much stronger than the foreground similarity. We quantify results in Table 1.

[2] M. Bray, P. Kohli, and P. Torr. Posecut: simultaneous segmentation and 3d pose estimation of humans using dynamic graph-cuts. In ECCV, 2006.

[3] P. F. Felzenszwalb and D. P. Huttenlocher. Pictorial structures for object recognition. Int. J. Computer Vision, 61(1), January 2005.

[4] M.-H. Y. Gang Hua and Y. Wu. Learning to estimate human pose with data driven belief propagation. In CVPR, 2005.

[5] M. Kumar, P. Torr, and A. Zisserman. Objcut. In CVPR, 2005. Figure 8: Sample results for horses. Our results tend to be quite good across the entire dataset of 300 images. Even though the horse model is fairly simplistic – a collection of rectangles similar to Fig. 6 – the posterior can capture rich non-rigid deformations of body parts. The Weizmann set of horses seems to be easier than our people dataset - we quantify this with a perplexity score in Table 1.

[6] M. Lee and I. Cohen. Proposal maps driven mcmc for estimating human body pose in static images. In CVPR, 2004.

[7] G. Mori, X. Ren, A. Efros, and J. Malik. Recovering human body conﬁgurations: Combining segmentation and recognition. In CVPR, 2004.

[8] D. Ramanan, D. Forsyth, and A. Zisserman. Strike a pose: Tracking people by ﬁnding stylized poses. In CVPR, June 2005.

[9] D. Ramanan and C. Sminchisescu. Training deformable models for localization. In CVPR, 2006.

[10] X. Ren, A. C. Berg, and J. Malik. Recovering human body conﬁgurations using pairwise constraints between parts. In ICCV, 2005.

[11] S. Russell and P. Norvig. Artiﬁcal Intelligence: A Modern Approach, chapter 23, pages 835–836. Prentice Hall, 2nd edition edition, 2003.

[12] J. Zhang, J. Luo, R. Collins, and Y. Liu. Body localization in still images using hierarchical models and hybrid search. In CVPR, 2006.