iccv iccv2013 iccv2013-204 iccv2013-204-reference knowledge-graph by maker-knowledge-mining

204 iccv-2013-Human Attribute Recognition by Rich Appearance Dictionary


Source: pdf

Author: Jungseock Joo, Shuo Wang, Song-Chun Zhu

Abstract: We present a part-based approach to the problem of human attribute recognition from a single image of a human body. To recognize the attributes of human from the body parts, it is important to reliably detect the parts. This is a challenging task due to the geometric variation such as articulation and view-point changes as well as the appearance variation of the parts arisen from versatile clothing types. The prior works have primarily focused on handling . edu . cn ???????????? geometric variation by relying on pre-trained part detectors or pose estimators, which require manual part annotation, but the appearance variation has been relatively neglected in these works. This paper explores the importance of the appearance variation, which is directly related to the main task, attribute recognition. To this end, we propose to learn a rich appearance part dictionary of human with significantly less supervision by decomposing image lattice into overlapping windows at multiscale and iteratively refining local appearance templates. We also present quantitative results in which our proposed method outperforms the existing approaches.


reference text

[1] R. Benenson, M. Mathias, T. Tuytelaars, and L. Van Gool. Seeking the strongest rigid detector. In CVPR, 2013.

[2] L. Bourdev, S. Maji, and J. Malik. Describing people: Poselet-based attribute classification. In ICCV, 2011.

[3] L. Bourdev and J. Malik. Poselets: Body part detectors trained using 3d human pose annotations. In ICCV, 2009.

[4] L. Cao, M. Dikmen, Y. Fu, and T. S. Huang. Gender recognition from body. In ACM MM, 2008.

[5] H. Chen, A. Gallagher, and B. Girod. Describing clothing by semantic attributes. In ECCV, 2012.

[6] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In CVPR, 2005.

[7] I. Endres, K. J. Shih, J. Jiaa, and D. Hoiem. Learning collections of part models for object recognition. In CVPR, 2013.

[8] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan. Object detection with discriminatively trained partbased models. TPAMI, 32: 1627–1645, 2010.

[9] B. A. Golomb, D. T. Lawrence, and T. J. Sejnowski. Sexnet: A neural network identifies sex from human faces. In NIPS, 1990.

[10] S. Gutta, H. Wechsler, and P. J. Phillips. Gender and ethnic classification of face images. In FG, 1998. 772277 (right) the most negative images for each attribute. The red boxes denote the bounding boxes and each blue box represents a part detection whose contribution to prediction is the biggest.

[11] Y. Jia, C. Huang, and T. Darrell. Beyond spatial pyramids: Receptive field learning for pooled image features. In CVPR, 2012.

[12] N. Kumar, A. Berg, P. N. Belhumeur, and S. Nayar. Describable visual attributes for face verification and image search. TPAMI, 33(10): 1962–1977, 2011.

[13] Y. H. Kwon and N. da Vitoria Lobo. Age classification from facial images. CVIU, 1999.

[14] S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In CVPR, 2006.

[15] G. Sharma and F. Jurie. Learning discriminative spatial representation for image classification. In BMVC, 2011.

[16] G. Sharma, F. Jurie, and C. Schmid. Expanded parts model for human attribute and action recognition in still images. In CVPR, 2013.

[17] S. Singh, A. Gupta, and A. A. Efros. Unsupervised discovery of mid-level discriminative patches. In ECCV, 2012.

[18] X. Song, T. Wu, Y. Jia, and S.-C. Zhu. Discriminatively trained and-or tree models for object detection. In CVPR, 2013.

[19] D. A. Vaquero, R. S. Feris, D. Tran, L. M. G. Brown, A. Hampapur, and M. Turk. Attribute-based people search in surveillance environments. In WACV, 2009.

[20] S. Wang, J. Joo, Y. Wang, and S.-C. Zhu. Weakly supervised learning for attribute localization in outdoor scenes. In CVPR, 2013.

[21] Y. Yang and D. Ramanan. Articulated pose estimation with flexible mixtures-of-parts. In CVPR, 2011.

[22] B. Yao, X. Yang, L. Lin, M. W. Lee, and S. C. Zhu. I2T: Image parsing to text description. Proceedings of the IEEE, pages 1485–1508, 2010. 772288