iccv iccv2013 iccv2013-8 iccv2013-8-reference knowledge-graph by maker-knowledge-mining

8 iccv-2013-A Deformable Mixture Parsing Model with Parselets

Source: pdf

Author: Jian Dong, Qiang Chen, Wei Xia, Zhongyang Huang, Shuicheng Yan

Abstract: In this work, we address the problem of human parsing, namely partitioning the human body into semantic regions, by using the novel Parselet representation. Previous works often consider solving the problem of human pose estimation as the prerequisite of human parsing. We argue that these approaches cannot obtain optimal pixel level parsing due to the inconsistent targets between these tasks. In this paper, we propose to use Parselets as the building blocks of our parsing model. Parselets are a group of parsable segments which can generally be obtained by lowlevel over-segmentation algorithms and bear strong semantic meaning. We then build a Deformable Mixture Parsing Model (DMPM) for human parsing to simultaneously handle the deformation and multi-modalities of Parselets. The proposed model has two unique characteristics: (1) the possible numerous modalities of Parselet ensembles are exhibited as the “And-Or” structure of sub-trees; (2) to further solve the practical problem of Parselet occlusion or absence, we directly model the visibility property at some leaf nodes. The DMPM thus directly solves the problem of human parsing by searching for the best graph configura- tionfrom apool ofParselet hypotheses without intermediate tasks. Comprehensive evaluations demonstrate the encouraging performance of the proposed approach.

reference text

[1] R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, and S. Susstrunk. Slic superpixels compared to state-of-the-art superpixel methods. TPAMI, 2012.

[2] P. Arbel ´aez, B. Hariharan, C. Gu, S. Gupta, L. Bourdev, and J. Malik. Semantic segmentation using regions and parts. In CVPR, 2012.

[3] P. Arbelaez, M. Maire, C. Fowlkes, and J. Malik. Contour detection and hierarchical image segmentation. TPAMI, 201 1.

[4] J. Carreira, R. Caseiro, J. Batista, and C. Sminchisescu. Semantic segmentation with second-order pooling. In ECCV. 2012.

[5] J. Carreira and C. Sminchisescu. Cpmc: Automatic object segmentation using constrained parametric min-cuts. TPAMI, 2012.

[6] K. Chatfield, V. Lempitsky, and A. Vedaldi. The devil is in the details: an evaluation of recent feature encoding methods. In BMVC, 2011.

[7] H. Chen, A. Gallagher, and B. Girod. Describing clothing by semantic attributes. In ECCV. 2012.

[8] S. chun Zhu and D. Mumford. A stochastic grammar of images. In Foundations and Trends in Computer Graphics and Vision, 2006.

[9] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In CVPR, 2005.

[10] I. Endres and D. Hoiem. Category independent object proposals. ECCV, 2010.

[11] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results.

[12] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan. Object Detection with Discriminatively Trained Part-Based Models. TPAMI, 2010.

[13] P. F. Felzenszwalb and D. P. Huttenlocher. Pictorial structures for object recognition. IJCV, 2005.

[14] P. F. Felzenszwalb and D. P. Huttenlocher. Distance transforms of sampled functions. Theory of Computing, 2012.

[15] J. S. Florent Perronnin and T. Mensink. Improving the Fisher Kernel for LargeScale Image Classification. In ECCV, 2010.

[16] A. C. Gallagher and T. Chen. Clothing cosegmentation for recognizing people. In CVPR, 2008.

[17] C. Gu, J. J. Lim, P. Arbel ´aez, and J. Malik. Recognition using regions. In CVPR, 2009.

[18] T. Joachims, T. Finley, and C.-N. J. Yu. Cutting-plane training of structural svms. Machine Learning, 2009.

[19] P. Kohli, J. Rihan, M. Bray, and P. H. Torr. Simultaneous segmentation and pose estimation of humans using dynamic graph cuts. IJCV, 2008.

[20] S. Liu, J. Feng, Z. Song, T. Zhang, H. Lu, C. Xu, and S. Yan. Hi, magic closet, tell me what to wear! In ACM MM, 2012.

[21] S. Liu, Z. Song, G. Liu, C. Xu, H. Lu, and S. Yan. Street-to-shop: Cross-

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30] [3 1]

[32]

[33]

[34] scenario clothing retrieval via parts alignment and auxiliary set. In CVPR, 2012. X. Liu, L. Lin, S.-C. Zhu, and H. Jin. Trajectory parsing by cluster sampling in spatio-temporal graph. In CVPR, 2009. D. Lowe. Distinctive image features from scale-invariant keypoints. IJCV, 2004. B. C. Russell, W. T. Freeman, A. A. Efros, J. Sivic, and A. Zisserman. Using multiple segmentations to discover objects and their extent in image collections. In CVPR, 2006. M. Sun and S. Savarese. Articulated part-based model for joint object detection and pose estimation. In ICCV, 2011. P. H. Torr and A. Zisserman. Human pose estimation using a joint pixel-wise and part-wise formulation. 2013. K. E. A. van de Sande, J. R. R. Uijlings, T. Gevers, and A. W. M. Smeulders. Segmentation as selective search for object recognition. In ICCV, 2011. A. Vedaldi, V. Gulshan, M. Varma, and A. Zisserman. Multiple kernels for object detection. In ICCV, 2009. J. Wang, J. Yang, K. Yu, F. Lv, and T. Huang. Locality-constrained linear coding for image classification. In CVPR, 2010. X. Wang and T. Zhang. Clothes search in consumer photos via color matching and attribute learning. In ACM MM, 201 1. K. Yamaguchi, M. H. Kiapour, L. E. Ortiz, and T. L. Berg. Parsing clothing in fashion photographs. In CVPR, 2012. Y. Yang and D. Ramanan. Articulated pose estimation with flexible mixturesof-parts. In CVPR, 2011. L. Zhu, Y. Chen, Y. Lu, C. Lin, and A. Yuille. Max margin and/or graph learning for parsing the human body. In CVPR, 2008. X. Zhu and D. Ramanan. Face detection, pose estimation, and landmark localization in the wild. In CVPR, 2012. 33440158