iccv iccv2013 iccv2013-344 iccv2013-344-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Jian-Fang Hu, Wei-Shi Zheng, Jianhuang Lai, Shaogang Gong, Tao Xiang
Abstract: Human action can be recognised from a single still image by modelling Human-object interaction (HOI), which infers the mutual spatial structure information between human and object as well as their appearance. Existing approaches rely heavily on accurate detection of human and object, and estimation of human pose. They are thus sensitive to large variations of human poses, occlusion and unsatisfactory detection of small size objects. To overcome this limitation, a novel exemplar based approach is proposed in this work. Our approach learns a set of spatial pose-object interaction exemplars, which are density functions describing how a person is interacting with a manipulated object for different activities spatially in a probabilistic way. A representation based on our HOI exemplar thus has great potential for being robust to the errors in human/object detection and pose estimation. A new framework consists of a proposed exemplar based HOI descriptor and an activity specific matching model that learns the parameters is formulated for robust human activity recog- nition. Experiments on two benchmark activity datasets demonstrate that the proposed approach obtains state-ofthe-art performance.
[1] C. S. A. Prest and J. Malik. Weakly supervised learning ofinteractions between humans and objects. TPAMI, 34(3):601–
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12] 614, 2012. 1, 2, 4, 5, 6 A. Bosch, A. Zisserman, and X. Munoz. Image classification using random forests and ferns. ICCV, 2007. 4 V. Delaitre, I. Laptev, and J. Sivic. Recognizing human actions in still images: a study of bag-of-features and partbased representations. In Proc. BMVC, 2010. 1 V. Delaitre, J. Sivic, I. Laptev, et al. Learning person-object interactions for action recognition in still images. In NIPS, 2011. 1 C. Desai, D. Ramanan, and C. Fowlkes. Discriminative models for static human-object interactions. In Workshop on Structured Models in Computer Vision, 2010. 1, 5, 6 P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan. Object detection with discriminatively trained partbased models. TPAMI, 32(9):1627–1645, 2010. 6 R. Filipovych and E. Ribeiro. Recognizing primitive interactions by exploring actor-object states. In CVPR, 2008. 1 B. J. Frey and D. Dueck. Clustering by passing messages between data points. Science, 3 15:972–976, 2007. 2 A. Gupta, A. Kembhavi, and L. Davis. Observing humanobject interactions: Using spatial and functional compatibility for recognition. TPAMI, 31(10): 1775–1789, 2009. 2, 5, 6 J. H. J. Xiao, K. Ehinger, A. Oliva, and A. Torralba. Sun database: Large-scale scene recognition from abbey to zoo. CVPR, 2010. 2 H. Kjellstr o¨m, J. Romero, and D. Kragi ´c. Visual objectaction recognition: Inferring object affordances from human demonstration. CVIU, 115(1):81–90, 2011. 1 S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23] scene categories. In CVPR, 2006. 6, 7 S. Maji, L. Bourdev, and J. Malik. Action recognition from a distributed representation of pose and appearance. CVPR, 2011. 1, 2 T. Malisiewicz, A. Gupta, and A. A. Efros. Ensemble of exemplar-svms for object detection and beyond. ICCV, 2011. 2 G. Mori and J. Malik. Recovering 3d human body configurations using shape contexts. TPAMI, 28(7): 1052–1062, 2006. 2 A. Prest, V. Ferrari, and C. Schmid. Explicit modeling of human-object interactions in realistic videos. TPAMI, 35(4):835–848, 2013. 1 M. Rohrbach, S. Amin, M. Andriluka, and B. Schiele. A database for fine grained activity detection of cooking activities. In CVPR, 2012. 1 M. A. Sadeghi and A. Farhadi. Recognition using visual phrases. In CVPR, 2011. 1, 2 G. Sharma, F. Jurie, and C. Schmid. Discriminative spatial saliency for image classification. In CVPR, 2012. 1 A. Vedaldi and A. Zisserman. Efficient additive kernels via explicit feature maps. TPAMI, 34(3):480–492, 2012. 4 B. Yao and L. Fei-Fei. Grouplet: A structured image representation for recognizing human and object interactions. CVPR, 2010. 1, 2, 5, 6, 7 B. Yao and L. Fei-Fei. Action recognition with exemplar based 2.5 d graph matching. In ECCV. 2012. 2 B. Yao and L. Fei-Fei. Recognizing human-object interactions in still images by modeling the mutual context of objects and human poses. TPAMI, 34(9): 1691–1703, 2012. 1, 2, 5, 6, 7
[24] B. Yao, A. Khosla, and L. Fei-Fei. Combining randomization and discrimination for fine-grained image categorization. In CVPR, 2011. 1
[25] W.-S. Zheng, S. Gong, and T. Xiang. Quantifying and transferring contextual information in object detection. TPAMI, 34(4):762–777, 2012. 5 33 1154 14