cvpr cvpr2013 cvpr2013-40 cvpr2013-40-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Chunyu Wang, Yizhou Wang, Alan L. Yuille
Abstract: We address action recognition in videos by modeling the spatial-temporal structures of human poses. We start by improving a state of the art method for estimating human joint locations from videos. More precisely, we obtain the K-best estimations output by the existing method and incorporate additional segmentation cues and temporal constraints to select the “best” one. Then we group the estimated joints into five body parts (e.g. the left arm) and apply data mining techniques to obtain a representation for the spatial-temporal structures of human actions. This representation captures the spatial configurations ofbodyparts in one frame (by spatial-part-sets) as well as the body part movements(by temporal-part-sets) which are characteristic of human actions. It is interpretable, compact, and also robust to errors on joint estimations. Experimental results first show that our approach is able to localize body joints more accurately than existing methods. Next we show that it outperforms state of the art action recognizers on the UCF sport, the Keck Gesture and the MSR-Action3D datasets.
[1] M. Blank, L. Gorelick, E. Shechtman, M. Irani, and R. Basri. Actions as space-time shapes. In ICCV, volume 2, pages 1395–1402. IEEE, 2005.
[2] A. Bobick and J. Davis. The recognition of human movement using temporal templates. PAMI, IEEE Transactions on, 23(3):257–267, 2001.
[3] I. B ¨ulthoff, H. B ¨ulthoff, P. Sinha, et al. Top-down influences on stereoscopic depth-perception. Nature neuroscience, 1(3):254–257, 1998.
[4] L. Campbell and A. Bobick. Recognition of human body motion using phase space constraints. In ICCV, pages 624– 630. IEEE, 1995.
[5] C.-C. Chang and C.-J. Lin. LIBSVM: A library for support vector machines. ACM Transactions on IST, 2:27: 1–27:27, 2011. Software available at http : / /www . cs ie .ntu . edu .tw/ ˜ c j l in/ l ibsvm.
[6] G. Dong and J. Li. Efficient mining of emerging patterns: Discovering trends and differences. In ACM SIGKDD, pages 43–52. ACM, 1999.
[7] A. Efros, A. Berg, G. Mori, and J. Malik. Recognizing action at a distance. In ICCV, pages 726–733. IEEE, 2003.
[8] P. Felzenszwalb, D. McAllester, and D. Ramanan. A dis-
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18] criminatively trained, multiscale, deformable part model. In CVPR, pages 1–8. IEEE, 2008. A. Gilbert, J. Illingworth, and R. Bowden. Scale invariant action recognition using compound features mined from dense spatio-temporal corners. ECCV, pages 222–233, 2008. N. Ikizler and P. Duygulu. Human action recognition using distribution of oriented rectangular patches. Human Motion– Understanding, Modeling, Capture and Animation, pages 271–284, 2007. Z. Jiang, Z. Lin, and L. Davis. Recognizing human actions by learning and matching shape-motion prototype trees. PAMI, 34(3):533–547, 2012. V. Kazemi and J. Sullivan. Using richer models for articulated pose estimation of footballers. A. Kovashka and K. Grauman. Learning a hierarchy of discriminative space-time neighborhood feature for human action recognition. In CVPR, pages 2046–2053. IEEE, 2010. I. Laptev. On space-time interest points. IJCV, 64(2): 107– 123, 2005. W. Li, Z. Zhang, and Z. Liu. Action recognition based on a bag of 3d points. In CVPR workshop, pages 9–14. IEEE, 2010. S. Maji, L. Bourdev, and J. Malik. Action recognition from a distributed representation of pose and appearance. In CVPR, pages 3177–3184. IEEE, 2011. M. Rodriguez, J. Ahmed, and M. Shah. Action mach a spatio-temporal maximum average correlation height filter for action recognition. In CVPR, pages 1–8, june 2008. S. Sadanand and J. Corso. Action bank: A high-level representation of activity in video. In CVPR, pages 1234–1241 . IEEE, 2012.
[19] C. Schuldt, I. Laptev, and B. Caputo. Recognizing human actions: A local svm approach. In ICPR, volume 3, pages 32–36. IEEE, 2004.
[20] H. Wang, A. Klaser, C. Schmid, and C. Liu. Action recognition by dense trajectories. In CVPR, pages 3 169–3 176. IEEE, 2011.
[21] J. Wang, Z. Liu, Y. Wu, and J. Yuan. Mining actionlet ensemble for action recognition with depth cameras. In CVPR, pages 1290–1297. IEEE, 2012.
[22] L. Wang, Y. Wang, T. Jiang, and W. Gao. Instantly telling what happens in a video sequence using simple features. In CVPR, pages 3257–3264. IEEE, 2011.
[23] S. Wu, O. Oreifej, and M. Shah. Action recognition in videos acquired by a moving camera using motion decomposition of lagrangian particle trajectories. In ICCV, pages 1419–1426. IEEE, 2011.
[24] X. Wu, D. Xu, L. Duan, and J. Luo. Action recognition using context and appearance distribution features. In CVPR, pages 489–496. IEEE, 2011.
[25] R. Xu, P. Agarwal, S. Kumar, V. Krovi, and J. Corso. Combining skeletal pose with local motion for human activity recognition. Articulated Motion and Deformable Objects, pages 114–123, 2012.
[26] Y. Yacoob and M. Black. Parameterized modeling and recognition of activities. In ICCV, pages 120–127. IEEE, 1998.
[27] Y. Yang and D. Ramanan. Articulated pose estimation with flexible mixtures-of-parts. In CVPR, pages 1385–1392. IEEE, 2011. 999992222222000