cvpr cvpr2013 cvpr2013-205 cvpr2013-205-reference knowledge-graph by maker-knowledge-mining

205 cvpr-2013-Hollywood 3D: Recognizing Actions in 3D Natural Scenes

Source: pdf

Author: Simon Hadfield, Richard Bowden

Abstract: Action recognition in unconstrained situations is a difficult task, suffering from massive intra-class variations. It is made even more challenging when complex 3D actions are projected down to the image plane, losing a great deal of information. The recent emergence of 3D data, both in broadcast content, and commercial depth sensors, provides the possibility to overcome this issue. This paper presents a new dataset, for benchmarking action recognition algorithms in natural environments, while making use of 3D information. The dataset contains around 650 video clips, across 14 classes. In addition, two state of the art action recognition algorithms are extended to make use ofthe 3D data, andfive new interestpoint detection strategies are alsoproposed, that extend to the 3D data. Our evaluation compares all 4 feature descriptors, using 7 different types of interest point, over a variety of threshold levels, for the Hollywood3D dataset. We make the dataset including stereo video, estimated depth maps and all code required to reproduce the benchmark results, available to the wider community.

reference text

[1] P. Beaudet. Rotationally invariant image operators. In Joint Conference on Pattern Recognition, 1978.

[2] M. Blank, L. Gorelick, E. Shechtman, M. Irani, and R. Basri. Actions as space-time shapes. In ICCV, 2005.

[3] M. Brand, N. Oliver, and A. Pentland. Coupled hidden markov models for complex action recognition. In CVPR, 1997.

[4] Chalearn. ChaLearn Gesture Dataset (CGD2011), 2011.

[5] Z. Cheng, L. Qin, Y. Ye, Q. Huang, and Q. Tian. Human daily action analysis with multi-view and color-depth data. In ECCVW, 2012.

[6] N. Dalal, B. Triggs, and C. Schmid. Human detection using oriented histograms of flow and appearance. In ECCV, 2006.

[7] P. Dollar, V. Rabaud, G. Cottrell, and S. Belongie. Behavior recognition via sparse spatio-temporal features. In Visual Surveillance and Evaluation Workshop, 2005.

[8] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The pascal visual object classes (voc) challenge. IJCV, June 2010.

[9] A. Gilbert, J. Illingworth, and R. Bowden. Fast realistic multi-action recognition using mined dense spatio-temporal features. In ICCV. IEEE, 2009.

[10] D. Han, L. Bo, and C. Sminchisescu. Selection and context for action recognition. In ICCV. IEEE, 2009.

[11] C. Harris and M. Stephens. A combined corner and edge

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23] detector. In Alvey Vision conference, 1988. A. Klaser, M. Marszalek, and C. Schmid. A spatio-temporal descriptor based on 3d-gradients. In BMVC, 2008. I. Laptev and T. Lindeberg. Space-time interest points. In ICCV, 2003. I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld. Learning realistic human actions from movies. In CVPR, 2008. W. Li, Z. Zhang, and Z. Liu. Action recognition based on a bag of 3d points. In CVPRW, 2010. M. Marszalek, I. Laptev, and C. Schmid. Actions in context. In CVPR, 2005. T. Moeslund, A. Hilton, and V. Kruger. Survey of advances in vision based motion capture & analysis. CVIU, 2006. O. Oshin, A. Gilbert, and R. Bowden. Machine Learning for Human Motion Analysis, chapter Learning to Recognise Spatio-Temporal Interest Points. IGI Publishing, 2010. O. Oshin, A. Gilbert, and R. Bowden. Capturing the relative distribution of features for action recognition. In Face and Gesture Workshop, 2011. C. Richardt, D. Orr, I. Davies, A. Criminisi, and N. Dodgson. Real-time spatiotemporal stereo matching using the dual-cross-bilateral grid. In ECCV, 2010. C. Schuldt, I. Laptev, and B. Caputo. Recognizing human actions: a local svm approach. In ICPR, 2004. P. Scovanner, S. Ali, and M. Shah. A 3-dimensional sift descriptor and its application to action recognition. In International conference on Multimedia, 2007. T. Tuytelaars and K. Mikolajczyk. Local invariant feature detectors - a survey. Foundations and Trends in Computer Graphics and Vision, 2008.

[24] H. Wang, A. Klaser, C. Schmid, and C.-L. Liu. Action recognition by dense trajectories. In CVPR, 2011.

[25] H. Wang, M. M. Ullah, A. Klaser, I. Laptev, and C. Schmid. Evaluation of local spatio-temporal features for action recognition. In BMVC, 2009.

[26] G. Willems, T. Tuytelaars, and L. Van Gool. An efficient dense and scale-invariant spatio-temporal interest point detector. In ECCV, 2008. 333444000533