cvpr cvpr2013 cvpr2013-378 cvpr2013-378-reference knowledge-graph by maker-knowledge-mining

378 cvpr-2013-Sampling Strategies for Real-Time Action Recognition

Source: pdf

Author: Feng Shi, Emil Petriu, Robert Laganière

Abstract: Local spatio-temporal features and bag-of-features representations have become popular for action recognition. A recent trend is to use dense sampling for better performance. While many methods claimed to use dense feature sets, most of them are just denser than approaches based on sparse interest point detectors. In this paper, we explore sampling with high density on action recognition. We also investigate the impact of random sampling over dense grid for computational efficiency. We present a real-time action recognition system which integrates fast random sampling method with local spatio-temporal features extracted from a Local Part Model. A new method based on histogram intersection kernel is proposed to combine multiple channels of different descriptors. Our technique shows high accuracy on the simple KTH dataset, and achieves state-of-the-art on two very challenging real-world datasets, namely, 93% on KTH, 83.3% on UCF50 and 47.6% on HMDB51.

reference text

[1] A. Agarwal and B. Triggs. Hyperfeatures - multilevel local coding for visual recognition. In ECCV, 2006. 2 222556990199

[2] I. R. Alonso Patron-perez. A probabilistic framework for recognizing similar actions using spatio-temporal features. In BMVC, 2007. 2

[3] C.-C. Chang and C.-J. Lin. Libsvm: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2(3):27, 2011. 4

[4] P. Doll a´r, V. Rabaud, G. Cottrell, and S. Belongie. Behavior recognition via sparse spatio-temporal features. In Proc. 2nd Joint IEEE Int Visual Surveillance and Performance Evaluation of Tracking and Surveillance Workshop, pages 65–72, 2005. 1, 3, 6

[5] P. Felzenszwalb, D. McAllester, and D. Ramanan. A discriminatively trained, multiscale, deformable part model. In CVPR, pages 1–8, 2008. 2

[6] Y.-G. Jiang, Q. Dai, X. Xue, W. Liu, and C.-W. Ngo. Trajectory-based modeling of human actions with motion reference points. In ECCV, pages 425–438, 2012. 7

[7] Y. Ke, R. Sukthankar, and M. Hebert. Efficient visual event detection using volumetric features. In ICCV, volume 1, pages 166–173, 2005. 2, 3

[8] O. Kliper-Gross, Y. Gurovich, T. Hassner, and L. Wolf. Motion interchange patterns for action recognition in unconstrained videos. In ECCV, pages 256–269, 2012. 7

[9] A. Klser, M. Marszałek, and C. Schmid. A spatio-temporal descriptor based on 3d-gradients. In BMVC, pages 995– 1004, 2008. 3

[10] H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre. Hmdb: A large video database for human motion recognition. In ICCV, pages 2556–2563, 2011. 1, 4, 7

[11] C. H. Lampert, M. B. Blaschko, and T. Hofmann. Beyond sliding windows: Object localization by efficient subwindow search. In CVPR, pages 1–8. IEEE, 2008. 3

[12] I. Laptev and T. Lindeberg. Space-time interest points. In Proc. Ninth IEEE Int Computer Vision Conf, pages 432–439, 2003. 1

[13] I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld. Learning realistic human actions from movies. In CVPR, pages 1–8, 2008. 3, 6

[14] Q. V. Le, W. Y. Zou, S. Y. Yeung, and A. Y. Ng. Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis. In CVPR, pages 3361–3368, 2011. 1

[15] T. Leung and J. Malik. Representing and recognizing the visual appearance of materials using three-dimensional textons. International Journal of Computer Vision, 43:2944, 2001. 2

[16] L. Liu, L. Shao, and P. Rockett. Human action recognition based on boosted feature selection and naive bayes nearest-

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26] neighbor classification. Signal Processing, pages 1521– 1530, 2012. 2 S. Maji, A. C. Berg, and J. Malik. Classification using intersection kernel support vector machines is efficient. In CVPR, pages 1–8. IEEE, 2008. 4 M. Marszalek, I. Laptev, and C. Schmid. Actions in context. In CVPR, pages 2929 – 2936, 2009. 1 S. Mathe and C. Sminchisescu. Dynamic eye movement datasets and learnt saliency models for visual action recognition. In ECCV, pages 842–856, 2012. 2 M. Muja and D. G. Lowe. Fast approximate nearest neighbors with automatic algorithm configuration. In International Conference on Computer Vision Theory and Application (VISSAPP’09), pages 331–340, 2009. 7 E. Nowak, F. Jurie, and B. Triggs. Sampling strategies for bag-of-features image classification. In ECCV (4), pages 490–503, 2006. 1, 2, 3, 5 K. K. Reddy and M. Shah. Recognizing 50 human action categories of web videos. Machine Vision and Applications Journal, pages 1–1 1, September, 2012. 1, 4, 7 S. Sadanand and J. J. Corso. Action bank: A high-level representation of activity in video. In CVPR, pages 1234–1241, 2012. 1, 6, 7 M. Sapienza, F. Cuzzolin, and P. H. T. and. Learning discriminative space-time actions from weakly labelled videos. In ECCV, 2012. 1, 7 C. Sch u¨ldt, I. Laptev, and B. Caputo. Recognizing human actions: A local svm approach. In ICPR (3), pages 32–36, 2004. 1, 4, 6 F. Shi, E. M. Petriu, and A. Cordeiro. Human action recognition from local part model. In Proc. IEEE Int Haptic Audio

[27]

[28]

[29]

[30] [3 1]

[32]

[33]

[34]

[35]

[36]

[37] Visual Environments and Games (HAVE) Workshop, pages 35–38, 2011. 1, 2, 3, 6 B. Solmaz, S. M. Assari, and M. Shah. Classifying web videos using a global video descriptor. Machine Vision and Applications, pages 1–13, 2012. 7 M. J. Swain and D. H. Ballard. Color indexing. International journal of computer vision, 7(1): 11–32, 1991 . 4 E. Vig, M. Dorr, and D. Cox. Space-variant descriptor sampling for action recognition based on saliency and eye movements. In ECCV, pages 84–97, 2012. 2, 4 H. Wang, A. Klaser, C. Schmid, and C.-L. Liu. Action recognition by dense trajectories. In CVPR, pages 3 169–3 176, 2011. 1, 2, 3, 4 H. Wang, A. Klaser, C. Schmid, and C.-L. Liu. Dense trajectories and motion boundary descriptors for action recognition. Technical report, INRIA, 2012. 4, 6, 7 H. Wang, M. M. Ullah, A. Klser, I. Laptev, and C. Schmid. Evaluation of local spatio-temporal features for action recognition. In BMVC, pages 127–137, 2009. 1, 2, 3, 4, 6 G. Willems, T. Tuytelaars, and L. J. V. Gool. An efficient dense and scale-invariant spatio-temporal interest point detector. In ECCV, pages 650–663, 2008. 1, 2, 3, 6 L. Yang, N. Zheng, J. Yang, M. Chen, and H. Chen. A biased sampling strategy for object categorization. In Proc. IEEE 12th Int Computer Vision Conf, pages 1141–1 148, 2009. 2 L. Yeffet and L. Wolf. Local trinary patterns for human action recognition. In ICCV, pages 492–497, 2009. 2 T.-H. Yu, T.-K. Kim, and R. Cipolla. Real-time action recognition by spatiotemporal semantic and structural forests. In BMVC, pages 1–12, 2010. 2 J. Zhang, S. Lazebnik, and C. Schmid. Local features and kernels for classification of texture and object categories: a comprehensive study. International Journal of Computer Vision, 73:2007, 2007. 4, 6 222666000200