cvpr cvpr2013 cvpr2013-347 cvpr2013-347-reference knowledge-graph by maker-knowledge-mining

347 cvpr-2013-Recognize Human Activities from Partially Observed Videos

Source: pdf

Author: Yu Cao, Daniel Barrett, Andrei Barbu, Siddharth Narayanaswamy, Haonan Yu, Aaron Michaux, Yuewei Lin, Sven Dickinson, Jeffrey Mark Siskind, Song Wang

Abstract: Recognizing human activities in partially observed videos is a challengingproblem and has many practical applications. When the unobserved subsequence is at the end of the video, the problem is reduced to activity prediction from unfinished activity streaming, which has been studied by many researchers. However, in the general case, an unobserved subsequence may occur at any time by yielding a temporal gap in the video. In this paper, we propose a new method that can recognize human activities from partially observed videos in the general case. Specifically, we formulate the problem into a probabilistic framework: 1) dividing each activity into multiple ordered temporal segments, 2) using spatiotemporal features of the training video samples in each segment as bases and applying sparse coding (SC) to derive the activity likelihood of the test video sample at each segment, and 3) finally combining the likelihood at each segment to achieve a global posterior for the activities. We further extend the proposed method to include more bases that correspond to a mixture of segments with different temporal lengths (MSSC), which can better rep- resent the activities with large intra-class variations. We evaluate the proposed methods (SC and MSSC) on various real videos. We also evaluate the proposed methods on two special cases: 1) activity prediction where the unobserved subsequence is at the end of the video, and 2) human activity recognition on fully observed videos. Experimental results show that the proposed methods outperform existing state-of-the-art comparison methods.

reference text

[1] M. Blank, L. Gorelick, E. Shechtman, M. Irani, and R. Basri. Actions as space-time shapes. In ICCV, pages 1395–1402, 2005. 1

[2] E. Cand e´s and J. Romberg. L-1 magic package. http://users.ece. gatech.edu/∼justin/l1magic/downloads/l1magic-1. 11.zip. 4

[3] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In CVPR, pages 886–893, 2005. 1

[4] DARPA. Video Dataset from DARPA Mind’s Eye Program. http:// www.visint.org, 2011. 5

[5] P. Doll a´r, V. Rabaud, G. Cottrell, and S. Belongie. Behavior recognition via sparse spatio-temporal features. In International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pages 65–72, 2005. 1, 4

[6] M. Hoai and F. De la Torre. Max-margin early event detectors. In CVPR, pages 2863–2870, 2012. 2, 4, 5

[7] H. Jhuang, T. Serre, L. Wolf, and T. Poggio. A biologically inspired system for action recognition. In ICCV, pages 1–8, 2007. 1, 4

[8] K. Kitani, B. D. Ziebart, J. A. D. Bagnell, and M. Hebert. Activity forecasting. In ECCV, pages 201–214, 2012. 2

[9] A. Kl¨ aser, M. Marszałek, and C. Schmid. A spatio-temporal descriptor based on 3D-gradients. In BMVC, pages 99. 1–99. 10, 2008. 1

[10] I. Laptev and T. Lindeberg. Space-time interest points. In ICCV, pages 432–439, 2003. 1

[11] I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld. Learning realistic human actions from movies. In CVPR, pages 1–8, 2008. 1

[12] D. G. Lowe. Distinctive image features from scale-invariant keypoints. IJCV, 60(2):91–1 10, 2004. 1

[13] J. Niebles, H. Wang, and F.-F. Li. Unsupervised learning of human action categories using spatial-temporal words. IJCV, 79(3):299– 318, 2008. 1

[14] M. S. Ryoo. Human activity prediction: Early recognition of ongoing activities from streaming videos. In ICCV, pages 1036–1043, 2011. 1, 2, 3, 4, 5

[15] M. S. Ryoo and J. K. Aggarwal. Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities. In ICCV, pages 1593–1600, 2009. 1

[16] M. S. Ryoo and J. K. Aggarwal. UT-Interaction Dataset, ICPR contest on Semantic Description of Human Activities (SDHA). http://cvrc.ece.utexas.edu/SDHA2010/Human Interaction.html, 2010. 5

[17] S. Sadanand and J. J. Corso. Action bank: A high-level representation of activity in video. In CVPR, 2012. 4

[18] C. Schuldt, I. Laptev, and B. Caputo. Recognizing human actions: A local svm approach. In ICPR, pages 32–36, 2004. 1

[19] P. Scovanner, S. Ali, and M. Shah. A 3-Dimensional SIFT descriptor and its application to action recognition. In Proceedings of the 15th International Conference on Multimedia, pages 357–360, 2007. 1

[20] D. Waltisberg, A. Yao, J. Gall, and L. V. Gool. Variations of a houghvoting action recognition system. In ICPR, 2010. 1

[21] G. Willems, T. Tuytelaars, and L. Gool. An efficient dense and scaleinvariant spatio-temporal interest point detector. In ECCV, pages 650–663, 2008. 1

[22] S. Wong and R. Cipolla. Extracting spatiotemporal interest points using global information. In ICCV, pages 1–8, 2007. 1

[23] A. Y. Yang, A. Ganesh, Z. Zhou, A. Wagner, V. Shia, S. Sastry, and Y. Ma. L-1 Benchmark package. http://www.eecs.berkeley.edu/ ∼yang/software/l1benchmark/l1benchmark.zip. 4

[24] T. Yu, T. Kim, and R. Cipolla. Real-time action recognition by spatiotemporal semantic and structural forest. In BMVC, pages 52. 1 52.12, 2010. 1 222666666533