iccv iccv2013 iccv2013-86 iccv2013-86-reference knowledge-graph by maker-knowledge-mining

86 iccv-2013-Concurrent Action Detection with Structural Prediction


Source: pdf

Author: Ping Wei, Nanning Zheng, Yibiao Zhao, Song-Chun Zhu

Abstract: Action recognition has often been posed as a classification problem, which assumes that a video sequence only have one action class label and different actions are independent. However, a single human body can perform multiple concurrent actions at the same time, and different actions interact with each other. This paper proposes a concurrent action detection model where the action detection is formulated as a structural prediction problem. In this model, an interval in a video sequence can be described by multiple action labels. An detected action interval is determined both by the unary local detector and the relations with other actions. We use a wavelet feature to represent the action sequence, and design a composite temporal logic descriptor to describe the action relations. The model parameters are trained by structural SVM learning. Given a long video sequence, a sequential decision window search algorithm is designed to detect the actions. Experiments on our new collected concurrent action dataset demonstrate the strength of our method.


reference text

[1] J. F. Allen. Towards a general theory of action and time. Artificial Intelligence, 23(2): 123–154, 1984.

[2] F. R. Bach, G. R. G. Lanckriet, and M. I. Jordan. Multiple kernel learning, conic duality, and the smo algorithm. ICML,

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14] 2004. C. Boutilier and R. I. Brafman. Partial-order planning with concurrent interacting actions. Journal of Artificial Intelligence Research, 14(1): 105–136, 2001 . W. Chen and S.-F. Chang. Motion trajectory matching of video objects. In SPIE Proceedings of Storage and Retrieval for Media Databases, 2000. C. Desai, D. Ramanan, and C. C. Fowlkes. Discriminative models for multi-class object layout. International Journal of Computer Vision, 95(1): 1–12, 2011. M. Hoai and F. De la Torre. Max-margin early event detectors. In CVPR, 2012. T. Joachims, T. Finley, and C.-N. J. Yu. Cutting-plane training of structural svms. Machine Learning, 77(1):27–59, 2009. M. M ¨uller and T. R ¨oder. Motion templates for automatic classification and retrieval of motion capture data. In ACM SIGGRAPH/Eurographics symposium on Computer animation, 2006. M. Pei, Y. Jia, and S.-C. Zhu. Parsing video events with goal inference and intent prediction. In ICCV, 2011. C. S. Pinhanez and A. F. Bobick. Human action detection using pnf propagation of temporal constraints. In CVPR, 1998. K. Quennesson, E. Ioup, and C. L. Isbell. Wavelet statistics for human motion classification. In AAAI, 2006. K. Rohanimanesh and S. Mahadevan. Learning to take concurrent actions. In NIPS, 2002. Y. Shi, Y. Huang, D. Minnen, A. F. Bobick, and I. A. Essa. Propagation networks for recognition of partially ordered sequential action. In CVPR, 2004. J. Shotton, A. W. Fitzgibbon, M. Cook, T. Sharp, M. Finoc-

[15]

[16]

[17]

[18]

[19]

[20] chio, R. Moore, A. Kipman, and A. Blake. Real-time human pose recognition in parts from single depth images. In CVPR, 2011. S. Sonnenburg, G. R ¨atsch, C. Sch a¨fer, and B. Sch o¨lkopf. Large scale multiple kernel learning. Journal of Machine Learning Research, 7: 153 1–1565, 2006. K. Tang, L. Fei-Fei, and D. Koller. Learning latent temporal structure for complex event detection. In CVPR, 2012. I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun. Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research, 6: 1453–1484, 2005. J. Wang, Z. Liu, Y. Wu, and J. Yuan. Mining actionlet ensemble for action recognition with depth cameras. In CVPR, 2012. P. Wei, Y. Zhao, N. Zheng, and S.-C. Zhu. Modeling 4d human-object interactions for event and object recognition. In ICCV, 2013. J. Yuan, Z. Liu, and Y. Wu. Discriminative subvolume search for efficient action detection. In CVPR, 2009. 33 113436