iccv iccv2013 iccv2013-260 iccv2013-260-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Bingbing Ni, Pierre Moulin
Abstract: We aim to unsupervisedly discover human’s action (motion) patterns of manipulating various objects in scenarios such as assisted living. We are motivated by two key observations. First, large variation exists in motion patterns associated with various types of objects being manipulated, thus manually defining motion primitives is infeasible. Second, some motion patterns are shared among different objects being manipulated while others are object specific. We therefore propose a nonparametric Bayesian method that adopts a hierarchical Dirichlet process prior to learn representative manipulation (motion) patterns in an unsupervised manner. Taking easy-to-obtain object detection score maps and dense motion trajectories as inputs, the proposed probabilistic model can discover motion pattern groups associated with different types of objects being manipulated with a shared manipulation pattern dictionary. The size of the learned dictionary is automatically inferred. Com- prehensive experiments on two assisted living benchmarks and a cooking motion dataset demonstrate superiority of our learned manipulation pattern dictionary in representing manipulation actions for recognition.
[1] http://www.murase.m.is.nagoya-u.ac.jp/kscgr/index.html.
[2] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In CVPR, 2005.
[3] P. Dollar, V. Rabaud, G. Cottrell, and S. Belongie. Behavior recognition via sparse spatio-temporal features. In VS-PETS, pages 65–72, 2005.
[4] R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. Liblinear: A library for large linear classification. JMLR, 9: 1871–1874, 2008.
[5] J. Gall and V. Lempitsky. Class-specific hough forests for object detection. In CVPR, pages 1022–1029, 2009.
[6] A. Gupta and L. Davis. Objects in action: An approach for combining action understanding and object perception. In CVPR, pages 1–8, 2007.
[7] H. Kjellstr o¨m, J. Romero, D. Mart ı´nez, and D. Kragi ´c. Simultaneous visual recognition of manipulation actions and manipulated objects. In ECCV, pages 336–349, 2008.
[8] V. Kyrki, I. Vicente, D. Kragic, and J.-O. Eklundh. Action recognition and understanding using motor primitives. In International Symposium on Robot and Human interactive Communication, pages 1113–1 118, 2007.
[9] I. Laptev and T. Lindeberg. Space-time interest points. In ICCV, 2003.
[10] R. Messing, C. Pal, and H. Kautz. Activity recognition using the velocity histories of tracked keypoints. In ICCV, pages 104–1 11, 2009.
[11] D. Moore, I. Essa, and M. Hayes. Exploiting human actions and object context for recognition tasks. In ICCV, Corfu, Greece, 1999.
[12] J. C. Niebles, H. Wang, and L. Fei-fei. Unsupervised learning of human action categories using spatial-temporal words. In BMVC, 2006.
[13] B. Packer and D. Koller. A combined pose, object, and feature model for action understanding. In CVPR, 2012.
[14] Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei. Hierarchical dirichlet processes. Journal of the American Statistical Association, 101, 2004.
[15] H. Wang, A. Kl¨ aser, C. Schmid, and L. Cheng-Lin. Action recognition by dense trajectories. In CVPR, pages 3 169–3 176, 2011.
[16] J. Wang, Z. Chen, and Y. Wu. Action recognition with multiscale spatiotemporal contexts. In CVPR, pages 3185–3 192, 2011.
[17] J. Wang, Z. Liu, Y. Wu, and J. Yuan. Mining actionlet ensemble for action recognition with depth cameras. In CVPR, pages 1290–1297, 2012.
[18] X. Wang, X. Ma, and W. Grimson. Unsupervised activity perception in crowded and complicated scenes using hierarchical bayesian models. T-PAMI, 31(3):539–555, 2009.
[19] S. Yan, X. Zhou, M. Liu, M. Hasegawa-Johnson, and T. Huang. Regression from patch-kernel. In CVPR, pages 1–8, 2008.
[20] Y. Yang, C. Fermuller, and Y. Aloimonos. Detection of manipulation action consequences (mac). In CVPR, 2013.
[21] B. Yao, A. Khosla, and L. Fei-fei. Classifying actions and measuring action similarity by modeling the mutual context of objects and human poses. In ICML, 2011. 1368