jmlr jmlr2012 jmlr2012-50 jmlr2012-50-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Yui Man Lui
Abstract: Action videos are multidimensional data and can be naturally represented as data tensors. While tensor computing is widely used in computer vision, the geometry of tensor space is often ignored. The aim of this paper is to demonstrate the importance of the intrinsic geometry of tensor space which yields a very discriminating structure for action recognition. We characterize data tensors as points on a product manifold and model it statistically using least squares regression. To this aim, we factorize a data tensor relating to each order of the tensor using Higher Order Singular Value Decomposition (HOSVD) and then impose each factorized element on a Grassmann manifold. Furthermore, we account for underlying geometry on manifolds and formulate least squares regression as a composite function. This gives a natural extension from Euclidean space to manifolds. Consequently, classification is performed using geodesic distance on a product manifold where each factor manifold is Grassmannian. Our method exploits appearance and motion without explicitly modeling the shapes and dynamics. We assess the proposed method using three gesture databases, namely the Cambridge hand-gesture, the UMD Keck body-gesture, and the CHALEARN gesture challenge data sets. Experimental results reveal that not only does the proposed method perform well on the standard benchmark data sets, but also it generalizes well on the one-shot-learning gesture challenge. Furthermore, it is based on a simple statistical model and the intrinsic geometry of tensor space. Keywords: gesture recognition, action recognition, Grassmann manifolds, product manifolds, one-shot-learning, kinect data
M. F. Abdelkadera, W. Abd-Almageeda, A. Srivastavab, and R. Chellappa. Gesture and action recognition via modeling trajectories on riemannian manifolds. Computer Vision and Image Understanding, 115(3):439–455, 2011. 3318 H UMAN G ESTURE R ECOGNITION ON P RODUCT M ANIFOLDS P.-A. Absil, R. Mahony, and R. Sepulchre. Riemannian geometry of grassmann manifolds with a view on algorithmic computation. Acta Applicandae Mathematicae, 80(2):199–220, 2004. P.-A Absil, R. Mahony, and R. Sepulchre. Optimization Algorithms on Matrix Manifolds. Princeton University Press, 2008. E. Begelfor and M. Werman. Affine invariance revisited. In IEEE Conference on Computer Vision and Pattern Recognition, New York, 2006. J.G.F. Belinfante and B. Kolman. A Survey of Lie Groups and Lie Algebras with Applications and Computational Methods. SIAM, 1972. P. Bilinski and F. Bremond. Evaluation of local descriptors for action recognition in videos. In ICVS, 2011. A. Bissacco, A. Chiuso, Y. Ma, and S. Soatto. Recognition of human gaits. In IEEE Conference on Computer Vision and Pattern Recognition, Hawaii, pages 270–277, 2001. ˚ A. Bj¨ rck and G.H. Golub. Numerical methods for computing angles between linear subspaces. o Mathematics of Computation, pages 579–594, 1973. CHALEARN. Chalearn gesture dataset (cgd 2011), chalearn, california, 2011. J.H. Conway, R.H. Hardin, and N.J.A. Sloane. Packing lines, planes, etc.: Packings in grassmannian spaces. Experimental Mathematics, 5(2):139–159, 1996. A. Datta, Y. Sheikh, and T. Kanade. Modeling the product manifold of posture and motion. In Workshop on Tracking Humans for the Evaluation of their Motion in Image Sequences (in conjunction with ICCV), 2009. L. De Lathauwer, B. De Moor, and J. Vandewalle. A multilinear singular value decomposition. SIAM J. Matrix Anal. Appl., 21(4):1253–1278, 2000. P. Dollar, V. Rabaud, G. Cottrell, and S. Belongie. Behavior recognition via sparse spatio-temporal features. In IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance (in conjunction with ICCV), 2005. A. Edelman, R. Arias, and S. Smith. The geometry of algorithms with orthogonality constraints. SIAM J. Matrix Anal. Appl., 20(2):303–353, 1998. I. Guyon, V. Athitsos, P. Jangyodsuk, B. Hammer, and H. J. E. Balderas. Chalearn gesture challenge: Design and first results. In CVPR Workshop on Gesture Recognition, 2012. M. T. Harandi, C. Sanderson, A. Wiliem, and B. C. Lovell. Kernel analysis over riemannian manifolds for visual recognition of actions, pedestrians and textures. In WACV, 2012. T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, 2001. Z. Jiang, Z. Lin, and L. Davis. Class consistent k-means: Application to face and action recognition. Computer Vision and Image Understanding, 116(6):730–741, 2012. 3319 L UI H. Karcher. Riemannian center of mass and mollifier smoothing. Comm. Pure Appl. Math., 30(5): 509–541, 1977. D. Kendall. Shape manifolds, procrustean metrics and complex projective spaces. Bull. London Math. Soc., 16:81–121, 1984. T-K. Kim and R. Cipolla. Gesture recognition under small sample size. In Asian Conference on Computer Vision, 2007. T-K. Kim and R. Cipolla. Canonical correlation analysis of video volume tensors for action categorization and detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(8): 1415–1428, 2009. T. G. Kolda and B. W. Bader. Tensor decompositions and applications. SIAM Review, 51(3), September 2009. B. Krausz and C. Bauckhage. Action recognition in videos using nonnegative tensor factorization. In International Conference on Pattern Recognition, 2010. J. Lee. Introduction to Smooth Manifolds. Springer, 2003. V. Levenshtein. Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady, 10:707–710, 1966. R. Li and R. Chellappa. Group motion segmentation using a spatio-temporal driving force model. In IEEE Conference on Computer Vision and Pattern Recognition, 2010. X. Li, W. Hu, Z. Zhang, X. Zhang, and G. Luo. Robust visual tracking based on incremental tensor subspace learning. In IEEE International Conference on Computer Vision, 2007. Z. Lin, Z. Jiang, and L. Davis. Recognizing actions by shape-motion prototype trees. In IEEE International Conference on Computer Vision, 2009. Y. M. Lui. Advances in matrix manifolds for computer vision. Image and Vision Computing, 30 (6-7):380–388, 2012a. Y. M. Lui. Tangent bundles on special manifolds for action recognition. IEEE Transactions on Circuits and Systems for Video Technology, 22(6):930–942, 2012b. Y. M. Lui and J. R. Beveridge. Grassmann registration manifolds for face recognition. In European Conference on Computer Vision, Marseille, France, 2008. Y. M. Lui, J. R. Beveridge, and M. Kirby. Canonical stiefel quotient and its application to generic face recognition in illumination spaces. In IEEE International Conference on Biometrics : Theory, Applications and Systems, Washington, D.C., 2009. Y. M. Lui, J. R. Beveridge, and M. Kirby. Action classification on product manifolds. In IEEE Conference on Computer Vision and Pattern Recognition, San Francisco, 2010. Y. Ma, J. Ko˘eck´ , and S. Sastry. Optimal motion from image sequences: A riemannian viewpoint, s a 1998. Technical Report No. UCB/ERL M98/37, EECS Department, University of California, Berkeley. 3320 H UMAN G ESTURE R ECOGNITION ON P RODUCT M ANIFOLDS S. Mitra and T. Acharya. Gesture recognition: A survey. IEEE Transactions on Systems, Man, Cybernetics - Part C: Applications and Reviews, 37:311–324, 2007. Q. Qiu, Z. Jiang, and R. Chellappa. Sparse dictionary-based representation and recognition of action attributes. In IEEE Conference on Computer Vision and Pattern Recognition, 2011. M. Rodriguez, J. Ahmed, and M. Shah. Action mach: A spatio-temporal maximum average correlation height filter for action recognition. In IEEE Conference on Computer Vision and Pattern Recognition, 2008. P. Saisan, G. Doretto, Y-N. Wu, and S. Soatto. Dynamic texture recognition. In IEEE Conference on Computer Vision and Pattern Recognition, 2001. P. Turaga and R. Chellappa. Locally time-invariant models of human activities using trajectories on the grassmannian. In IEEE Conference on Computer Vision and Pattern Recognition, 2009. P. Turaga, S. Biswas, and R. Chellappa. The role of geometry for age estimation. In IEEE International conference Acoustics, Speech and Signal Processing, 2010. M. A. O. Vasilescu. Human motion signatures: Analysis, synthesis, recognition. In International Conference on Pattern Recognition, Quebec City, Canada, pages 456–460, 2002. M. A. O. Vasilescu and D. Terzopoulos. Multilinear image analysis for facial recognition. In International Conference on Pattern Recognition, Quebec City, Canada, pages 511–514, 2002. A. Veeraraghavan, A. K. Roy-Chowdhury, and R. Chellappa. Matching shape sequences in video with applications in human movement analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, (12):1896–1909, 2005. H. Wang, M. Ullah, A Klaser, I. Laptev, and C. Schmid. Evaulation of local spatio-temporal features for action recognition. In British Machine Vision Conference, 2009. D. Weinland, R. Ronfard, and E. Boyer. Free viewpoint action recognition using motion history volumes. Computer Vision and Image Understanding, 104:249–257, 2006. Y. Yuan, H. Zheng, Z. Li, and D. Zhang. Video action recognition with spatio-temporal graph embedding and spline modeling. In ICASSP, 2010. 3321