cvpr cvpr2013 cvpr2013-233 cvpr2013-233-reference knowledge-graph by maker-knowledge-mining

233 cvpr-2013-Joint Sparsity-Based Representation and Analysis of Unconstrained Activities


Source: pdf

Author: Raghuraman Gopalan

Abstract: While the notion of joint sparsity in understanding common and innovative components of a multi-receiver signal ensemble has been well studied, we investigate the utility of such joint sparse models in representing information contained in a single video signal. By decomposing the content of a video sequence into that observed by multiple spatially and/or temporally distributed receivers, we first recover a collection of common and innovative components pertaining to individual videos. We then present modeling strategies based on subspace-driven manifold metrics to characterize patterns among these components, across other videos in the system, to perform subsequent video analysis. We demonstrate the efficacy of our approach for activity classification and clustering by reporting competitive results on standard datasets such as, HMDB, UCF-50, Olympic Sports and KTH.


reference text

[1] Ucf-50. http://server.cs.ucf.edu/ vision/data/ucf50.rar. 1, 5

[2] D. Baron, M. Duarte, M. Wakin, S. Sarvotham, and R. Baraniuk. Distributed compressive sensing. arXiv preprint arXiv:0901.3403, 2009. 1, 2, 3, 4

[3] A. Bobick and J. Davis. The recognition of human movement using temporal templates. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 23(3):257–267, 2001. 1 222777444422

[4] W. Brendel and S. Todorovic. Learning spatiotemporal graphs of human tivities. In Computer Vision (ICCV), 2011 IEEE International Conference pages 778–785. IEEE, 2011. 6 acon,

[5] L. Cao, Y. Mu, A. Natsev, S. Chang, G. Hua, and J. Smith. Scene aligned pooling for complex video recognition. In ECCV, 2012. 1, 2, 6

[6] Y. Chen, N. Nasrabadi, and T. Tran. Simultaneousjoint sparsity model for target detection in hyperspectral imagery. Geoscience and Remote Sensing Letters, IEEE, 8(4):676–680, 2011. 2

[7] Y. Chikuse. Statistics on special manifolds. Springer Verlag, 2003. 4

[8] L. Duan, D. Xu, and S. Chang. Exploiting web images for event recognition in consumer videos: A multiple source domain adaptation approach. In Computer Vision andPattern Recognition (CVPR), 2012 IEEE Conference on, pages 1338–1345. IEEE, 2012. 2

[9] L. Duan, D. Xu, I. Tsang, and J. Luo. Visual event recognition in videos by learning from web data. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 34(9): 1667–1680, 2012. 2

[10] A. Edelman, T. Arias, and S. Smith. The geometry of algorithms with orthogonality constraints. SIAM Journal on Matrix Analysis and Applications, 20(2):303–353, 1998. 4

[11] Y. Fu, T. Hospedales, T. Xiang, and S. Gong. Attribute learning for understanding unstructured social activity. In European Conference on Computer Vision, 2012. 1, 2

[12] A. Gilbert, J. Illingworth, and R. Bowden. Fast realistic multi-action recognition using mined dense spatio-temporal features. In Computer Vision, 2009 IEEE 12th International Conference on, pages 925–931. IEEE, 2009. 6

[13] L. Gorelick, M. Blank, E. Shechtman, M. Irani, and R. Basri. Actions as spacetime shapes. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 29(12):2247–2253, 2007. 1

[14] J. Hamm and D. Lee. Grassmann discriminant analysis: a unifying view on subspace-based learning. In Proceedings of the 25th international conference on Machine learning, pages 376–383. ACM, 2008. 4

[15] Z. Han, Z. Xu, and S. Zhu. Video primal sketch: A generic middle-level representation of video. In Computer Vision (ICCV), 2011 IEEE International Conference on, pages 1283–1290. IEEE, 201 1. 2

[16] M. Hoai and F. De la Torre. Max-margin early event detectors. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages 2863– 2870. IEEE, 2012. 2

[17] H. Jhuang, T. Serre, L. Wolf, and T. Poggio. A biologically inspired system for action recognition. In Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on, pages 1–8. IEEE, 2007. 6

[18] H. Karcher. Riemannian center of mass and mollifier smoothing. Communications on pure and applied mathematics, 30(5):509–541, 1977. 4

[19] O. Kliper-Gross, Y. Gurovich, T. Hassner, and L. Wolf. Motion interchange patterns for action recognition in unconstrained videos. In ECCV, 2012. 2, 5, 6

[20] A. Kovashka and K. Grauman. Learning a hierarchy of discriminative space-

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30] [3 1]

[32] time neighborhood features for human action recognition. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pages 2046–2053. IEEE, 2010. 1, 2, 6 H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre. HMDB: a large video database for human motion recognition. In Proceedings of the International Conference on Computer Vision (ICCV), 2011. 1, 5, 6 I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld. Learning realistic human actions from movies. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, pages 1–8. IEEE, 2008. 1, 2, 3, 5, 6 R. Li and T. Zickler. Discriminative virtual views for cross-view action recognition. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages 2855–2862. IEEE, 2012. 2 J. Liu, B. Kuipers, and S. Savarese. Recognizing human actions by attributes. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pages 3337–3344. IEEE, 2011. 1 J. Liu, J. Luo, and M. Shah. Recognizing realistic actions from videos in the wild. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pages 1996–2003. IEEE, 2009. 2 J. Liu, S. McCloskey, and Y. Liu. Local expert forest of score fusion for video event classification. Computer Vision–ECCV 2012, pages 397–410, 2012. 1, 2 P. Nagesh and B. Li. A compressive sensing approach for expression-invariant face recognition. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pages 1518–1525. IEEE, 2009. 2 J. Niebles, C. Chen, and L. Fei-Fei. Modeling temporal structure of decomposable motion segments for activity classification. Computer Vision–ECCV 2010, pages 392–405, 2010. 5, 6 S. Oh, A. Hoogs, A. Perera, N. Cuntoor, C. Chen, J. Lee, S. Mukherjee, J. Aggarwal, H. Lee, L. Davis, et al. A large-scale benchmark dataset for event recognition in surveillance video. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pages 3153–3160. IEEE, 2011. 1, 2 H. Pirsiavash and D. Ramanan. Detecting activities of daily living in firstperson camera views. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages 2847–2854. IEEE, 2012. 2 Q. Qiu, Z. Jiang, and R. Chellappa. Sparse dictionary-based representation and recognition of action attributes. In Computer Vision (ICCV), 2011 IEEE International Conference on, pages 707–714. IEEE, 2011. 2 C. Rao and M. Shah. View-invariance in action recognition. In Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on, volume 2, pages II–3 16. IEEE, 2001 . 1

[33] M. Raptis, I. Kokkinos, and S. Soatto. Discovering discriminative action parts from mid-level video representations. In Computer Vision and Pattern Recog-

[34]

[35]

[36]

[37]

[38] nition (CVPR), 2012 IEEE Conference on, pages 1242–1249. IEEE, 2012. 2 M. Rohrbach, M. Regneri, M. Andriluka, S. Amin, M. Pinkal, and B. Schiele. Script data for attribute-based recognition of composite activities. Computer Vision–ECCV 2012, pages 144–157, 2012. 2 S. Sadanand and J. Corso. Action bank: A high-level representation of activity in video. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages 1234–1241 . IEEE, 2012. 2, 5, 6 S. Savarese, A. DelPozo, J. Niebles, and L. Fei-Fei. Spatial-temporal correlatons for unsupervised action classification. In Motion and video Computing, 2008. WMVC 2008. IEEE Workshop on, pages 1–8. IEEE, 2008. 1 C. Schuldt, I. Laptev, and B. Caputo. Recognizing human actions: A local svm approach. In Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on, volume 3, pages 32–36. IEEE, 2004. 1, 5, 6 A. Tamrakar, S. Ali, Q. Yu, J. Liu, O. Javed, A. Divakaran, H. Cheng, and H. Sawhney. Evaluation of low-level features and their combinations for complex event detection in open source videos. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages 3681–3688. IEEE, 2012. 2

[39] P. Turaga, R. Chellappa, V. Subrahmanian, and O. Udrea. Machine recognition of human activities: A survey. Circuits and Systems for Video Technology, IEEE Transactions on, 18(1 1): 1473–1488, 2008. 1, 2

[40] P. Turaga, A. Veeraraghavan, A. Srivastava, and R. Chellappa. Statistical computations on grassmann and stiefel manifolds for image and video-based recognition. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 33(1 1):2273–2286, 2011. 4

[41] S. Vitaladevuni, P. Natarajan, and R. Prasad. Efficient orthogonal matching pursuit using sparse random projections for scene and video classification. In Computer Vision (ICCV), 2011 IEEEInternational Conference on, pages 23 12– 2319. IEEE, 2011. 2

[42] H. Wang, A. Klaser, C. Schmid, and C. Liu. Action recognition by dense trajectories. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pages 3169–3176. IEEE, 201 1. 1, 2

[43] H. Wang, A. Kl¨ aser, C. Schmid, C. Liu, et al. Dense trajectories and motion boundary descriptors for action recognition. In Research report, INRIA, 2012. 5, 6

[44] X. Wu, D. Xu, L. Duan, and J. Luo. Action recognition using context and appearance distribution features. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pages 489–496. IEEE, 2011. 6

[45] L. Xu, J. Neufeld, B. Larson, and D. Schuurmans. Maximum margin clustering. Advances in neural information processing systems, 17: 1537–1544, 2004. 5

[46] W. Xu and B. Hassibi. Compressed sensing over the grassmann manifold: A unified analytical framework. In Communication, Control, and Computing,

[47]

[48]

[49]

[50] 2008 46th Annual Allerton Conference on, pages 562–567. IEEE, 2008. 4 H. Yin and S. Li. Multimodal image fusion with joint sparsity model. Optical Engineering, 50(6):067007–067007, 2011. 2 X. Yuan and S. Yan. Visual classification with multi-task joint sparse representation. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pages 3493–3500. IEEE, 2010. 2 Q. Zhang and B. Li. Joint sparsity model with matrix completion for an ensemble of face images. In Image Processing (ICIP), 2010 17th IEEE International Conference on, pages 1665–1668. IEEE, 2010. 2 B. Zhou, X. Wang, and X. Tang. Understanding collective crowd behaviors: Learning a mixture model of dynamic pedestrian-agents. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages 2871–2878. IEEE, 2012. 2 222777444533