iccv iccv2013 iccv2013-244 iccv2013-244-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Jingjing Zheng, Zhuolin Jiang
Abstract: We present an approach to jointly learn a set of viewspecific dictionaries and a common dictionary for crossview action recognition. The set of view-specific dictionaries is learned for specific views while the common dictionary is shared across different views. Our approach represents videos in each view using both the corresponding view-specific dictionary and the common dictionary. More importantly, it encourages the set of videos taken from different views of the same action to have similar sparse representations. In this way, we can align view-specific features in the sparse feature spaces spanned by the viewspecific dictionary set and transfer the view-shared features in the sparse feature space spanned by the common dictionary. Meanwhile, the incoherence between the common dictionary and the view-specific dictionary set enables us to exploit the discrimination information encoded in viewspecific features and view-shared features separately. In addition, the learned common dictionary not only has the capability to represent actions from unseen views, but also , makes our approach effective in a semi-supervised setting where no correspondence videos exist and only a few labels exist in the target view. Extensive experiments using the multi-view IXMAS dataset demonstrate that our approach outperforms many recent approaches for cross-view action recognition.
[1] L. T. Alessandro Bergamo. Exploiting weakly-labeled web images to improve object classification: a domain adaptation approach. In NIPS, 2010. 5, 7
[2] M. Blank, L. Gorelick, E. Shechtman, M. Irani, and R. Basri. Actions as space-time shapes. In ICCV, 2005. 1
[3] G. K. M. Cheung, S. Baker, and T. Kanade. Shape-fromsilhouette of articulated objects and its use for human body kinematics estimation and motion capture. In CVPR, 2003. 1
[4] P. Doll a´r, V. Rabaud, G. Cottrell, and S. Belongie. Behavior recognition via sparse spatio-temporal features. In VS-PETS. 4
[5] A. A. Efros, A. C. Berg, G. Mori, and J. Malik. Recognizing action at a distance. In ICCV, 2003. 1
[6] A. Farhadi and M. K. Tabrizi. Learning to recognize activities from the wrong view point. In ECCV, 2008. 1, 2, 5
[7] A. Farhadi, M. K. Tabrizi, I. Endres, and D. A. Forsyth. A latent model of discriminative aspect. In ICCV, 2009. 1, 6
[8] C.-H. Huang, Y.-R. Yeh, and Y.-C. F. Wang. Recognizing actions across cameras by exploring the correlated subspace. In ECCV Workshops, 2012. 1, 2
[9] Z. Jiang, Z. Lin, and L. S. Davis. Learning a discriminative dictionary for sparse coding via label consistent k-svd. In CVPR, 2011. 3
[10] I. N. Junejo, E. Dexter, I. Laptev, and P. P ´erez. View-independent action recognition from temporal selfsimilarities. IEEE Trans. Pattern Anal. Mach. Intell. 2
[11] I. N. Junejo, E. Dexter, I. Laptev, and P. P ´erez. Cross-view action recognition from temporal self-similarities. In ECCV, 2008. 2, 6, 7
[12] S. Kong and D. Wang. A dictionary learning approach for classification: Separating the particularity and the commonality. In ECCV, 2012. 4
[13] I. Laptev and T. Lindeberg. Space-time interest points. In ICCV, 2003. 1
[14] B. Li, O. I. Camps, and M. Sznaier. Cross-view activity recognition using hankelets. In CVPR, 2012. 2
[15] R. Li and T. Zickler. Discriminative virtual views for crossview action recognition. In CVPR, 2012. 2, 5, 6, 7, 8
[16] Z. Lin, Z. Jiang, and L. S. Davis. Recognizing actions by shape-motion prototype trees. In ICCV, 2009. 1, 5
[17] J. Little and J. E. Boyd. Recognizing people by their gait:
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
[28] The shape of motion. Videre, 1996. 1 J. Liu, S. Ji, and J. Ye. SLEP: Sparse Learning with Efficient Projections. Arizona State University, 2009. 4, 5 J. Liu and M. Shah. Learning human actions via information maximization. In CVPR, 2008. 1, 7 J. Liu, M. Shah, B. Kuipers, and S. Savarese. Cross-view action recognition via view knowledge transfer. In CVPR, 2011. 1, 2, 4, 5, 6, 7 F. Lv and R. Nevatia. Single view human action recognition using key pose matching and viterbi path searching. In CVPR, 2007. 1 V. Parameswaran and R. Chellappa. Human actionrecognition using mutual invariants. CVIU, 2005. 2 V. Parameswaran and R. Chellappa. View invariance for human action recognition. IJCV, 2006. 2 D.-S. Pham and S. Venkatesh. Joint learning and dictionary construction for pattern recognition. In CVPR, 2008. 4 C. Rao, A. Yilmaz, and M. Shah. View-invariant representation and recognition of actions. IJCV, 2002. 2 D. Tran and A. Sorokin. Human activity recognition with metric learning. In ECCV, 2008. 5 A. ul Haq, I. Gondal, and M. Murshed. On dynamic scene geometry for view-invariant action matching. In CVPR, 2011. 2 D. Weinland, E. Boyer, and R. Ronfard. Action recognition from arbitrary views using 3d exemplars. In ICCV, 2007. 2, 4
[29] D. Weinland, M. O¨zuysal, and P. Fua. Making action recognition robust to occlusions and viewpoint changes. In ECCV, 2010. 2, 7
[30] J. Wright, A. Y. Yang, A. Ganesh, S. S. Sastry, and Y. Ma. Robust face recognition via sparse representation. IEEE Trans. Pattern Anal. Mach. Intell., 2009. 3, 7 [3 1] P. Yan, S. M. Khan, and M. Shah. Learning 4d action feature models for arbitrary view action recognition. In CVPR, 2008. 2
[32] A. Yilmaz and M. Shah. Actions sketch: A novel action representation. In CVPR, 2005. 1
[33] J. Zheng, Z. Jiang, J. Phillips, and R. Chellappa. Crossview action recognition via a transferable dictionary pair. In BMVC, 2012. 1, 2, 7 33 117836