iccv iccv2013 iccv2013-99 iccv2013-99-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Xinxiao Wu, Han Wang, Cuiwei Liu, Yunde Jia
Abstract: In cross-view action recognition, “what you saw” in one view is different from “what you recognize ” in another view. The data distribution even the feature space can change from one view to another due to the appearance and motion of actions drastically vary across different views. In this paper, we address the problem of transferring action models learned in one view (source view) to another different view (target view), where action instances from these two views are represented by heterogeneous features. A novel learning method, called Heterogeneous Transfer Discriminantanalysis of Canonical Correlations (HTDCC), is proposed to learn a discriminative common feature space for linking source and target views to transfer knowledge between them. Two projection matrices that respectively map data from source and target views into the common space are optimized via simultaneously minimizing the canonical correlations of inter-class samples and maximizing the intraclass canonical correlations. Our model is neither restricted to corresponding action instances in the two views nor restricted to the same type of feature, and can handle only a few or even no labeled samples available in the target view. To reduce the data distribution mismatch between the source and target views in the commonfeature space, a nonparametric criterion is included in the objective function. We additionally propose a joint weight learning method to fuse multiple source-view action classifiers for recognition in the target view. Different combination weights are assigned to different source views, with each weight presenting how contributive the corresponding source view is to the target view. The proposed method is evaluated on the IXMAS multi-view dataset and achieves promising results.
[1] L. Duan, D. Xu, and I. Tsang. Learning with augmented features for heterogeneous domain adaptation. In ICML, 2012.
[2] A. Farhadi and M. Tabrizi. Learning to recognize activities from the wrong view point. In ECCV, 2008.
[3] J.Liu and M. Shah. Learning human actions via information maximization. In CVPR, 2008.
[4] I. Junejo, E. Dexter, I. Laptev, and P. Perez. Viewindependent action recognition from temporal selfsimilarities. IEEE T-PAMI, 33(1): 172–185, 2011.
[5] T. Kim, J. Kittler, and R. Cipolla. Discriminative learning and recognition of image set classes using canonical correlations. IEEE T-PAMI, 29(6): 1005–1018, 2007.
[6] B. Kulis, K. Saenko, and T. Darrell. What you saw is not what you get: Domain adaptation using asymmetric kernel transforms. In CVPR, 2011.
[7] M. Lewandowski, D. Makris, and J. Nebel. View and styleindependent action manifolds for human activity recognition. In ECCV, 2010.
[8]
[9]
[10]
[11] 615 R. Li and T. Zickler. Discriminative virtual views for crossview action recognition. In CVPR, 2012. J. Liu, M. Shahz, B. Kuipersy, and S. Savarese. Cross-view action recognition via view knowledge transfer. In CVPR, 2011. J. Shawe-Taylor and N. Cristianini. Kernel methods for pattern analysis. In Cambridge University Press, 2004. Y. Shen and H. Foroosh. View-invariant action recognition using fundamental ratios. In CVPR, 2008.
[12] X. Shi, Q. Liu, W. Fan, P. Yu, and R. Zhu. Transfer learning on heterogeneous feature spaces via spectral transformation. In ICDM, 2010.
[13] C. Wang and S. Mahadevan. Heterogeneous domain adaptation using manifold alignment. In IJCAI, 2011.
[14] D. Weinland, E. Boyer, and R. Ronfard. Action recognition from arbitrary views using 3d exemplars. In ICCV, 2007.
[15] X. Wu and Y. Jia. View-invariant action recognition using latent kernelized structural svm. In ECCV, 2012.
[16] X. Wu, C. Liu, and Y. Jia. Transfer discriminant-analysis of canonical correlations for view-transfer action recognition. In PCM, 2012.
[17] X.Wu, D. Xu, L. Duan, and J. Luo. Action recognition using context and appearance distribution features. In CVPR, 2011.
[18] P. Yan, S. Khan, and M. Sha. Learning 4d action feature models for arbitrary view action recognition. In CVPR, 2008.
[19] A. Yilmaz and M. Shah. Recognizing human actions in videos acquired by uncalibrated moving cameras. In ICCV, 2005.
[20] J. Zheng, Z.Jiang, P. Philips, and R. Chellappa. Cross-view action recognition via a transferable dictionary pair. In BMVC, 2012. Table 3. Comparison of different multiple source views fusion methods on the recognition accuracy for each target view. MethodsTarget view1Target view2Target view3Target view4Target view5Average λ Olu = r=mλ0 euth=od05 450967. 3096% % 5 50 67. 2076% % 45 6 6. 5 29% % 6 6308 . 2408% % 34 320 3. 3 6% % 5 4 5768. 5936% % 0000.2.3.1.40 0000.2.3.1.40 0000.2.3.1.40 0000.2.3.1.40 0000.2.3.1.40 Figure 1. Examples of the learned combination weights of multiple source views. For each target view, its classifiers are constructed by the combination of transferred four source views based on the weights shown by vertical axis of histograms. Figure 2. Recognition performance of multiple source views fusion on each action class. 616