iccv iccv2013 iccv2013-249 iccv2013-249-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Qiang Zhou, Gang Wang, Kui Jia, Qi Zhao
Abstract: Sharing knowledge for multiple related machine learning tasks is an effective strategy to improve the generalization performance. In this paper, we investigate knowledge sharing across categories for action recognition in videos. The motivation is that many action categories are related, where common motion pattern are shared among them (e.g. diving and high jump share the jump motion). We propose a new multi-task learning method to learn latent tasks shared across categories, and reconstruct a classifier for each category from these latent tasks. Compared to previous methods, our approach has two advantages: (1) The learned latent tasks correspond to basic motionpatterns instead offull actions, thus enhancing discrimination power of the classifiers. (2) Categories are selected to share information with a sparsity regularizer, avoidingfalselyforcing all categories to share knowledge. Experimental results on multiplepublic data sets show that the proposed approach can effectively transfer knowledge between different action categories to improve the performance of conventional single task learning methods.
[1] A. Argyriou, T. Evgeniou, and M. Pontil. Convex multi-task feature learning. Machine Learning, 73(3):243–272, 2008.
[2] A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sciences, 2(1):183–202, 2009.
[3] L. Cao, Z. Liu, and T. S. Huang. Cross-dataset action detection. In Proc. CVPR, 2010.
[4] R. Caruana. Multitask learning. Machine Learning, 28(1):41–75, 1997.
[5] I. Endres, V. Srikumar, M.-W. Chang, and D. Hoiem. Learning shared body plans. In Proc. CVPR, 2012.
[6] T. Evgeniou and M. Pontil. Regularized multi–task learning. In Proc. KDD, 2004.
[7] P. F. Felzenszwalb, R. B. Girshick, D. A. McAllester, and D. Ramanan. Object detection with discriminatively trained part-based models. IEEE Trans. PAMI, 32(9):1627–1645, 2010.
[8] Z. Harchaoui, M. Douze, M. Paulin, M. Dud ı´k, and J. Malick. Largescale image classification with trace-norm regularization. In Proc. CVPR, 2012.
[9] S. Kim and E. P. Xing. Tree-guided group lasso for multi-task regression with structured sparsity. In Proc. ICML, 2010.
[10] O. Kliper-Gross, Y. Gurovich, T. Hassner, and L. Wolf. Motion interchange patterns for action recognition in unconstrained videos. In Proc. ECCV, 2012.
[11] A. Kumar and H. D. III. Learning task grouping and overlap in multitask learning. In Proc. ICML, 2012.
[12] T. Lan, Y. Wang, W. Yang, S. N. Robinovitch, and G. Mori. Discriminative latent models for recognizing contextual group activities. IEEE Trans. PAMI, 34(8): 1549–1562, 2012.
[13] I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld. Learning realistic human actions from movies. In Proc. CVPR, 2008.
[14] J. Liu, B. Kuipers, and S. Savarese. Recognizing human actions by attributes. In Proc. CVPR, 2011.
[15] M. Marszalek, I. Laptev, and C. Schmid. Actions in context. In Proc. CVPR, 2009.
[16] B. Ni, S. Yan, and A. A. Kassim. Recognizing human group activities with localized causalities. In Proc. CVPR, 2009.
[17] J. C. Niebles, C.-W. Chen, and F.-F. Li. Modeling temporal structure of decomposable motion segments for activity classification. In Proc. ECCV, 2010.
[18] P. Ott and M. Everingham. Shared parts for deformable part-based models. In Proc. CVPR, 2011.
[19] H. Pirsiavash and D. Ramanan. Steerable part models. In Proc. CVPR, 2012.
[20] H. Pirsiavash, D. Ramanan, and C. Fowlkes. Bilinear classifiers for
[21]
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]
[30]
[31]
[32]
[33] visual recognition. In Proc. NIPS, 2009. K. K. Reddy and M. Shah. Recognizing 50 human action categories of web videos. Machine Vision and Applications, 2012. S. Sadanand and J. J. Corso. Action bank: A high-level representation of activity in video. In Proc. CVPR, 2012. R. Salakhutdinov, A. Torralba, and J. B. Tenenbaum. Learning to share visual appearance for multiclass object detection. In Proc. CVPR, 2011. H. O. Song, S. Zickler, T. Althoff, R. Girshick, C. Geyer, M. Fritz, P. Felzenszwalb, and T. Darrell. Sparselet models for efficient multiclass object detection. In Proc. ECCV, 2012. K. Tang, F.-F. Li, and D. Koller. Learning latent temporal structure for complex event detection. In Proc. CVPR, 2012. A. Torralba, K. P. Murphy, and W. T. Freeman. Sharing visual features for multiclass and multiview object detection. IEEE Trans. PAMI, 29(5):854–869, 2007. P. Tseng. On accelerated proximal gradient methods for convexconcave optimization. Submitted to SIAM Journal on Optimization, 2008. H. Wang, A. Kl¨ aser, C. Schmid, and C.-L. Liu. Action recognition by dense trajectories. In Proc. CVPR, pages 3169–3 176, 2011. J. Wang, J. Yang, K. Yu, F. Lv, T. S. Huang, and Y. Gong. Localityconstrained linear coding for image classification. In Proc. CVPR, 2010. L. Wang, Y. Qiao, and X. Tang. Motionlets: Mid-level 3d parts for human motion recognition. In Proc. CVPR, pages 2674–2681, 2013. Y. Wang and G. Mori. Hidden part models for human action recognition: Probabilistic versus max margin. IEEE Trans. PAMI, 33(7):1310–1323, 2011. B. Yao, X. Jiang, A. Khosla, A. L. Lin, L. J. Guibas, and F.-F. Li. Human action recognition by learning bases of action attributes and parts. In Proc. ICCV, 2011. J. Zhou, J. Chen, and J. Ye. Clustered multi-task learning via alternating structure optimization. In Proc. NIPS, 2011. 22227711