cvpr cvpr2013 cvpr2013-85 cvpr2013-85-reference knowledge-graph by maker-knowledge-mining

85 cvpr-2013-Complex Event Detection via Multi-source Video Attributes


Source: pdf

Author: Zhigang Ma, Yi Yang, Zhongwen Xu, Shuicheng Yan, Nicu Sebe, Alexander G. Hauptmann

Abstract: Complex events essentially include human, scenes, objects and actions that can be summarized by visual attributes, so leveraging relevant attributes properly could be helpful for event detection. Many works have exploited attributes at image level for various applications. However, attributes at image level are possibly insufficient for complex event detection in videos due to their limited capability in characterizing the dynamic properties of video data. Hence, we propose to leverage attributes at video level (named as video attributes in this work), i.e., the semantic labels of external videos are used as attributes. Compared to complex event videos, these external videos contain simple contents such as objects, scenes and actions which are the basic elements of complex events. Specifically, building upon a correlation vector which correlates the attributes and the complex event, we incorporate video attributes latently as extra informative cues into the event detector learnt from complex event videos. Extensive experiments on a real-world large-scale dataset validate the efficacy of the proposed approach.


reference text

[1] http://en.wikipedia.org/wiki/sylvester equation.

[2] http://www.nist.gov/itl/iad/mig/upload/med1 1-evalplanv03-201 10801a.pdf.

[3] http://www.nist.gov/itl/iad/mig/upload/med12-evalplanv01.pdf.

[4] R. K. Ando and T. Zhang. A framework for learning predictive structures from multiple tasks and unlabeled data. Journal of Machine Learning Research, 6: 1817–1853, 2005.

[5] S. Dhar, V. Ordonez, and T. L. Berg. High level describable attributes for predicting aesthetics and interestingness. In CVPR, pages 1657–1664, 2011.

[6] D. Ding, F. Metze, S. Rawat, P. F. Schulam, S. Burger, E. Younessian, L. Bao, M. G. Christel, and A. G. Hauptmann. Beyond audio and video retrieval: towards multimedia summarization. In ICMR, 2012.

[7] K. Duan, D. Parikh, D. J. Crandall, and K. Grauman. Discovering localized attributes for fine-grained recognition. In CVPR, pages 3474–3481, 2012.

[8] A. Farhadi, I. Endres, D. Hoiem, and D. A. Forsyth. Describing objects by their attributes. In CVPR, pages 1778–1785, 2009.

[9] S. J. Hwang, F. Sha, and K. Grauman. Sharing features between objects and their attributes. In CVPR, pages 1761– 1768, 2011.

[10] H. Izadinia and M. Shah. Recognizing complex events using large marginjoint low-level event model. In ECCV(4), pages

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20] 430–444, 2012. C. H. Lampert, H. Nickisch, and S. Harmeling. Learning to detect unseen object classes by between-class attribute transfer. In CVPR, pages 951–958, 2009. I. Laptev and T. Lindeberg. Space-time interest points. In ICCV, pages 432–439, 2003. D. G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2):91–1 10, 2004. J. Luo, T. Tommasi, and B. Caputo. Multiclass transfer learning from unconstrained priors. In ICCV, pages 1863–1870, 2011. Z. Ma, Y. Yang, Y. Cai, N. Sebe, and A. G. Hauptmann. Knowledge adaptation for ad hoc multimedia event detection with few exemplars. In ACM Multimedia, pages 469–478, 2012. Z. Ma, Y. Yang, N. Sebe, K. Zheng, and A. G. Hauptmann. Multimedia event detection using a classifier-specific intermediate representation. IEEE Transactions on Multimedia, 2013. K. K. Reddy and M. Shah. Recognizing 50 human action categories of web videos. Machine Vision and Applications Journal, 2012. B. Sch o¨lkopf, A. J. Smola, and K.-R. M ¨uller. Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10(5): 1299–1319, 1998. L. Torresani, M. Szummer, and A. W. Fitzgibbon. Efficient object category recognition using classemes. In ECCV (1), pages 776–789, 2010. G. Wang and D. A. Forsyth. Joint learning of visual attributes, object classes and visual saliency. In ICCV, pages 537–544, 2009.

[21] Y. Wang and G. Mori. A discriminative latent model of object classes and attributes. In ECCV (5), pages 155–168, 2010.

[22] Y. Yang, J. Song, Z. Huang, Z. Ma, N. Sebe, and A. G. Hauptmann. Multi-feature fusion via hierarchical regression for multimedia analysis. IEEE Transactions on Multimedia, 2013.

[23] S.-I. Yu, Z. Xu, D. Ding, W. Sze, F. Vicente, Z. Lan, Y. Cai, S. Rawat, P. Schulam, N. Markandaiah, S. Bahmani, A. Juarez, W. Tong, Y. Yang, S. Burger, F. Metze, R. Singh, B. Raj, R. Stern, T. Mitamura, E. Nyberg, and A. Hauptmann. Informedia e-lamp @ TRECVID2012: Multimedia event detection and recounting med and mer. In NIST TRECVID Workshop, 2012. 222666333311