nips nips2012 nips2012-289 nips2012-289-reference knowledge-graph by maker-knowledge-mining

289 nips-2012-Recognizing Activities by Attribute Dynamics


Source: pdf

Author: Weixin Li, Nuno Vasconcelos

Abstract: In this work, we consider the problem of modeling the dynamic structure of human activities in the attributes space. A video sequence is Ä?Ĺš rst represented in a semantic feature space, where each feature encodes the probability of occurrence of an activity attribute at a given time. A generative model, denoted the binary dynamic system (BDS), is proposed to learn both the distribution and dynamics of different activities in this space. The BDS is a non-linear dynamic system, which extends both the binary principal component analysis (PCA) and classical linear dynamic systems (LDS), by combining binary observation variables with a hidden Gauss-Markov state process. In this way, it integrates the representation power of semantic modeling with the ability of dynamic systems to capture the temporal structure of time-varying processes. An algorithm for learning BDS parameters, inspired by a popular LDS learning method from dynamic textures, is proposed. A similarity measure between BDSs, which generalizes the BinetCauchy kernel for LDS, is then introduced and used to design activity classiÄ?Ĺš ers. The proposed method is shown to outperform similar classiÄ?Ĺš ers derived from the kernel dynamic system (KDS) and state-of-the-art approaches for dynamics-based or attribute-based action recognition. 1


reference text

[1] J. K. Aggarwal and M. S. Ryoo, “Human activity analysis: A review,� ACM Computing Surveys, vol. 43, no. 16, pp. 1–16, 2011.

[2] P. Doll´ r, V. Rabaud, G. Cottrell, and S. Belongie, “Behavior recognition via sparse spatio-temporal a features,â€? ICCV VS-PETS, 2005.

[3] I. Laptev, M. MarszaÄš‚ek, C. Schmid, and B. Rozenfeld, “Learning realistic human actions from movies,â€? CVPR, 2008.

[4] J. C. Niebles, C.-W. Chen, and L. Fei-Fei, “Modeling temporal structure of decomposable motion segments for activity classiÄ?Ĺš cation,â€? ECCV, 2010.

[5] B. Laxton, J. Lim, and D. Kriegman, “Leveraging temporal, contextual and ordering constraints for recognizing complex activities in video,� CVPR, 2007.

[6] R. Chaudhry, A. Ravichandran, G. Hager, and R. Vidal, “Histograms of oriented optical Ä?Ĺš‚ow and binetcauchy kernels on nonlinear dynamical systems for the recognition of human actions,â€? CVPR, 2009.

[7] A. Gaidon, Z. Harchaoui, and C. Schmid, “Actom sequence models for efÄ?Ĺš cient action detection,â€? CVPR, 2011.

[8] B. Li, M. Ayazoglu, T. Mao, O. Camps, and M. Sznaier, “Activity recognition using dynamic subspace angles,� CVPR, 2011.

[9] C. H. Lampert, H. Nickisch, and S. Harmeling, “Learning to detect unseen object classes by between-class attribute transfer,� CVPR, 2009.

[10] N. Rasiwasia and N. Vasconcelos, “Holistic context models for visual recognition,� IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 34, no. 5, pp. 902–917, 2012.

[11] J. Liu, B. Kuipers, and S. Savarese, “Recognizing human actions by attributes,� CVPR, 2011.

[12] A. Fathi and G. Mori, “Action recognition by learning mid-level motion features,� CVPR, 2008.

[13] N. Rasiwasia and N. Vasconcelos, “Holistic context modeling using semantic co-occurrences,� CVPR, 2009.

[14] A. I. Schein, L. K. Saul, and L. H. Ungar, “A generalized linear model for principal component analysis of binary data,� AISTATS, 2003.

[15] G. Doretto, A. Chiuso, Y. N. Wu, and S. Soatto, “Dynamic textures,� Int’l J. Computer Vision, vol. 51, no. 2, pp. 91–109, 2003.

[16] S. V. N. Vishwanathan, A. J. Smola, and R. Vidal, “Binet-cauchy kernels on dynamical systems and its application to the analysis of dynamic scenes,� Int’l J. Computer Vision, vol. 73, no. 1, pp. 95–119, 2006.

[17] L. Gorelick, M. Blank, E. Shechtman, M. Irani, and R. Basri, “Actions as space-time shapes,� IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 12, pp. 2247–2253, 2007.

[18] N. Ă‹™ Ikizler and D. A. Forsyth, “Searching for complex human activities with no visual examples,â€? Int’l J. Computer Vision, vol. 80, no. 3, pp. 337–357, 2008.

[19] V. Kellokumpu, G. Zhao, and M. Pietik¨ inen, “Human activity recognition using a dynamic texture based a method,â€? BMVC, 2008.

[20] N. Rasiwasia and N. Vasconcelos, “Scene classiÄ?Ĺš cation with low-dimensional semantic spaces and weak supervision,â€? CVPR, 2008.

[21] A. Quattoni, M. Collins, and T. Darrell, “Learning visual representations using images with captions,� CVPR, 2007.

[22] N. Rasiwasi, P. J. Moreno, and N. Vasconcelos, “Bridging the gap: Query by semantic example,� IEEE Trans. Multimedia, vol. 9, no. 5, pp. 923–938, 2007.

[23] D. M. Blei and J. D. Lafferty, “Dynamic topic models,� ICML, 2006.

[24] X. Wang and A. McCallum, “Topics over time: a non-markov continuous-time model of topical trends,� ACM SIGKDD, 2006.

[25] M. Collins, S. Dasgupta, and R. E. Schapire, “A generalization of principal component analysis to the exponential family,� NIPS, 2002.

[26] R. H. Shumway and D. S. Stoffer, “An approach to time series smoothing and forecasting using the em algorithm,� Journal of Time Series Analysis, vol. 3, no. 4, pp. 253–264, 1982.

[27] B. Sch¨ lkopf, A. Smola, and K.-R. M¨ ller, “Nonlinear component analysis as a kernel eigenvalue probo u lem,â€? Neural Computation, vol. 10, pp. 1299–1319, 1998.

[28] A. B. Chan and N. Vasconcelos, “Classifying video with kernel dynamic textures,� CVPR, 2007.

[29] C.-C. Chang and C.-J. Lin, “LIBSVM: A library for support vector machines,� ACM Trans. on Intelligent Systems and Technology, vol. 2, no. 3, pp. 27:1–27:27, 2011. 9