nips nips2010 nips2010-40 nips2010-40-reference knowledge-graph by maker-knowledge-mining

40 nips-2010-Beyond Actions: Discriminative Models for Contextual Group Activities


Source: pdf

Author: Tian Lan, Yang Wang, Weilong Yang, Greg Mori

Abstract: We propose a discriminative model for recognizing group activities. Our model jointly captures the group activity, the individual person actions, and the interactions among them. Two new types of contextual information, group-person interaction and person-person interaction, are explored in a latent variable framework. Different from most of the previous latent structured models which assume a predefined structure for the hidden layer, e.g. a tree structure, we treat the structure of the hidden layer as a latent variable and implicitly infer it during learning and inference. Our experimental results demonstrate that by inferring this contextual information together with adaptive structures, the proposed model can significantly improve activity recognition performance. 1


reference text

[1] S. Andrews, I. Tsochantaridis, and T. Hofmann. Support vector machines for multiple-instance learning. In Advances in Neural Information Processing Systems, 2003.

[2] M. Blank, L. Gorelick, E. Shechtman, M. Irani, and R. Basri. Actions as space-time shapes. In IEEE International Conference on Computer Vision, 2005.

[3] W. Choi, K. Shahid, and S. Savarese. What are they doing? : Collective activity classification using spatio-temporal relationship among people. In 9th International Workshop on Visual Surveillance, 2009.

[4] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In Proc. IEEE Comput. Soc. Conf. Comput. Vision and Pattern Recogn., 2005.

[5] C. Desai, D. Ramanan, and C. Fowlkes. Discriminative models for multi-class object layout. In IEEE International Conference on Computer Vision, 2009.

[6] C. Desai, D. Ramanan, and C. Fowlkes. Discriminative models for static human-object interactions. In Workshop on Structured Models in Computer Vision, 2010.

[7] T.-M.-T. Do and T. Artieres. Large margin training for hidden markov models with partially observed states. In International Conference on Machine Learning, 2009.

[8] P. Felzenszwalb, D. McAllester, and D. Ramanan. A discriminatively trained, multiscale, deformable part model. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2008.

[9] A. Gupta, A. Kembhavi, and L. S. Davis. Observing human-object interactions: Using spatial and functional compatibility for recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(10):1775–1789, 2009.

[10] D. Han, L. Bo, and C. Sminchisescu. Selection and context for action recognition. In IEEE International Conference on Computer Vision, 2009.

[11] G. Heitz and D. Koller. Learning spatial context: Using stuff to find things. In European Conference on Computer Vision, 2008.

[12] M. Marszalek, I. Laptev, and C. Schmid. Actions in context. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2009.

[13] K. P. Murphy, A. Torralba, and W. T. Freeman. Using the forest to see the trees: A graphicsl model relating features, objects, and scenes. In Advances in Neural Information Processing Systems, volume 16. MIT Press, 2004.

[14] J. C. Niebles, C.-W. Chen, , and L. Fei-Fei. Modeling temporal structure of decomposable motion segments for activity classification. In European Conference of Computer Vision, 2010.

[15] A. Quattoni, S. Wang, L.-P. Morency, M. Collins, and T. Darrell. Hidden conditional random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(10):1848–1852, June 2007.

[16] A. Rabinovich, A. Vedaldi, C. Galleguillos, E. Wiewiora, and S. Belongie. Objects in context. In IEEE International Conference on Computer Vision, 2007.

[17] C. Schuldt, I. Laptev, and B. Caputo. Recognizing human actions: A local svm approach. In 17th International Conference on Pattern Recognition, 2004.

[18] A. Vedaldi and A. Zisserman. Structured output regression for detection with partial truncation. In Advances in Neural Information Processing Systems. MIT Press, 2009.

[19] Y. Wang and G. Mori. Max-margin hidden conditional random fields for human action recognition. In Proc. IEEE Comput. Soc. Conf. Comput. Vision and Pattern Recogn., 2009.

[20] Y. Wang and G. Mori. A discriminative latent model of image region and object tag correspondence. In Advances in Neural Information Processing Systems (NIPS), 2010.

[21] Y. Wang and G. Mori. A discriminative latent model of object classes and attributes. In European Conference on Computer Vision, 2010.

[22] W. Yang, Y. Wang, and G. Mori. Recognizing human actions from still images with latent poses. In CVPR, 2010.

[23] B. Yao and L. Fei-Fei. Grouplet: a structured image representation for recognizing human and object interactions. In The Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, San Francisco, CA, June 2010.

[24] B. Yao and L. Fei-Fei. Modeling mutual context of object and human pose in human-object interaction activities. In The Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, San Francisco, CA, June 2010.

[25] C.-N. Yu and T. Joachims. Learning structural SVMs with latent variables. In International Conference on Machine Learning, 2009. 9