jmlr jmlr2013 jmlr2013-58 jmlr2013-58-reference knowledge-graph by maker-knowledge-mining

58 jmlr-2013-Language-Motivated Approaches to Action Recognition

Source: pdf

Author: Manavender R. Malgireddy, Ifeoma Nwogu, Venu Govindaraju

Abstract: We present language-motivated approaches to detecting, localizing and classifying activities and gestures in videos. In order to obtain statistical insight into the underlying patterns of motions in activities, we develop a dynamic, hierarchical Bayesian model which connects low-level visual features in videos with poses, motion patterns and classes of activities. This process is somewhat analogous to the method of detecting topics or categories from documents based on the word content of the documents, except that our documents are dynamic. The proposed generative model harnesses both the temporal ordering power of dynamic Bayesian networks such as hidden Markov models (HMMs) and the automatic clustering power of hierarchical Bayesian models such as the latent Dirichlet allocation (LDA) model. We also introduce a probabilistic framework for detecting and localizing pre-speciﬁed activities (or gestures) in a video sequence, analogous to the use of ﬁller models for keyword detection in speech processing. We demonstrate the robustness of our classiﬁcation model and our spotting framework by recognizing activities in unconstrained real-life video sequences and by spotting gestures via a one-shot-learning approach. Keywords: dynamic hierarchical Bayesian networks, topic models, activity recognition, gesture spotting, generative models

reference text

University of Central Florida, Computer Vision Lab, 2010. URL http://server.cs.ucf.edu/ ˜vision/data/UCF50.rar. ChaLearn Gesture Dataset (CGD2011), ChaLearn, California, 2011. URL http://gesture. chalearn.org/2011-one-shot-learning. J. Aggarwal and M. Ryoo. Human activity analysis: A review. ACM Computing Surveys, 43: 16:1–16:43, Apr 2011. Y. Benabbas, A. Lablack, N. Ihaddadene, and C. Djeraba. Action Recognition Using Direction Models of Motion. In Proceedings of the 2010 International Conference on Pattern Recognition, pages 4295–4298, 2010. H. Bilen, V. P. Namboodiri, and L. Van Gool. Action recognition: A region based approach. In Proceedings of the 2011 IEEE Workshop on the Applications of Computer Vision, pages 294 –300, 2011. D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 3:993–1022, Mar. 2003. M. Bregonzio, S. Gong, and T. Xiang. Recognising action as clouds of space-time interest points. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 1948–1955, 2009. N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In Proceedings of the 2005 IEEE Conference on Computer Vision and Pattern Recognition, pages 886–893, 2005. K. G. Derpanis, M. Sizintsev, K. Cannons, and R. P. Wildes. Efﬁcient action spotting based on a spacetime oriented structure representation. In Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition, pages 1990–1997, 2010. P. Doll´ r, V. Rabaud, G. Cottrell, and S. Belongie. Behavior recognition via sparse spatio-temporal a features. In Proceedings of the 2005 IEEE Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pages 65–72, 2005. A. Gilbert, J. Illingworth, and R. Bowden. Action recognition using mined hierarchical compound features. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(5):883–897, 2011. S. Gong and T. Xiang. Recognition of group activities using dynamic probabilistic networks. In Proceedings of the 2003 IEEE Conference on Computer Vision and Pattern Recognition, pages 742–749 vol.2, 2003. 2210 A L ANGUAGE -M OTIVATED A PPROACH TO ACTION R ECOGNITION G. Heinrich. Parameter estimation for text analysis,. Technical report, University of Leipzig, 2008. T. Hospedales, S.-G. Gong, and T. Xiang. A Markov Clustering Topic Model for Mining Behaviour in Video. In Proceedings of the 2009 International Conference on Computer Vision, pages 1165– 1172, 2009. A. Kl¨ ser, M. Marszalek, and C. Schmid. A spatio-temporal descriptor based on 3d-gradients. In a Proceedings of the 2008 British Machine Vision Conference, 2008. A. Kovashka and K. Grauman. Learning a hierarchy of discriminative space-time neighborhood features for human action recognition. In Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition, pages 2046–2053, 2010. H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre. HMDB: a large video database for human motion recognition. In Proceedings of the 2011 International Conference on Computer Vision, 2011. I. Laptev. On Space-Time Interest Points. International Journal of Computer Vision, 64:107–123, September 2005. I. Laptev and T. Lindeberg. Space-time interest points. In Proceedings of the 2003 International Conference on Computer Vision, pages 432–439, 2003. I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld. Learning Realistic Human Actions From Movies. In Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition, pages 1–8, 2008. P. Matikainen, M. Hebert, and R. Sukthankar. Trajectons: Action recognition through the motion analysis of tracked features. In Proceedings of the 2009 IEEE Workshop on Video-Oriented Object and Event Classiﬁcation, Sep 2009. P. Matikainen, M. Hebert, and R. Sukthankar. Representing Pairwise Spatial and Temporal Relations for Action Recognition. In Proceedings of the 2010 European Conference on Computer Vision, September 2010. R. Messing, C. Pal, and H. Kautz. Activity Recognition Using the Velocity Histories of Tracked Keypoints. In Proceedings of the 2009 International Conference on Computer Vision, 2009. P. Natarajan and R. Nevatia. Coupled hidden semi markov models for activity recognition. In Proceedings of the IEEE Workshop on Motion and Video Computing, 2007. N. T. Nguyen, D. Q. Phung, and S. Venkatesh. Learning and detecting activities from movement trajectories using the hierarchical hidden markov models. In Proceedings of the 2005 IEEE Conference on Computer Vision and Pattern Recognition, pages 955–960, 2005. E. Nowak, F. Jurie, and B. Triggs. Sampling strategies for bag-of-features image classiﬁcation. In Proceedings of the 2006 European Conference on Computer Vision, pages 490–503, 2006. N. Oliver, E. Horvitz, and A. Garg. Layered representations for human activity recognition. In Proceedings of the 2002 IEEE International Conference on Multimodal Interfaces, pages 3–8, 2002. 2211 M ALGIREDDY, N WOGU AND G OVINDARAJU N. M. Oliver, B. Rosario, and A. P. Pentland. A bayesian computer vision system for modeling human interactions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8): 831–843, 2000. J. R. Rohlicek, W. Russell, S. Roukos, and H. Gish. Continuous hidden Markov modeling for speaker-independent word spotting. In Proceedings of the 1989 International Conference on Acoustics, Speech, and Signal Processing, pages 627–630, 1989. R. Rose and D. Paul. A Hidden Markov Model based keyword recognition system. In Proceedings of the 1990 International Conference on Acoustics, Speech, and Signal Processing, 1990. C. Sch¨ ldt, I. Laptev, and B. Caputo. Recognizing human actions: A local svm approach. In u Proceedings of the 2004 International Conference on Pattern Recognition, pages 32–36, 2004. P. Scovanner, S. Ali, and M. Shah. A 3-dimensional sift descriptor and its application to action recognition. In Procedings of the ACM International Conference on Multimedia, pages 357–360, 2007. H. Wang, M. M. Ullah, A. Kl¨ ser, I. Laptev, and C. Schmid. Evaluation of local spatio-temporal a features for action recognition. In Proceedings of the 2009 British Machine Vision Conference, sep 2009. H. Wang, A. Kl¨ ser, C. Schmid, and L. Cheng-Lin. Action Recognition by Dense Trajectories. In a Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition, pages 3169–3176, Jun 2011. G. Willems, T. Tuytelaars, and L. Gool. An efﬁcient dense and scale-invariant spatio-temporal interest point detector. In Proceedings of the 2008 European Conference on Computer Vision, pages 650–663, 2008. J. Yamato, J. Ohya, and K. Ishii. Recognizing human action in time-sequential images using hidden Markov model. In Proceedings of the 1992 IEEE Conference on Computer Vision and Pattern Recognition, pages 379–385, 1992. L. Yeffet and L. Wolf. Local trinary patterns for human action recognition. In Proceedings of the 2009 International Conference on Computer Vision, 2009. J. Yuan, Z. Liu, and Y. Wu. Discriminative subvolume search for efﬁcient action detection. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009. 2212