iccv iccv2013 iccv2013-216 iccv2013-216-reference knowledge-graph by maker-knowledge-mining

216 iccv-2013-Inferring "Dark Matter" and "Dark Energy" from Videos


Source: pdf

Author: Dan Xie, Sinisa Todorovic, Song-Chun Zhu

Abstract: This paper presents an approach to localizing functional objects in surveillance videos without domain knowledge about semantic object classes that may appear in the scene. Functional objects do not have discriminative appearance and shape, but they affect behavior of people in the scene. For example, they “attract” people to approach them for satisfying certain needs (e.g., vending machines could quench thirst), or “repel” people to avoid them (e.g., grass lawns). Therefore, functional objects can be viewed as “dark matter”, emanating “dark energy ” that affects people ’s trajectories in the video. To detect “dark matter” and infer their “dark energy ” field, we extend the Lagrangian mechanics. People are treated as particle-agents with latent intents to approach “dark matter” and thus satisfy their needs, where their motions are subject to a composite “dark energy ” field of all functional objects in the scene. We make the assumption that people take globally optimal paths toward the intended “dark matter” while avoiding latent obstacles. A Bayesian framework is used to probabilistically model: people ’s trajectories and intents, constraint map of the scene, and locations of functional objects. A data-driven Markov Chain Monte Carlo (MCMC) process is used for inference. Our evaluation on videos of public squares and courtyards demonstrates our effectiveness in localizing functional objects and predicting people ’s trajectories in unobserved parts of the video footage.


reference text

[1] S. Ali and M. Shah. A Lagrangian particle dynamics approach for crowd flow segmentation and stability analysis. In CVPR, 2007. 3

[2] S. Ali and M. Shah. Floor fields for tracking in high density crowd scenes. In EECV, 2008. 3

[3] M. R. Amer, D. Xie, M. Zhao, S. Todorovic, and S.-C. Zhu. Cost-sensitive top-down / bottom-up inference for multiscale activity recognition. In ECCV, 2012. 2, 6

[4] C. L. Baker, R. Saxe, and J. B. Tenenbaum. Action understanding as inverse planning. Cognition, 2009. 3

[5] J. Barraquand, B. Langlois, and J.-C. Latombe. Numerical potential field techniques for robot path planning. TSMC, 1992. 3

[6] J. Gall, A. Fossati, and L. V. Gool. Functional categorization of objects using real-time markerless motion capture. In CVPR, 2011. 2

[7] H. Gong, J. Sim, M. Likhachev, and J. Shi. Multi-hypothesis motion planning for visual object tracking. In ICCV, 2011. 3

[8] H. Grabner, J. Gall, and L. V. Gool. What makes a chair a chair ? In CVPR, 2011. 2

[9] A. Gupta, A. Kembhavi, and L. S. Davis. Observing humanobject interactions: using spatial and functional compatibility for recognition. TPAMI, 2009. 2

[10] M. Hoai and F. De la Torre. Max-margin early event detectors. In CVPR, 2012. 3

[11] K. Kim, M. Grundmann, A. Shamir, I. Matthews, J. Hodgins, and I. Essa. Motion fields to predict play evolution in dynamic sport scenes. In CVPR, 2010. 3

[12] K. M. Kitani, B. D. Ziebart, J. A. Bagnell, and M. Hebert. Activity forecasting. In ECCV, 2012. 3, 5, 6, 7, 8

[13] J. Kwon and K. M. Lee. Wang-Landau monte carlo-based tracking methods for abrupt motions. TPAMI, 2013. 2, 4

[14] K. H. Lee, M. G. Choi, Q. Hong, and J. Lee. Group behavior from video : A data-driven approach to crowd simulation. In SCA, 2007. 3

[15] A. Lerner, Y. Chrysanthou, and D. Lischinski. Crowds by example. In Eurographics, 2007. 3

[16] S. Oh et al. A large-scale benchmark dataset for event recognition in surveillance video. In CVPR, 2011. 2, 6

[17] M. Pei, Y. Jia, and S.-C. Zhu. Parsing video events with goal inference and intent prediction. In ICCV, 2011. 3

[18] S. Pellegrini, J. Gall, L. Sigal, and L. V. Gool. Destination

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28] flow for crowd simulation. In ECCV, 2012. 3 H. Pirsiavash, D. Ramanan, and C. C. Fowlkes. Globallyoptimal greedy algorithms for tracking a variable number of objects. In CVPR, 2011. 2 M. S. Ryoo. Human activity prediction: Early recognition of ongoing activities from streaming videos. In ICCV, 2011. 3 W. Shao and D. Terzopoulos. Autonomous pedestrians. In SCA, 2005. 3 B. Solmaz, B. E. Moore, and M. Shah. Identifying behaviors in crowd scenes using stability analysis for Dynamical Systems. TPAMI, 2012. 3 Z. Tu and S.-C. Zhu. Image segmentation by data-driven markov chain monte carlo. TPAMI, 2002. 2, 4 M. W. Turek, A. Hoogs, and R. Collins. Unsupervised learning of functional categories in video scenes. In ECCV, 2010. 2 C. Vondrick, D. Patterson, and D. Ramanan. Efficiently scaling up crowdsourced video annotation. IJCV, 2013. 6 Y. Zhao and S.-C. Zhu. Image parsing via stochastic scene grammar. In NIPS, 2011. 2 B. Zhou, X. Wang, and X. Tang. Random field topic model for semantic region analysis in crowded scenes from tracklets. In CVPR, 2011. 3 B. Zhou, X. Wang, and X. Tang. Understanding collective crowd behaviors: Learning a Mixture model of Dynamic pedestrian-Agents. In CVPR, 2012. 3 222233 11