cvpr cvpr2013 cvpr2013-158 cvpr2013-158-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Dennis Park, C. Lawrence Zitnick, Deva Ramanan, Piotr Dollár
Abstract: We describe novel but simple motion features for the problem of detecting objects in video sequences. Previous approaches either compute optical flow or temporal differences on video frame pairs with various assumptions about stabilization. We describe a combined approach that uses coarse-scale flow and fine-scale temporal difference features. Our approach performs weak motion stabilization by factoring out camera motion and coarse object motion while preserving nonrigid motions that serve as useful cues for recognition. We show results for pedestrian detection and human pose estimation in video sequences, achieving state-of-the-art results in both. In particular, given a fixed detection rate our method achieves a five-fold reduction in false positives over prior art on the Caltech Pedestrian benchmark. Finally, we perform extensive diagnostic experiments to reveal what aspects of our system are crucial for good performance. Proper stabilization, long time-scale features, and proper normalization are all critical.
[1] Minds eye dataset. http : / /www .vis int . org/ index .html . 2, 4, 6
[2] C. Anderson, P. Burt, and G. Van Der Wal. Change detection and tracking using pyramid transform techniques. In Proc. SPIE Conference on Intelligent Robots and Computer Vision, pages 300–305, 1985. 2
[3] M. Andriluka, S. Roth, and B. Schiele. People-tracking-bydetection and people-detection-by-tracking. In CVPR, pages 1–8, 2008. 2
[4] T. Brox and J. Malik. Large displacement optical flow: descriptor matching in variational motion estimation. IEEE TPAMI, 33(3):500–513, 2011. 2
[5] R. Collins et al. A system for video surveillance and monitoring, volume 102. Carnegie Mellon University, the Robotics Institute, 2000. 2
[6] N. Dalal. Finding people in images and videos. PhD thesis, Institut National Polytechnique de Grenoble-INPG, 2006. 2
[7] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In CVPR. IEEE, 2005. 1, 2, 3, 4, 5, 6
[8] N. Dalal, B. Triggs, and C. Schmid. Human detection using oriented histograms of flow and appearance. ECCV, pages 428–441, 2006. 1, 2
[9] P. Doll a´r, S. Belongie, and P. Perona. The fastest pedestrian detector in the west. BMVC, 2010. 2
[10] P. Doll a´r, V. Rabaud, G. Cottrell, and S. Belongie. Behavior
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22] recognition via sparse spatio-temporal features. In ICCV VSPETS, 2005. 2 P. Dollar, Z. Tu, P. Perona, and S. Belongie. Integral channel features. BMVC, 2009. 2, 4, 5 P. Doll a´r, C. Wojek, B. Schiele, and P. Perona. Pedestrian detection: An evaluation of the state of the art. IEEE TPAMI, 34(4):743–761, 2012. 2, 4, 5, 6 A. Efros, A. Berg, G. Mori, and J. Malik. Recognizing action at a distance. In ICCV, pages 726–733, 2003. 1, 2 A. Ess, B. Leibe, K. Schindler, and L. Van Gool. A mobile vision system for robust multi-person tracking. In CVPR, 2008. 2 A. Fathi and G. Mori. Human pose estimation using motion exemplars. In ICCV, pages 1–8, 2007. 2 P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan. Object detection with discriminatively trained part based models. IEEE TPAMI, 2010. 2, 4, 5 D. Geronimo, A. M. Lopez, A. D. Sappa, and T. Graf. Survey of pedestrian detection for advanced driver assistance systems. IEEE TPAMI, 32(7):1239–1258, 2010. 2 R. Hartley and A. Zisserman. Multiple view geometry in computer vision. Cambridge Univ. Press, 2000. 1 M. Irani, B. Rousso, and S. Peleg. Recovery of ego-motion using region alignment. IEEE TPAMI, 1997. 1 M. Jones and D. Snow. Pedestrian detection using boosted features over many frames. In ICPR, 2008. 2 I. Laptev and T. Lindeberg. Local descriptors for spatiotemporal recognition. Spatial Coherence for Visual Motion Analysis, pages 91–103, 2006. 2 B. Lucas, T. Kanade, et al. An iterative image registration technique with an application to stereo vision. In IJCAI,
[23]
[24]
[25]
[26]
[27]
[28]
[29]
[30] [3 1]
[32] 1981. 3 D. Park, D. Ramanan, and C. Fowlkes. Multiresolution models for object detection. ECCV, 2010. 6 M. Piccardi. Background subtraction techniques: a review. In Systems, Man and Cybernetics, volume 4, pages 3099– 3104. IEEE, 2004. 1, 2 E. Shechtman and M. Irani. Space-time behavior based correlation. In IEEE TPAMI, 2007. 2 P. Viola and M. Jones. Robust real-time face detection. IJCV, 57(2):137–154, 2004. 1 P. Viola, M. Jones, and D. Snow. Detecting pedestrians using patterns of motion and appearance. IJCV, 2005. 1 S. Walk, N. Majer, K. Schindler, and B. Schiele. New features and insights for pedestrian detection. In CVPR, 2010. 2, 6 H. Wang, M. Ullah, A. Klaser, I. Laptev, C. Schmid, et al. Evaluation of local spatio-temporal features for action recognition. In BMVC, 2009. 1, 2 C. Wojek, S. Walk, S. Roth, K. Schindler, and B. Schiele. Monocular visual scene understanding: Understanding multi-object traffic scenes. IEEE TPAMI, 2012. 2 Y. Yang and D. Ramanan. Articulated pose estimation with flexible mixtures-of-parts. In CVPR, pages 1385–1392. IEEE, 2011. 2, 6, 7, 8 L. Zelnik-Manor and M. Irani. Event-based analysis of video. In CVPR, volume 2, pages II–123. IEEE, 2001. 2 222888888977