cvpr cvpr2013 cvpr2013-332 cvpr2013-332-reference knowledge-graph by maker-knowledge-mining

332 cvpr-2013-Pixel-Level Hand Detection in Ego-centric Videos


Source: pdf

Author: Cheng Li, Kris M. Kitani

Abstract: We address the task of pixel-level hand detection in the context of ego-centric cameras. Extracting hand regions in ego-centric videos is a critical step for understanding handobject manipulation and analyzing hand-eye coordination. However, in contrast to traditional applications of hand detection, such as gesture interfaces or sign-language recognition, ego-centric videos present new challenges such as rapid changes in illuminations, significant camera motion and complex hand-object manipulations. To quantify the challenges and performance in this new domain, we present a fully labeled indoor/outdoor ego-centric hand detection benchmark dataset containing over 200 million labeled pixels, which contains hand images taken under various illumination conditions. Using both our dataset and a publicly available ego-centric indoors dataset, we give extensive analysis of detection performance using a wide range of local appearance features. Our analysis highlights the effectiveness of sparse features and the importance of modeling global illumination. We propose a modeling strategy based on our findings and show that our model outperforms several baseline approaches.


reference text

[1] A. Argyros and M. Lourakis. Real-time tracking of multiple skin-colored objects with a possibly moving camera. ECCV, 2004. 2

[2] L. Breiman. Random forests. Machine learning, 45(1):5–32, 2001. 5

[3] M. Calonder, V. Lepetit, C. Strecha, and P. Fua. BRIEF: Binary robust independent elementary features. In ECCV, 2010. 3

[4] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In CVPR, pages 886–893, 2005. 3

[5] A. Fathi, Y. Li, and J. Rehg. Learning to recognize daily actions using gaze. In ECCV, 2012. 1

[6] A. Fathi, X. Ren, and J. Rehg. Learning to recognize objects in egocentric activities. In CVPR, 2011. 2, 4, 6, 7, 8

[7] E. Hayman and J.-O. Eklundh. Statistical background subtraction for a mobile observer. In ICCV, 2003. 2, 6, 8

[8] M. Jones and J. Rehg. Statistical color models with applica-

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21] tion to skin detection. In CVPR, 1999. 2, 5, 6, 8 P. Kakumanu, S. Makrogiannis, and N. Bourbakis. A survey of skin-color modeling and detection methods. Pattern recognition, 40(3): 1106–1 122, 2007. 2 M. K ¨olsch and M. Turk. Robust hand detection. In FG, 2004. 2 M. Kolsch and M. Turk. Hand tracking with flocks of features. In CVPR, 2005. 2 D. Lowe. Distinctive image features from scale-invariant keypoints. IJCV, 60(2):91–1 10, 2004. 3 B. Lucas, T. Kanade, et al. An iterative image registration technique with an application to stereo vision. In International joint conference on Artificial intelligence, 1981 . 6 M. Maire, P. Arbel ´aez, C. Fowlkes, and J. Malik. Using contours to detect and localize junctions in natural images. In CVPR, 2008. 6 I. Oikonomidis, N. Kyriazis, and A. Argyros. Markerless and efficient 26-DOF hand pose recovery. ACCV, 2011. 2 P. P ´erez, C. Hue, J. Vermaak, and M. Gangnet. Color-based probabilistic tracking. ECCV, 2002. 2 H. Pirsiavash and D. Ramanan. Detecting activities of daily living in first-person camera views. In CVPR, 2012. 1 J. Rehg and T. Kanade. Visual tracking of high DOF articulated structures: an application to human hand tracking. ECCV, 1994. 2 X. Ren and J. Malik. Tracking as repeated figure/ground segmentation. In CVPR, 2007. 3 E. Rublee, V. Rabaud, K. Konolige, and G. Bradski. ORB: an efficient alternative to SIFT or SURF. In ICCV, 2011. 3 P. Sand and S. Teller. Particle video: Long-range motion

[22]

[23]

[24]

[25]

[26]

[27]

[28] estimation using point trajectories. IJCV, 80(1):72–91, 2008. 6 Y. Sheikh, O. Javed, and T. Kanade. Background subtraction for freely moving cameras. In ICCV, 2009. 2, 6, 7, 8 L. Sigal, S. Sclaroff, and V. Athitsos. Skin color-based video segmentation under time-varying illumination. PAMI, 26(7):862–877, 2004. 2 B. Stenger, P. Mendonc ¸a, and R. Cipolla. Model-based 3D tracking of an articulated hand. In CVPR, 2001 . 2 E. Sudderth, M. Mandel, W. Freeman, and A. Willsky. Visual hand tracking using nonparametric belief propagation. In Workshop on Generative Model Based Vision, 2004. 2 C. Tomasi and T. Kanade. Detection and tracking of point features. School of Computer Science, Carnegie Mellon Univ., 1991. 6 L. Van der Maaten and G. Hinton. Visualizing data using t-SNE. JMLR, 9(2579-2605):85, 2008. 3 S. Wang, H. Lu, F. Yang, and M. Yang. Superpixel tracking. In ICCV, 2011. 3 333555777755