nips nips2004 nips2004-182 nips2004-182-reference knowledge-graph by maker-knowledge-mining

182 nips-2004-Synergistic Face Detection and Pose Estimation with Energy-Based Models


Source: pdf

Author: Margarita Osadchy, Matthew L. Miller, Yann L. Cun

Abstract: We describe a novel method for real-time, simultaneous multi-view face detection and facial pose estimation. The method employs a convolutional network to map face images to points on a manifold, parametrized by pose, and non-face images to points far from that manifold. This network is trained by optimizing a loss function of three variables: image, pose, and face/non-face label. We test the resulting system, in a single configuration, on three standard data sets – one for frontal pose, one for rotated faces, and one for profiles – and find that its performance on each set is comparable to previous multi-view face detectors that can only handle one form of pose variation. We also show experimentally that the system’s accuracy on both face detection and pose estimation is improved by training for the two tasks together.


reference text

[1] L. Bottou and Y. LeCun. The Lush Manual. http://lush.sf.net, 2002.

[2] R. Caruana. Multitask learning. Machine Learning, 28:41–75, 1997.

[3] C. Garcia and M. Delakis. A neural architecture for fast and robust face detection. IEEE-IAPR Int. Conference on Pattern Recognition, pages 40–43, 2002.

[4] F. J. Huang and Y. LeCun. Loss functions for discriminative training of energy-based graphical models. Technical report, Courant Institute of Mathematical Science, NYU, June 2004.

[5] M. Jones and P. Viola. Fast multi-view face detection. Technical Report TR2003-96, Mitsubishi Electric Research Laboratories, 2003.

[6] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, November 1998.

[7] S. Z. Li, L. Zhu, Z. Zhang, A. Blake, H. Zhang, and H. Shum. Statistical learning of multi-view face detection. In Proceedings of the 7th European Conference on Computer Vision-Part IV, 2002.

[8] Y. Li, S. Gong, and H. Liddell. Support vector regression and classification based multi-view face detection and recognition. In Face and Gesture, 2000.

[9] H. Moon and M. L. Miller. Estimating facial pose from sparse representation. In International Conference on Image Processing, Singapore, 2004.

[10] A. Pentland, B. Moghaddam, and T. Starner. View-based and modular eigenspaces for face recognition. In CVPR, 1994.

[11] H. A. Rowley, S. Baluja, and T. Kanade. Neural network-based face detection. PAMI, 20:22–38, 1998.

[12] H. A. Rowley, S. Baluja, and T. Kanade. Rotation invariant neural network-based face detection. In Computer Vision and Pattern Recognition, 1998.

[13] H. Schneidermn and T. Kanade. A statistical method for 3d object detection applied to faces and cars. In Computer Vision and Pattern Recognition, 2000.

[14] K. Sung and T. Poggio. Example-based learning of view-based human face detection. PAMI, 20:39–51, 1998.

[15] R. Vaillant, C. Monrocq, and Y. LeCun. Original approach for the localisation of objects in images. IEE Proc on Vision, Image, and Signal Processing, 141(4):245–250, August 1994.

[16] P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition, pages 511–518, 2001.