nips nips2007 nips2007-56 nips2007-56-reference knowledge-graph by maker-knowledge-mining

56 nips-2007-Configuration Estimates Improve Pedestrian Finding

Source: pdf

Author: Duan Tran, David A. Forsyth

Abstract: Fair discriminative pedestrian ﬁnders are now available. In fact, these pedestrian ﬁnders make most errors on pedestrians in conﬁgurations that are uncommon in the training data, for example, mounting a bicycle. This is undesirable. However, the human conﬁguration can itself be estimated discriminatively using structure learning. We demonstrate a pedestrian ﬁnder which ﬁrst ﬁnds the most likely human pose in the window using a discriminative procedure trained with structure learning on a small dataset. We then present features (local histogram of oriented gradient and local PCA of gradient) based on that conﬁguration to an SVM classiﬁer. We show, using the INRIA Person dataset, that estimates of conﬁguration signiﬁcantly improve the accuracy of a discriminative pedestrian ﬁnder. 1

reference text

[1] D.M. Gavrila. Sensor-based pedestrian protection. Intelligent Transportation Systems, pages 77–81, 2001.

[2] C. Papageorgiou and T. Poggio. A trainable system for object detection. Int. J. Computer Vision, 38(1):15– 33, June 2000.

[3] C.P. Papageorgiou and T. Poggio. A pattern classiﬁcation approach to dynamical object detection. In Int. Conf. on Computer Vision, pages 1223–1228, 1999.

[4] L. Zhao and C.E. Thorpe. Stereo- and neural network-based pedestrian detection. Intelligent Transportation Systems, 1(3):148–154, September 2000.

[5] D. Gavrila. Pedestrian detection from a moving vehicle. In European Conference on Computer Vision, pages II: 37–49, 2000.

[6] Y. Wu, T. Yu, and G. Hua. A statistical ﬁeld model for pedestrian detection. In IEEE Conf. on Computer Vision and Pattern Recognition, pages I: 1023–1030, 2005.

[7] P. Viola, M.J. Jones, and D. Snow. Detecting pedestrians using patterns of motion and appearance. Int. J. Computer Vision, 63(2):153–161, July 2005.

[8] M. Dimitrijevic, V. Lepetit, and P. Fua. Human body pose recognition using spatio-temporal templates. In ICCV workshop on Modeling People and Human Interaction, 2005.

[9] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In IEEE Conf. on Computer Vision and Pattern Recognition, pages I: 886–893, 2005.

[10] D.G. Lowe. Distinctive image features from scale-invariant keypoints. Int. J. Computer Vision, 60(2):91– 110, November 2004.

[11] A. Mohan, C.P. Papageorgiou, and T. Poggio. Example-based object detection in images by components. IEEE T. Pattern Analysis and Machine Intelligence, 23(4):349–361, April 2001.

[12] Y. Ke and R. Sukthankar. Pca-sift: a more distinctive representation for local image descriptors. In IEEE Conf. on Computer Vision and Pattern Recognition, pages II: 506–513, 2004.

[13] K. Mikolajczyk and C. Schmid. A performance evaluation of local descriptors. IEEE T. Pattern Analysis and Machine Intelligence, 2004. accepted.

[14] Serge Belongie, Jitendra Malik, and Jan Puzicha. Shape matching and object recognition using shape contexts. IEEE T. Pattern Analysis and Machine Intelligence, 24(4):509–522, 2002.

[15] P. Sabzmeydani and G. Mori. Detecting pedestrians by learning shapelet features. In CVPR, 2007.

[16] D.A. Forsyth, O.Arikan, L. Ikemoto, J. O’Brien, and D. Ramanan. Computational studies in human motion 1: Tracking and animation. Foundations and Trends in Computer Vision, 2006. In press.

[17] P.F. Felzenszwalb and D.P. Huttenlocher. Pictorial structures for object recognition. Int. J. Computer Vision, 61(1):55–79, January 2005.

[18] M. P. Kumar, P. H. S. Torr, and A. Zisserman. Extending pictorial structures for object recognition. In Proceedings of the British Machine Vision Conference, 2004.

[19] Deva Ramanan, D.A. Forsyth, and A. Zisserman. Strike a pose: Tracking people by ﬁnding stylized poses. In IEEE Conf. on Computer Vision and Pattern Recognition, 2005.

[20] D. Ramanan and D.A. Forsyth. Using temporal coherence to build models of animals. In Proc. ICCV, 2003.

[21] D. Ramanan. Learning to parse images of articulated objects. In Proc. NIPS, 2006.

[22] R. Ronfard, C. Schmid, and B. Triggs. Learning to parse pictures of people. In European Conference on Computer Vision, page IV: 700 ff., 2002.

[23] K. Mikolajczyk, C. Schmid, and A. Zisserman. Human detection based on a probabilistic assembly of robust part detectors. In European Conference on Computer Vision, pages Vol I: 69–82, 2004.

[24] A. Micilotta, E. Ong, and R. Bowden. Detection and tracking of humans by probabilistic body part assembly. In British Machine Vision Conference, volume 1, pages 429–438, 2005.

[25] B. Leibe, E. Seemann, and B. Schiele. Pedestrian detection in crowded scenes. In IEEE Conf. on Computer Vision and Pattern Recognition, pages I: 878–885, 2005.

[26] Pedro F. Felzenszwalb and Daniel P. Huttenlocher. Efﬁcient matching of pictorial structures. In IEEE Conf. on Computer Vision and Pattern Recognition, 2000.

[27] B. Taskar. Learning Structured Prediction Models: A Large Margin Approach. PhD thesis, Stanford University, 2004.

[28] B. Taskar, S. Lacoste-Julien, and M. Jordan. Structured prediction via the extragradient method. In Neural Information Processing Systems Conference, 2005.

[29] N. Ratliff, J. A. Bagnell, and M. Zinkevich. Subgradient methods for maximum margin structured learning. In ICML 2006 Workshop on Learning in Structured Output Spaces, 2006.

[30] N.Z. Shor. Minimization Methods for Non-Differentiable Functions and Applications. 1985.

[31] Chih-Chung Chang and Chih-Jen Lin. LIBSVM: a library for support vector machines, 2001.