iccv iccv2013 iccv2013-220 iccv2013-220-reference knowledge-graph by maker-knowledge-mining

220 iccv-2013-Joint Deep Learning for Pedestrian Detection

Source: pdf

Author: Wanli Ouyang, Xiaogang Wang

Abstract: Feature extraction, deformation handling, occlusion handling, and classi?cation are four important components in pedestrian detection. Existing methods learn or design these components either individually or sequentially. The interaction among these components is not yet well explored. This paper proposes that they should be jointly learned in order to maximize their strengths through cooperation. We formulate these four components into a joint deep learning framework and propose a new deep network architecture1. By establishing automatic, mutual interaction among components, the deep model achieves a 9% reduction in the average miss rate compared with the current best-performing pedestrian detection approaches on the largest Caltech benchmark dataset.

reference text

[1] A. Bar-Hillel, D. Levi, E. Krupka, and C. Goldberg. Partbased feature synthesis for human detection. In ECCV, 2010. 2, 5

[2] O. Barinova, V. Lempitsky, and P. Kohli. On detection of multiple object instances using hough transforms. In CVPR, 2010. 1, 2

[3] Y. Bengio. Learning deep architectures for AI. Foundations and Trends in Machine Learning, 2(1): 1–127, 2009. 2

[4] L. Bourdev and J. Malik. Poselets: body part detectors trained using 3D human pose annotations. In ICCV, 2009. 2

[5] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In CVPR, 2005. 1, 2, 5, 7

[6] C. Desai and D. Ramanan. Detecting actions, poses, and objects with relational phraselets. In ECCV, 2012. 2

[7] M. Dikmen, D. Hoiem, and T. S. Huang. A data-driven

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16] method for feature transformation. In CVPR, 2012. 2 Y. Ding and J. Xiao. Contextual boost for pedestrian detection. In CVPR, 2012. 1, 6 P. Doll a´r, R. Appel, and W. Kienzle. Crosstalk cascades for frame-rate pedestrian detection. In ECCV, 2012. 1, 2, 5 P. Doll a´r, S. Belongie, and P. Perona. The fastest pedestrian detector in the west. In BMVC, 2010. 5 P. Doll a´r, Z. Tu, P. Perona, and S. Belongie. Integral channel features. In BMVC, 2009. 1, 2, 5 P. Doll a´r, C. Wojek, B. Schiele, and P. Perona. Pedestrian detection: an evaluation of the state of the art. IEEE Trans. PAMI, 34(4):743 – 761, 2012. 1, 5 M. Enzweiler, A. Eigenstetter, B. Schiele, and D. M. Gavrila. Multi-cue pedestrian classi?cation with partial occlusion handling. In CVPR, 2010. 1, 2 D. Erhan, Y. Bengio, A.Courville, and P. Vincent. Visualizing higher-layer features of deep networks. Technical report, University of Montreal, 2009. 3, 4 A. Ess, B. Leibe, and L. V. Gool. Depth and appearance for mobile scene analysis. In ICCV, 2007. 2, 5 C. Farabet, C. Couprie, L. Najman, and Y. LeCun. Learning hierarchical features for scene labeling. IEEE Trans. PAMI, 30: 1915–1929, 2013. 2 2062

[17] P. Felzenszwalb, R. B. Grishick, D.McAllister, and D. Ramanan. Object detection with discriminatively trained part based models. IEEE Trans. PAMI, 32: 1627–1645, 2010. 1, 2, 4, 5, 6, 7

[18] P. F. Felzenszwalb and D. P. Huttenlocher. Pictorial structures for object recognition. IJCV, 61:55–79, 2005. 2

[19] T. Gao, B. Packer, and D. Koller. A segmentation-aware object detection model with occlusion handling. In CVPR, 2011. 1

[20] G. E. Hinton, S. Osindero, and Y. Teh. A fast learning algorithm for deep belief nets. Neural Computation, 18: 1527– 1554, 2006. 2

[21] G. E. Hinton and R. R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, 3 13(5786):504 507, July 2006. 2

[22] D. Hoiem, A. A. Efros, and M. Hebert. Putting objects in perspective. In CVPR, 2006. 2

[23] K. Jarrett, K. Kavukcuoglu, M. Ranzato, and Y. LeCun. What is the best multi-stage architecture for object recognition? In CVPR, 2009. 2

[24] A. Krizhevsky, I. Sutskever, and G. Hinton. Imagenet classi?cation with deep convolutional neural networks. In NIPS, 2012. 2, 3, 5, 7

[25] Q. V. Le, M. Ranzato, R. Monga, M. Devin, K. Chen, G. S. Corrado, J. Dean, and A. Y. Ng. Building high-level features using large scale unsupervised learning. In ICML, 2012. 2, 3

[26] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradientbased learning applied to document recognition. Proceedings of the IEEE, 86(1 1):2278–2324, 1998. 2, 3

[27] B. Leibe, E. Seemann, and B. Schiele. Pedestrian detection in crowded scenes. In CVPR, 2005. 2

[28] Z. Lin and L. Davis. A pose-invariant descriptor for human detection and segmentation. In ECCV, 2008. 5

[29] D. Lowe. Distinctive image features from scale-invarian keypoints. IJCV, 60(2):91–1 10, 2004. 1

[30] P. Luo, X. Wang, and X. Tang. Hierarchical face parsing via – [3 1]

[32]

[33]

[34]

[35]

[36]

[37]

[38]

[39]

[40]

[41]

[42] deep learning. In CVPR, 2012. 2 S. Maji, A. C. Berg, and J. Malik. Classi?cation using intersection kernel support vector machines is ef?cient. In CVPR, 2008. 1, 2, 5 K. Mikolajczyk, B. Leibe, and B. Schiele. Multiple object class detection with a generative model. In CVPR, 2006. 2 M. Norouzi, M. Ranjbar, and G. Mori. Stacks of convolutional restricted boltzmann machines for shift-invariant feature learning. In CVPR, 2009. 2 W. Ouyang and X. Wang. A discriminative deep model for pedestrian detection with occlusion handling. In CVPR, 2012. 1, 2, 5, 6, 7 W. Ouyang and X. Wang. Single-pedestrian detection aided by multi-pedestrian detection. In CVPR, 2013. 2 W. Ouyang, X. Zeng, and X. Wang. Modeling mutual visibility relationship in pedestrian detection. In CVPR, 2013. 1 D. Park, D. Ramanan, and C. Fowlkes. Multiresolution models for object detection. In ECCV, 2010. 2, 5, 6 H. Poon and P. Domingos. Sum-product networks: A new deep architecture. In UAI, 201 1. 2 D. Ramanan. Learning to parse images of articulated bodies. In NIPS, 2007. 4 M. Ranzato, F. J. Huang, Y.-L. Boureau, and Y. Lecun. Unsupervised learning of invariant feature hierarchies with applications to object recognition. In CVPR, 2007. 2 M. Ranzato, F.-J. Huang, Y.-L. Boureau, and Y. LeCun. Unsupervised learning of invariant feature hierarchies with applications to object recognition. In CVPR, 2007. 7 P. Sabzmeydani and G. Mori. Detecting pedestrians by learn-

[43]

[44]

[45]

[46]

[47]

[48]

[49]

[50] [5 1]

[52]

[53]

[54] ing shapelet features. In CVPR, 2007. 5 W. Schwartz, A. Kembhavi, D. Harwood, and L. Davis. Human detection using partial least squares analysis. In ICCV, 2009. 2, 5 P. Sermanet, K. Kavukcuoglu, S. Chintala, and Y. Lecun. Pedestrian detection with unsupervised and multi-stage feature learning. In CVPR, 2013. 2, 5, 6, 7 V. D. Shet, J. Neumann, V. Ramesh, and L. S. Davis. Bilattice-based logical reasoning for human detection. In CVPR, 2007. 2 Y. Sun, X. Wang, and X. Tang. Hybrid deep learning for computing face similarities. In ICCV, 2013. 2 O. Tuzel, F. Porikli, and P. Meer. Pedestrian detection via classi?cation on riemannian manifolds. IEEE Trans. PAMI, 30(10): 1713–1727, Oct. 2008. 1, 2 A. Vedaldi, V. Gulshan, M. Varma, and A. Zisserman. Multiple kernels for object detection. In ICCV, 2009. 2 P. Viola, M. J. Jones, and D. Snow. Detecting pedestrians using patterns of motion and appearance. IJCV, 63(2): 153– 161, 2005. 1, 2, 5 S. Walk, N. Majer, K. Schindler, and B. Schiele. New features and insights for pedestrian detection. In CVPR, 2010. 2, 5 X. Wang, X. Han, and S. Yan. An hog-lbp human detector with partial occlusion handling. In CVPR, 2009. 1, 2, 5 C. Wojek and B. Schiele. A performance evaluation of single and multi-feature people detection. In DAGM, 2008. 5 B. Wu and R. Nevatia. Detection of multiple, partially occluded humans in a single image by bayesian combination of edgelet part detectors. In ICCV, 2005. 2 T. Wu and S. Zhu. A numeric study of the bottom-up

[55]

[56]

[57]

[58] and top-down inference processes in and-or graphs. IJCV, 93(2):226–252, Jun. 201 1. 2 Y. Yang and D. Ramanan. Articulated pose estimation with ?exible mixtures-of-parts. In CVPR, 201 1. 2 M. D. Zeiler, G. W. Taylor, and R. Fergus. Adaptive deconvolutional networks for mid and high level feature learning. In ICCV, 2011. 2 X. Zeng, W. Ouyang, and X. Wang. Multi-stage contextual deep learning for pedestrian detection. In ICCV, 2013. 2 L. Zhu, Y. Chen, A. Yuille, and W. Freeman. Latent hierarchical structural learning for object detection. In CVPR, 2010. 1, 2 2063