nips nips2009 nips2009-236 nips2009-236-reference knowledge-graph by maker-knowledge-mining

236 nips-2009-Structured output regression for detection with partial truncation

Source: pdf

Author: Andrea Vedaldi, Andrew Zisserman

Abstract: We develop a structured output model for object category detection that explicitly accounts for alignment, multiple aspects and partial truncation in both training and inference. The model is formulated as large margin learning with latent variables and slack rescaling, and both training and inference are computationally efﬁcient. We make the following contributions: (i) we note that extending the Structured Output Regression formulation of Blaschko and Lampert [1] to include a bias term signiﬁcantly improves performance; (ii) that alignment (to account for small rotations and anisotropic scalings) can be included as a latent variable and efﬁciently determined and implemented; (iii) that the latent variable extends to multiple aspects (e.g. left facing, right facing, front) with the same formulation; and (iv), most signiﬁcantly for performance, that truncated and truncated instances can be included in both training and inference with an explicit truncation mask. We demonstrate the method by training and testing on the PASCAL VOC 2007 data set – training includes the truncated examples, and in testing object instances are detected at multiple scales, alignments, and with signiﬁcant truncations. 1

reference text

[1] M. B. Blaschko and C. H. Lampert. Learning to localize objects with structured output regression. In Proc. ECCV, 2008.

[2] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In Proc. CVPR, 2005.

[3] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The PASCAL Visual Object Classes Challenge 2008 (VOC2008) Results. http://www. pascal-network.org/challenges/VOC/voc2008/workshop/index.html, 2008.

[4] P. F. Felzenszwalb, R. B. Grishick, D. McAllister, and D. Ramanan. Object detection with discriminatively trained part based models. PAMI, 2009.

[5] R. Fergus, P. Perona, and A. Zisserman. Object class recognition by unsupervised scaleinvariant learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, volume 2, pages 264–271, June 2003.

[6] K. Hotta. Robust face detection under partial occlusion. In Proceedings of the IEEE International Conference on Image Processing, 2004.

[7] Y. Y. Lin, T. L. Liu, and C. S. Fuh. Fast object detection with occlusions. In Proceedings of the European Conference on Computer Vision, pages 402–413. Springer-Verlag, May 2004.

[8] P. Schnitzspan, M. Fritz, S. Roth, and B. Schiele. Discriminative structure learning of hierarchical representations for object detection. In Proc. CVPR, 2009.

[9] I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun. Support vector machine learning for interdependent and structured output spaces. In Proc. ICML, 2004.

[10] A. Vedaldi, V. Gulshan, M. Varma, and A. Zisserman. Multiple kernels for object detection. In Proc. ICCV, 2009.

[11] O. Williams, A. Blake, and R. Cipolla. The variational ising classiﬁer (VIC) algorithm for coherently contaminated data. In Proc. NIPS, 2005.

[12] J. Winn and J. Shotton. The Layout Consistent Random Field for Recognizing and Segmenting Partially Occluded Objects. In Proc. CVPR, 2006.

[13] C.-N. J. Yu and T. Joachims. Learning structural SVMs with latent variables. In Proc. ICML, 2009. 9