nips nips2011 nips2011-193 nips2011-193-reference knowledge-graph by maker-knowledge-mining

193 nips-2011-Object Detection with Grammar Models


Source: pdf

Author: Ross B. Girshick, Pedro F. Felzenszwalb, David A. McAllester

Abstract: Compositional models provide an elegant formalism for representing the visual appearance of highly variable objects. While such models are appealing from a theoretical point of view, it has been difficult to demonstrate that they lead to performance advantages on challenging datasets. Here we develop a grammar model for person detection and show that it outperforms previous high-performance systems on the PASCAL benchmark. Our model represents people using a hierarchy of deformable parts, variable structure and an explicit model of occlusion for partially visible objects. To train the model, we introduce a new discriminative framework for learning structured prediction models from weakly-labeled data. 1


reference text

[1] M. Blaschko and C. Lampert. Learning to localize objects with structured output regression. In ECCV, 2008.

[2] M. Blaschko, A. Vedaldi, and A. Zisserman. Simultaneous object detection and ranking with weak supervision. In NIPS, 2010.

[3] L. Bourdev, S. Maji, T. Brox, and J. Malik. Detecting people using mutually consistent poselet activations. In ECCV, 2010.

[4] R. Collobert, F. Sinz, J. Weston, and L. Bottou. Trading convexity for scalability. In ICML, 2006.

[5] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In CVPR, 2005.

[6] C. Do, Q. Le, C. Teo, O. Chapelle, and A. Smola. Tighter bounds for structured estimation. In NIPS, 2008.

[7] M. Enzweiler, A. Eigenstetter, B. Schiele, and D. M. Gavrila. Multi-cue pedestrian classification with partial occlusion handling. In CVPR, 2010.

[8] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results. http://www.pascalnetwork.org/challenges/VOC/voc2007/workshop/index.html.

[9] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The PASCAL Visual Object Classes Challenge 2010 (VOC2010) Results. http://www.pascalnetwork.org/challenges/VOC/voc2010/workshop/index.html.

[10] P. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan. Object detection with discriminatively trained part based models. PAMI, 2009.

[11] P. Felzenszwalb and D. McAllester. Object detection grammars. Univerity of Chicago, CS Dept., Tech. Rep. 2010-02.

[12] P. F. Felzenszwalb, R. B. Girshick, and D. McAllester. Discriminatively trained deformable part models, release 4. http://people.cs.uchicago.edu/˜pff/latent-release4/.

[13] Y. Jin and S. Geman. Context and hierarchy in a probabilistic image model. In CVPR, 2006.

[14] D. McAllester and J. Keshet. Generalization bounds and consistency for latent structural probit and ramp loss. In NIPS, 2011.

[15] Y. Ohta, T. Kanade, and T. Sakai. An analysis system for scenes containing objects with substructures. In ICPR, 1978.

[16] B. Taskar, C. Guestrin, and D. Koller. Max-margin markov networks. In NIPS, 2003.

[17] I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun. Large margin methods for structured and interdependent output variables. JMLR, 2006.

[18] X. Wang, T. Han, and S. Yan. An hog-lbp human detector with partial occlusion handling. In ICCV, 2009.

[19] C.-N. J. Yu and T. Joachims. Learning structural svms with latent variables. In ICML, 2009.

[20] A. Yuille and A. Rangarajan. The concave-convex procedure. Neural Computation, 2003.

[21] L. Zhu, Y. Chen, A. Torralba, W. Freeman, and A. Yuille. Part and appearance sharing: Recursive compositional models for multi-view multi-object detection. In CVPR, 2010.

[22] L. Zhu, Y. Chen, and A. Yuille. Unsupervised learning of probabilistic grammar-markov models for object categories. PAMI, 2009.

[23] L. Zhu, Y. Chen, A. Yuille, and W. Freeman. Latent hierarchical structural learning for object detection. In CVPR, 2010.

[24] S. Zhu and D. Mumford. A stochastic grammar of images. Foundations and Trends in Computer Graphics and Vision, 2006. 9