nips nips2012 nips2012-106 nips2012-106-reference knowledge-graph by maker-knowledge-mining

106 nips-2012-Dynamical And-Or Graph Learning for Object Shape Modeling and Detection

Source: pdf

Author: Xiaolong Wang, Liang Lin

Abstract: This paper studies a novel discriminative part-based model to represent and recognize object shapes with an “And-Or graph”. We deﬁne this model consisting of three layers: the leaf-nodes with collaborative edges for localizing local parts, the or-nodes specifying the switch of leaf-nodes, and the root-node encoding the global veriﬁcation. A discriminative learning algorithm, extended from the CCCP [23], is proposed to train the model in a dynamical manner: the model structure (e.g., the conﬁguration of the leaf-nodes associated with the or-nodes) is automatically determined with optimizing the multi-layer parameters during the iteration. The advantages of our method are two-fold. (i) The And-Or graph model enables us to handle well large intra-class variance and background clutters for object shape detection from images. (ii) The proposed learning algorithm is able to obtain the And-Or graph representation without requiring elaborate supervision and initialization. We validate the proposed method on several challenging databases (e.g., INRIA-Horse, ETHZ-Shape, and UIUC-People), and it outperforms the state-of-the-arts approaches. 1

reference text

[1] Y. Altun, I. Tsochantaridis, and T. Hofmann, Hidden markov support vector machines, In ICML, 2003. 7

[2] M. Andriluka, S. Roth, and B. Schiele, Pictorial structures revisited: People detection and articulated pose estimation, In CVPR, 2009. 7, 8

[3] S. Belongie, J. Malik, and J. Puzicha, Shape Matching and Object Recognition using Shape Contexts, IEEE TPAMI, 24(1): 705-522, 2002. 3

[4] L. Bourdev, S. Maji, T. Brox, and J. Malik, Detecting people using mutually consistent poselet activations, In ECCV, 2010. 7, 8

[5] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan, Object Detection with Discriminatively Trained Part-based Models, IEEE TPAMI, 2010. 1, 2, 7, 8

[6] V. Ferrari, F. Jurie, and C. Schmid, From Images to Shape Models for Object Detection, Int’l J. of Computer Vision, 2009. 2, 8

[7] V. Ferrari, L. Fevrier, F. Jerie, and C. Schmid, Groups of Adjacent Contour Segments for Object Detection, IEEE TPAMI, 30(1): 36-51, 2008. 7, 8

[8] N. Kambhatla and T. K. Leen, Dimension Reduction by Local Principal Component Analysis, Neural Computation, 9: 1493-1516, 1997. 5

[9] C. Lu, L. J. Latecki, N. Adluru, X. Yang, and H. Ling, Shape Guided Contour Grouping with Particle Filters, In ICCV, 2009. 2, 8

[10] T. Ma and L. J. Latecki, From Partial Shape Matching through Local Deformation to Robust Global Shape Similarity for Object Detection, In CVPR, 2011. 2, 8

[11] S. Maji and J. Malik, Object Detection using a Max-Margin Hough Transform, In CVPR, 2009. 2, 8

[12] D. R. Martin, C. C. Fowlkes, and J. Malik, Learning to detect natural image boundaries using local brightness, color, and texture cues, IEEET PAMI, 26(5): 530-549, 2004. 7

[13] J. C. Platt, Using analytic qp and sparseness to speed training of support vector machines, In Advances in Neural Information Processing Systems, pages 557-563, 1998. 7

[14] P. Schnitzspan, M. Fritz, S. Roth, and B. Schiele, Discriminative structure learning of hierarchical representations for object detection, In CVPR, 2009. 2

[15] Z. Song, Q. Chen, Z. Huang, Y. Hua, and S. Yan, Contextualizing Object Detection and Classiﬁcation, In CVPR, 2010. 2

[16] P. Srinivasan, Q. Zhu, and J. Shi, Many-to-one Contour Matching for Describing and Discriminating Object Shape, In CVPR, 2010. 2, 8

[17] D. Tran and D. Forsyth, Improved human parsing with a full relational model, In ECCV, 2010. 7

[18] Y. Wang, D. Tran, and Z. Liao, Learning Hierarchical Poselets for Human Parsing, In CVPR, 2011. 2, 7, 8

[19] X. Yang and L. J. Latecki, Weakly Supervised Shape Based Object Detection with Particle Filter, In ECCV, 2010. 2

[20] B. Yao, A. Khosla, and L. Fei-Fei, Classifying Actions and Measuring Action Similarity by Modeling the Mutual Context of Objects and Human Poses, In ICML, 2011. 2

[21] P. Yarlagadda, A. Monroy and B. Ommer, Voting by Grouping Dependent Parts, In ECCV, 2010. 8

[22] C.-N. J. Yu and T. Joachims, Learning structural svms with latent variables, In ICML, 2009. 2, 4, 5

[23] A. Yuille and A. Rangarajan, The concave-convex procedure(cccp), In NIPS, pages 1033-1040, 2001. 1, 2, 5

[24] Y.B. Zhao and S.C. Zhu, Image Parsing via Stochastic Scene Grammar, In NIPS, 2011. 2

[25] L. Zhu, Y. Chen, A. Yuille, and W. Freeman, Latent Hierarchical Structural Learning for Object Detection, In CVPR, 2010. 1, 2, 5

[26] L. Zhu, Y. Chen, Y. Lu, C. Lin, and A. Yuille, Max Margin AND/OR Graph Learning for Parsing the Human Body, In CVPR, 2008. 1, 2

[27] S.C. Zhu and D. Mumford, A stochastic grammar of images, Foundations and Trends in Computer Graphics and Vision, 2(4): 259-362, 2006. 1, 2 9