nips nips2008 nips2008-191 nips2008-191-reference knowledge-graph by maker-knowledge-mining

191 nips-2008-Recursive Segmentation and Recognition Templates for 2D Parsing

Source: pdf

Author: Leo Zhu, Yuanhao Chen, Yuan Lin, Chenxi Lin, Alan L. Yuille

Abstract: Language and image understanding are two major goals of artiﬁcial intelligence which can both be conceptually formulated in terms of parsing the input signal into a hierarchical representation. Natural language researchers have made great progress by exploiting the 1D structure of language to design efﬁcient polynomialtime parsing algorithms. By contrast, the two-dimensional nature of images makes it much harder to design efﬁcient image parsers and the form of the hierarchical representations is also unclear. Attempts to adapt representations and algorithms from natural language have only been partially successful. In this paper, we propose a Hierarchical Image Model (HIM) for 2D image parsing which outputs image segmentation and object recognition. This HIM is represented by recursive segmentation and recognition templates in multiple layers and has advantages for representation, inference, and learning. Firstly, the HIM has a coarse-to-ﬁne representation which is capable of capturing long-range dependency and exploiting different levels of contextual information. Secondly, the structure of the HIM allows us to design a rapid inference algorithm, based on dynamic programming, which enables us to parse the image rapidly in polynomial time. Thirdly, we can learn the HIM efﬁciently in a discriminative manner from a labeled dataset. We demonstrate that HIM outperforms other state-of-the-art methods by evaluation on the challenging public MSRC image dataset. Finally, we sketch how the HIM architecture can be extended to model more complex image phenomena. 1

reference text

[1] F. Jelinek and J. D. Lafferty, “Computation of the probability of initial substring generation by stochastic context-free grammars,” Computational Linguistics, vol. 17, no. 3, pp. 315–323, 1991.

[2] M. Collins, “Head-driven statistical models for natural language parsing,” Ph.D. Thesis, University of Pennsylvania, 1999.

[3] K. Lari and S. J. Young, “The estimation of stochastic context-free grammars using the inside-outside algorithm,” in Computer Speech and Languag, 1990.

[4] M. Shilman, P. Liang, and P. A. Viola, “Learning non-generative grammatical models for document analysis,” in Proceedings of IEEE International Conference on Computer Vision, 2005, pp. 962–969.

[5] Z. Tu and S. C. Zhu, “Image segmentation by data-driven markov chain monte carlo,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 5, pp. 657–673, 2002.

[6] Z. Tu, X. Chen, A. L. Yuille, and S. C. Zhu, “Image parsing: Unifying segmentation, detection, and recognition,” in Proceedings of IEEE International Conference on Computer Vision, 2003, pp. 18–25.

[7] J. D. Lafferty, A. McCallum, and F. C. N. Pereira, “Conditional random ﬁelds: Probabilistic models for segmenting and labeling sequence data,” in Proceedings of International Conference on Machine Learning, 2001, pp. 282–289.

[8] J. Shotton, J. M. Winn, C. Rother, and A. Criminisi, “TextonBoost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation,” in Proceedings of European Conference on Computer Vision, 2006, pp. 1–15.

[9] M. Collins, “Discriminative training methods for hidden markov models: theory and experiments with perceptron algorithms,” in Proceedings of Annual Meeting on Association for Computational Linguistics conference on Empirical methods in natural language processing, 2002, pp. 1–8. ´

[10] X. He, R. S. Zemel, and M. A. Carreira-Perpi˜ an, “Multiscale conditional random ﬁelds for image labeling,” in Proceedings of IEEE n´ Computer Society Conference on Computer Vision and Pattern Recognition, 2004, pp. 695–702.

[11] S. Kumar and M. Hebert, “A hierarchical ﬁeld framework for uniﬁed context-based classiﬁcation,” in Proceedings of IEEE International Conference on Computer Vision, 2005, pp. 1284–1291.

[12] E. L. Allwein, R. E. Schapire, and Y. Singer, “Reducing multiclass to binary: A unifying approach for margin classiﬁers,” Journal of Machine Learning Research, vol. 1, pp. 113–141, 2000.

[13] Y. Boykov and M.-P. Jolly, “Interactive graph cuts for optimal boundary and region segmentation of objects in n-d images,” in Proceedings of IEEE International Conference on Computer Vision, 2001, pp. 105–112.

[14] A. Oliva and A. Torralba, “Building the gist of a scene: the role of global image features in recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 155, pp. 23–36, 2006.

[15] A. Levin and Y. Weiss, “Learning to combine bottom-up and top-down segmentation,” in Proceedings of European Conference on Computer Vision, 2006, pp. 581–594.

[16] E. B. Sudderth, A. B. Torralba, W. T. Freeman, and A. S. Willsky, “Learning hierarchical models of scenes, objects, and parts,” in Proceedings of IEEE International Conference on Computer Vision, 2005, pp. 1331–1338.

[17] Y. Chen, L. Zhu, C. Lin, A. L. Yuille, and H. Zhang, “Rapid inference on a novel and/or graph for object detection, segmentation and parsing,” in Advances in Neural Information Processing Systems, 2007.

[18] B. Taskar, D. Klein, M. Collins, D. Koller, and C. Manning, “Max-margin parsing,” in Proceedings of Annual Meeting on Association for Computational Linguistics conference on Empirical methods in natural language processing, 2004.

[19] L. Zhu, Y. Chen, X. Ye, and A. L. Yuille, “Structure-perceptron learning of a hierarchical log-linear model,” in Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2008.

[20] J. Verbeek and B. Triggs, “Region classiﬁcation with markov ﬁeld aspect models,” in Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2007.

[21] Z. Tu, “Auto-context and its application to high-level vision tasks,” in Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2008.

[22] J. Verbeek and B. Triggs, “Scene segmentation with crfs learned from partially labeled images,” in Advances in Neural Information Processing Systems, vol. 20, 2008. 8