nips nips2009 nips2009-133 nips2009-133-reference knowledge-graph by maker-knowledge-mining

133 nips-2009-Learning models of object structure

Source: pdf

Author: Joseph Schlecht, Kobus Barnard

Abstract: We present an approach for learning stochastic geometric models of object categories from single view images. We focus here on models expressible as a spatially contiguous assemblage of blocks. Model topologies are learned across groups of images, and one or more such topologies is linked to an object category (e.g. chairs). Fitting learned topologies to an image can be used to identify the object class, as well as detail its geometry. The latter goes beyond labeling objects, as it provides the geometric structure of particular instances. We learn the models using joint statistical inference over category parameters, camera parameters, and instance parameters. These produce an image likelihood through a statistical imaging model. We use trans-dimensional sampling to explore topology hypotheses, and alternate between Metropolis-Hastings and stochastic dynamics to explore instance parameters. Experiments on images of furniture objects such as tables and chairs suggest that this is an effective approach for learning models that encode simple representations of category geometry and the statistics thereof, and support inferring both category and geometry on held out single view images. 1

reference text

[1] C. Andrieu, N. de Freitas, A. Doucet, and M. I. Jordan. An introduction to MCMC for machine learning. Machine Learning, 50(1):5–43, 2003.

[2] I. Biederman. Recognition-by-components: A theory of human image understanding. Psychological Review, 94(2):115–147, April 1987.

[3] M. B. Clowes. On seeing things. Artiﬁcial Intelligence, 2(1):79–116, 1971.

[4] D. Crandall and D. Huttenlocher. Weakly-supervised learning of part-based spatial models for visual object recognition. In 9th European Conference on Computer Vision, 2006.

[5] L. Fei-Fei, R. Fergus, and P. Perona. Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. In Workshop on Generative-Model Based Vision, 2004.

[6] R. Fergus, P. Perona, and A. Zisserman. Object class recognition by unsupervised scale-invariant learning. In IEEE Conference on Computer Vision and Pattern Recognition, 2003.

[7] P. J. Green. Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika, 82(4):711–732, 1995.

[8] P. J. Green. Trans-dimensional markov chain monte carlo. In Highly Structured Stochastic Systems. 2003.

[9] D. Hoiem, C. Rother, and J. Winn. 3d layoutcrf for multi-view object class recognition and segmentation. In CVPR, 2007.

[10] D. Huttenlocher and S. Ullman. Recognizing solid objects by alignment with an image. IJCV, 5(2):195– 212, 1990.

[11] C. Kemp and J. B. Tenenbaum. The discovery of structural form. Proceedings of the National Academy of Sciences, 105(31):10687–10692, 2008.

[12] A. Kushal, C. Schmid, and J. Ponce. Flexible object models for category-level 3d object recognition. In CVPR, 2007.

[13] M. Leordeanu, M. Hebert, and R. Sukthankar. Beyond local appearance: Category recognition from pairwise interactions of simple features. In CVPR, 2007.

[14] J. S. Liu. Monte Carlo Strategies in Scientiﬁc Computing. Springer-Verlag, 2001.

[15] D. G. Lowe. Fitting parameterized three-dimensional models to images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 13(5):441–450, 1991.

[16] D. G. Lowe. Distinctive image features from scale-invariant keypoint. International Journal of Computer Vision, 60(2):91–110, 2004.

[17] G. Mori and J. Malik. Recovering 3d human body conﬁgurations using shape contexts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006.

[18] R. M. Neal. Probabilistic inference using Markov chain Monte Carlo methods. Technical Report CRGTR-93-1, University of Toronto, 1993.

[19] S. Savarese and L. Fei-Fei. 3d generic object categorization, localization and pose estimation. In IEEE Intern. Conf. in Computer Vision (ICCV), 2007.

[20] S. Savarese and L. Fei-Fei. View synthesis for recognizing unseen poses of object classes. In European Conference on Computer Vision (ECCV), 2008.

[21] J. Schlecht and K. Barnard. Learning models of object structure. Technical report, University of Arizona, 2009.

[22] C. Sminchisescu. Kinematic jump processes for monocular 3d human tracking. In Computer vision and pattern recognition, 2003.

[23] C. Sminchisescu and B. Triggs. Estimating articulated human motion with covariance scaled sampling. International Journal of Robotics Research, 22(6):371–393, 2003.

[24] E. B. Sudderth, A. Torralba, W. T. Freeman, and A. S. Willsky. Learning hierarchical models of scenes, objects, and parts. In ICCV, 2005.

[25] K. Sugihara. A necessary and sufﬁcient condition for a picture to represent a polyhedral scene. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6(5):578–586, September 1984.

[26] J. B. Tenenbaum, T. L. Grifﬁths, and C. Kemp. Theory-based bayesian models of inductive learning and reasoning. Trends in Cognitive Sciences, 10(7):309–318, 2006.

[27] Z. Tu and S.-C. Zhu. Image segmentation by data-driven markov chain monte-carlo. IEEE Trans. Patt. Analy. Mach. Intell., 24(5):657–673, 2002.

[28] P. H. Winston. Learning structural descriptions from examples. In P. H. Winston, editor, The psychology of computer vision, pages 157–209. McGraw-Hill, 1975.

[29] L. Zhu, Y. Chen, and A. Yuille. Unsupervised learning of a probabilistic grammar for object detection and parsing. In NIPS, 2006.

[30] S. Zhu and D. Mumford. A stochastic grammar of images. Foundations and Trends in Computer Graphics and Vision, 4(2):259–362, 2006. 9