nips nips2005 nips2005-110 nips2005-110-reference knowledge-graph by maker-knowledge-mining

110 nips-2005-Learning Depth from Single Monocular Images

Source: pdf

Author: Ashutosh Saxena, Sung H. Chung, Andrew Y. Ng

Abstract: We consider the task of depth estimation from a single monocular image. We take a supervised learning approach to this problem, in which we begin by collecting a training set of monocular images (of unstructured outdoor environments which include forests, trees, buildings, etc.) and their corresponding ground-truth depthmaps. Then, we apply supervised learning to predict the depthmap as a function of the image. Depth estimation is a challenging problem, since local features alone are insufﬁcient to estimate depth at a point, and one needs to consider the global context of the image. Our model uses a discriminatively-trained Markov Random Field (MRF) that incorporates multiscale local- and global-image features, and models both depths at individual points as well as the relation between depths at different points. We show that, even on unstructured scenes, our algorithm is frequently able to recover fairly accurate depthmaps. 1

reference text

[1] D. Scharstein and R. Szeliski. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. Int’l Journal of Computer Vision, 47:7–42, 2002.

[2] David A. Forsyth and Jean Ponce. Computer Vision : A Modern Approach. Prentice Hall, 2003.

[3] S. Das and N. Ahuja. Performance analysis of stereo, vergence, and focus as depth cues for active vision. IEEE Trans Pattern Analysis & Machine Intelligence, 17:1213–1219, 1995.

[4] J. Michels, A. Saxena, and A.Y. Ng. High speed obstacle avoidance using monocular vision and reinforcement learning. In ICML, 2005.

[5] T. Nagai, T. Naruse, M. Ikehara, and A. Kurematsu. Hmm-based surface reconstruction from single images. In Proc IEEE Int’l Conf Image Processing, volume 2, 2002.

[6] G. Gini and A. Marchi. Indoor robot navigation with single camera vision. In PRIS, 2002.

[7] M. Shao, T. Simchony, and R. Chellappa. New algorithms from reconstruction of a 3-d depth map from one or more images. In Proc IEEE CVPR, 1988.

[8] J. Lafferty, A. McCallum, and F. Pereira. Discriminative ﬁelds for modeling spatial dependencies in natural images. In ICML, 2001.

[9] K. Murphy, A. Torralba, and W.T. Freeman. Using the forest to see the trees: A graphical model relating features, objects, and scenes. In NIPS 16, 2003.

[10] Xuming He, Richard S. Zemel, and Miguel A. Carreira-Perpinan. Multiscale conditional random ﬁelds for image labeling. In proc. CVPR, 2004.

[11] S. Kumar and M. Hebert. Discriminative ﬁelds for modeling spatial dependencies in natural images. In NIPS 16, 2003.

[12] J.M. Loomis. Looking down is looking up. Nature News and Views, 414:155–156, 2001.

[13] B. Wu, T.L. Ooi, and Z.J. He. Perceiving distance accurately by a directional process of integrating ground information. Letters to Nature, 428:73–77, 2004.

[14] P. Sinha I. Blthoff, H. Blthoff. Top-down inﬂuences on stereoscopic depth-perception. Nature Neuroscience, 1:254–257, 1998.

[15] E.R. Davies. Laws’ texture energy in TEXTURE. In Machine Vision: Theory, Algorithms, Practicalities 2nd Edition. Academic Press, San Diego, 1997.

[16] A.S. Willsky. Multiresolution markov models for signal and image processing. IEEE, 2002.