cvpr cvpr2013 cvpr2013-279 cvpr2013-279-reference knowledge-graph by maker-knowledge-mining

279 cvpr-2013-Manhattan Scene Understanding via XSlit Imaging


Source: pdf

Author: Jinwei Ye, Yu Ji, Jingyi Yu

Abstract: A Manhattan World (MW) [3] is composed of planar surfaces and parallel lines aligned with three mutually orthogonal principal axes. Traditional MW understanding algorithms rely on geometry priors such as the vanishing points and reference (ground) planes for grouping coplanar structures. In this paper, we present a novel single-image MW reconstruction algorithm from the perspective of nonpinhole cameras. We show that by acquiring the MW using an XSlit camera, we can instantly resolve coplanarity ambiguities. Specifically, we prove that parallel 3D lines map to 2D curves in an XSlit image and they converge at an XSlit Vanishing Point (XVP). In addition, if the lines are coplanar, their curved images will intersect at a second common pixel that we call Coplanar Common Point (CCP). CCP is a unique image feature in XSlit cameras that does not exist in pinholes. We present a comprehensive theory to analyze XVPs and CCPs in a MW scene and study how to recover 3D geometry in a complex MW scene from XVPs and CCPs. Finally, we build a prototype XSlit camera by using two layers of cylindrical lenses. Experimental results × on both synthetic and real data show that our new XSlitcamera-based solution provides an effective and reliable solution for MW understanding.


reference text

[1] O. Barinova, V. Konushin, A. Yakubenko, K. Lee, H. Lim, and A. Konushin. Fast automatic single-view 3D reconstruction of urban scenes. In ECCV, 2008.

[2] V. Caglioti and S. Gasparini. On the localization of straight lines in 3D space from single 2D images. In CVPR, 2005.

[3] J. M. Coughlan and A. L. Yuille. Manhattan world: Compass direction from a single image by bayesian inference. In ICCV, 1999.

[4] A. Criminisi, I. Reid, and A. Zisserman. Single view metrology. IJCV, 40(2):123–148, Nov. 2000.

[5] E. Delage, H. Lee, and A. Y. Ng. Automatic single-image 3D reconstructions of indoor manhattan world scenes. In ISRR, 2005.

[6] E. Delage, H. Lee, and A. Y. Ng. A dynamic bayesian network model for autonomous 3D reconstruction from a single indoor image. In CVPR, 2006.

[7] Y. Ding, J. Yu, and P. Sturm. Recovering specular surfaces using curved line images. In CVPR, 2009.

[8] D. Feldman, T. Pajdla, and D. Weinshall. On the epipolar geometry of the crossed-slits projection. In ICCV, 2003.

[9] A. Flint, C. Mei, D. Murray, and I. Reid. A dynamic programming approach to reconstructing building interiors. In ECCV, 2010.

[10] Y. Furukawa, B. Curless, S. M. Seitz, and R. Szeliski. Manhattanworld stereo. In CVPR, 2009.

[11] Y. Furukawa, B. Curless, S. M. Seitz, and R. Szeliski. Reconstructing building interiors from images. In ICCV, 2009.

[12] R. Hartley and A. Zisserman. Multiple View Geometry in Computer Vision. Cambridge University Press, NY, USA, second edition, 2003.

[13] D. Hoiem, A. Efros, and M. Hebert. Geometric context from a single image. In ICCV, 2005.

[14] D. Hoiem, A. A. Efros, and M. Hebert. Automatic photo pop-up. In ACM SIGGRAPH, 2005.

[15] J. Kosecka and W. Zhang. Video compass. In ECCV, 2002.

[16] D. C. Lee, M. Hebert, and T. Kanade. Geometric reasoning for single image structure recovery. In CVPR, 2009.

[17] M. Levoy and P. Hanrahan. Light field rendering. In ACM SIGGRAPH, pages 31–42, 1996.

[18] M. Meingast, C. Geyer, and S. Sastry. Geometric models of rollingshutter cameras. CoRR, 2005. http://arxiv.org/abs/cs/0503076.

[19] H. Nagahara, C. Zhou, T. Watanabe, H. Ishiguro, and S. K. Nayar. Programmable aperture camera using LCoS. In ECCV, 2010.

[20] T. Pajdla. Epipolar geometry of some non-classical cameras. In Proc. of Computer Vision Winter Workshop, Slovenian Pattern Recognition Society, pages 223–233, 2001.

[21] J. Ponce. What is a camera? In CVPR, 2009.

[22] A. Saxena, S. H. Chung, and A. Y. Ng. Learning depth from single monocular images. In NIPS. 2005.

[23] A. Saxena, M. Sun, and A. Y. Ng. Make3D: Learning 3D scene structure from a single still image. IEEE TPAMI, 31(5):824 –840, May 2009.

[24] G. Schindler and F. Dellaert. Atlanta world: an expectation maximization framework for simultaneous low-level edge grouping and camera calibration in complex man-made environments. In CVPR, 2004.

[25] S. M. Seitz and J. Kim. The space of all stereo images. IJCV, 48(1):21–38, June 2002.

[26] R. Swaminathan, A. Wu, and H. Dong. Depth from distortions. In OMNIVIS, 2008.

[27] J. Yu and L. McMillan. General linear cameras. In ECCV, 2004.

[28] J. Yu, L. McMillan, and P. Sturm. Multi-perspective modelling, rendering and imaging. Computer Graphics Forum, 29(1):227–246, 2010.

[29] W. Zhang and J. Kosecka. Extraction, matching and pose recovery based on dominant rectangular structures. In IEEE International Workshop on Higher-Level Knowledge in 3D Modeling and Motion Analysis, Oct. 2003.

[30] A. Zomet, D. Feldman, S. Peleg, and D. Weinshall. Mosaicing new views: the crossed-slits projection. June 2003. 888888 IEEE TPAMI, 25(6):741–754,