iccv iccv2013 iccv2013-139 iccv2013-139-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Qian-Yi Zhou, Stephen Miller, Vladlen Koltun
Abstract: We present an approach to reconstruction of detailed scene geometry from range video. Range data produced by commodity handheld cameras suffers from high-frequency errors and low-frequency distortion. Our approach deals with both sources of error by reconstructing locally smooth scene fragments and letting these fragments deform in order to align to each other. We develop a volumetric registration formulation that leverages the smoothness of the deformation to make optimization practical for large scenes. Experimental results demonstrate that our approach substantially increases the fidelity of complex scene geometry reconstructed with commodity handheld cameras.
[1] B. J. Brown and S. Rusinkiewicz. Global non-rigid alignment of 3-D scans. ACM Trans.Graph., 26(3), 2007. 3
[2] B. Curless and M. Levoy. A volumetric method for building complex models from range images. In SIGGRAPH, 1996. 1, 3
[3] A. J. Davison. Real-time simultaneous localisation and mapping with a single camera. In ICCV, 2003. 1
[4] A. J. Davison, I. D. Reid, N. Molton, and O. Stasse. MonoSLAM: Real-time single camera SLAM. PAMI, 29(6), 2007. 1
[5] F. Endres, J. Hess, N. Engelhard, J. Sturm, D. Cremers, and W. Burgard. An evaluation of the RGB-D SLAM system. In ICRA, 2012. 3
[6] Y. Furukawa, B. Curless, S. M. Seitz, and R. Szeliski. Towards internet-scale multi-view stereo. In CVPR, 2010. 1
[7] P. Henry, M. Krainin, E. Herbst, X. Ren, and D. Fox. RGB-D mapping: Using Kinect-style depth cameras for dense 3D modeling of indoor environments. International Journal of Robotics Research, 3 1(5), 2012. 1, 2 478 (a) Extended KinectFusion(b) Zhou and Koltun (c) Mocap trajectory(d) Our approach Figure 5. Evaluation on a benchmark scene [29]: (a) Extended KinectFusion [22], (b) Zhou and Koltun [35], (c) volumetric integration along the motion-captured camera trajectory, and (d) our approach. Our approach is the only one that preserves high-frequency features such as the chair leg (red closeup) without introducing noisy artifacts on the flat panel (blue closeup).
[8] D. Herrera C., J. Kannala, and J. Heikkil a¨. Joint depth and color camera calibration with distortion correction. PAMI, 34(10), 2012. 1
[9] X. Huang, N. Paragios, and D. N. Metaxas. Shape regis- tration in implicit spaces using information theory and free
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21] form deformations. PAMI, 28(8), 2006. 2 B. Jian and B. C. Vemuri. Robust point set registration using gaussian mixture models. PAMI, 33(8), 2011. 2 K. Khoshelham and S. O. Elberink. Accuracy and resolution of Kinect depth data for indoor mapping applications. Sensors, 12(2), 2012. 1 G. Klein and D. W. Murray. Parallel tracking and mapping for small AR workspaces. In ISMAR, 2007. 1 K. Konolige and P. Mihelich. Technical description of Kinect calibration. 2012. http : / /wi k i ro s . o rg/ k ine ct_ca l . ibrat i / t echn i l 5 on ca . A. Myronenko and X. B. Song. Point set registration: Coherent point drift. PAMI, 32(12), 2010. 2 R. A. Newcombe and A. J. Davison. Live dense reconstruction with a single moving camera. In CVPR, 2010. 1 R. A. Newcombe, S. Izadi, O. Hilliges, D. Molyneaux, D. Kim, A. J. Davison, P. Kohli, J. Shotton, S. Hodges, and A. Fitzgibbon. KinectFusion: Real-time dense surface mapping and tracking. In ISMAR, 2011. 1, 2, 3 R. A. Newcombe, S. Lovegrove, and A. J. Davison. DTAM: Dense tracking and mapping in real-time. In ICCV, 2011. 1 C. V. Nguyen, S. Izadi, and D. Lovell. Modeling Kinect sensor noise for improved 3D reconstruction and tracking. In 3DIMPVT, 2012. 5 N. Paragios, M. Rousson, and V. Ramesh. Non-rigid registration using distance functions. CVIU, 89(2-3), 2003. 2 M. Pollefeys, L. J. V. Gool, M. Vergauwen, F. Verbiest, K. Cornelis, J. Tops, and R. Koch. Visual modeling with a hand-held camera. IJCV, 59(3), 2004. 1 M. Pollefeys, D. Nist ´er, J.-M. Frahm, A. Akbarzadeh, P. Mordohai, B. Clipp, C. Engels, D. Gallup, S. J. Kim,
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]
[30] [3 1]
[32]
[33] P. Merrell, C. Salmi, S. N. Sinha, B. Talton, L. Wang, Q. Yang, H. Stew e´nius, R. Yang, G. Welch, and H. Towles. Detailed real-time urban 3D reconstruction from video. IJCV, 78(2-3), 2008. 1 H. Roth and M. Vona. Moving volume KinectFusion. In British Machine Vision Conference (BMVC), 2012. 2, 6, 7 S. Rusinkiewicz, O. Hall-Holt, and M. Levoy. Real-time 3D model acquisition. ACM Trans. Graph., 21(3), 2002. 1 S. Rusinkiewicz and M. Levoy. Efficient variants of the ICP algorithm. In 3DIM, 2001 . 4 T. W. Sederberg and S. R. Parry. Free-form deformation of solid geometric models. In SIGGRAPH, 1986. 4 S. M. Seitz, B. Curless, J. Diebel, D. Scharstein, and R. Szeliski. A comparison and evaluation of multi-view stereo reconstruction algorithms. In CVPR, 2006. 1 J. Smisek, M. Jancosek, and T. Pajdla. 3D with Kinect. In ICCV Workshops, 2011. 1 O. Sorkine and M. Alexa. As-rigid-as-possible surface modeling. In Symposium on Geometry Processing, 2007. 4 J. Sturm, N. Engelhard, F. Endres, W. Burgard, and D. Cremers. A benchmark for the evaluation of RGB-D SLAM systems. In IROS, 2012. 5, 7 R. W. Sumner, J. Schmid, and M. Pauly. Embedded deformation for shape manipulation. ACM Trans.Graph., 26(3), 2007. 4 A. Teichman, S. Miller, and S. Thrun. Unsupervised intrinsic calibration of depth sensors via SLAM. In RSS, 2013. 2, 5 D. Terzopoulos, J. C. Platt, A. H. Barr, and K. W. Fleischer. Elastically deformable models. In SIGGRAPH, 1987. 4 F. Wang, B. C. Vemuri, A. Rangarajan, and S. J. Eisenschenk. Simultaneous nonrigid registration of multiple point sets and atlas construction. PAMI, 30(1 1), 2008. 2
[34] T. Whelan, H. Johannsson, M. Kaess, J. Leonard, and J. McDonald. Robust real-time visual odometry for dense RGB-D mapping. In ICRA, 2013. 1, 2
[35] Q.-Y. Zhou and V. Koltun. Dense scene reconstruction with points of interest. ACM Trans. Graph., 32(4), 2013. 1, 2, 6, 7 479 (I) (II) (III) (IV) (a) (b) (c) (d) ntesiDy0.125Point−0.pDlasenEZGOrcxuhotme(DAn)pdas0it.Tr1oKbuainchlteoFusin0.15tyeDvalumC0.18642Point−pla0.e5DEirsotnCGcuOZemhxo(rtuAnl)apd0ieTv.1rouaKDtcinhslerbuFtion0.15 ntesiDy0.125Point−0.pDlasenEZGOrcxuhotme(DAn)pdas0it.Tr1oKbuainchlteoFusin0.15tyeDvalumC0.18642Point−pla0.e5DEirsotnCGcuOZemhxo(rtuAnl)apd0ieTv.1rouaKDtcinhslerbuFtion0.15 ntesiDy0.125Point−0.p5DlasenEZGOrcxuhotme(DAun)pdaes0it.Tr1oKbuaitnchleoFusin0.15tyeDvialumCt0.18642Point−pla0.e5DEirsotnCGcuOZEremhxo(tuAnl)apdt0ieTv.1rouaKDtcinhsleurbtFiosn0.15 ntesiDy0.125Point−0.p5DlasenEZGOrcxuhotme(DAun)pdaes0it.Tr1oKbuaitnchleoFusin0.15tyeDvialumCt0.18642Point−pla0.e5DEirsotnCGcuOZEremhxo(tuAnl)apdt0ieTv.1rouaKDtcinhsleurbtFiosn0.15 Figure 6. Evaluation with synthetic data. (a) Extended KinectFusion, (b) Zhou and Koltun, (c) volumetric integration along the groundtruth camera trajectory, and (d) our approach. The plots on the right show distributions of point-to-plane error between the reconstructed shapes and the true shape. (I) and (II) use an idealized error model with no low-frequency distortion. (III) and (IV) use the full error model with low-frequency distortion estimated on a real PrimeSense sensor. 24GB of RAM, and an NVIDIA GeForce GTX 690 graphics card. 480