Author: Hee Seok Lee, Kuoung Mu Lee
Abstract: In this paper, we propose a convex optimization framework for simultaneous estimation of super-resolved depth map and images from a single moving camera. The pixel measurement error in 3D reconstruction is directly related to the resolution of the images at hand. In turn, even a small measurement error can cause significant errors in reconstructing 3D scene structure or camera pose. Therefore, enhancing image resolution can be an effective solution for securing the accuracy as well as the resolution of 3D reconstruction. In the proposed method, depth map estimation and image super-resolution are formulated in a single energy minimization framework with a convex function and solved efficiently by a first-order primal-dual algorithm. Explicit inter-frame pixel correspondences are not required for our super-resolution procedure, thus we can avoid a huge computation time and obtain improved depth map in the accuracy and resolution as well as highresolution images with reasonable time. The superiority of our algorithm is demonstrated by presenting the improved depth map accuracy, image super-resolution results, and camera pose estimation.
1 Simultaneous Super-Resolution of Depth and Images using a Single Camera Hee Seok Lee Kyoung Mu Lee Department of ECE, ASRI, Seoul National University, 15 1-742, Seoul, Korea ult ra2 1@ snu . [sent-1, score-0.091]
2 Abstract In this paper, we propose a convex optimization framework for simultaneous estimation of super-resolved depth map and images from a single moving camera. [sent-4, score-0.816]
3 The pixel measurement error in 3D reconstruction is directly related to the resolution of the images at hand. [sent-5, score-0.352]
4 In turn, even a small measurement error can cause significant errors in reconstructing 3D scene structure or camera pose. [sent-6, score-0.197]
5 Therefore, enhancing image resolution can be an effective solution for securing the accuracy as well as the resolution of 3D reconstruction. [sent-7, score-0.366]
6 In the proposed method, depth map estimation and image super-resolution are formulated in a single energy minimization framework with a convex function and solved efficiently by a first-order primal-dual algorithm. [sent-8, score-0.877]
7 Explicit inter-frame pixel correspondences are not required for our super-resolution procedure, thus we can avoid a huge computation time and obtain improved depth map in the accuracy and resolution as well as highresolution images with reasonable time. [sent-9, score-0.872]
8 The superiority of our algorithm is demonstrated by presenting the improved depth map accuracy, image super-resolution results, and camera pose estimation. [sent-10, score-0.594]
9 Introduction In 3D reconstruction with a single camera, the accuracy of camera pose and scene structure estimation is highly affected by the conditions of input images such as noise, contrast, blur, and resolution. [sent-12, score-0.331]
10 In particular, image resolution is an important factor for achieving sufficient accuracy ofvarious geometry-related computer vision algorithms including 3D reconstruction, since it influence the feature detection, localization and matching. [sent-13, score-0.145]
11 Note that small measurement error does not bring large errors in object position and camera pose when an object is close to the camera, while, it does significantly when kyoungmu @ s nu . [sent-15, score-0.303]
12 Therefore, it is necessary to enhance the image resolution to reduce the sensitivity to the image measurement error and achieve reliable and accurate 3D reconstruction. [sent-21, score-0.208]
13 Image super-resolution, the method for enhancing image resolution, has two different approaches: reconstructionbased approach and learning-based approach. [sent-22, score-0.101]
14 Therefore, finding accurate pixel-wise correspondences is the key for the success of the reconstruction-based super-resolution. [sent-25, score-0.138]
15 For general scenes, these correspondences can be obtained up to sub-pixel accuracy using optical flow algorithms. [sent-26, score-0.18]
16 Some iterative methods [4, 7] alternately estimate a high-resolution image and pixel correspondences, and show better results. [sent-28, score-0.139]
17 Note that if we employ the information about the 3D scene geometry, the super-resolution problem can be solved more efficiently since we can directly use it for enhancing the accuracy of the correspondences. [sent-30, score-0.101]
18 That is, with estimated camera poses, the problem of finding pairwise pixel correspondences through an image sequence can be converted into estimating the depth value of corresponding pixels. [sent-31, score-0.798]
19 Although this converted problem has an error source related to the camera pose error, because it is casted in a much lesser dimensional solution space than the original pairwise correspondence problem, it can be solved much easily and faster. [sent-32, score-0.305]
20 Therefore, depth reconstruction and super-resolution problems are interrelated and boost each other’s accuracy. [sent-33, score-0.555]
21 So, in this work, we combine the depth estimation and the high-resolution image estimation in a unified framework, and propose a simultaneous solution to both problems. [sent-34, score-0.75]
22 222888111 In the proposed method, the depth estimation and image super-resolution are formulated with a single convex energy function, which consists of data term and regularization term. [sent-35, score-0.848]
23 The solution is estimated by convex optimization of the energy function. [sent-36, score-0.399]
24 Although both pixel correspondences (re-parameterized by depth) and high-resolution image are estimated, the computational cost is not so expensive compared to the conventional high-resolution image estimation only because we do not use alternating methods like EM. [sent-37, score-0.489]
25 Additionally, due to the simultaneous estimation of depth and high-resolution image, the results of the two problems are greatly enhanced. [sent-38, score-0.591]
26 Related works In this section, we review some works that are similar to our work in combine 3D reconstruction and superresolution. [sent-40, score-0.126]
27 Then, we discuss the works on the primal-dual algorithm for 3D reconstruction or super-resolution. [sent-41, score-0.126]
28 3D reconstruction and image super resolution In [1, 9, 14, 5], the close relationship between superresolution and 3D scene structure is pointed out and their cooperative solution is studied. [sent-44, score-0.406]
29 Occlusions are effec- tively handled in their super-resolution method using depth information, but super-resolution does not contribute to depth map estimation in this method. [sent-46, score-1.017]
30 In [14], a method for increasing the accuracy of 3D video reconstruction using multiple static cameras is presented. [sent-47, score-0.126]
31 The 3D video is composed of texture images and 3D shapes, and increasing their accuracy is achieved by simultaneous super-resolution using MRF formulation and graph-cuts. [sent-48, score-0.216]
32 High-quality texture and 3D reconstruction is presented in [5] where texture and shape of a 3D model are alternately estimated with joint energy functional. [sent-49, score-0.468]
33 Compared to [5] our work has more challenging settings in which neither accurate camera motions nor initial pixel correspondences are available. [sent-50, score-0.392]
34 The authors formulate a full frame superresolution problem combined with a depth map estimation problem, and attempt to enhance the results of both problems. [sent-52, score-0.75]
35 However, their solution is not fully simultaneous but follows an EM-style alternating method instead. [sent-53, score-0.254]
36 They fix the current high-resolution image for the estimation of the depth map, and vice versa. [sent-54, score-0.46]
37 Graph-cut and iterated conditional modes (ICM) are used for the depth and highresolution image estimation, respectively, for each iteration, which result in an inevitably large computation cost. [sent-55, score-0.491]
38 In contrast, we search the globally optimum solution directly with a single convex energy function and achieve very fast optimization speed for dense real-time 3D reconstruction. [sent-56, score-0.421]
39 Primal-dual algorithm for 3D reconstruction and super-resolution The formulation of our algorithm is based on the variational approach, especially the primal-dual algorithm [2, 3, 6]. [sent-59, score-0.167]
40 The first-order primal-dual algorithm is a very effective tool for convex variational problems due to its parallelizable characteristics. [sent-60, score-0.164]
41 The first-order primal-dual algorithm has been applied recently for the 3D reconstruction and super-resolution problems. [sent-62, score-0.126]
42 In [10] and [13], a dense 3D reconstruction is studied and its real-time implementations are demonstrated. [sent-63, score-0.126]
43 They used conventional energy functions consisting of photometric consistency-based data term and L1 or Huber norm-based smoothness term, but achieved a breakthrough performance in computation time using the primaldual algorithm combined with the GPGPU implementation. [sent-64, score-0.43]
44 The reconstruction-based super-resolution is formulated by image downsampling, blurring, and warping, and then the latent high-resolution image is estimated with the Huber norm regularization. [sent-66, score-0.179]
45 This method achieves a fast computation of high-quality super-resolution comparable to other methods, but has certain limitations such that highly accurate initial image warping is required and no updating procedure is involved in estimating the super-resolution. [sent-67, score-0.15]
46 × Our novel combined 3D reconstruction and superresolution algorithm is also formulated in the first-order primal-dual framework. [sent-68, score-0.344]
47 However, unlike [10] and [13], the proposed super-resolution combined framework enables more accurate depth map estimation with respect to its resolution. [sent-69, score-0.599]
48 Our image super-resolution is also accelerated by finding pixel correspondences in a depth domain instead of optical flows between images with the help of camera geometry obtained from the 3D reconstruction. [sent-70, score-0.746]
49 Model In this work, we propose a new energy function for a simultaneous estimation of depth map and high-resolution image. [sent-72, score-0.819]
50 The inputs are M N size low-resolution image sequence Ij ∈ RMN and their corresponding camera poses Pj ∈ SE(3) with j ∈ {0, . [sent-73, score-0.125]
51 Let g ∈ be the latent super-resolution image with the gray scale, and d ∈ be the latent inverse depth map, where s is the predefined upscale factor. [sent-77, score-0.58]
52 The solution of g and d is estimated with respect to the reference view P0. [sent-78, score-0.205]
53 The energy function to solve this problem is composed of the data cost Edata based on the photometric constancy and the regularization cost Ereg for smoothing undesirable artifacts. [sent-79, score-0.621]
54 With Rs2MN 222888222 Rs2MN quence Ij and the super-resolution image g, induced by the depth map d: The photometric consistency should hold for Ij and the simulated low-resolution image D ∗ B ∗ g. [sent-80, score-0.608]
55 the parameter λ which controls the degree of regularization, the energy function has the form E(g, d) = Ereg + λEdata. [sent-81, score-0.139]
56 Data cost We start with the relationship between the highresolution image g for the reference image I0 and the lowresolution image Ij from an adjacent view. [sent-85, score-0.401]
57 With the camera internal parameter K including the focal length and the principal point, the reprojected 3D position X of pixel (x, y) in I0 with the inverse depth d(x, y) by the reference camera P0 is given by X = d(x1,y)K−1 · (x, y, 1)? [sent-86, score-0.875]
58 , and its projection to the adjacent view with Pj is calculated as h(KPj,0d(x1,y)K−1 · (x, y, 1)? [sent-87, score-0.089]
59 For notational simplicity, the non-bold characters g and d are used for the pixel-wise values g(x, y) and d(x, y), respectively, and their corresponding dual variables later. [sent-92, score-0.122]
60 We define the image warping W(Ij , d), which transforms the image Ij to the reference image I0, using the pixel projection and reprojection discussed above, W(Ij,d)(x,y) = Ij(h(KPj,0d1K−1· (x,y,1)? [sent-93, score-0.224]
61 (1) Then, by the photometric consistency between the reference image and the adjacent image, the equation I0(x,y) = Ij(h(KPj,0d1K−1·(x,y,1)? [sent-95, score-0.227]
62 By incorporating the image resolution degradation model, the equation (D ∗ B ∗ g)(x, y) = I0(x, y) = W(Ij, d)(x, y) (3) also holds for all j ∈ {0, . [sent-100, score-0.156]
63 Here, D and B are the downsampling and the blurring operator, respectively. [sent-104, score-0.18]
64 (4) To find the optimized value of d through an iterative update, we apply the first-order Taylor expansion to W(Ij , d) to approximate a change in image W(Ij , d) with respect to a small change of depth at the initial value d0, W(Ij,d) ? [sent-110, score-0.494]
65 (7) The blur kernel B is predefined with the simple Gaussian blur model, with the standard deviation s and the kernel size of (s 1)1/2. [sent-120, score-0.176]
66 2 shows an example of the convexity of data cost ρ(g, d) for different image points. [sent-129, score-0.116]
67 The shape of the cost function is obviously convex, but the shape of the function varies from image point to point according to the image gradient. [sent-130, score-0.116]
68 In a low texture region, the data cost is dominated by the high-resolution intensity g than the depth d. [sent-131, score-0.576]
69 Therefore, regularization is required to get a more plausible solution for depth d. [sent-132, score-0.567]
70 Regularization For image intensity g and inverse depth d, we use a Huber norm based regularization to get a smoothed and discontinuity-preserved result. [sent-141, score-0.665]
71 By combining the data cost (8) and the regularization (9), we get our objective energy function E(g, d), αd E(g,d) =? [sent-150, score-0.363]
72 (10) In the next section, we describe the solution of this energy function. [sent-156, score-0.218]
73 Initial depth estimation In the data cost (8), the first-order Taylor expansion, which can only handle a small update for g, and d is applied. [sent-160, score-0.576]
74 The initial value of g can be easily obtained by upscaling the input image at reference view using simple bicubic interpolation. [sent-162, score-0.319]
75 However, the initial value of d should be estimated using the low-resolution input sequence. [sent-163, score-0.113]
76 The cost function for initial depth estimation is easily obtained from Eq. [sent-164, score-0.644]
77 (8) and (10) by replacing B ∗ g and Iˆj with the low-resolution images I0 and Ij , respectively, and removing the regularization on g. [sent-165, score-0.108]
78 The resulting energy function for low-resolution depth map dˇ is E(ˇd) =? [sent-166, score-0.608]
79 The equation (11) is actually a conventional formulation for depth map estimation. [sent-173, score-0.519]
80 The optimization of this energy function is almost similar to the optimization of Eq. [sent-174, score-0.249]
81 (10), which will be explained below, so the optimization of (11) is skipped here. [sent-175, score-0.101]
82 Thus, a coarse-to-fine approach is used to approach the global optimum of d gradually by starting from an arbitrary initial solution, e. [sent-178, score-0.135]
83 The depth result obtained at the finest level is upscaled using bicubic interpolation and is fed to the optimization of (10) as an initial value. [sent-182, score-0.723]
84 High-resolution image and depth estimation Now we will describe a solution of Eq. [sent-185, score-0.539]
85 By interpreting our objective function (10) as the primal-dual formulation, we can rewrite it as a generic saddle point problem with the dual variables p and q, which corresponds to g and d, respectively: mg,id−nδmpP,aq(xp? [sent-187, score-0.122]
86 where the operator ∇∗ , the conjugate of ∇ as ∇∗ = −div, computes the divergence [2], and ¯g and d¯ are the intermediate variables for the convergence of algorithm. [sent-214, score-0.155]
87 The operators Rp,q and Rp,q are the resolvent operators that search lower energy values using subgradients. [sent-217, score-0.529]
88 The resolvent operators will be discussed in more detail. [sent-219, score-0.313]
89 Our regularization term (10) is a typical form used in [2]. [sent-220, score-0.108]
90 Thus, the resolvent operator of the dual variables is a pixelwise projection Rp,q(p,q) = ? [sent-221, score-0.469]
91 (14) On the other hand, the data cost has a difference with the standard form in previous primal-dual algorithm applications. [sent-224, score-0.116]
92 This difference comes from the summation of absolute value in the data cost for image sequence. [sent-225, score-0.116]
93 Since we use 222888444 a L1 norm for the difference between two images, there are some critical (non-differentiable) points in their summation. [sent-226, score-0.177]
94 Therefore, these non-differentiability should be handled in the optimization procedure. [sent-227, score-0.101]
95 The minimization of similar cost function is introduced in [13], but the solution space of [13] is for the depth map only, so the minimization can be efficiently achieved by evaluating and sorting all critical points. [sent-228, score-0.922]
96 On the other hand, the solution space of our problem is composed of depth map and image intensity, so there are J2 critical points. [sent-229, score-0.696]
97 Searching them is not straightforward, and thus optimization by evaluating and sorting critical points is inefficient. [sent-230, score-0.217]
98 Instead, the general gradient descent and critical point searching are combined to accelerate the minimization procedure. [sent-231, score-0.201]
99 (16) We divide the domain of resolvent operator based on the cost ρ and the magnitude of gradient ? [sent-248, score-0.463]
100 22, and apply the gradient descent search and critical point search, whRerg,d( )=⎪⎨ ⎧⎩⎪ (g,ifd ρ) −( g,τρd? [sent-250, score-0.103]
