cvpr cvpr2013 cvpr2013-111 knowledge-graph by maker-knowledge-mining

111 cvpr-2013-Dense Reconstruction Using 3D Object Shape Priors


Source: pdf

Author: Amaury Dame, Victor A. Prisacariu, Carl Y. Ren, Ian Reid

Abstract: We propose a formulation of monocular SLAM which combines live dense reconstruction with shape priors-based 3D tracking and reconstruction. Current live dense SLAM approaches are limited to the reconstruction of visible surfaces. Moreover, most of them are based on the minimisation of a photo-consistency error, which usually makes them sensitive to specularities. In the 3D pose recovery literature, problems caused by imperfect and ambiguous image information have been dealt with by using prior shape knowledge. At the same time, the success of depth sensors has shown that combining joint image and depth information drastically increases the robustness of the classical monocular 3D tracking and 3D reconstruction approaches. In this work we link dense SLAM to 3D object pose and shape recovery. More specifically, we automatically augment our SLAMsystem with object specific identity, together with 6D pose and additional shape degrees of freedom for the object(s) of known class in the scene, combining im- age data and depth information for the pose and shape recovery. This leads to a system that allows for full scaled 3D reconstruction with the known object(s) segmented from the scene. The segmentation enhances the clarity, accuracy and completeness of the maps built by the dense SLAM system, while the dense 3D data aids the segmentation process, yieldingfaster and more reliable convergence than when using 2D image data alone.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 uk Abstract We propose a formulation of monocular SLAM which combines live dense reconstruction with shape priors-based 3D tracking and reconstruction. [sent-6, score-0.443]

2 Current live dense SLAM approaches are limited to the reconstruction of visible surfaces. [sent-7, score-0.242]

3 Moreover, most of them are based on the minimisation of a photo-consistency error, which usually makes them sensitive to specularities. [sent-8, score-0.129]

4 In the 3D pose recovery literature, problems caused by imperfect and ambiguous image information have been dealt with by using prior shape knowledge. [sent-9, score-0.356]

5 At the same time, the success of depth sensors has shown that combining joint image and depth information drastically increases the robustness of the classical monocular 3D tracking and 3D reconstruction approaches. [sent-10, score-0.694]

6 In this work we link dense SLAM to 3D object pose and shape recovery. [sent-11, score-0.476]

7 More specifically, we automatically augment our SLAMsystem with object specific identity, together with 6D pose and additional shape degrees of freedom for the object(s) of known class in the scene, combining im- age data and depth information for the pose and shape recovery. [sent-12, score-0.92]

8 This leads to a system that allows for full scaled 3D reconstruction with the known object(s) segmented from the scene. [sent-13, score-0.153]

9 The segmentation enhances the clarity, accuracy and completeness of the maps built by the dense SLAM system, while the dense 3D data aids the segmentation process, yieldingfaster and more reliable convergence than when using 2D image data alone. [sent-14, score-0.282]

10 Introduction The reconstruction of scene geometry from a single monocular image sequence is a key problem in computer vision. [sent-16, score-0.129]

11 When the camera trajectory is unknown, the joint on-line estimation problem for scene structure and camera pose has become known as visual Simultaneous Localisation and Mapping. [sent-17, score-0.466]

12 Early methods for visual SLAM [7, 11] concentrated on accurate camera pose estimation using only sparse reconstructions. [sent-18, score-0.324]

13 au an computation devices, real-time, dense SLAM has become a technical possibility [10, 15]. [sent-23, score-0.141]

14 While this has the effect of increasing SLAM robustness and accuracy, both approaches are limited by their use of a sparse map and by the fact that they consider the objects to be of fixed and perfectly known shape. [sent-27, score-0.1]

15 A more generic semantic reconstruction is proposed in [8], where shape and layout priors of buildings are learned offline. [sent-28, score-0.224]

16 While shape priors have seen limited use in the SLAM literature, they have been extensively used in segmentation and tracking, as a solution to the problem of imperfect raw image information. [sent-31, score-0.202]

17 One of the most effective and popular approaches to represent shape knowledge is to use dimensionality reduction to capture the shape variance as low dimensional latent shape spaces. [sent-32, score-0.434]

18 Initial works, such as [24], focused on (implicitly or explicitly defined) 2D shapes, and used linear dimensionality reduction in the form of principal component analysis (PCA). [sent-33, score-0.118]

19 More recent works use nonlinear dimensionality reduction such as Kernel PCA in [6] and Gaussian Process Latent Variable Models (GP-LVM) in [17]. [sent-34, score-0.117]

20 This led to 3D shape priors being first introduced in [23]. [sent-35, score-0.162]

21 Most recently, [19] learn GP-LVM latent spaces of 3D shapes and use them in monocular simultaneous 2D segmentation, 3D reconstruction and 3D pose recovery. [sent-36, score-0.461]

22 Our objective in this paper is to address these limitations of existing systems by proposing an efficient dense SLAM ooooopppppyy y r r ri i ggggghhhhhtt t 1 1 12 2 28 8 868 6 approach that integrates a shape-prior-based estimator asand-when possible. [sent-37, score-0.141]

23 Here, in a manner similar to [19], we represent the shape-prior using GP-LVM and optimise an energy over the pose and a low-dimensional latent shape space. [sent-39, score-0.444]

24 An implicit volumetric representation of the dense reconstruction (similar to that used in [16]) allows for a very efficient fusion of the dense reconstruction with the reconstructed object shape. [sent-40, score-0.566]

25 Next, in Section 3 the semantic part of the system is described, including the recognition of an object together and the estimation of its refined pose and shape. [sent-44, score-0.327]

26 In Section 4 we present the way the information provided by the shape prior based estimator can be integrated into the dense SLAM. [sent-45, score-0.24]

27 Dense SLAM Our dense SLAM system is structured as follows: firstly, assuming known camera pose from the PTAM system [11], dense depth maps are built using a brightness constancy assumption. [sent-48, score-1.033]

28 Each depth map is subsequently fused into a global volumetric representation of the scene. [sent-49, score-0.509]

29 Local Depth Map Estimation Closely mirroring the approach of [15], we formulate the initial depth map estimation problem as one of finding the depth of each point that is seen in one reference image. [sent-52, score-0.717]

30 More formally, let u denote the coordinates of a pixel in the reference image Ir. [sent-56, score-0.219]

31 the pixel coordinates and depth that brings to its homogeneous coordinates in the camera frame (computed using the intrinsic parameters of the camera). [sent-66, score-0.682]

32 mMr is the SE(3) matrix that maps the coordinates of a point in the reference camera frame into its coordinates in the camera frame m, this matrix is available from PTAM providing the world to camera(s) transformation mMw using mMr = mMwrMw . [sent-67, score-0.659]

33 π is the function that projects the homogeneous coordinates of a point in the camera frame into its pixel coordinates in the image plane. [sent-68, score-0.422]

34 Searching for the actual surface along one ray is then −1 equivalent to searching for the depth d that leads to the minimum photo-consistency error. [sent-70, score-0.444]

35 The points along each ray are evenly sampled along the inverse depth so that the corresponding epipolar lines are evenly sampled. [sent-71, score-0.417]

36 The depth estimate resulting from this process is however noisy, since (i) for many pixels, the brightness constancy is not respected; (ii) the pixels themselves are noisy, (iii) the evaluation space is discretised and (iv) uniform regions lack colour information. [sent-72, score-0.606]

37 To improve the depth map the standard approach is to regularise with a weak prior that favours continuous depth in uniform regions. [sent-73, score-0.591]

38 This yields the following energy minimisation over the depth map d(u) : E(d(u)) =? [sent-74, score-0.551]

39 adl( tou) 1 i sw thheer depth map gra(dsieenet, [ γ i]s f a rsfc aulrathr weighting t ? [sent-81, score-0.326]

40 Robust Map Representation The process above can be repeated for several reference frames, and the resulting depth maps merged into a single global map. [sent-91, score-0.389]

41 To address this limitation we fuse the local depth maps into a dense volumetric parametrization of the 3D world 1 1 12 2 28 8 879 7 akin to that used in [26, 10, 16]. [sent-95, score-0.602]

42 The surface is recovered from this representation as the TSDF zero level set. [sent-97, score-0.099]

43 Each time a new depth map is generated, the values in the TSDF are updated to take the new information into account following a similar process to the one in [16]. [sent-98, score-0.363]

44 , the distance as estimated in the new depth map: Wk+1= Wk+ W? [sent-100, score-0.265]

45 ∈ [0, maxW] denotes the confidence in the new in- formation∈ ∈an [0d, ims a xfunction of the angle between the surface normal and the optic ray, with greater confidence associated with frontal surfaces (and near zero confidence for surfaces tangential to an optic ray). [sent-106, score-0.338]

46 Thus the new (approximate) distance to the surface is a weighted average of all previous measurements, helping to smooth out errors. [sent-107, score-0.099]

47 Incorporating object knowledge Our objective in this paper is to show how the ability to detect objects and incorporate them into a SLAM map is beneficial, as a step towards a more object-based, more semantically meaningful map. [sent-110, score-0.115]

48 Image-based object detection and localisation While the dense SLAM system continually acquires new × depth meshes at key-frames and fuses these into a global volumetric representation, in parallel we run a part-based object-class detector based on the effective procedure described in [9]. [sent-115, score-0.772]

49 To perform the optimization using shape priors, it is first necessary to have a coarse estimation of the pose of the detected object. [sent-123, score-0.391]

50 To estimate this pose, we use a combination of the data available from the detector and from the dense SLAM map as follows. [sent-124, score-0.25]

51 We thus require a detection in at least two key-frames before proceeding to estimate the pose [12]. [sent-126, score-0.182]

52 Next, we estimate the vertical axis of the object by find- ing its supporting plane using the dense SLAM map. [sent-128, score-0.252]

53 To do so, we make the assumptions that (i) there is indeed a supporting planar surface; and (ii) the supporting plane is unoccluded in the immediate area around the object. [sent-129, score-0.114]

54 In particular, we sample depth values from pixels located immediately below the object in the key-frames and apply RANSAC to the resulting point cloud (see Figure 1(b)). [sent-130, score-0.319]

55 To estimate the second (and hence third and final) principal direction we consider the projection of the part configuration from the Felzenszwalb detector [9]. [sent-131, score-0.17]

56 Finally, the size of the object is estimated using the size of the projection of the detected object in the first image and the depth of the object available from an initial triangulation. [sent-135, score-0.538]

57 This, together with the intrinsic camera calibration, is sufficient to yield an estimate of the size of the object. [sent-136, score-0.143]

58 We lever- × age this information to build foreground and background colour models for the detected object as a whole. [sent-140, score-0.342]

59 Colour histograms of both foreground P(y|Cf) and background P(y|Cb) are tohfe bno easily ecgreraotuendd (h Pe(rey y represents tkhger image pixel colour at location u). [sent-144, score-0.29]

60 foreground iaxnedl background posterior probabilities: Pf(u) = Pb(u) = P(Cηff|y)=ηfP(y|CPf()y +|C ηfb)P(y|Cb) P(Cηbb|y)=ηfP(y|CPf()y +|C ηbb)P(y|Cb) (4) with ηf and ηb being the number of pixels in the foreground and background regions respectively. [sent-147, score-0.162]

61 1(a) where each part of the detected object is represented with its relative segmenta- tion (black region represents the background, white is foreground and gray is unknown) and the foreground per-pixel posterior probability shown in Fig. [sent-149, score-0.253]

62 Prior based shape and pose estimation To segment the object in 3D (and subsequently fuse this information back into the volumetric model), we use a method similar to [19]. [sent-153, score-0.573]

63 3D shapes are represented volumetrically as Signed Distance Functions (SDFs), with the object surface implicitly defined by the zero-level set making this a natural candidate for use with the volumetric models produced using the methods in Section 2. [sent-154, score-0.294]

64 Within-class shape variation is represented via a low-dimensional embedding of the otherwise very high-dimensional 3D shapespace. [sent-155, score-0.099]

65 Unlike [19] however, in our current context, we have camera pose and depth information available from the SLAM system, which we aim to use to improve the object pose and shape recovery results. [sent-159, score-0.92]

66 one that also takes depth into consideration), and, on the other hand, the need to match the scale between the SLAM system and the learned object coordinate system. [sent-162, score-0.432]

67 Our aim therefore becomes, for objects detected within the scene, the simultaneous recovery of 3D shape (parametrised by the latent space), 6D pose and scale. [sent-163, score-0.468]

68 We do this by defining an image and depth based energy function, finding its derivatives w. [sent-164, score-0.458]

69 pose, scale and shape and using standard nonlinear minimisation techniques. [sent-167, score-0.275]

70 1 Energy Function Our dense SLAM system provides pose and depth information over multiple frames coming from a single monocular source. [sent-171, score-0.741]

71 We use the nv key-frames from this data stream as multiple views in our joint 3D shape / 3D pose optimisation. [sent-172, score-0.281]

72 image coordinates and depth) from a key frame v, and the corresponding points X0 in the object coordinate frame, we write our energy function as: E(Φ) =n1v? [sent-175, score-0.353]

73 This energy function combines an image based error Eiv (Φ) and a depth based one Edv (Φ), with α representing the balance between the two. [sent-184, score-0.361]

74 Note that there is a principled, probabilistic explanation behind this coupling, as each part of the energy function can be written as the log of a per pixel joint probability. [sent-185, score-0.135]

75 Furthermore, since the two parts of the energy function are sums of per-pixel 1 1 12 2 28 89 91 9 values, we can perform the multi view information fusion by simply averaging the per-view energy function values. [sent-186, score-0.242]

76 Eiv (Φ) measures the discrimination between statistically defined foreground and background regions, as a function of the projected 3D SDF Φ, using the functions Pf and Pb from eq (4) in each reference view v. [sent-187, score-0.218]

77 πv (Φ) projects Φ to a 2D occupancy map, with value 1 inside the projection outline and 0 outside. [sent-190, score-0.161]

78 It does this by evaluating, for each pixel in the imagedepth domain the probability of it being a projection of a voxel “inside” the 3D SDF Φ. [sent-191, score-0.167]

79 e the surface of the 3D object model), using the pose corresponding to view v and assuming pixelwise independence. [sent-201, score-0.385]

80 The probability that an image-depth point lies on the object surface is equal to the probability that the back projected 3D point lies on the zero-level of the SDF. [sent-202, score-0.153]

81 This approach was also used in [20], in which depth data coming from a Microsoft Kinect unit was used for simultaneous model based 3D tracking and calibration. [sent-204, score-0.382]

82 Unlike [20] however, here we (i) also make use of the RGB image data, (ii) adapt the shape of the object and (iii) use the dense SLAM system to provide depth data. [sent-205, score-0.611]

83 To minimise this energy, we compute its derivatives with respect to pose, shape and scale and use them in a Levenberg-Marquardt style nonlinear minimisation. [sent-206, score-0.243]

84 2 Pose/Scale Derivatives Each image-depth point x in a view v is the projection of a point X in the camera coordinate frame, which itself has a corresponding point X0 in the object coordinate frame. [sent-209, score-0.403]

85 The transformation from X to x is parametrised by the camera intrinsic parameters corresponding to view v. [sent-210, score-0.295]

86 The transformation from X0 to X is X = vMoX0, where vMo = vMwwMo with vMw being the SE(3) transformation from the world to the reference camera v coordinates defined in Section 2. [sent-211, score-0.411]

87 wMo is the transformation from object to world coordinates, i. [sent-212, score-0.136]

88 , 7 represent the unknown 6 DoF pose parameters p( ∈thre 1e, f. [sent-219, score-0.222]

89 zeΦe(ΦX(0X)ζ0)+ζ 1∂∂XΦnvv∂∂Xλpnv (12) where ∂∂λΦp= −∂∂XΦ0(vMo)−1vMw∂∂wλMpo(vMo)−1X (13) (14) As in [19], in order to make the computation of ∂∂πλpv easier, we use OpenGL-style normalised device coordinates for Φ and X. [sent-229, score-0.177]

90 In this coordinate system the 3D SDF Φ is transformed into Φv, using the pose, scale and intrinsics corresponding to view v. [sent-230, score-0.163]

91 Also, the 3D point that projects to x under the known camera calibration for view v is now denoted by Xnv. [sent-231, score-0.237]

92 Therefore, using the chain rule, we can write: ∂∂Xλpvn=∂∂XXnv∂∂λXp (15) ∂∂XXvn where are the derivatives of the standard normalised device coordinate conversion (i. [sent-232, score-0.242]

93 projection and normalisation of the Z coordinate) and ∂∂λXp follow in a straightforward manner as derivatives of X = vMwwMoX0, wrt. [sent-234, score-0.171]

94 Finally, the derivatives ∂∂wλMpo are computed × analogously and 3. [sent-236, score-0.14]

95 We do this, in a manner similar to [19], by using a dimensionality reduction technique called Gaussian Process Latent Variable Models, to learn nonlinear and probabilistic latent shape spaces. [sent-240, score-0.283]

96 pose and scale, by replacing ∂∂ΦXvvn∂∂Xλpnv with ∂∂Φlqv and ∂∂λΦp with ∂∂lΦq. [sent-251, score-0.182]

97 These final two derivatives are the ones of the standard GP-LVM generative process [13], on which the inverse DCT transform has been applied. [sent-252, score-0.134]

98 Map update Once the shape and pose estimation of the object has converged (as measured using the standard LevenbergMarquardt test), we fuse the shape SDF Φ with the global map. [sent-254, score-0.527]

99 is defined by the object SDF while the confidence W? [sent-257, score-0.103]

100 (or weight) in this distance is defined so that only the voxels close to the object surface are modified: W? [sent-258, score-0.191]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('slam', 0.499), ('depth', 0.265), ('pose', 0.182), ('sdf', 0.176), ('colour', 0.17), ('vmo', 0.152), ('dense', 0.141), ('eiv', 0.134), ('minimisation', 0.129), ('mmr', 0.114), ('volumetric', 0.106), ('camera', 0.103), ('sdfs', 0.101), ('edv', 0.101), ('tsdf', 0.101), ('surface', 0.099), ('shape', 0.099), ('derivatives', 0.097), ('energy', 0.096), ('coordinates', 0.093), ('reference', 0.087), ('foreground', 0.081), ('ray', 0.08), ('discretised', 0.076), ('mpo', 0.076), ('pbv', 0.076), ('pnv', 0.076), ('projection', 0.074), ('cb', 0.069), ('cpf', 0.067), ('monocular', 0.067), ('latent', 0.067), ('priors', 0.063), ('trimaps', 0.062), ('reconstruction', 0.062), ('coordinate', 0.061), ('map', 0.061), ('meshes', 0.059), ('constancy', 0.058), ('supporting', 0.057), ('parametrised', 0.056), ('signed', 0.056), ('wk', 0.056), ('object', 0.054), ('fuse', 0.054), ('voxel', 0.054), ('ptam', 0.054), ('system', 0.052), ('pb', 0.052), ('view', 0.05), ('frame', 0.049), ('confidence', 0.049), ('detector', 0.048), ('simultaneous', 0.048), ('principal', 0.048), ('ir', 0.047), ('localisation', 0.047), ('nonlinear', 0.047), ('transformation', 0.046), ('optic', 0.046), ('specularities', 0.045), ('normalised', 0.045), ('projects', 0.045), ('iii', 0.044), ('pv', 0.043), ('analogously', 0.043), ('dct', 0.043), ('occupancy', 0.042), ('ii', 0.042), ('bb', 0.042), ('pf', 0.041), ('unknown', 0.04), ('imperfect', 0.04), ('intrinsic', 0.04), ('known', 0.039), ('pixel', 0.039), ('estimation', 0.039), ('subsequently', 0.039), ('live', 0.039), ('device', 0.039), ('lambertian', 0.038), ('fused', 0.038), ('voxels', 0.038), ('fk', 0.038), ('detected', 0.037), ('process', 0.037), ('compressed', 0.037), ('world', 0.036), ('evenly', 0.036), ('shapes', 0.035), ('recovery', 0.035), ('tracking', 0.035), ('reduction', 0.035), ('dimensionality', 0.035), ('coming', 0.034), ('coarse', 0.034), ('innermost', 0.034), ('laide', 0.034), ('levenbergmarquardt', 0.034), ('evd', 0.034)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000001 111 cvpr-2013-Dense Reconstruction Using 3D Object Shape Priors

Author: Amaury Dame, Victor A. Prisacariu, Carl Y. Ren, Ian Reid

Abstract: We propose a formulation of monocular SLAM which combines live dense reconstruction with shape priors-based 3D tracking and reconstruction. Current live dense SLAM approaches are limited to the reconstruction of visible surfaces. Moreover, most of them are based on the minimisation of a photo-consistency error, which usually makes them sensitive to specularities. In the 3D pose recovery literature, problems caused by imperfect and ambiguous image information have been dealt with by using prior shape knowledge. At the same time, the success of depth sensors has shown that combining joint image and depth information drastically increases the robustness of the classical monocular 3D tracking and 3D reconstruction approaches. In this work we link dense SLAM to 3D object pose and shape recovery. More specifically, we automatically augment our SLAMsystem with object specific identity, together with 6D pose and additional shape degrees of freedom for the object(s) of known class in the scene, combining im- age data and depth information for the pose and shape recovery. This leads to a system that allows for full scaled 3D reconstruction with the known object(s) segmented from the scene. The segmentation enhances the clarity, accuracy and completeness of the maps built by the dense SLAM system, while the dense 3D data aids the segmentation process, yieldingfaster and more reliable convergence than when using 2D image data alone.

2 0.42758131 372 cvpr-2013-SLAM++: Simultaneous Localisation and Mapping at the Level of Objects

Author: Renato F. Salas-Moreno, Richard A. Newcombe, Hauke Strasdat, Paul H.J. Kelly, Andrew J. Davison

Abstract: We present the major advantages of a new ‘object oriented’ 3D SLAM paradigm, which takes full advantage in the loop of prior knowledge that many scenes consist of repeated, domain-specific objects and structures. As a hand-held depth camera browses a cluttered scene, realtime 3D object recognition and tracking provides 6DoF camera-object constraints which feed into an explicit graph of objects, continually refined by efficient pose-graph optimisation. This offers the descriptive and predictive power of SLAM systems which perform dense surface reconstruction, but with a huge representation compression. The object graph enables predictions for accurate ICP-based camera to model tracking at each live frame, and efficient active search for new objects in currently undescribed image regions. We demonstrate real-time incremental SLAM in large, cluttered environments, including loop closure, relocalisation and the detection of moved objects, and of course the generation of an object level scene description with the potential to enable interaction.

3 0.3964386 231 cvpr-2013-Joint Detection, Tracking and Mapping by Semantic Bundle Adjustment

Author: Nicola Fioraio, Luigi Di_Stefano

Abstract: In this paper we propose a novel Semantic Bundle Adjustmentframework whereby known rigid stationary objects are detected while tracking the camera and mapping the environment. The system builds on established tracking and mapping techniques to exploit incremental 3D reconstruction in order to validate hypotheses on the presence and pose of sought objects. Then, detected objects are explicitly taken into account for a global semantic optimization of both camera and object poses. Thus, unlike all systems proposed so far, our approach allows for solving jointly the detection and SLAM problems, so as to achieve object detection together with improved SLAM accuracy.

4 0.25218734 245 cvpr-2013-Layer Depth Denoising and Completion for Structured-Light RGB-D Cameras

Author: Ju Shen, Sen-Ching S. Cheung

Abstract: The recent popularity of structured-light depth sensors has enabled many new applications from gesture-based user interface to 3D reconstructions. The quality of the depth measurements of these systems, however, is far from perfect. Some depth values can have significant errors, while others can be missing altogether. The uncertainty in depth measurements among these sensors can significantly degrade the performance of any subsequent vision processing. In this paper, we propose a novel probabilistic model to capture various types of uncertainties in the depth measurement process among structured-light systems. The key to our model is the use of depth layers to account for the differences between foreground objects and background scene, the missing depth value phenomenon, and the correlation between color and depth channels. The depth layer labeling is solved as a maximum a-posteriori estimation problem, and a Markov Random Field attuned to the uncertainty in measurements is used to spatially smooth the labeling process. Using the depth-layer labels, we propose a depth correction and completion algorithm that outperforms oth- er techniques in the literature.

5 0.21874438 397 cvpr-2013-Simultaneous Super-Resolution of Depth and Images Using a Single Camera

Author: Hee Seok Lee, Kuoung Mu Lee

Abstract: In this paper, we propose a convex optimization framework for simultaneous estimation of super-resolved depth map and images from a single moving camera. The pixel measurement error in 3D reconstruction is directly related to the resolution of the images at hand. In turn, even a small measurement error can cause significant errors in reconstructing 3D scene structure or camera pose. Therefore, enhancing image resolution can be an effective solution for securing the accuracy as well as the resolution of 3D reconstruction. In the proposed method, depth map estimation and image super-resolution are formulated in a single energy minimization framework with a convex function and solved efficiently by a first-order primal-dual algorithm. Explicit inter-frame pixel correspondences are not required for our super-resolution procedure, thus we can avoid a huge computation time and obtain improved depth map in the accuracy and resolution as well as highresolution images with reasonable time. The superiority of our algorithm is demonstrated by presenting the improved depth map accuracy, image super-resolution results, and camera pose estimation.

6 0.18656695 380 cvpr-2013-Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images

7 0.16722158 230 cvpr-2013-Joint 3D Scene Reconstruction and Class Segmentation

8 0.16396302 117 cvpr-2013-Detecting Changes in 3D Structure of a Scene from Multi-view Images Captured by a Vehicle-Mounted Camera

9 0.15821646 227 cvpr-2013-Intrinsic Scene Properties from a Single RGB-D Image

10 0.15292743 14 cvpr-2013-A Joint Model for 2D and 3D Pose Estimation from a Single Image

11 0.14768314 108 cvpr-2013-Dense 3D Reconstruction from Severely Blurred Images Using a Single Moving Camera

12 0.14608248 284 cvpr-2013-Mesh Based Semantic Modelling for Indoor and Outdoor Scenes

13 0.14357556 423 cvpr-2013-Template-Based Isometric Deformable 3D Reconstruction with Sampling-Based Focal Length Self-Calibration

14 0.14314727 444 cvpr-2013-Unconstrained Monocular 3D Human Pose Estimation by Action Detection and Cross-Modality Regression Forest

15 0.14142849 113 cvpr-2013-Dense Variational Reconstruction of Non-rigid Surfaces from Monocular Video

16 0.14048651 61 cvpr-2013-Beyond Point Clouds: Scene Understanding by Reasoning Geometry and Physics

17 0.13679954 345 cvpr-2013-Real-Time Model-Based Rigid Object Pose Estimation and Tracking Combining Dense and Sparse Visual Cues

18 0.13648638 74 cvpr-2013-CLAM: Coupled Localization and Mapping with Efficient Outlier Handling

19 0.13500151 394 cvpr-2013-Shading-Based Shape Refinement of RGB-D Images

20 0.13195311 115 cvpr-2013-Depth Super Resolution by Rigid Body Self-Similarity in 3D


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.29), (1, 0.256), (2, 0.04), (3, -0.005), (4, -0.019), (5, -0.094), (6, -0.005), (7, 0.099), (8, 0.087), (9, -0.017), (10, -0.097), (11, 0.065), (12, -0.047), (13, 0.141), (14, -0.002), (15, -0.133), (16, -0.12), (17, 0.15), (18, -0.109), (19, 0.021), (20, -0.023), (21, -0.035), (22, -0.053), (23, -0.04), (24, -0.141), (25, -0.001), (26, 0.014), (27, 0.062), (28, -0.041), (29, -0.084), (30, 0.001), (31, 0.09), (32, -0.177), (33, -0.159), (34, -0.05), (35, 0.14), (36, 0.193), (37, -0.081), (38, 0.027), (39, -0.099), (40, 0.036), (41, 0.023), (42, 0.004), (43, -0.117), (44, 0.006), (45, -0.046), (46, 0.043), (47, -0.062), (48, 0.058), (49, -0.068)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94461966 111 cvpr-2013-Dense Reconstruction Using 3D Object Shape Priors

Author: Amaury Dame, Victor A. Prisacariu, Carl Y. Ren, Ian Reid

Abstract: We propose a formulation of monocular SLAM which combines live dense reconstruction with shape priors-based 3D tracking and reconstruction. Current live dense SLAM approaches are limited to the reconstruction of visible surfaces. Moreover, most of them are based on the minimisation of a photo-consistency error, which usually makes them sensitive to specularities. In the 3D pose recovery literature, problems caused by imperfect and ambiguous image information have been dealt with by using prior shape knowledge. At the same time, the success of depth sensors has shown that combining joint image and depth information drastically increases the robustness of the classical monocular 3D tracking and 3D reconstruction approaches. In this work we link dense SLAM to 3D object pose and shape recovery. More specifically, we automatically augment our SLAMsystem with object specific identity, together with 6D pose and additional shape degrees of freedom for the object(s) of known class in the scene, combining im- age data and depth information for the pose and shape recovery. This leads to a system that allows for full scaled 3D reconstruction with the known object(s) segmented from the scene. The segmentation enhances the clarity, accuracy and completeness of the maps built by the dense SLAM system, while the dense 3D data aids the segmentation process, yieldingfaster and more reliable convergence than when using 2D image data alone.

2 0.91709238 372 cvpr-2013-SLAM++: Simultaneous Localisation and Mapping at the Level of Objects

Author: Renato F. Salas-Moreno, Richard A. Newcombe, Hauke Strasdat, Paul H.J. Kelly, Andrew J. Davison

Abstract: We present the major advantages of a new ‘object oriented’ 3D SLAM paradigm, which takes full advantage in the loop of prior knowledge that many scenes consist of repeated, domain-specific objects and structures. As a hand-held depth camera browses a cluttered scene, realtime 3D object recognition and tracking provides 6DoF camera-object constraints which feed into an explicit graph of objects, continually refined by efficient pose-graph optimisation. This offers the descriptive and predictive power of SLAM systems which perform dense surface reconstruction, but with a huge representation compression. The object graph enables predictions for accurate ICP-based camera to model tracking at each live frame, and efficient active search for new objects in currently undescribed image regions. We demonstrate real-time incremental SLAM in large, cluttered environments, including loop closure, relocalisation and the detection of moved objects, and of course the generation of an object level scene description with the potential to enable interaction.

3 0.80481952 231 cvpr-2013-Joint Detection, Tracking and Mapping by Semantic Bundle Adjustment

Author: Nicola Fioraio, Luigi Di_Stefano

Abstract: In this paper we propose a novel Semantic Bundle Adjustmentframework whereby known rigid stationary objects are detected while tracking the camera and mapping the environment. The system builds on established tracking and mapping techniques to exploit incremental 3D reconstruction in order to validate hypotheses on the presence and pose of sought objects. Then, detected objects are explicitly taken into account for a global semantic optimization of both camera and object poses. Thus, unlike all systems proposed so far, our approach allows for solving jointly the detection and SLAM problems, so as to achieve object detection together with improved SLAM accuracy.

4 0.6511724 397 cvpr-2013-Simultaneous Super-Resolution of Depth and Images Using a Single Camera

Author: Hee Seok Lee, Kuoung Mu Lee

Abstract: In this paper, we propose a convex optimization framework for simultaneous estimation of super-resolved depth map and images from a single moving camera. The pixel measurement error in 3D reconstruction is directly related to the resolution of the images at hand. In turn, even a small measurement error can cause significant errors in reconstructing 3D scene structure or camera pose. Therefore, enhancing image resolution can be an effective solution for securing the accuracy as well as the resolution of 3D reconstruction. In the proposed method, depth map estimation and image super-resolution are formulated in a single energy minimization framework with a convex function and solved efficiently by a first-order primal-dual algorithm. Explicit inter-frame pixel correspondences are not required for our super-resolution procedure, thus we can avoid a huge computation time and obtain improved depth map in the accuracy and resolution as well as highresolution images with reasonable time. The superiority of our algorithm is demonstrated by presenting the improved depth map accuracy, image super-resolution results, and camera pose estimation.

5 0.64338732 380 cvpr-2013-Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images

Author: Jamie Shotton, Ben Glocker, Christopher Zach, Shahram Izadi, Antonio Criminisi, Andrew Fitzgibbon

Abstract: We address the problem of inferring the pose of an RGB-D camera relative to a known 3D scene, given only a single acquired image. Our approach employs a regression forest that is capable of inferring an estimate of each pixel’s correspondence to 3D points in the scene ’s world coordinate frame. The forest uses only simple depth and RGB pixel comparison features, and does not require the computation of feature descriptors. The forest is trained to be capable of predicting correspondences at any pixel, so no interest point detectors are required. The camera pose is inferred using a robust optimization scheme. This starts with an initial set of hypothesized camera poses, constructed by applying the forest at a small fraction of image pixels. Preemptive RANSAC then iterates sampling more pixels at which to evaluate the forest, counting inliers, and refining the hypothesized poses. We evaluate on several varied scenes captured with an RGB-D camera and observe that the proposed technique achieves highly accurate relocalization and substantially out-performs two state of the art baselines.

6 0.61397225 74 cvpr-2013-CLAM: Coupled Localization and Mapping with Efficient Outlier Handling

7 0.61165452 354 cvpr-2013-Relative Volume Constraints for Single View 3D Reconstruction

8 0.6072039 230 cvpr-2013-Joint 3D Scene Reconstruction and Class Segmentation

9 0.59462124 423 cvpr-2013-Template-Based Isometric Deformable 3D Reconstruction with Sampling-Based Focal Length Self-Calibration

10 0.57982081 117 cvpr-2013-Detecting Changes in 3D Structure of a Scene from Multi-view Images Captured by a Vehicle-Mounted Camera

11 0.55721408 428 cvpr-2013-The Episolar Constraint: Monocular Shape from Shadow Correspondence

12 0.5490092 61 cvpr-2013-Beyond Point Clouds: Scene Understanding by Reasoning Geometry and Physics

13 0.53847915 284 cvpr-2013-Mesh Based Semantic Modelling for Indoor and Outdoor Scenes

14 0.53806704 395 cvpr-2013-Shape from Silhouette Probability Maps: Reconstruction of Thin Objects in the Presence of Silhouette Extraction and Calibration Error

15 0.52503955 110 cvpr-2013-Dense Object Reconstruction with Semantic Priors

16 0.5206086 289 cvpr-2013-Monocular Template-Based 3D Reconstruction of Extensible Surfaces with Local Linear Elasticity

17 0.50502044 227 cvpr-2013-Intrinsic Scene Properties from a Single RGB-D Image

18 0.50146413 245 cvpr-2013-Layer Depth Denoising and Completion for Structured-Light RGB-D Cameras

19 0.50105035 331 cvpr-2013-Physically Plausible 3D Scene Tracking: The Single Actor Hypothesis

20 0.50031644 286 cvpr-2013-Mirror Surface Reconstruction from a Single Image


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(10, 0.14), (12, 0.012), (16, 0.024), (26, 0.041), (33, 0.288), (49, 0.016), (57, 0.012), (63, 0.015), (67, 0.063), (69, 0.066), (83, 0.098), (87, 0.098)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.98322165 400 cvpr-2013-Single Image Calibration of Multi-axial Imaging Systems

Author: Amit Agrawal, Srikumar Ramalingam

Abstract: Imaging systems consisting of a camera looking at multiple spherical mirrors (reflection) or multiple refractive spheres (refraction) have been used for wide-angle imaging applications. We describe such setups as multi-axial imaging systems, since a single sphere results in an axial system. Assuming an internally calibrated camera, calibration of such multi-axial systems involves estimating the sphere radii and locations in the camera coordinate system. However, previous calibration approaches require manual intervention or constrained setups. We present a fully automatic approach using a single photo of a 2D calibration grid. The pose of the calibration grid is assumed to be unknown and is also recovered. Our approach can handle unconstrained setups, where the mirrors/refractive balls can be arranged in any fashion, not necessarily on a grid. The axial nature of rays allows us to compute the axis of each sphere separately. We then show that by choosing rays from two or more spheres, the unknown pose of the calibration grid can be obtained linearly and independently of sphere radii and locations. Knowing the pose, we derive analytical solutions for obtaining the sphere radius and location. This leads to an interesting result that 6-DOF pose estimation of a multi-axial camera can be done without the knowledge of full calibration. Simulations and real experiments demonstrate the applicability of our algorithm.

2 0.96725792 432 cvpr-2013-Three-Dimensional Bilateral Symmetry Plane Estimation in the Phase Domain

Author: Ramakrishna Kakarala, Prabhu Kaliamoorthi, Vittal Premachandran

Abstract: We show that bilateral symmetry plane estimation for three-dimensional (3-D) shapes may be carried out accurately, and efficiently, in the spherical harmonic domain. Our methods are valuable for applications where spherical harmonic expansion is already employed, such as 3-D shape registration, morphometry, and retrieval. We show that the presence of bilateral symmetry in the 3-D shape is equivalent to a linear phase structure in the corresponding spherical harmonic coefficients, and provide algorithms for estimating the orientation of the symmetry plane. The benefit of using spherical harmonic phase is that symmetry estimation reduces to matching a compact set of descriptors, without the need to solve a correspondence problem. Our methods work on point clouds as well as large-scale mesh models of 3-D shapes.

3 0.9529615 109 cvpr-2013-Dense Non-rigid Point-Matching Using Random Projections

Author: Raffay Hamid, Dennis Decoste, Chih-Jen Lin

Abstract: We present a robust and efficient technique for matching dense sets of points undergoing non-rigid spatial transformations. Our main intuition is that the subset of points that can be matched with high confidence should be used to guide the matching procedure for the rest. We propose a novel algorithm that incorporates these high-confidence matches as a spatial prior to learn a discriminative subspace that simultaneously encodes both the feature similarity as well as their spatial arrangement. Conventional subspace learning usually requires spectral decomposition of the pair-wise distance matrix across the point-sets, which can become inefficient even for moderately sized problems. To this end, we propose the use of random projections for approximate subspace learning, which can provide significant time improvements at the cost of minimal precision loss. This efficiency gain allows us to iteratively find and remove high-confidence matches from the point sets, resulting in high recall. To show the effectiveness of our approach, we present a systematic set of experiments and results for the problem of dense non-rigid image-feature matching.

4 0.95272535 248 cvpr-2013-Learning Collections of Part Models for Object Recognition

Author: Ian Endres, Kevin J. Shih, Johnston Jiaa, Derek Hoiem

Abstract: We propose a method to learn a diverse collection of discriminative parts from object bounding box annotations. Part detectors can be trained and applied individually, which simplifies learning and extension to new features or categories. We apply the parts to object category detection, pooling part detections within bottom-up proposed regions and using a boosted classifier with proposed sigmoid weak learners for scoring. On PASCAL VOC 2010, we evaluate the part detectors ’ ability to discriminate and localize annotated keypoints. Our detection system is competitive with the best-existing systems, outperforming other HOG-based detectors on the more deformable categories.

5 0.95085466 365 cvpr-2013-Robust Real-Time Tracking of Multiple Objects by Volumetric Mass Densities

Author: Horst Possegger, Sabine Sternig, Thomas Mauthner, Peter M. Roth, Horst Bischof

Abstract: Combining foreground images from multiple views by projecting them onto a common ground-plane has been recently applied within many multi-object tracking approaches. These planar projections introduce severe artifacts and constrain most approaches to objects moving on a common 2D ground-plane. To overcome these limitations, we introduce the concept of an occupancy volume exploiting the full geometry and the objects ’ center of mass and develop an efficient algorithm for 3D object tracking. Individual objects are tracked using the local mass density scores within a particle filter based approach, constrained by a Voronoi partitioning between nearby trackers. Our method benefits from the geometric knowledge given by the occupancy volume to robustly extract features and train classifiers on-demand, when volumetric information becomes unreliable. We evaluate our approach on several challenging real-world scenarios including the public APIDIS dataset. Experimental evaluations demonstrate significant improvements compared to state-of-theart methods, while achieving real-time performance. – –

same-paper 6 0.9481324 111 cvpr-2013-Dense Reconstruction Using 3D Object Shape Priors

7 0.94672674 74 cvpr-2013-CLAM: Coupled Localization and Mapping with Efficient Outlier Handling

8 0.94573373 98 cvpr-2013-Cross-View Action Recognition via a Continuous Virtual Path

9 0.94520897 61 cvpr-2013-Beyond Point Clouds: Scene Understanding by Reasoning Geometry and Physics

10 0.94458044 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases

11 0.94452119 242 cvpr-2013-Label Propagation from ImageNet to 3D Point Clouds

12 0.94367731 408 cvpr-2013-Spatiotemporal Deformable Part Models for Action Detection

13 0.94325334 414 cvpr-2013-Structure Preserving Object Tracking

14 0.9430933 225 cvpr-2013-Integrating Grammar and Segmentation for Human Pose Estimation

15 0.94242752 256 cvpr-2013-Learning Structured Hough Voting for Joint Object Detection and Occlusion Reasoning

16 0.94223797 143 cvpr-2013-Efficient Large-Scale Structured Learning

17 0.9420979 372 cvpr-2013-SLAM++: Simultaneous Localisation and Mapping at the Level of Objects

18 0.94205225 227 cvpr-2013-Intrinsic Scene Properties from a Single RGB-D Image

19 0.94148058 325 cvpr-2013-Part Discovery from Partial Correspondence

20 0.94139391 445 cvpr-2013-Understanding Bayesian Rooms Using Composite 3D Object Models