iccv iccv2013 iccv2013-139 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Qian-Yi Zhou, Stephen Miller, Vladlen Koltun
Abstract: We present an approach to reconstruction of detailed scene geometry from range video. Range data produced by commodity handheld cameras suffers from high-frequency errors and low-frequency distortion. Our approach deals with both sources of error by reconstructing locally smooth scene fragments and letting these fragments deform in order to align to each other. We develop a volumetric registration formulation that leverages the smoothness of the deformation to make optimization practical for large scenes. Experimental results demonstrate that our approach substantially increases the fidelity of complex scene geometry reconstructed with commodity handheld cameras.
Reference: text
sentIndex sentText sentNum sentScore
1 Elastic Fragments for Dense Scene Reconstruction Qian-Yi Zhou1 Stephen Miller1 Vladlen Koltun1,2 1Stanford University Abstract We present an approach to reconstruction of detailed scene geometry from range video. [sent-1, score-0.316]
2 Range data produced by commodity handheld cameras suffers from high-frequency errors and low-frequency distortion. [sent-2, score-0.162]
3 Our approach deals with both sources of error by reconstructing locally smooth scene fragments and letting these fragments deform in order to align to each other. [sent-3, score-1.089]
4 We develop a volumetric registration formulation that leverages the smoothness of the deformation to make optimization practical for large scenes. [sent-4, score-0.372]
5 Experimental results demonstrate that our approach substantially increases the fidelity of complex scene geometry reconstructed with commodity handheld cameras. [sent-5, score-0.264]
6 Introduction Enabling the reconstruction of detailed surface geometry from image data is one of the central goals of computer vision. [sent-7, score-0.299]
7 Substantial progress on dense scene reconstruction from photographs and video sequences has been made, despite the ambiguity of photometric cues [20, 26, 21, 6, 15, 17]. [sent-8, score-0.216]
8 When direct information on the surface geometry ofthe scene is given in the form of range data, we can expect to do even better. [sent-9, score-0.263]
9 Obtaining a detailed three-dimensional model of an object or an environment from range images is difficult in part due to the high-frequency noise and quantization artifacts in the data [11, 27]. [sent-11, score-0.141]
10 [16], building on work on range image integration [2], real-time range scanning [23], and monocular SLAM [3, 4, 12] showed that registering each input image to a growing volumetric model can average out high-frequency error and produce smooth reconstructions of objects and small scenes. [sent-14, score-0.36]
11 A related source of difficulty is the substantial lowfrequency distortion present in range images produced by 2Adobe Research Figure1. [sent-16, score-0.388]
12 The sculpture is 4 meters wide and 6 meters high. [sent-18, score-0.122]
13 This may not lead to noticeable artifacts if the scanned objects are relatively small or if the scanned surfaces do not contain fine-scale details. [sent-21, score-0.184]
14 However, for sufficiently large and complex scenes this distortion leads to clearly visible artifacts in the reconstructed geometry (Figure 2). [sent-22, score-0.388]
15 (a) Extended KinectFusion [22] is unable to produce a globally consistent reconstruction due to drift. [sent-27, score-0.11]
16 (b) The approach of Zhou and Koltun [35] is restricted to rigid alignment and is unable to correct the inconsistencies in trajectory fragments acquired at different times and from different points of view. [sent-28, score-0.935]
17 Current techniques for dense scene reconstruction from consumer-grade range video cast the problem in terms of trajectory estimation [16, 7, 34, 35]. [sent-30, score-0.405]
18 The implicit assumption is that once a sufficiently accurate estimate for the camera trajectory is obtained, the range images can be integrated to yield a clean model of the scene’s geometry. [sent-31, score-0.285]
19 The difficulty is that for sufficiently complex scenes and camera trajectories there may not be any estimate for the trajectory that yields an artifact-free reconstruction with rigidly aligned images, due to the low-frequency distortion in the input. [sent-32, score-0.553]
20 Rigidly aligning the images along a camera path is not always sufficient to resolve the inconsistencies produced by distortions in the sensor. [sent-33, score-0.285]
21 In this work, we introduce a scene reconstruction ap- proach that is based on non-rigid alignment. [sent-34, score-0.158]
22 Specifically, we partition the input stream into small fragments of k frames each. [sent-36, score-0.48]
23 Frame-to-model registration [16] is used to reconstruct the surfaces imaged in each fragment, integrating out high-frequency error. [sent-37, score-0.266]
24 Since the low-frequency distortion introduced by the sensor is intrinsically stationary and since the fragments are temporally brief, each fragment is internally consistent. [sent-38, score-0.95]
25 The problem is that fragments that were acquired from substantially different points of view are in general not mutually consistent. [sent-39, score-0.521]
26 Our approach allows the fragments to subtly bend to resolve these extrinsic inconsistencies. [sent-40, score-0.58]
27 This is done by optimizing a global objective that maximizes alignment between overlapping fragments while minimizing elastic strain energy to protect local detail. [sent-41, score-0.96]
28 Non-rigid registration has a long history in medical imaging and computer vision, resulting in sophisticated techniques for aligning two-dimensional contours and three-dimensional shapes [19, 9, 33, 14, 10]. [sent-42, score-0.217]
29 Our work aims to reconstruct spatially extended scenes from a large number of range images, each of which covers only a small part of the scene. [sent-44, score-0.145]
30 Our approach was thus designed 474 to preserve surface detail while operating on a scale that has rarely been addressed with non-rigid registration techniques. [sent-46, score-0.274]
31 The closest work to ours is due to Brown and Rusinkiewicz, who used non-rigid alignment to produce precise object models from 3D scans [1]. [sent-47, score-0.208]
32 We adopt the basic idea of employing non-rigid deformation to preserve surface detail, but develop a different formulation that is more appropriate to our setting. [sent-48, score-0.168]
33 Specifically, the approach of Brown and Rusinkiewicz is based on detecting and aligning keypoints, and propagating this sparse alignment using thin-plate splines. [sent-49, score-0.204]
34 This approach can be problematic because keypoint-based correspondences are imperfect in practice and the spline interpolation is insensitive to surface detail outside the keypoints. [sent-50, score-0.2]
35 We formulate an optimization objective that integrates alignment and regularization constraints that densely cover all surfaces in the scene. [sent-51, score-0.268]
36 Since our input is a temporally dense stream of range data, we can establish correspondences directly on dense geometry without singling out keypoints. [sent-52, score-0.304]
37 This enables the formulation of a regularization objective that reliably preserves surface detail throughout the scene. [sent-53, score-0.18]
38 Figures 1 and 2 illustrate the benefits of elastic registration. [sent-54, score-0.133]
39 Since the high-frequency noise is integrated out by individual fragments and the low-frequency distortion is resolved when the fragments are registered to each other, detailed surface geometry is cleanly reconstructed through- out the scene. [sent-56, score-1.395]
40 Given an RGB-D scan as input, we partition it into k-frame segments (we use k = 50 or k = 100) and use the frame-to-model registration and integration pipeline developed by Newcombe et al. [sent-61, score-0.304]
41 [16] to reconstruct a locally precise surface fragment from each such trajectory segment. [sent-62, score-0.466]
42 Each fragment is a triangular mesh with the vertex set Pi = {p} and the edge set Gi ⊂ Pi2. [sent-63, score-0.17]
43 The purpose of initial alignment is to establish dense correspondences between fragments that cover overlapping parts of the scene. [sent-65, score-0.819]
44 To initialize this process, we assume that a rough initial alignment between the fragments in an extrinsic coordinate frame (“scene frame”) can be obtained. [sent-66, score-0.736]
45 While prior work relied on manual initial alignment [1], we found that an off-the-shelf SLAM system [5] was sufficient for our purposes. [sent-67, score-0.169]
46 To this end, we test every pair of fragments and attempt to align it using ICP starting with the relative pose provided by the rough initialization. [sent-69, score-0.571]
47 Tthhreesshe correspondence sets, peesrtiambelinshtse)d a over many pairs of overlapping fragments, are used in the next stage to define the alignment objective. [sent-73, score-0.216]
48 Given the correspondences extracted in the preceding stage, we define an optimization objective that combines an alignment term and a regularization term. [sent-75, score-0.286]
49 The alignment term minimizes the distances between corresponding points on different fragments. [sent-76, score-0.169]
50 The regularization term preserves the shape of each fragment by minimizing the elastic strain energy produced by the deformation. [sent-77, score-0.454]
51 It also deals poorly with fragments that have multiple connected components, which are commonly encountered in complex scenes. [sent-81, score-0.48]
52 Volumetric integration [2] is used to merge the fragments and to obtain the complete scene model. [sent-87, score-0.612]
53 After motivating the objective and clarifying the deficiencies of the initial approach, we develop a volumetric formulation that addresses these issues in Section 3. [sent-92, score-0.202]
54 We compute T by 475 minimizing an energy function of the form E(T) = Ea(T) + Er(T), (1) where Ea is the alignment term and Er is the elastic regu- larization term. [sent-113, score-0.302]
55 The alignment term Ea(T) measures the alignment of all corresponding pairs. [sent-114, score-0.338]
56 We use the point-to-plane distance, which has well-known benefits for surface registration [24]: = Ea(T) ? [sent-115, score-0.274]
57 i The regularizer Er (T) measures the elastic strain energy for all fragments [32]. [sent-133, score-0.692]
58 In principle, we want to measure the change in the first two fundamental forms of each surface due to the mapping T: ? [sent-134, score-0.138]
59 iws a erot Nati(opn) t risan thsfeor smet t ohfat n maps trhse lofc pal tangent frame of p to the local tangent frame of p? [sent-158, score-0.222]
60 We then compute an updated estimate for the local normal and tangent frame at each point p? [sent-186, score-0.111]
61 For example, the scene shown in Figure 1 contains 370 fragments with a total of 66. [sent-191, score-0.528]
62 Furthermore, the point-based formulation does not control for distortion induced by changes in the relative pose of disconnected surfaces within fragments. [sent-194, score-0.364]
63 2, we reformulate the registration objective to address these issues. [sent-196, score-0.28]
64 = T(p) is reconstructed from the transformed control points v? [sent-215, score-0.123]
65 A is a block matrix with blocks of size m m, where m/3 is the number of vertices in each control mlatt×icme. [sent-264, score-0.121]
66 The alignment term Ea leads to non-zero values in the main diagonal blocks and in blocks that correspond to overlapping fragment pairs for which correspondences were established during the initial alignment stage. [sent-267, score-0.744]
67 ) For large scenes, each fragment will over- = lap cwhit hK a c? [sent-269, score-0.17]
68 )n Ft norum larbgeer oscfe fragments on average laln odv tehrematrix A? [sent-271, score-0.48]
69 The optimization is initialized using the rough rigid alignment computed in the initial alignment stage (Section 2). [sent-277, score-0.436]
70 We try to scan as much surface detail as possible in order to evaluate the quality of the reconstruction. [sent-297, score-0.13]
71 A typical scan lasts for 2 to 20 minutes, along a complicated camera trajectory with numerous loop closures. [sent-298, score-0.227]
72 During scanning, the operator could see the color and depth images captured by the sensor in real time, but no preview of the reconstruction was shown. [sent-299, score-0.227]
73 Our approach creates a globally consistent scene with high-fidelity local details, while Extended KinectFusion suffers from lack of loop closure and the rigid registration approach ofZhou and Koltun breaks some local regions due to unresolved residual distortion. [sent-302, score-0.282]
74 We also compare to a reconstruction produced by a hypothetical approach that integrates along the motion-captured camera trajectory provided by the benchmark. [sent-303, score-0.411]
75 This can be attributed to two potential causes: the approach is limited to rigid alignment and does not resolve the lowfrequency distortion in the data, and the sensor noise of the motion capture system. [sent-305, score-0.602]
76 To further identify the error source and to make quantitative evaluations, we synthesize range video sequences using synthetic 3D models and use these models as ground truth geometry to evaluate the reconstruction quality. [sent-306, score-0.279]
77 To synthesize these sequences, we navigate a virtual camera around each synthetic model and produce perfect range images at full frame rate using a standard rendering pipeline. [sent-307, score-0.209]
78 These images are then combined with two error models to simulate the data produced by real-world range cameras. [sent-308, score-0.133]
79 The two error models we use aim to simulate the factory-calibrated data produced by PrimeSense sensors and idealized data with no low-frequency distortion. [sent-309, score-0.291]
80 To produce the idealized data, we process the perfect synthetic depth images using the quantization model described by Konolige and Mihelich [13] and introduce sensor noise following the model of Nguyen et al. [sent-310, score-0.333]
81 To produce the simulated factorycalibrated data, we add a model of low-frequency distortion estimated on a real PrimeSense sensor using the calibration approach of Teichman et al. [sent-312, score-0.343]
82 The results are obtained by computing the point-toplane distance from points in the reconstructed model to the ground truth shape, after initial alignment by standard rigid registration. [sent-315, score-0.285]
83 We compare our approach to three alternatives: 477 Extended KinectFusion [22], Zhou and Koltun [35], and integration of the simulated depth images along the ground truth trajectory. [sent-316, score-0.166]
84 For idealized data with no low-frequency distortion, the idealized approach that uses the ground-truth trajectory performs extremely well and outperforms our approach. [sent-319, score-0.468]
85 For simulated factory-calibrated data, our approach sometimes outperforms the idealized approach. [sent-320, score-0.209]
86 This is because the idealized approach is limited to rigid alignment. [sent-321, score-0.222]
87 Although it benefits from perfect camera localization, the real-world distortion in the data causes in- consistencies between input images that are too large to be eliminated by volumetric integration. [sent-322, score-0.357]
88 Our approach uses nonrigid alignment to resolve these inconsistencies. [sent-323, score-0.221]
89 Conclusion We presented an approach for dense scene reconstruction from range video produced by consumer-grade cameras. [sent-325, score-0.349]
90 Our approach partitions the video sequence into segments, uses frame-to-model integration to reconstruct locally precise scene fragments from each segment, establishes dense correspondences between overlapping fragments, and optimizes a global objective that aligns the fragments. [sent-326, score-0.91]
91 The optimization can subtly deform the fragments, thus correcting inconsistencies caused by low-frequency distortion in the input images. [sent-327, score-0.331]
92 Frame-to-model integration can fail due to jerky camera movement. [sent-330, score-0.145]
93 A volumetric method for building complex models from range images. [sent-344, score-0.175]
94 Evaluation on a benchmark scene [29]: (a) Extended KinectFusion [22], (b) Zhou and Koltun [35], (c) volumetric integration along the motion-captured camera trajectory, and (d) our approach. [sent-386, score-0.307]
95 Joint depth and color camera calibration with distortion correction. [sent-393, score-0.334]
96 Modeling Kinect sensor noise for improved 3D reconstruction and tracking. [sent-469, score-0.184]
97 Simultaneous nonrigid registration of multiple point sets and atlas construction. [sent-599, score-0.182]
98 (a) Extended KinectFusion, (b) Zhou and Koltun, (c) volumetric integration along the groundtruth camera trajectory, and (d) our approach. [sent-651, score-0.259]
99 (I) and (II) use an idealized error model with no low-frequency distortion. [sent-653, score-0.17]
100 (III) and (IV) use the full error model with low-frequency distortion estimated on a real PrimeSense sensor. [sent-654, score-0.182]
wordName wordTfidf (topN-words)
[('fragments', 0.48), ('koltun', 0.266), ('registration', 0.182), ('distortion', 0.182), ('kinectfusion', 0.181), ('fragment', 0.17), ('idealized', 0.17), ('alignment', 0.169), ('elastic', 0.133), ('trajectory', 0.128), ('pi', 0.126), ('slam', 0.117), ('volumetric', 0.114), ('reconstruction', 0.11), ('surface', 0.092), ('rusinkiewicz', 0.091), ('teichman', 0.089), ('zhou', 0.088), ('newcombe', 0.086), ('integration', 0.084), ('primesense', 0.079), ('strain', 0.079), ('er', 0.076), ('sensor', 0.074), ('lowfrequency', 0.073), ('produced', 0.072), ('tangent', 0.07), ('inconsistencies', 0.065), ('correspondences', 0.065), ('reconstructed', 0.064), ('geometry', 0.062), ('blocks', 0.062), ('camera', 0.061), ('range', 0.061), ('control', 0.059), ('ea', 0.059), ('daun', 0.059), ('rtuanl', 0.059), ('tuanl', 0.059), ('dense', 0.058), ('lattice', 0.057), ('icp', 0.054), ('handheld', 0.053), ('sculpture', 0.052), ('objective', 0.052), ('rigid', 0.052), ('resolve', 0.052), ('sensors', 0.049), ('closeup', 0.048), ('engelhard', 0.048), ('subtly', 0.048), ('scene', 0.048), ('calibration', 0.048), ('overlapping', 0.047), ('surfaces', 0.047), ('extended', 0.047), ('synthetic', 0.046), ('mapping', 0.046), ('reformulate', 0.046), ('scanned', 0.046), ('rough', 0.046), ('iii', 0.045), ('align', 0.045), ('artifacts', 0.045), ('curless', 0.044), ('cholesky', 0.044), ('internally', 0.044), ('paragios', 0.044), ('vi', 0.043), ('interpolation', 0.043), ('depth', 0.043), ('davison', 0.042), ('izadi', 0.042), ('konolige', 0.042), ('frame', 0.041), ('brown', 0.041), ('acquired', 0.041), ('cl', 0.04), ('ismar', 0.04), ('hypothetical', 0.04), ('induced', 0.04), ('ii', 0.04), ('reconstructions', 0.04), ('deformation', 0.04), ('simulated', 0.039), ('precise', 0.039), ('scan', 0.038), ('reconstruct', 0.037), ('commodity', 0.037), ('guiding', 0.037), ('rigidly', 0.037), ('kb', 0.037), ('formulation', 0.036), ('deform', 0.036), ('aligning', 0.035), ('sufficiently', 0.035), ('kinect', 0.035), ('detailed', 0.035), ('nguyen', 0.035), ('meters', 0.035)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000006 139 iccv-2013-Elastic Fragments for Dense Scene Reconstruction
Author: Qian-Yi Zhou, Stephen Miller, Vladlen Koltun
Abstract: We present an approach to reconstruction of detailed scene geometry from range video. Range data produced by commodity handheld cameras suffers from high-frequency errors and low-frequency distortion. Our approach deals with both sources of error by reconstructing locally smooth scene fragments and letting these fragments deform in order to align to each other. We develop a volumetric registration formulation that leverages the smoothness of the deformation to make optimization practical for large scenes. Experimental results demonstrate that our approach substantially increases the fidelity of complex scene geometry reconstructed with commodity handheld cameras.
2 0.17619334 366 iccv-2013-STAR3D: Simultaneous Tracking and Reconstruction of 3D Objects Using RGB-D Data
Author: Carl Yuheng Ren, Victor Prisacariu, David Murray, Ian Reid
Abstract: We introduce a probabilistic framework for simultaneous tracking and reconstruction of 3D rigid objects using an RGB-D camera. The tracking problem is handled using a bag-of-pixels representation and a back-projection scheme. Surface and background appearance models are learned online, leading to robust tracking in the presence of heavy occlusion and outliers. In both our tracking and reconstruction modules, the 3D object is implicitly embedded using a 3D level-set function. The framework is initialized with a simple shape primitive model (e.g. a sphere or a cube), and the real 3D object shape is tracked and reconstructed online. Unlike existing depth-based 3D reconstruction works, which either rely on calibrated/fixed camera set up or use the observed world map to track the depth camera, our framework can simultaneously track and reconstruct small moving objects. We use both qualitative and quantitative results to demonstrate the superior performance of both tracking and reconstruction of our method.
3 0.15532285 228 iccv-2013-Large-Scale Multi-resolution Surface Reconstruction from RGB-D Sequences
Author: Frank Steinbrücker, Christian Kerl, Daniel Cremers
Abstract: We propose a method to generate highly detailed, textured 3D models of large environments from RGB-D sequences. Our system runs in real-time on a standard desktop PC with a state-of-the-art graphics card. To reduce the memory consumption, we fuse the acquired depth maps and colors in a multi-scale octree representation of a signed distance function. To estimate the camera poses, we construct a pose graph and use dense image alignment to determine the relative pose between pairs of frames. We add edges between nodes when we detect loop-closures and optimize the pose graph to correct for long-term drift. Our implementation is highly parallelized on graphics hardware to achieve real-time performance. More specifically, we can reconstruct, store, and continuously update a colored 3D model of an entire corridor of nine rooms at high levels of detail in real-time on a single GPU with 2.5GB.
4 0.1457454 319 iccv-2013-Point-Based 3D Reconstruction of Thin Objects
Author: Benjamin Ummenhofer, Thomas Brox
Abstract: 3D reconstruction deals with the problem of finding the shape of an object from a set of images. Thin objects that have virtually no volumepose a special challengefor reconstruction with respect to shape representation and fusion of depth information. In this paper we present a dense pointbased reconstruction method that can deal with this special class of objects. We seek to jointly optimize a set of depth maps by treating each pixel as a point in space. Points are pulled towards a common surface by pairwise forces in an iterative scheme. The method also handles the problem of opposed surfaces by means of penalty forces. Efficient optimization is achieved by grouping points to superpixels and a spatial hashing approach for fast neighborhood queries. We show that the approach is on a par with state-of-the-art methods for standard multi view stereo settings and gives superior results for thin objects.
5 0.14094186 56 iccv-2013-Automatic Registration of RGB-D Scans via Salient Directions
Author: Bernhard Zeisl, Kevin Köser, Marc Pollefeys
Abstract: We address the problem of wide-baseline registration of RGB-D data, such as photo-textured laser scans without any artificial targets or prediction on the relative motion. Our approach allows to fully automatically register scans taken in GPS-denied environments such as urban canyon, industrial facilities or even indoors. We build upon image features which are plenty, localized well and much more discriminative than geometry features; however, they suffer from viewpoint distortions and request for normalization. We utilize the principle of salient directions present in the geometry and propose to extract (several) directions from the distribution of surface normals or other cues such as observable symmetries. Compared to previous work we pose no requirements on the scanned scene (like containing large textured planes) and can handle arbitrary surface shapes. Rendering the whole scene from these repeatable directions using an orthographic camera generates textures which are identical up to 2D similarity transformations. This ambiguity is naturally handled by 2D features and allows to find stable correspondences among scans. For geometric pose estimation from tentative matches we propose a fast and robust 2 point sample consensus scheme integrating an early rejection phase. We evaluate our approach on different challenging real world scenes.
6 0.13720703 140 iccv-2013-Elastic Net Constraints for Shape Matching
7 0.13603093 9 iccv-2013-A Flexible Scene Representation for 3D Reconstruction Using an RGB-D Camera
8 0.12935998 367 iccv-2013-SUN3D: A Database of Big Spaces Reconstructed Using SfM and Object Labels
9 0.12547261 183 iccv-2013-Geometric Registration Based on Distortion Estimation
10 0.12517269 382 iccv-2013-Semi-dense Visual Odometry for a Monocular Camera
11 0.12246515 281 iccv-2013-Multi-view Normal Field Integration for 3D Reconstruction of Mirroring Objects
12 0.12225053 36 iccv-2013-Accurate and Robust 3D Facial Capture Using a Single RGBD Camera
13 0.1219274 16 iccv-2013-A Generic Deformation Model for Dense Non-rigid Surface Registration: A Higher-Order MRF-Based Approach
15 0.10244586 343 iccv-2013-Real-World Normal Map Capture for Nearly Flat Reflective Surfaces
16 0.10170691 436 iccv-2013-Unsupervised Intrinsic Calibration from a Single Frame Using a "Plumb-Line" Approach
17 0.10014099 283 iccv-2013-Multiple Non-rigid Surface Detection and Registration
18 0.099780798 185 iccv-2013-Go-ICP: Solving 3D Registration Efficiently and Globally Optimally
19 0.098339915 387 iccv-2013-Shape Anchors for Data-Driven Multi-view Reconstruction
20 0.093081176 128 iccv-2013-Dynamic Probabilistic Volumetric Models
topicId topicWeight
[(0, 0.203), (1, -0.192), (2, -0.049), (3, 0.046), (4, -0.032), (5, 0.003), (6, 0.042), (7, -0.115), (8, 0.017), (9, 0.008), (10, -0.005), (11, 0.052), (12, -0.052), (13, 0.044), (14, 0.045), (15, -0.028), (16, 0.045), (17, 0.089), (18, -0.017), (19, -0.061), (20, -0.005), (21, -0.004), (22, 0.026), (23, 0.078), (24, -0.08), (25, -0.027), (26, -0.035), (27, 0.094), (28, -0.001), (29, 0.014), (30, 0.074), (31, -0.114), (32, 0.037), (33, -0.028), (34, -0.027), (35, -0.042), (36, 0.032), (37, 0.1), (38, 0.172), (39, 0.012), (40, 0.103), (41, 0.013), (42, -0.053), (43, 0.056), (44, -0.017), (45, -0.011), (46, 0.031), (47, 0.005), (48, 0.012), (49, -0.043)]
simIndex simValue paperId paperTitle
same-paper 1 0.94921637 139 iccv-2013-Elastic Fragments for Dense Scene Reconstruction
Author: Qian-Yi Zhou, Stephen Miller, Vladlen Koltun
Abstract: We present an approach to reconstruction of detailed scene geometry from range video. Range data produced by commodity handheld cameras suffers from high-frequency errors and low-frequency distortion. Our approach deals with both sources of error by reconstructing locally smooth scene fragments and letting these fragments deform in order to align to each other. We develop a volumetric registration formulation that leverages the smoothness of the deformation to make optimization practical for large scenes. Experimental results demonstrate that our approach substantially increases the fidelity of complex scene geometry reconstructed with commodity handheld cameras.
2 0.79516846 228 iccv-2013-Large-Scale Multi-resolution Surface Reconstruction from RGB-D Sequences
Author: Frank Steinbrücker, Christian Kerl, Daniel Cremers
Abstract: We propose a method to generate highly detailed, textured 3D models of large environments from RGB-D sequences. Our system runs in real-time on a standard desktop PC with a state-of-the-art graphics card. To reduce the memory consumption, we fuse the acquired depth maps and colors in a multi-scale octree representation of a signed distance function. To estimate the camera poses, we construct a pose graph and use dense image alignment to determine the relative pose between pairs of frames. We add edges between nodes when we detect loop-closures and optimize the pose graph to correct for long-term drift. Our implementation is highly parallelized on graphics hardware to achieve real-time performance. More specifically, we can reconstruct, store, and continuously update a colored 3D model of an entire corridor of nine rooms at high levels of detail in real-time on a single GPU with 2.5GB.
3 0.76973659 183 iccv-2013-Geometric Registration Based on Distortion Estimation
Author: Wei Zeng, Mayank Goswami, Feng Luo, Xianfeng Gu
Abstract: Surface registration plays a fundamental role in many applications in computer vision and aims at finding a oneto-one correspondence between surfaces. Conformal mapping based surface registration methods conformally map 2D/3D surfaces onto 2D canonical domains and perform the matching on the 2D plane. This registration framework reduces dimensionality, and the result is intrinsic to Riemannian metric and invariant under isometric deformation. However, conformal mapping will be affected by inconsistent boundaries and non-isometric deformations of surfaces. In this work, we quantify the effects of boundary variation and non-isometric deformation to conformal mappings, and give the theoretical upper bounds for the distortions of conformal mappings under these two factors. Besides giving the thorough theoretical proofs of the theorems, we verified them by concrete experiments using 3D human facial scans with dynamic expressions and varying boundaries. Furthermore, we used the distortion estimates for reducing search range in feature matching of surface registration applications. The experimental results are consistent with the theoreticalpredictions and also demonstrate the performance improvements in feature tracking.
4 0.76936895 16 iccv-2013-A Generic Deformation Model for Dense Non-rigid Surface Registration: A Higher-Order MRF-Based Approach
Author: Yun Zeng, Chaohui Wang, Xianfeng Gu, Dimitris Samaras, Nikos Paragios
Abstract: We propose a novel approach for dense non-rigid 3D surface registration, which brings together Riemannian geometry and graphical models. To this end, we first introduce a generic deformation model, called Canonical Distortion Coefficients (CDCs), by characterizing the deformation of every point on a surface using the distortions along its two principle directions. This model subsumes the deformation groups commonly used in surface registration such as isometry and conformality, and is able to handle more complex deformations. We also derive its discrete counterpart which can be computed very efficiently in a closed form. Based on these, we introduce a higher-order Markov Random Field (MRF) model which seamlessly integrates our deformation model and a geometry/texture similarity metric. Then we jointly establish the optimal correspondences for all the points via maximum a posteriori (MAP) inference. Moreover, we develop a parallel optimization algorithm to efficiently perform the inference for the proposed higher-order MRF model. The resulting registration algorithm outperforms state-of-the-art methods in both dense non-rigid 3D surface registration and tracking.
5 0.74648881 319 iccv-2013-Point-Based 3D Reconstruction of Thin Objects
Author: Benjamin Ummenhofer, Thomas Brox
Abstract: 3D reconstruction deals with the problem of finding the shape of an object from a set of images. Thin objects that have virtually no volumepose a special challengefor reconstruction with respect to shape representation and fusion of depth information. In this paper we present a dense pointbased reconstruction method that can deal with this special class of objects. We seek to jointly optimize a set of depth maps by treating each pixel as a point in space. Points are pulled towards a common surface by pairwise forces in an iterative scheme. The method also handles the problem of opposed surfaces by means of penalty forces. Efficient optimization is achieved by grouping points to superpixels and a spatial hashing approach for fast neighborhood queries. We show that the approach is on a par with state-of-the-art methods for standard multi view stereo settings and gives superior results for thin objects.
6 0.73295784 56 iccv-2013-Automatic Registration of RGB-D Scans via Salient Directions
7 0.67200494 9 iccv-2013-A Flexible Scene Representation for 3D Reconstruction Using an RGB-D Camera
8 0.64766091 284 iccv-2013-Multiview Photometric Stereo Using Planar Mesh Parameterization
9 0.63279426 185 iccv-2013-Go-ICP: Solving 3D Registration Efficiently and Globally Optimally
10 0.609101 164 iccv-2013-Fibonacci Exposure Bracketing for High Dynamic Range Imaging
11 0.60230446 283 iccv-2013-Multiple Non-rigid Surface Detection and Registration
12 0.59921539 281 iccv-2013-Multi-view Normal Field Integration for 3D Reconstruction of Mirroring Objects
13 0.56527752 366 iccv-2013-STAR3D: Simultaneous Tracking and Reconstruction of 3D Objects Using RGB-D Data
14 0.56143337 346 iccv-2013-Rectangling Stereographic Projection for Wide-Angle Image Visualization
16 0.55514473 280 iccv-2013-Multi-view 3D Reconstruction from Uncalibrated Radially-Symmetric Cameras
17 0.54365009 343 iccv-2013-Real-World Normal Map Capture for Nearly Flat Reflective Surfaces
18 0.5421918 128 iccv-2013-Dynamic Probabilistic Volumetric Models
19 0.53545439 254 iccv-2013-Live Metric 3D Reconstruction on Mobile Phones
20 0.53496897 397 iccv-2013-Space-Time Tradeoffs in Photo Sequencing
topicId topicWeight
[(2, 0.019), (26, 0.037), (31, 0.022), (35, 0.017), (42, 0.055), (64, 0.026), (73, 0.021), (89, 0.703)]
simIndex simValue paperId paperTitle
same-paper 1 0.99628723 139 iccv-2013-Elastic Fragments for Dense Scene Reconstruction
Author: Qian-Yi Zhou, Stephen Miller, Vladlen Koltun
Abstract: We present an approach to reconstruction of detailed scene geometry from range video. Range data produced by commodity handheld cameras suffers from high-frequency errors and low-frequency distortion. Our approach deals with both sources of error by reconstructing locally smooth scene fragments and letting these fragments deform in order to align to each other. We develop a volumetric registration formulation that leverages the smoothness of the deformation to make optimization practical for large scenes. Experimental results demonstrate that our approach substantially increases the fidelity of complex scene geometry reconstructed with commodity handheld cameras.
2 0.99492484 81 iccv-2013-Combining the Right Features for Complex Event Recognition
Author: Kevin Tang, Bangpeng Yao, Li Fei-Fei, Daphne Koller
Abstract: In this paper, we tackle the problem of combining features extracted from video for complex event recognition. Feature combination is an especially relevant task in video data, as there are many features we can extract, ranging from image features computed from individual frames to video features that take temporal information into account. To combine features effectively, we propose a method that is able to be selective of different subsets of features, as some features or feature combinations may be uninformative for certain classes. We introduce a hierarchical method for combining features based on the AND/OR graph structure, where nodes in the graph represent combinations of different sets of features. Our method automatically learns the structure of the AND/OR graph using score-based structure learning, and we introduce an inference procedure that is able to efficiently compute structure scores. We present promising results and analysis on the difficult and large-scale 2011 TRECVID Multimedia Event Detection dataset [17].
3 0.99279422 39 iccv-2013-Action Recognition with Improved Trajectories
Author: Heng Wang, Cordelia Schmid
Abstract: Recently dense trajectories were shown to be an efficient video representation for action recognition and achieved state-of-the-art results on a variety of datasets. This paper improves their performance by taking into account camera motion to correct them. To estimate camera motion, we match feature points between frames using SURF descriptors and dense optical flow, which are shown to be complementary. These matches are, then, used to robustly estimate a homography with RANSAC. Human motion is in general different from camera motion and generates inconsistent matches. To improve the estimation, a human detector is employed to remove these matches. Given the estimated camera motion, we remove trajectories consistent with it. We also use this estimation to cancel out camera motion from the optical flow. This significantly improves motion-based descriptors, such as HOF and MBH. Experimental results onfour challenging action datasets (i.e., Hollywood2, HMDB51, Olympic Sports and UCF50) significantly outperform the current state of the art.
4 0.99102396 216 iccv-2013-Inferring "Dark Matter" and "Dark Energy" from Videos
Author: Dan Xie, Sinisa Todorovic, Song-Chun Zhu
Abstract: This paper presents an approach to localizing functional objects in surveillance videos without domain knowledge about semantic object classes that may appear in the scene. Functional objects do not have discriminative appearance and shape, but they affect behavior of people in the scene. For example, they “attract” people to approach them for satisfying certain needs (e.g., vending machines could quench thirst), or “repel” people to avoid them (e.g., grass lawns). Therefore, functional objects can be viewed as “dark matter”, emanating “dark energy ” that affects people ’s trajectories in the video. To detect “dark matter” and infer their “dark energy ” field, we extend the Lagrangian mechanics. People are treated as particle-agents with latent intents to approach “dark matter” and thus satisfy their needs, where their motions are subject to a composite “dark energy ” field of all functional objects in the scene. We make the assumption that people take globally optimal paths toward the intended “dark matter” while avoiding latent obstacles. A Bayesian framework is used to probabilistically model: people ’s trajectories and intents, constraint map of the scene, and locations of functional objects. A data-driven Markov Chain Monte Carlo (MCMC) process is used for inference. Our evaluation on videos of public squares and courtyards demonstrates our effectiveness in localizing functional objects and predicting people ’s trajectories in unobserved parts of the video footage.
5 0.98808831 103 iccv-2013-Deblurring by Example Using Dense Correspondence
Author: Yoav Hacohen, Eli Shechtman, Dani Lischinski
Abstract: This paper presents a new method for deblurring photos using a sharp reference example that contains some shared content with the blurry photo. Most previous deblurring methods that exploit information from other photos require an accurately registered photo of the same static scene. In contrast, our method aims to exploit reference images where the shared content may have undergone substantial photometric and non-rigid geometric transformations, as these are the kind of reference images most likely to be found in personal photo albums. Our approach builds upon a recent method for examplebased deblurring using non-rigid dense correspondence (NRDC) [11] and extends it in two ways. First, we suggest exploiting information from the reference image not only for blur kernel estimation, but also as a powerful local prior for the non-blind deconvolution step. Second, we introduce a simple yet robust technique for spatially varying blur estimation, rather than assuming spatially uniform blur. Unlike the aboveprevious method, which hasproven successful only with simple deblurring scenarios, we demonstrate that our method succeeds on a variety of real-world examples. We provide quantitative and qualitative evaluation of our method and show that it outperforms the state-of-the-art.
6 0.98553431 302 iccv-2013-Optimization Problems for Fast AAM Fitting in-the-Wild
7 0.98416245 56 iccv-2013-Automatic Registration of RGB-D Scans via Salient Directions
8 0.98365891 337 iccv-2013-Random Grids: Fast Approximate Nearest Neighbors and Range Searching for Image Search
9 0.97312516 2 iccv-2013-3D Scene Understanding by Voxel-CRF
10 0.9694016 390 iccv-2013-Shufflets: Shared Mid-level Parts for Fast Object Detection
11 0.96250105 319 iccv-2013-Point-Based 3D Reconstruction of Thin Objects
12 0.94571459 129 iccv-2013-Dynamic Scene Deblurring
13 0.94529033 228 iccv-2013-Large-Scale Multi-resolution Surface Reconstruction from RGB-D Sequences
14 0.94264472 343 iccv-2013-Real-World Normal Map Capture for Nearly Flat Reflective Surfaces
15 0.94089174 40 iccv-2013-Action and Event Recognition with Fisher Vectors on a Compact Feature Set
16 0.93888861 9 iccv-2013-A Flexible Scene Representation for 3D Reconstruction Using an RGB-D Camera
17 0.93724149 317 iccv-2013-Piecewise Rigid Scene Flow
18 0.93684834 42 iccv-2013-Active MAP Inference in CRFs for Efficient Semantic Segmentation
19 0.9362331 226 iccv-2013-Joint Subspace Stabilization for Stereoscopic Video
20 0.93539274 370 iccv-2013-Saliency Detection in Large Point Sets