iccv iccv2013 iccv2013-366 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Carl Yuheng Ren, Victor Prisacariu, David Murray, Ian Reid
Abstract: We introduce a probabilistic framework for simultaneous tracking and reconstruction of 3D rigid objects using an RGB-D camera. The tracking problem is handled using a bag-of-pixels representation and a back-projection scheme. Surface and background appearance models are learned online, leading to robust tracking in the presence of heavy occlusion and outliers. In both our tracking and reconstruction modules, the 3D object is implicitly embedded using a 3D level-set function. The framework is initialized with a simple shape primitive model (e.g. a sphere or a cube), and the real 3D object shape is tracked and reconstructed online. Unlike existing depth-based 3D reconstruction works, which either rely on calibrated/fixed camera set up or use the observed world map to track the depth camera, our framework can simultaneously track and reconstruct small moving objects. We use both qualitative and quantitative results to demonstrate the superior performance of both tracking and reconstruction of our method.
Reference: text
sentIndex sentText sentNum sentScore
1 au Abstract We introduce a probabilistic framework for simultaneous tracking and reconstruction of 3D rigid objects using an RGB-D camera. [sent-7, score-0.815]
2 The tracking problem is handled using a bag-of-pixels representation and a back-projection scheme. [sent-8, score-0.293]
3 Surface and background appearance models are learned online, leading to robust tracking in the presence of heavy occlusion and outliers. [sent-9, score-0.289]
4 In both our tracking and reconstruction modules, the 3D object is implicitly embedded using a 3D level-set function. [sent-10, score-0.542]
5 The framework is initialized with a simple shape primitive model (e. [sent-11, score-0.256]
6 a sphere or a cube), and the real 3D object shape is tracked and reconstructed online. [sent-13, score-0.434]
7 Unlike existing depth-based 3D reconstruction works, which either rely on calibrated/fixed camera set up or use the observed world map to track the depth camera, our framework can simultaneously track and reconstruct small moving objects. [sent-14, score-0.996]
8 We use both qualitative and quantitative results to demonstrate the superior performance of both tracking and reconstruction of our method. [sent-15, score-0.475]
9 Introduction Many applications need an accurate 3D model of a rigid object, but without access to a predefined CAD model or similar, the standard acquisition method involves 3D scanning using a precisely calibrated multi-camera or range sensor system. [sent-17, score-0.227]
10 In this paper we introduce a framework for simultaneous tracking and reconstruction of unknown 3D rigid objects that is simple, fast and effective. [sent-19, score-0.788]
11 The system is initialized with a simple primitive 3D shape (e. [sent-20, score-0.256]
12 a sphere or a cube), then the 3D shape of the object being tracked is reconstructed incrementally online. [sent-22, score-0.513]
13 This flexible framework for 3D reconstruction and tracking has many real-world applications. [sent-23, score-0.475]
14 For example, it allows users to pick a random rigid object from their home, scan it and then use it as a controller to interact with a computer. [sent-24, score-0.329]
15 The proposed framework comprises two modules: a tracking module and a reconstruction module. [sent-25, score-0.576]
16 Most existing research work for 3D tracking with depth data uses a model-based approach, which generates pose hypotheses and evaluates them on the observed depth/RGB-D data. [sent-28, score-0.728]
17 To find the best pose hypothesis, such methods define and minimise an objective function measuring the discrepancy between expected (from the model hypothesis) and observed visual cues. [sent-29, score-0.369]
18 For example, in [4], the authors use Kinect input to track hand-held 3D puppets (rigid objects). [sent-31, score-0.268]
19 The system yields robust and realtime performance for tracking rigid objects, but accurate 3D models of puppets with color and textures need to be built off-line, in advance. [sent-32, score-0.617]
20 The occlusion from the hand is handled by a color-based segmenter. [sent-33, score-0.084]
21 More general is the work KinectFusion [7], where the whole scene structure and camera pose are estimated simultaneously. [sent-34, score-0.243]
22 Ray-casting is used to establish point correspondences between the observed point cloud and the reconstructed world map, and alignment achieved using ICP. [sent-35, score-0.295]
23 However, the ICP-based tracking in KinectFusion relies on a static world map, which makes the method unable to track small moving objects in a static scene. [sent-36, score-0.662]
24 Camera motion and (inverse) object motion of course induce identical image changes and the authors note in [5] that KinectFusion could be to reconstruct a moving 3D rigid object from a static Kinect, but in this case the object needs to be large enough to occupy the majority of the depth map. [sent-37, score-0.833]
25 These rely on the many evaluations of the objective function at arbitrary points in the pose hypothesis space. [sent-39, score-0.232]
26 In [8], the authors use Particle Swarm Optimization (PSO) to solve the articulated hand tracking problem. [sent-40, score-0.307]
27 The system is implemented efficiently on the GPU, yielding real-time performance and fast-recovery from tracking failure. [sent-41, score-0.249]
28 Another similar work that tracks rigid obooppyyrriigghhtt 11 556611 ject is [11], which uses a particle filter to solve the pose op- timization. [sent-42, score-0.412]
29 Both works measure the discrepancy between the expected depth generated by the pose hypothesis and the observed one and are still very computational expensive (even with a GPU implementation), as they require a large number of energy function evaluations. [sent-43, score-0.693]
30 In [10], the authors use a gradient-based optimization method to solve the tracking problem but do not explicitly establish point correspondences between the model and the observation. [sent-44, score-0.352]
31 Instead, they use 3D level-set embedding function to encode the 3D object model and, by back-projecting the observed depth image into object coordinates, they are able to take advantage of the gradient of the level-set function to guide the search for the pose. [sent-45, score-0.483]
32 However, the energy function only considers the fitting of the depth data to the surface of the object, making the tracker very sensitive to close-to-surface outliers (e. [sent-47, score-0.491]
33 The tracker module presented in this work extends the back-projection scheme of [10] by formulating a probabilistic model to use both color and depth information, resulting in more robust and accurate tracking. [sent-50, score-0.532]
34 Most traditional 3D reconstruction methods require a calibrated multiple camera setup and are based on space carving. [sent-52, score-0.372]
35 The introduction of customizable, frame-rate RGB-D cameras has made 3D reconstruction using a single depth camera possible. [sent-53, score-0.49]
36 With a single hand-held Kinect device, KinectFusion can incrementally reconstruct the surface of the physical world that the camera sees, in real-time. [sent-55, score-0.453]
37 However, as discussed in previous subsection, KinectFusion relies heavily on the world map to track the camera, thus it can not reconstruct small moving objects in static scenes. [sent-56, score-0.441]
38 Another related recent work is [12] where the authors use a single, fixed, un-calibrated Kinect to scan human body in a home environment. [sent-57, score-0.18]
39 Accurate 3D human shapes are obtained by combining multiple monocular views of a person moving in front of the sensor. [sent-58, score-0.115]
40 The SCAPE model [1] is used to constrain the alignment of the multiple depth maps from the various views. [sent-59, score-0.226]
41 In [9], the authors use monocular 2D image cues to reconstruct 3D shapes. [sent-60, score-0.201]
42 The reconstruction is constrained by a learnt low-dimensional 3D shape space. [sent-61, score-0.42]
43 However, they both rely heavily on a learnt shape space to constrain the reconstruction. [sent-63, score-0.235]
44 This forces the reconstructed objects to be within a fixed, prelearned category. [sent-64, score-0.205]
45 In this paper, we present a probabilistic framework for simultaneous tracking and reconstruction of an unknown rigid object using a single RGB-D camera. [sent-66, score-0.884]
46 We introduce a probabilistic model for model-based 3D tracking using RGB-D images. [sent-69, score-0.317]
47 The proposed probabilistic model leads to a differentiable energy function, which can be efficiently solved by gradient-based optimization method. [sent-70, score-0.127]
48 Our method yields real-time performance on the GPU and is robust to missing data, occlusion and outliers. [sent-71, score-0.08]
49 We extend the space carving approach by introducing a novel probabilistic framework for the reconstruction of unknown 3D objects. [sent-73, score-0.376]
50 An inside/outside volumetric model of the object is learnt incrementally online, and the 3D shape is reconstructed by evolving a 3D levelset embedding function on this inside/outside model. [sent-74, score-0.626]
51 The reconstruction method can be implemented in a massively parallel fashion, resulting in great computational efficiency. [sent-75, score-0.273]
52 Generative Model Assuming calibrated color and depth cameras (i. [sent-77, score-0.313]
53 aligned color-depth frames), let Ωd be the depth image domain, and Ωc be the color image domain. [sent-79, score-0.246]
54 Ω is the RGB-D image domain, obtained by re-projecting the color image onto the depth image. [sent-80, score-0.246]
55 A pixel x ∈ Ω at image coordinates (u, v) has depth value d and color value y. [sent-81, score-0.421]
56 A depth pixel x˙ in the object region is projected from a 3D point object surface as: x˙ = ATo,cX˙ X˙ = (X, Y, Z, 1)? [sent-84, score-0.547]
57 on the To,c = [R|t] ∈ SE3 (1) where the Euclidean group SE3 := {R, t|R ∈ SO3, t ∈ R3}, To,c is the 6 DoF pose parametrised by the pose parameter p that transforms from object coordinates to camera coordinates. [sent-85, score-0.605]
58 We represent the shape of the 3D object by a 3D signed distance function (SDF) Φ(X) defined in an object coordinate frame. [sent-87, score-0.26]
59 The surface of the 3D shape is recovered as the zero level, Φ(X) = 0. [sent-88, score-0.256]
60 The domain outside maps to positive values, and the domain inside the object maps to negative values. [sent-89, score-0.321]
61 Two appearance models are used to describe the color statistics of the scene: one for the object surface, which generates the foreground region in the image; and one for the background. [sent-91, score-0.252]
62 These are represented by their likelihoods, P(y| V ), where V can take on values on or out, because a pixel inside the volume can never generate a pixel in Ω. [sent-92, score-0.236]
63 The two appearance models are represented with RGB color histograms using 32 bins per channel. [sent-93, score-0.061]
64 The histogram can be initialized either from a detection module or from a user-selected bounding box on the RGB image, in which the 11556622 P(y| V = out) and the pose T(p). [sent-94, score-0.326]
65 (Right): Graphical model of our generative model for tracking and reconstruction. [sent-95, score-0.299]
66 foreground model is built from the interior of the bounding box and the background from the immediate region outside the bounding box. [sent-96, score-0.204]
67 These initial color likelihoods are used in conjunction with the local depth information both for tracking and reconstruction, and are refined over time. [sent-97, score-0.644]
68 We use this model both for tracking and for reconstruction, but these two aspects make use of the information in different ways. [sent-102, score-0.249]
69 In this model the shape Φ generates a set of voxels {X, V } (indexed by i). [sent-103, score-0.325]
70 This volumetric model, combined with the object pose p, in turn generates the observed RGB-D images Ω comprising pixels {x, y} (indexed by j). [sent-104, score-0.439]
71 t and our objective is to find the optimal sequence of poses and the shape, given the observed RGB-D images: Φm,p0a. [sent-115, score-0.068]
72 Ωt) (3) Note that there are further justifiable simplifications that can be made to 2. [sent-124, score-0.161]
73 Finally, note that in this model, the locations of voxels X are treated as generated randomly from the shape Φ. [sent-130, score-0.322]
74 Under this model all voxels locations have the same probability of being generated but in practice the situation is more certain, with every voxel being generated exactly once. [sent-131, score-0.524]
75 The variables X are maintained in the model for convenience, but it is the indicator variable of each voxel V that carries the important information about the volumetric model. [sent-132, score-0.347]
76 In the remainder of the paper we perform approximate inference by finding MAP or maximum likelihood estimates of Φ and pt, alternating steps that estimate the current pose given the (current estimate of) shape, and estimating the fixed shape by assuming knowledge of the current and past poses. [sent-134, score-0.403]
77 Tracking For tracking, we assume known shape Φ and optimise the pose at time t (dropping the subscript on the pose p henceforth) by maximising the likelihood P(Ω|Φ, p) as a function of p. [sent-137, score-0.778]
78 To optimise this conditional distribution we treat the RGB-D image Ω as a bag-of-independent-pixels {x, y}. [sent-138, score-0.161]
79 Though not all voxels generate a pixel, each pixel x is generated by a unique voxel X, where X is sampled from Φ, and x is its (deterministic) projection into the image. [sent-139, score-0.563]
80 The likelihood is the product over all pixel likelihoods: L(p) = ? [sent-141, score-0.211]
81 j where i(j) indicates that voxel iprojects to pixel j. [sent-143, score-0.367]
82 This generative model is very similar to [3], which uses level-sets to track 2D deformable objects. [sent-144, score-0.153]
83 In [3] the image and the level-set embedding function are in the same 2D domain, and each value in the level-set function is associated with a pixel in the image domain. [sent-145, score-0.194]
84 The tracking is done by maximizing the discrepancy between the foreground/background region with the Heaviside function of the level-set embedding selecting either foreground or background. [sent-146, score-0.497]
85 However, in our case, the level-set function is defined in 3D space, and all pixels in the RGB-D image domain are generated either from the object surface or from 11556633 outside the object. [sent-147, score-0.405]
86 No pixel is generated from the interior of the model, and there is not a one-to-one mapping between pixels in the RGB-D image and voxels. [sent-148, score-0.215]
87 In our work, the per-pixel likelihood of the pose (in which have marginalised V ) is: P(x,y|Φ,p) = ? [sent-150, score-0.277]
88 , {P(x|Φ,p,V =k)P(V =k|y)} (5) The pixel location likelihoods for the foreground and background are simply uniform distributions: P(x|Φ,p,V =on) =δ? [sent-154, score-0.309]
89 Ω where X = To−,1cA−1 x˙ is the back-projection into object coordinates ofthe RGB-D pixel x. [sent-162, score-0.242]
90 are the smoothed Heaviside and Dirac delta functions, and thus select the outside of the object and the surface of the object respectively. [sent-165, score-0.348]
91 5 and assuming pixel-wise independence, we obtain pose likelihood as: P(Ω|Φ, p) ∼ ? [sent-167, score-0.277]
92 This can be written as an energy summation by taking logs. [sent-172, score-0.109]
93 Reconstruction For the purposes of reconstruction, we initialize the tracker with a simple initial model (e. [sent-184, score-0.117]
94 a sphere), and iterate the tracker until it converges to a pose that projects the initial model close to the object region in the RGB-D image domain. [sent-186, score-0.348]
95 The reconstruction runs on each Ωˆ and the reconstructed 3D model is used for tracking in the next frame. [sent-189, score-0.587]
96 More specifically, we evolve a 3D level-set embedding function over an inside/outside probability volume to maximize the per-voxel posterior probability of the 3D level-set function, given the shape prior and all previously observed depth and poses. [sent-191, score-0.516]
97 In the reconstruction step, we assume the pose of object given by the tracker is fixed and we optimize P(Φ|Ωˆ0. [sent-192, score-0.574]
98 , (11) Note that the equation above applies per voxel X, with V as the corresponding inside/outside membership of X. [sent-213, score-0.269]
99 Taking the two terms in the summation in turn, first we develop the likelihood of generating an RGB-D image Ω given the pose and voxel memberships. [sent-214, score-0.596]
100 the likelihood that the single voxel (X, V ) generated the RGB-D pixel x as follows: Lin(V ) = P(x|X, V, p) = ⎧⎨δ01. [sent-217, score-0.539]
wordName wordTfidf (topN-words)
[('voxel', 0.269), ('tracking', 0.249), ('kinectfusion', 0.247), ('reconstruction', 0.226), ('depth', 0.185), ('pose', 0.164), ('optimise', 0.161), ('rigid', 0.16), ('likelihoods', 0.149), ('voxels', 0.137), ('surface', 0.13), ('shape', 0.126), ('heaviside', 0.121), ('tracker', 0.117), ('likelihood', 0.113), ('reconstructed', 0.112), ('simplifications', 0.107), ('puppets', 0.107), ('track', 0.103), ('module', 0.101), ('victor', 0.099), ('carl', 0.099), ('pixel', 0.098), ('pf', 0.097), ('kinect', 0.096), ('embedding', 0.096), ('reconstruct', 0.095), ('pt', 0.091), ('discrepancy', 0.09), ('sphere', 0.087), ('outside', 0.084), ('gpu', 0.08), ('incrementally', 0.079), ('camera', 0.079), ('volumetric', 0.078), ('coordinates', 0.077), ('pb', 0.075), ('indexed', 0.075), ('modules', 0.073), ('simultaneous', 0.073), ('home', 0.07), ('world', 0.07), ('primitive', 0.069), ('observed', 0.068), ('probabilistic', 0.068), ('learnt', 0.068), ('hypothesis', 0.068), ('cube', 0.067), ('calibrated', 0.067), ('object', 0.067), ('moving', 0.067), ('static', 0.067), ('domain', 0.065), ('foreground', 0.062), ('generates', 0.062), ('initialized', 0.061), ('color', 0.061), ('generated', 0.059), ('energy', 0.059), ('authors', 0.058), ('interior', 0.058), ('prelearned', 0.054), ('parametrised', 0.054), ('dwm', 0.054), ('scape', 0.054), ('justifiable', 0.054), ('xpi', 0.054), ('scan', 0.052), ('generative', 0.05), ('summation', 0.05), ('controller', 0.05), ('prisacariu', 0.05), ('maximising', 0.05), ('dirac', 0.05), ('particle', 0.049), ('monocular', 0.048), ('school', 0.048), ('minimise', 0.047), ('massively', 0.047), ('establish', 0.045), ('pso', 0.045), ('handled', 0.044), ('xi', 0.043), ('henceforth', 0.043), ('swarm', 0.043), ('tracked', 0.042), ('carving', 0.041), ('adelaide', 0.041), ('hx', 0.041), ('robots', 0.041), ('sees', 0.041), ('unknown', 0.041), ('posterior', 0.041), ('constrain', 0.041), ('occlusion', 0.04), ('cad', 0.04), ('inside', 0.04), ('yields', 0.04), ('objects', 0.039), ('ject', 0.039)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 366 iccv-2013-STAR3D: Simultaneous Tracking and Reconstruction of 3D Objects Using RGB-D Data
Author: Carl Yuheng Ren, Victor Prisacariu, David Murray, Ian Reid
Abstract: We introduce a probabilistic framework for simultaneous tracking and reconstruction of 3D rigid objects using an RGB-D camera. The tracking problem is handled using a bag-of-pixels representation and a back-projection scheme. Surface and background appearance models are learned online, leading to robust tracking in the presence of heavy occlusion and outliers. In both our tracking and reconstruction modules, the 3D object is implicitly embedded using a 3D level-set function. The framework is initialized with a simple shape primitive model (e.g. a sphere or a cube), and the real 3D object shape is tracked and reconstructed online. Unlike existing depth-based 3D reconstruction works, which either rely on calibrated/fixed camera set up or use the observed world map to track the depth camera, our framework can simultaneously track and reconstruct small moving objects. We use both qualitative and quantitative results to demonstrate the superior performance of both tracking and reconstruction of our method.
2 0.28513855 228 iccv-2013-Large-Scale Multi-resolution Surface Reconstruction from RGB-D Sequences
Author: Frank Steinbrücker, Christian Kerl, Daniel Cremers
Abstract: We propose a method to generate highly detailed, textured 3D models of large environments from RGB-D sequences. Our system runs in real-time on a standard desktop PC with a state-of-the-art graphics card. To reduce the memory consumption, we fuse the acquired depth maps and colors in a multi-scale octree representation of a signed distance function. To estimate the camera poses, we construct a pose graph and use dense image alignment to determine the relative pose between pairs of frames. We add edges between nodes when we detect loop-closures and optimize the pose graph to correct for long-term drift. Our implementation is highly parallelized on graphics hardware to achieve real-time performance. More specifically, we can reconstruct, store, and continuously update a colored 3D model of an entire corridor of nine rooms at high levels of detail in real-time on a single GPU with 2.5GB.
3 0.27364206 2 iccv-2013-3D Scene Understanding by Voxel-CRF
Author: Byung-Soo Kim, Pushmeet Kohli, Silvio Savarese
Abstract: Scene understanding is an important yet very challenging problem in computer vision. In the past few years, researchers have taken advantage of the recent diffusion of depth-RGB (RGB-D) cameras to help simplify the problem of inferring scene semantics. However, while the added 3D geometry is certainly useful to segment out objects with different depth values, it also adds complications in that the 3D geometry is often incorrect because of noisy depth measurements and the actual 3D extent of the objects is usually unknown because of occlusions. In this paper we propose a new method that allows us to jointly refine the 3D reconstruction of the scene (raw depth values) while accurately segmenting out the objects or scene elements from the 3D reconstruction. This is achieved by introducing a new model which we called Voxel-CRF. The Voxel-CRF model is based on the idea of constructing a conditional random field over a 3D volume of interest which captures the semantic and 3D geometric relationships among different elements (voxels) of the scene. Such model allows to jointly estimate (1) a dense voxel-based 3D reconstruction and (2) the semantic labels associated with each voxel even in presence of par- tial occlusions using an approximate yet efficient inference strategy. We evaluated our method on the challenging NYU Depth dataset (Version 1and 2). Experimental results show that our method achieves competitive accuracy in inferring scene semantics and visually appealing results in improving the quality of the 3D reconstruction. We also demonstrate an interesting application of object removal and scene completion from RGB-D images.
4 0.20892398 424 iccv-2013-Tracking Revisited Using RGBD Camera: Unified Benchmark and Baselines
Author: Shuran Song, Jianxiong Xiao
Abstract: Despite significant progress, tracking is still considered to be a very challenging task. Recently, the increasing popularity of depth sensors has made it possible to obtain reliable depth easily. This may be a game changer for tracking, since depth can be used to prevent model drift and handle occlusion. We also observe that current tracking algorithms are mostly evaluated on a very small number of videos collectedandannotated by different groups. The lack of a reasonable size and consistently constructed benchmark has prevented a persuasive comparison among different algorithms. In this paper, we construct a unified benchmark dataset of 100 RGBD videos with high diversity, propose different kinds of RGBD tracking algorithms using 2D or 3D model, and present a quantitative comparison of various algorithms with RGB or RGBD input. We aim to lay the foundation for further research in both RGB and RGBD tracking, and our benchmark is available at http://tracking.cs.princeton.edu.
5 0.19484445 318 iccv-2013-PixelTrack: A Fast Adaptive Algorithm for Tracking Non-rigid Objects
Author: Stefan Duffner, Christophe Garcia
Abstract: In this paper, we present a novel algorithm for fast tracking of generic objects in videos. The algorithm uses two components: a detector that makes use of the generalised Hough transform with pixel-based descriptors, and a probabilistic segmentation method based on global models for foreground and background. These components are used for tracking in a combined way, and they adapt each other in a co-training manner. Through effective model adaptation and segmentation, the algorithm is able to track objects that undergo rigid and non-rigid deformations and considerable shape and appearance variations. The proposed tracking method has been thoroughly evaluated on challenging standard videos, and outperforms state-of-theart tracking methods designed for the same task. Finally, the proposed models allow for an extremely efficient implementation, and thus tracking is very fast.
6 0.19443727 382 iccv-2013-Semi-dense Visual Odometry for a Monocular Camera
7 0.18310337 341 iccv-2013-Real-Time Body Tracking with One Depth Camera and Inertial Sensors
8 0.18172282 225 iccv-2013-Joint Segmentation and Pose Tracking of Human in Natural Videos
9 0.17619334 139 iccv-2013-Elastic Fragments for Dense Scene Reconstruction
10 0.17469811 9 iccv-2013-A Flexible Scene Representation for 3D Reconstruction Using an RGB-D Camera
11 0.16977413 298 iccv-2013-Online Robust Non-negative Dictionary Learning for Visual Tracking
12 0.16852987 218 iccv-2013-Interactive Markerless Articulated Hand Motion Tracking Using RGB and Depth Data
13 0.16792007 319 iccv-2013-Point-Based 3D Reconstruction of Thin Objects
14 0.16718794 281 iccv-2013-Multi-view Normal Field Integration for 3D Reconstruction of Mirroring Objects
15 0.15590164 133 iccv-2013-Efficient Hand Pose Estimation from a Single Depth Image
16 0.15400027 444 iccv-2013-Viewing Real-World Faces in 3D
17 0.15193519 58 iccv-2013-Bayesian 3D Tracking from Monocular Video
18 0.15136956 128 iccv-2013-Dynamic Probabilistic Volumetric Models
19 0.14962819 367 iccv-2013-SUN3D: A Database of Big Spaces Reconstructed Using SfM and Object Labels
20 0.14851508 230 iccv-2013-Latent Data Association: Bayesian Model Selection for Multi-target Tracking
topicId topicWeight
[(0, 0.289), (1, -0.261), (2, -0.011), (3, 0.07), (4, 0.042), (5, -0.125), (6, -0.062), (7, -0.01), (8, -0.127), (9, 0.176), (10, -0.025), (11, -0.059), (12, -0.102), (13, 0.061), (14, -0.013), (15, -0.076), (16, 0.04), (17, -0.062), (18, -0.103), (19, -0.07), (20, -0.023), (21, 0.045), (22, 0.071), (23, 0.009), (24, -0.075), (25, 0.011), (26, -0.003), (27, 0.07), (28, 0.022), (29, 0.072), (30, 0.049), (31, -0.054), (32, -0.024), (33, -0.085), (34, -0.104), (35, 0.047), (36, 0.036), (37, 0.079), (38, 0.097), (39, 0.055), (40, 0.018), (41, -0.115), (42, -0.069), (43, 0.066), (44, -0.005), (45, 0.036), (46, 0.001), (47, -0.026), (48, -0.027), (49, -0.071)]
simIndex simValue paperId paperTitle
same-paper 1 0.96049809 366 iccv-2013-STAR3D: Simultaneous Tracking and Reconstruction of 3D Objects Using RGB-D Data
Author: Carl Yuheng Ren, Victor Prisacariu, David Murray, Ian Reid
Abstract: We introduce a probabilistic framework for simultaneous tracking and reconstruction of 3D rigid objects using an RGB-D camera. The tracking problem is handled using a bag-of-pixels representation and a back-projection scheme. Surface and background appearance models are learned online, leading to robust tracking in the presence of heavy occlusion and outliers. In both our tracking and reconstruction modules, the 3D object is implicitly embedded using a 3D level-set function. The framework is initialized with a simple shape primitive model (e.g. a sphere or a cube), and the real 3D object shape is tracked and reconstructed online. Unlike existing depth-based 3D reconstruction works, which either rely on calibrated/fixed camera set up or use the observed world map to track the depth camera, our framework can simultaneously track and reconstruct small moving objects. We use both qualitative and quantitative results to demonstrate the superior performance of both tracking and reconstruction of our method.
2 0.84269071 228 iccv-2013-Large-Scale Multi-resolution Surface Reconstruction from RGB-D Sequences
Author: Frank Steinbrücker, Christian Kerl, Daniel Cremers
Abstract: We propose a method to generate highly detailed, textured 3D models of large environments from RGB-D sequences. Our system runs in real-time on a standard desktop PC with a state-of-the-art graphics card. To reduce the memory consumption, we fuse the acquired depth maps and colors in a multi-scale octree representation of a signed distance function. To estimate the camera poses, we construct a pose graph and use dense image alignment to determine the relative pose between pairs of frames. We add edges between nodes when we detect loop-closures and optimize the pose graph to correct for long-term drift. Our implementation is highly parallelized on graphics hardware to achieve real-time performance. More specifically, we can reconstruct, store, and continuously update a colored 3D model of an entire corridor of nine rooms at high levels of detail in real-time on a single GPU with 2.5GB.
3 0.76859486 9 iccv-2013-A Flexible Scene Representation for 3D Reconstruction Using an RGB-D Camera
Author: Diego Thomas, Akihiro Sugimoto
Abstract: Updating a global 3D model with live RGB-D measurements has proven to be successful for 3D reconstruction of indoor scenes. Recently, a Truncated Signed Distance Function (TSDF) volumetric model and a fusion algorithm have been introduced (KinectFusion), showing significant advantages such as computational speed and accuracy of the reconstructed scene. This algorithm, however, is expensive in memory when constructing and updating the global model. As a consequence, the method is not well scalable to large scenes. We propose a new flexible 3D scene representation using a set of planes that is cheap in memory use and, nevertheless, achieves accurate reconstruction of indoor scenes from RGB-D image sequences. Projecting the scene onto different planes reduces significantly the size of the scene representation and thus it allows us to generate a global textured 3D model with lower memory requirement while keeping accuracy and easiness to update with live RGB-D measurements. Experimental results demonstrate that our proposed flexible 3D scene representation achieves accurate reconstruction, while keeping the scalability for large indoor scenes.
4 0.71988213 254 iccv-2013-Live Metric 3D Reconstruction on Mobile Phones
Author: Petri Tanskanen, Kalin Kolev, Lorenz Meier, Federico Camposeco, Olivier Saurer, Marc Pollefeys
Abstract: unkown-abstract
5 0.69565254 319 iccv-2013-Point-Based 3D Reconstruction of Thin Objects
Author: Benjamin Ummenhofer, Thomas Brox
Abstract: 3D reconstruction deals with the problem of finding the shape of an object from a set of images. Thin objects that have virtually no volumepose a special challengefor reconstruction with respect to shape representation and fusion of depth information. In this paper we present a dense pointbased reconstruction method that can deal with this special class of objects. We seek to jointly optimize a set of depth maps by treating each pixel as a point in space. Points are pulled towards a common surface by pairwise forces in an iterative scheme. The method also handles the problem of opposed surfaces by means of penalty forces. Efficient optimization is achieved by grouping points to superpixels and a spatial hashing approach for fast neighborhood queries. We show that the approach is on a par with state-of-the-art methods for standard multi view stereo settings and gives superior results for thin objects.
6 0.68589324 341 iccv-2013-Real-Time Body Tracking with One Depth Camera and Inertial Sensors
7 0.6858471 128 iccv-2013-Dynamic Probabilistic Volumetric Models
8 0.67969066 139 iccv-2013-Elastic Fragments for Dense Scene Reconstruction
9 0.6773631 424 iccv-2013-Tracking Revisited Using RGBD Camera: Unified Benchmark and Baselines
10 0.66048926 2 iccv-2013-3D Scene Understanding by Voxel-CRF
11 0.64292496 367 iccv-2013-SUN3D: A Database of Big Spaces Reconstructed Using SfM and Object Labels
12 0.6155507 318 iccv-2013-PixelTrack: A Fast Adaptive Algorithm for Tracking Non-rigid Objects
13 0.60233444 218 iccv-2013-Interactive Markerless Articulated Hand Motion Tracking Using RGB and Depth Data
14 0.60088378 382 iccv-2013-Semi-dense Visual Odometry for a Monocular Camera
15 0.57175279 133 iccv-2013-Efficient Hand Pose Estimation from a Single Depth Image
16 0.56569856 425 iccv-2013-Tracking via Robust Multi-task Multi-view Joint Sparse Representation
17 0.56200093 410 iccv-2013-Support Surface Prediction in Indoor Scenes
18 0.55590069 320 iccv-2013-Pose-Configurable Generic Tracking of Elongated Objects
19 0.55128604 56 iccv-2013-Automatic Registration of RGB-D Scans via Salient Directions
20 0.54744542 303 iccv-2013-Orderless Tracking through Model-Averaged Posterior Estimation
topicId topicWeight
[(2, 0.055), (21, 0.015), (26, 0.077), (30, 0.015), (31, 0.032), (35, 0.02), (40, 0.017), (42, 0.11), (48, 0.016), (64, 0.101), (73, 0.053), (79, 0.078), (89, 0.301), (96, 0.011), (98, 0.014)]
simIndex simValue paperId paperTitle
same-paper 1 0.96626216 366 iccv-2013-STAR3D: Simultaneous Tracking and Reconstruction of 3D Objects Using RGB-D Data
Author: Carl Yuheng Ren, Victor Prisacariu, David Murray, Ian Reid
Abstract: We introduce a probabilistic framework for simultaneous tracking and reconstruction of 3D rigid objects using an RGB-D camera. The tracking problem is handled using a bag-of-pixels representation and a back-projection scheme. Surface and background appearance models are learned online, leading to robust tracking in the presence of heavy occlusion and outliers. In both our tracking and reconstruction modules, the 3D object is implicitly embedded using a 3D level-set function. The framework is initialized with a simple shape primitive model (e.g. a sphere or a cube), and the real 3D object shape is tracked and reconstructed online. Unlike existing depth-based 3D reconstruction works, which either rely on calibrated/fixed camera set up or use the observed world map to track the depth camera, our framework can simultaneously track and reconstruct small moving objects. We use both qualitative and quantitative results to demonstrate the superior performance of both tracking and reconstruction of our method.
2 0.96379417 143 iccv-2013-Estimating Human Pose with Flowing Puppets
Author: Silvia Zuffi, Javier Romero, Cordelia Schmid, Michael J. Black
Abstract: We address the problem of upper-body human pose estimation in uncontrolled monocular video sequences, without manual initialization. Most current methods focus on isolated video frames and often fail to correctly localize arms and hands. Inferring pose over a video sequence is advantageous because poses of people in adjacent frames exhibit properties of smooth variation due to the nature of human and camera motion. To exploit this, previous methods have used prior knowledge about distinctive actions or generic temporal priors combined with static image likelihoods to track people in motion. Here we take a different approach based on a simple observation: Information about how a person moves from frame to frame is present in the optical flow field. We develop an approach for tracking articulated motions that “links” articulated shape models of peo- ple in adjacent frames through the dense optical flow. Key to this approach is a 2D shape model of the body that we use to compute how the body moves over time. The resulting “flowing puppets ” provide a way of integrating image evidence across frames to improve pose inference. We apply our method on a challenging dataset of TV video sequences and show state-of-the-art performance.
3 0.96201646 424 iccv-2013-Tracking Revisited Using RGBD Camera: Unified Benchmark and Baselines
Author: Shuran Song, Jianxiong Xiao
Abstract: Despite significant progress, tracking is still considered to be a very challenging task. Recently, the increasing popularity of depth sensors has made it possible to obtain reliable depth easily. This may be a game changer for tracking, since depth can be used to prevent model drift and handle occlusion. We also observe that current tracking algorithms are mostly evaluated on a very small number of videos collectedandannotated by different groups. The lack of a reasonable size and consistently constructed benchmark has prevented a persuasive comparison among different algorithms. In this paper, we construct a unified benchmark dataset of 100 RGBD videos with high diversity, propose different kinds of RGBD tracking algorithms using 2D or 3D model, and present a quantitative comparison of various algorithms with RGB or RGBD input. We aim to lay the foundation for further research in both RGB and RGBD tracking, and our benchmark is available at http://tracking.cs.princeton.edu.
4 0.96195745 301 iccv-2013-Optimal Orthogonal Basis and Image Assimilation: Motion Modeling
Author: Etienne Huot, Giuseppe Papari, Isabelle Herlin
Abstract: This paper describes modeling and numerical computation of orthogonal bases, which are used to describe images and motion fields. Motion estimation from image data is then studied on subspaces spanned by these bases. A reduced model is obtained as the Galerkin projection on these subspaces of a physical model, based on Euler and optical flow equations. A data assimilation method is studied, which assimilates coefficients of image data in the reduced model in order to estimate motion coefficients. The approach is first quantified on synthetic data: it demonstrates the interest of model reduction as a compromise between results quality and computational cost. Results obtained on real data are then displayed so as to illustrate the method.
5 0.96057242 410 iccv-2013-Support Surface Prediction in Indoor Scenes
Author: Ruiqi Guo, Derek Hoiem
Abstract: In this paper, we present an approach to predict the extent and height of supporting surfaces such as tables, chairs, and cabinet tops from a single RGBD image. We define support surfaces to be horizontal, planar surfaces that can physically support objects and humans. Given a RGBD image, our goal is to localize the height and full extent of such surfaces in 3D space. To achieve this, we created a labeling tool and annotated 1449 images with rich, complete 3D scene models in NYU dataset. We extract ground truth from the annotated dataset and developed a pipeline for predicting floor space, walls, the height and full extent of support surfaces. Finally we match the predicted extent with annotated scenes in training scenes and transfer the the support surface configuration from training scenes. We evaluate the proposed approach in our dataset and demonstrate its effectiveness in understanding scenes in 3D space.
6 0.96004272 65 iccv-2013-Breaking the Chain: Liberation from the Temporal Markov Assumption for Tracking Human Poses
7 0.96001834 89 iccv-2013-Constructing Adaptive Complex Cells for Robust Visual Tracking
8 0.9587391 217 iccv-2013-Initialization-Insensitive Visual Tracking through Voting with Salient Local Features
9 0.9581973 120 iccv-2013-Discriminative Label Propagation for Multi-object Tracking with Sporadic Appearance Features
10 0.95778954 204 iccv-2013-Human Attribute Recognition by Rich Appearance Dictionary
11 0.95705771 341 iccv-2013-Real-Time Body Tracking with One Depth Camera and Inertial Sensors
12 0.95683336 317 iccv-2013-Piecewise Rigid Scene Flow
13 0.95634961 433 iccv-2013-Understanding High-Level Semantics by Modeling Traffic Patterns
14 0.95599711 146 iccv-2013-Event Detection in Complex Scenes Using Interval Temporal Constraints
15 0.95532238 129 iccv-2013-Dynamic Scene Deblurring
16 0.95492673 200 iccv-2013-Higher Order Matching for Consistent Multiple Target Tracking
17 0.95447475 190 iccv-2013-Handling Occlusions with Franken-Classifiers
18 0.95434678 160 iccv-2013-Fast Object Segmentation in Unconstrained Video
19 0.95369512 268 iccv-2013-Modeling 4D Human-Object Interactions for Event and Object Recognition
20 0.95340329 127 iccv-2013-Dynamic Pooling for Complex Event Recognition