cvpr cvpr2013 cvpr2013-372 knowledge-graph by maker-knowledge-mining

372 cvpr-2013-SLAM++: Simultaneous Localisation and Mapping at the Level of Objects

Source: pdf

Author: Renato F. Salas-Moreno, Richard A. Newcombe, Hauke Strasdat, Paul H.J. Kelly, Andrew J. Davison

Abstract: We present the major advantages of a new ‘object oriented’ 3D SLAM paradigm, which takes full advantage in the loop of prior knowledge that many scenes consist of repeated, domain-specific objects and structures. As a hand-held depth camera browses a cluttered scene, realtime 3D object recognition and tracking provides 6DoF camera-object constraints which feed into an explicit graph of objects, continually refined by efficient pose-graph optimisation. This offers the descriptive and predictive power of SLAM systems which perform dense surface reconstruction, but with a huge representation compression. The object graph enables predictions for accurate ICP-based camera to model tracking at each live frame, and efficient active search for new objects in currently undescribed image regions. We demonstrate real-time incremental SLAM in large, cluttered environments, including loop closure, relocalisation and the detection of moved objects, and of course the generation of an object level scene description with the potential to enable interaction.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 1Imperial College London Abstract We present the major advantages of a new ‘object oriented’ 3D SLAM paradigm, which takes full advantage in the loop of prior knowledge that many scenes consist of repeated, domain-specific objects and structures. [sent-6, score-0.214]

2 As a hand-held depth camera browses a cluttered scene, realtime 3D object recognition and tracking provides 6DoF camera-object constraints which feed into an explicit graph of objects, continually refined by efficient pose-graph optimisation. [sent-7, score-0.482]

3 This offers the descriptive and predictive power of SLAM systems which perform dense surface reconstruction, but with a huge representation compression. [sent-8, score-0.141]

4 The object graph enables predictions for accurate ICP-based camera to model tracking at each live frame, and efficient active search for new objects in currently undescribed image regions. [sent-9, score-0.773]

5 We demonstrate real-time incremental SLAM in large, cluttered environments, including loop closure, relocalisation and the detection of moved objects, and of course the generation of an object level scene description with the potential to enable interaction. [sent-10, score-0.457]

6 Modern processing hardware permits ever-improving levels of detail and scale, and much interest is now turning to semantic labelling of this geometry in terms of the objects and regions that are known to exist in the scene. [sent-13, score-0.233]

7 However, some thought about this process reveals a huge amount of wasted computational effort; and the potential for a much better way of taking account of domain knowledge in the loop of SLAM operation itself. [sent-14, score-0.181]

8 We propose a paradigm for real-time localisation and mapping which harnesses 3D object recognition to jump over low level geometry processing and produce incremen- Davison1 2University of Washington Figure1. [sent-15, score-0.273]

9 (left) A live view at the current camera pose and the synthetic rendered objects. [sent-17, score-0.42]

10 (right) We contrast a raw depth camera normal map with the corresponding high quality prediction from our object graph, used both for camera tracking and for masking object search. [sent-18, score-0.653]

11 As a hand-held depth camera browses a cluttered scene, prior knowledge of the objects likely to be repetitively present enables real-time 3D recognition and the creation of a simple pose graph map of relative object locations. [sent-20, score-0.653]

12 This graph is continuously optimised as new measurements arrive, and enables always up-to-date, dense and precise prediction of the next camera measurement. [sent-21, score-0.381]

13 These predictions are used for robust camera tracking and the generation of active search regions for further object detection. [sent-22, score-0.257]

14 Our approach is enabled by efficient GPGPU parallel im1 1 13 3 35 5 502 0 plementation of recent advances in real-time 3D object detection and 6DoF (Degree of Freedom) ICP-based pose refinement. [sent-23, score-0.207]

15 Real-Time SLAM with Hand-Held Sensors In SLAM (Simultaneous Localisation and Mapping), building an internally consistent map in real-time from a moving sensor enables drift-free localisation during arbitrarily long periods of motion. [sent-26, score-0.199]

16 Sparse feature filtering methods like [5] were improved on by ‘keyframe SLAM’ systems like PTAM [8] which used bundle adjustment in parallel with tracking to enable high feature counts and more accurate tracking. [sent-29, score-0.149]

17 While this approach is possible with an RGB camera [12], commodity depth cameras have now come to the fore in high performance, robust indoor 3D mapping, in particular via the KinectFusion algorithm [11]. [sent-31, score-0.208]

18 New developments such as [18] have tackled scaling the method via a sliding volume, sub-blocking or octrees; but a a truly scalable, multi-resolution, loop closure capable dense nonparametric surface representation remains elusive, and will always be wasteful in environments with symmetry. [sent-32, score-0.391]

19 From sparse feature-based SLAM, where the world is modelled as an unconnected point cloud, to dense SLAM which assumes that scenes contain continuous surfaces, we have seen an increase in the prior scene knowledge brought to bear. [sent-33, score-0.152]

20 While we currently pre-define the ob- jects expected in a scene, we intend that the paradigm permits the objects in a scene to be identified and segmented automatically as salient, repeated elements. [sent-35, score-0.269]

21 Unlike dense nonparametric approaches, the relatively few discrete entities in the map makes it highly feasible to jointly optimise over all object positions to make globally consistent maps. [sent-37, score-0.192]

22 Further, and crucially, instant recognition of objects provides great efficiency and robustness benefits via the active approaches it permits to tracking and object detection, guided entirely by the dense predictions we can make of the positions of known objects. [sent-39, score-0.319]

23 SLAM++ relates strongly to the growing interest in semantically labelling scene reconstructions and maps, in both the computer vision and robotics communities, though we stress the big difference between post-hoc labelling of geometry and the closed loop, real-time algorithm we present. [sent-40, score-0.336]

24 A depth camera is first used to scan a scene, similar in scale and object content to the results we demonstrate later, and all data is fused into a single large point cloud. [sent-43, score-0.302]

25 Off-line, learned object models, with a degree of variation to cope with a range of real object types, are then matched into the joint scan, optimising both similarity and object configuration constraints. [sent-44, score-0.186]

26 The results are impressive, though the emphasis is on labelling rather than aiding mapping and we can see problems with missing data which cannot be fixed with non-interactive capture. [sent-45, score-0.184]

27 Other good work on labelling using RDB-D data was by Silberman [16] as well as Ren et al. [sent-46, score-0.108]

28 [14] who used kernel descriptors for appearance and shape to label single depth camera images with object and region identity. [sent-47, score-0.256]

29 These objects, once recognized via SIFT descriptors, improved the quality of SLAM due to their known size and shape, though the objects were simple highly textured posters and the scene scale small. [sent-51, score-0.125]

30 [13] demonstrated a 2D laser/camera system which used object recognition to generate discrete entities to map (tree trunks) rather than using Finally, the same idea that object recognition aids reconstruction has been used in off-line structure from motion. [sent-53, score-0.185]

31 [3] represented a scene as a set of points, objects and regions in two-view SfM, solving expensively and jointly in a graph optimisation for a labelling and reconstruction solution taking account of interactions between all scene entities. [sent-55, score-0.481]

32 Given a live depth map Dl, we first compute a surface measurement in the form of a vertex and Fniogrmureal map Ntliln providing input t+o pthipee sequentially computed camera tracking and object detection pipelines. [sent-59, score-0.806]

33 (1) We track the live camera pose aTlw ml awpit hN an iterative closest point approach using the dense multi-object scene prediction captured in the current SLAM graph G. [sent-60, score-0.66]

34 (3) We add successfully detected objects g into the SLAM graph in the form of a object-pose vertex connected to the live estimated camera-pose vertex via a measurement edge. [sent-64, score-0.563]

35 (4) Rendering objects from the SLAM graph produce a predicted depth Dr and normal map Nr into the vliveret eexsti vmiaa taed m efarasmuree,m enabling us 4to) actively sgea orbcjhe only othmo steh pixels Mno gt rdaepshcr pirboeddu by cau prrreendtic objects tinh Dthe graph. [sent-65, score-0.411]

36 Wmael run an individual ICP between each object and the live image resulting in the addition of a new camera-object constraint into the SLAM graph. [sent-66, score-0.249]

37 Creating an Object Database Before live operation in a certain scene, we rapidly make high quality 3D models of repeatedly occurring objects via interactive scanning using KinectFusion [11] in a controlled setting where the object can easily be circled without occlusion. [sent-71, score-0.354]

38 A mesh for the object is extracted from the truncated signed distance volume obtained from KinectFusion using marching cubes [10]. [sent-72, score-0.162]

39 A small amount of manual editing in a mesh tool is performed to separate the object from the ground plane, and mark it up with a coordinate frame such that domain-specific object constraints can be applied. [sent-73, score-0.173]

40 SLAM Map Representation Our representation of the world is a graph, where each node stores either the estimated SE(3) pose (rotation and translation relative to a fixed world frame) Twoj of object j, or Twi of the historical pose of the camera at timestep i (see Figure 5). [sent-77, score-0.45]

41 Each object node is annotated with a type from the object database. [sent-78, score-0.096]

42 Each SE(3) measurement of the pose of an object Zi,oj from the camera is stored in the graph as a factor (constraint) which links one camera pose and one object pose. [sent-79, score-0.665]

43 Additional factors can optionally be added to the graph; between camera poses to represent camera-camera motion estimates (e. [sent-80, score-0.152]

44 Details on graph optimisation are given in Section 3. [sent-85, score-0.192]

45 [6] for recognising the 6DoF pose of 3D objects, represented by meshes, in a depth image. [sent-90, score-0.191]

46 We give details of our parallel implementation, which achieves the real-time detection of multiple instances of multiple objects we need by fully exploiting the fine-grained parallelism of GPUs. [sent-91, score-0.127]

47 ’s method an object is detected and simultaneously localised via the accumulation of votes in a parameter space. [sent-93, score-0.143]

48 Points, with normal estimates, are randomly sampled on a bilateral-filtered image from the depth camera. [sent-95, score-0.136]

49 These samples are paired up in all possible combinations to generate PPFs which vote for 6DoF model configurations containing a similar PPF. [sent-96, score-0.098]

50 Similar structures are also built from each live frame. [sent-98, score-0.201]

51 Matching similar features of the scene against the model can be efficiently performed in parallel via a vectorised binary search, producing a vote for each match. [sent-100, score-0.212]

52 32 + i; codes ← Sort ( codes ) ; / / Decode PPF index and hash key key2ppfMap ← new array; hashKeys ← new array; fhoarseahcKhe eiy ← ←0 ntoe wN a - 1r ainy parallel do key2ppfMap[i] = ∼(1 ? [sent-108, score-0.204]

53 To overcome this, each vote is represented as a 64-bit integer code (Figure 4), which can then be efficiently sorted and reduced in parallel. [sent-111, score-0.098]

54 The first 6 bits encode the alignment angle, followed by 26 bits for the model point and 32 bits for the scene reference point. [sent-114, score-0.206]

55 This is followed by a parallel reduction with a sum operation to accumulate equal vote codes (Algorithm 2). [sent-116, score-0.286]

56 After peak finding, pose estimates for each scene reference point are clustered on the CPU according to translation and rotation thresholds as in [6]. [sent-117, score-0.196]

57 ’s recognition algorithm [6] in room scenes is highly successful when objects occupy most of the camera’s field of view, but poor for objects which are distant or partly occluded by other objects, due to poor sample point coverage. [sent-121, score-0.179]

58 The view prediction capabilities of the system mean that we can generate a mask in image space for depth pixels which are not already described by projected known objects. [sent-123, score-0.133]

59 The measured depth images from the camera are cropped using these masks and samples are only spread in the undescribed regions (see Figure 3). [sent-124, score-0.281]

60 The result of object detection is often multiple cluster peaks, and quantised localisation results. [sent-125, score-0.155]

61 We update the live camera to world transform Twl by estimating a sequence of m incremental updates {T˜rnl }nm=1 parametrised with a vector x ∈ R6 defining a twist in SE(3), with as the identity. [sent-132, score-0.447]

62 We iteratively minimise the whole depth image point-plane metric over all available valid pixels u ∈ Ω in the live depth map: Ec(x) = T˜rnl=0 ? [sent-133, score-0.381]

63 = π(K vˆl (u)), computed by projecting the vertex vl (u) at pixel u from the live depth map into the reference frame with camera intrinsic matrix K and standard pin-hole projection function π. [sent-142, score-0.598]

64 The current live vertex is transformed into the reference frame using the current incremental transform T˜rnl: vˆl(u) = T˜rnlvl(u) , vl(u) = K−1 u˙Dl(u) , vr(u? [sent-143, score-0.426]

65 Taking ×the 6 s lionluetairon sy vsteecmtor u x ntog an element in SE(3) via the exponential map, we compose the computed incremental transform at iteration n + 1onto the previous estimated transform : T˜rnl T˜rnl+1 ← exp(x)T˜rnl . [sent-157, score-0.131]

66 (7) The estimated live camera pose Twl therefore results by composing the final incremental transform onto the previous frame pose: Twl ← TwrT˜rml T˜rml . [sent-158, score-0.543]

67 (8) Tracking Convergence: We use a maximum of m = 10 iterations and check for poor convergence of the optimisation process using two simple criteria. [sent-159, score-0.14]

68 Second, after an optimisation iteration has completed we compute the ratio of pixels in the live image which have been correctly matched with the predicted model ascertained by discounting pixels which induce a point-plane error greater than a specified magnitude ? [sent-161, score-0.349]

69 Tracking for Model Initialisation: We utilise the dense ICP pose estimation and convergence check for two further key components in SLAM++. [sent-163, score-0.19]

70 Therefore, given a candidate object and detected pose, we run camera-model ICP estimation on the detected object pose, and check for convergence using the previously described criteria. [sent-165, score-0.202]

71 We find that for correctly detected objects, the pose estimates from the detector are erroneous within ±30◦ rotation, and ±fro5m0cm th etr daentselcattioorn a. [sent-166, score-0.137]

72 Camera-Object Pose Constraints: Given the active set of objects that have been detected in SLAM++, we further estimate relative camera-object pose parameters which are used to induce constraints in the scene pose graph. [sent-169, score-0.363]

73 To that end, we run the dense ICP estimate between the live frame and each model object currently visible in the frame. [sent-170, score-0.386]

74 The ability to compute an individual relative pose estimate introduces the possibility to prune poorly initialised or incorrectly tracked objects from the pose graph at a later date. [sent-171, score-0.426]

75 By analysing the statistics of the camera-object pose estimate’s convergence we can keep an inlier-outlier style count on the inserted objects, and cull poorly performing ones. [sent-172, score-0.185]

76 Examplegraphilustraingthepose fthemovingcam- era over four time steps Twi (red) as well as the poses of three static objects in the world Twoj (blue). [sent-176, score-0.144]

77 We formulate the problem of estimating the historical poses of the depth camera Twi at time iand the poses of the static objects Twoj as graph optimisation. [sent-179, score-0.431]

78 Zi,oj denotes the 6DoF measurement of object j in frame iand Σi−,o1j its inverse measurement covariance which can be estimated using the approximated Hessian Σi−,o1j = J? [sent-180, score-0.174]

79 Zi,i+1 is the relative ICP constraint between camera iand i+1, with Σi−,i1+1 the corresponding inverse covariance. [sent-182, score-0.118]

80 1 Including Structural Priors Additional information can be incorporated in the graph in order to improve the robustness and accuracy of the optimisation problem. [sent-198, score-0.192]

81 The world reference frame w is defined in such that the x and z-axes lie within the ground plane with the y-axis perpendicular into it. [sent-200, score-0.116]

82 The ground plane is implicitly detected from the initial observation Z1,o1 of the first object; its pose Twf remains fixed during optimisation. [sent-201, score-0.137]

83 Other Priors The ground plane constraint can have value beyond the pose graph. [sent-212, score-0.101]

84 While this is not yet implemented we at least cull hypotheses of object positions far from the ground plane. [sent-215, score-0.097]

85 Relocalisation When camera tracking is lost the system enters a relocalisation mode. [sent-218, score-0.339]

86 Here a new local graph is created and tracked from, and when it contains at least 3 objects it is matched against the previously tracked long-term graph (see Figure 6). [sent-219, score-0.351]

87 The matched vertex with highest vote in the long-term graph is used instead of the currently observed vertex in the local graph and camera tracking is resumed from it, discarding the local map. [sent-223, score-0.742]

88 Whentrackingslotalo- cal graph (blue) is created and matched against a long-term graph (red). [sent-226, score-0.214]

89 (top) Scene with objects and camera frustum when tracking is resumed a few frames after relocalisation. [sent-227, score-0.327]

90 Results × The in-the-loop operation of our system is more effectively demonstrated in our submitted video than on paper, where the advantages of our method over off-line scene labelling may not be immediately obvious. [sent-231, score-0.2]

91 Loop Closure Loop closure in SLAM occurs when a location is revisited after a period of neglect, and the arising drift corrected. [sent-235, score-0.108]

92 Larger loop closures (see Figure 7), where the drift is too much to enable matching via predictive ICP, are detected using a module on based matching fragments within the main long-term graph in the same manner as in relocalisation (Section 3. [sent-237, score-0.478]

93 The real-time process lasted around 10 minutes, including various loop closures, relocalisations due to lost tracking. [sent-242, score-0.145]

94 We apply this × to command virtual characters to navigate the scene and find places to sit as soon as the system is started (without the need to scan a whole room). [sent-253, score-0.102]

95 System statistics We present system settings for mapping the room in Figure 7 (10×6 3m) using a gaming laptop. [sent-258, score-0.117]

96 In this paper we have shown that using high performance 3D object recognition in the loop permits a new approach to real-time SLAM with large advantages in terms of efficient and semantic scene description. [sent-261, score-0.305]

97 In particular we demonstrate how the tight interaction of recognition, mapping and tracking elements is mutually beneficial to all. [sent-262, score-0.167]

98 (middle) Imposing the new correspondences and re-optimising the graph closes the loop and yields a more metric map. [sent-266, score-0.231]

99 (right) For visualisation purposes only (since raw scans are not normally saved in SLAM++), we show a coloured point cloud after loop closure. [sent-267, score-0.145]

100 Combining object recognition and SLAM for extended map representations. [sent-371, score-0.099]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('slam', 0.538), ('icp', 0.217), ('live', 0.201), ('ppf', 0.171), ('ppfs', 0.147), ('rnl', 0.147), ('loop', 0.145), ('relocalisation', 0.13), ('hashkeys', 0.122), ('twoj', 0.122), ('votecodes', 0.122), ('camera', 0.118), ('twi', 0.109), ('closure', 0.108), ('labelling', 0.108), ('localisation', 0.107), ('optimisation', 0.106), ('pose', 0.101), ('vote', 0.098), ('drost', 0.098), ('tracking', 0.091), ('depth', 0.09), ('kinectfusion', 0.09), ('graph', 0.086), ('mapping', 0.076), ('ppfcount', 0.073), ('twl', 0.073), ('undescribed', 0.073), ('objects', 0.069), ('proceedings', 0.067), ('robotics', 0.064), ('vertex', 0.063), ('votes', 0.059), ('parallel', 0.058), ('scene', 0.056), ('permits', 0.056), ('dense', 0.055), ('map', 0.051), ('codes', 0.051), ('browses', 0.049), ('cull', 0.049), ('firstppfindex', 0.049), ('ramos', 0.049), ('renato', 0.049), ('resumed', 0.049), ('rml', 0.049), ('sumop', 0.049), ('twf', 0.049), ('vot', 0.049), ('votecount', 0.049), ('wtrans', 0.049), ('object', 0.048), ('surface', 0.048), ('normal', 0.046), ('scan', 0.046), ('vr', 0.046), ('currently', 0.046), ('measurement', 0.045), ('transform', 0.044), ('hash', 0.044), ('sort', 0.044), ('wrot', 0.043), ('closures', 0.043), ('bolt', 0.043), ('gpgpu', 0.043), ('prediction', 0.043), ('incremental', 0.043), ('accumulate', 0.043), ('paradigm', 0.042), ('matched', 0.042), ('room', 0.041), ('mesh', 0.041), ('enables', 0.041), ('world', 0.041), ('castle', 0.04), ('generalised', 0.04), ('reference', 0.039), ('entities', 0.038), ('predictive', 0.038), ('international', 0.038), ('cubes', 0.038), ('measurements', 0.038), ('conjunction', 0.037), ('bits', 0.037), ('conference', 0.037), ('cholesky', 0.036), ('detected', 0.036), ('operation', 0.036), ('frame', 0.036), ('se', 0.035), ('environments', 0.035), ('description', 0.035), ('initialised', 0.035), ('marching', 0.035), ('count', 0.035), ('dr', 0.035), ('reality', 0.034), ('poses', 0.034), ('check', 0.034), ('tracked', 0.034)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000004 372 cvpr-2013-SLAM++: Simultaneous Localisation and Mapping at the Level of Objects

Author: Renato F. Salas-Moreno, Richard A. Newcombe, Hauke Strasdat, Paul H.J. Kelly, Andrew J. Davison

2 0.47809091 231 cvpr-2013-Joint Detection, Tracking and Mapping by Semantic Bundle Adjustment

Author: Nicola Fioraio, Luigi Di_Stefano

Abstract: In this paper we propose a novel Semantic Bundle Adjustmentframework whereby known rigid stationary objects are detected while tracking the camera and mapping the environment. The system builds on established tracking and mapping techniques to exploit incremental 3D reconstruction in order to validate hypotheses on the presence and pose of sought objects. Then, detected objects are explicitly taken into account for a global semantic optimization of both camera and object poses. Thus, unlike all systems proposed so far, our approach allows for solving jointly the detection and SLAM problems, so as to achieve object detection together with improved SLAM accuracy.

3 0.42758131 111 cvpr-2013-Dense Reconstruction Using 3D Object Shape Priors

Author: Amaury Dame, Victor A. Prisacariu, Carl Y. Ren, Ian Reid

Abstract: We propose a formulation of monocular SLAM which combines live dense reconstruction with shape priors-based 3D tracking and reconstruction. Current live dense SLAM approaches are limited to the reconstruction of visible surfaces. Moreover, most of them are based on the minimisation of a photo-consistency error, which usually makes them sensitive to specularities. In the 3D pose recovery literature, problems caused by imperfect and ambiguous image information have been dealt with by using prior shape knowledge. At the same time, the success of depth sensors has shown that combining joint image and depth information drastically increases the robustness of the classical monocular 3D tracking and 3D reconstruction approaches. In this work we link dense SLAM to 3D object pose and shape recovery. More specifically, we automatically augment our SLAMsystem with object specific identity, together with 6D pose and additional shape degrees of freedom for the object(s) of known class in the scene, combining im- age data and depth information for the pose and shape recovery. This leads to a system that allows for full scaled 3D reconstruction with the known object(s) segmented from the scene. The segmentation enhances the clarity, accuracy and completeness of the maps built by the dense SLAM system, while the dense 3D data aids the segmentation process, yieldingfaster and more reliable convergence than when using 2D image data alone.

4 0.1763909 380 cvpr-2013-Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images

Author: Jamie Shotton, Ben Glocker, Christopher Zach, Shahram Izadi, Antonio Criminisi, Andrew Fitzgibbon

Abstract: We address the problem of inferring the pose of an RGB-D camera relative to a known 3D scene, given only a single acquired image. Our approach employs a regression forest that is capable of inferring an estimate of each pixel’s correspondence to 3D points in the scene ’s world coordinate frame. The forest uses only simple depth and RGB pixel comparison features, and does not require the computation of feature descriptors. The forest is trained to be capable of predicting correspondences at any pixel, so no interest point detectors are required. The camera pose is inferred using a robust optimization scheme. This starts with an initial set of hypothesized camera poses, constructed by applying the forest at a small fraction of image pixels. Preemptive RANSAC then iterates sampling more pixels at which to evaluate the forest, counting inliers, and refining the hypothesized poses. We evaluate on several varied scenes captured with an RGB-D camera and observe that the proposed technique achieves highly accurate relocalization and substantially out-performs two state of the art baselines.

5 0.16146505 284 cvpr-2013-Mesh Based Semantic Modelling for Indoor and Outdoor Scenes

Author: Julien P.C. Valentin, Sunando Sengupta, Jonathan Warrell, Ali Shahrokni, Philip H.S. Torr

Abstract: Semantic reconstruction of a scene is important for a variety of applications such as 3D modelling, object recognition and autonomous robotic navigation. However, most object labelling methods work in the image domain and fail to capture the information present in 3D space. In this work we propose a principled way to generate object labelling in 3D. Our method builds a triangulated meshed representation of the scene from multiple depth estimates. We then define a CRF over this mesh, which is able to capture the consistency of geometric properties of the objects present in the scene. In this framework, we are able to generate object hypotheses by combining information from multiple sources: geometric properties (from the 3D mesh), and appearance properties (from images). We demonstrate the robustness of our framework in both indoor and outdoor scenes. For indoor scenes we created an augmented version of the NYU indoor scene dataset (RGB-D images) with object labelled meshes for training and evaluation. For outdoor scenes, we created ground truth object labellings for the KITTI odometry dataset (stereo image sequence). We observe a signifi- cant speed-up in the inference stage by performing labelling on the mesh, and additionally achieve higher accuracies.

6 0.1396852 74 cvpr-2013-CLAM: Coupled Localization and Mapping with Efficient Outlier Handling

7 0.12846799 345 cvpr-2013-Real-Time Model-Based Rigid Object Pose Estimation and Tracking Combining Dense and Sparse Visual Cues

8 0.1149189 245 cvpr-2013-Layer Depth Denoising and Completion for Structured-Light RGB-D Cameras

9 0.10814887 397 cvpr-2013-Simultaneous Super-Resolution of Depth and Images Using a Single Camera

10 0.10519109 444 cvpr-2013-Unconstrained Monocular 3D Human Pose Estimation by Action Detection and Cross-Modality Regression Forest

11 0.10458545 117 cvpr-2013-Detecting Changes in 3D Structure of a Scene from Multi-view Images Captured by a Vehicle-Mounted Camera

12 0.095036298 61 cvpr-2013-Beyond Point Clouds: Scene Understanding by Reasoning Geometry and Physics

13 0.093727499 82 cvpr-2013-Class Generative Models Based on Feature Regression for Pose Estimation of Object Categories

14 0.090170965 423 cvpr-2013-Template-Based Isometric Deformable 3D Reconstruction with Sampling-Based Focal Length Self-Calibration

15 0.089844763 424 cvpr-2013-Templateless Quasi-rigid Shape Modeling with Implicit Loop-Closure

16 0.089680299 440 cvpr-2013-Tracking People and Their Objects

17 0.086913072 108 cvpr-2013-Dense 3D Reconstruction from Severely Blurred Images Using a Single Moving Camera

18 0.086804144 368 cvpr-2013-Rolling Shutter Camera Calibration

19 0.086634487 14 cvpr-2013-A Joint Model for 2D and 3D Pose Estimation from a Single Image

20 0.083844438 209 cvpr-2013-Hypergraphs for Joint Multi-view Reconstruction and Multi-object Tracking

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.232), (1, 0.161), (2, 0.021), (3, -0.026), (4, 0.013), (5, -0.07), (6, 0.02), (7, 0.027), (8, 0.052), (9, 0.01), (10, -0.073), (11, 0.087), (12, -0.038), (13, 0.134), (14, 0.0), (15, -0.159), (16, -0.043), (17, 0.175), (18, -0.069), (19, 0.036), (20, -0.015), (21, -0.035), (22, -0.034), (23, 0.012), (24, -0.19), (25, -0.036), (26, 0.037), (27, 0.038), (28, -0.035), (29, -0.045), (30, -0.056), (31, 0.075), (32, -0.182), (33, -0.145), (34, -0.069), (35, 0.209), (36, 0.215), (37, -0.041), (38, 0.072), (39, -0.094), (40, -0.051), (41, 0.056), (42, 0.01), (43, -0.145), (44, 0.058), (45, -0.061), (46, 0.076), (47, -0.106), (48, 0.054), (49, -0.101)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.92912585 372 cvpr-2013-SLAM++: Simultaneous Localisation and Mapping at the Level of Objects

Author: Renato F. Salas-Moreno, Richard A. Newcombe, Hauke Strasdat, Paul H.J. Kelly, Andrew J. Davison

2 0.87956208 231 cvpr-2013-Joint Detection, Tracking and Mapping by Semantic Bundle Adjustment

Author: Nicola Fioraio, Luigi Di_Stefano

3 0.80363452 111 cvpr-2013-Dense Reconstruction Using 3D Object Shape Priors

Author: Amaury Dame, Victor A. Prisacariu, Carl Y. Ren, Ian Reid

4 0.61604875 74 cvpr-2013-CLAM: Coupled Localization and Mapping with Efficient Outlier Handling

Author: Jonathan Balzer, Stefano Soatto

Abstract: We describe a method to efficiently generate a model (map) of small-scale objects from video. The map encodes sparse geometry as well as coarse photometry, and could be used to initialize dense reconstruction schemes as well as to support recognition and localization of three-dimensional objects. Self-occlusions and the predominance of outliers present a challenge to existing online Structure From Motion and Simultaneous Localization and Mapping systems. We propose a unified inference criterion that encompasses map building and localization (object detection) relative to the map in a coupled fashion. We establish correspondence in a computationally efficient way without resorting to combinatorial matching or random-sampling techniques. Instead, we use a simpler M-estimator that exploits putative correspondence from tracking after photometric and topological validation. We have collected a new dataset to benchmark model building in the small scale, which we test our algorithm on in comparison to others. Although our system is significantly leaner than previous ones, it compares favorably to the state of the art in terms of accuracy and robustness.

5 0.57179856 380 cvpr-2013-Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images

Author: Jamie Shotton, Ben Glocker, Christopher Zach, Shahram Izadi, Antonio Criminisi, Andrew Fitzgibbon

6 0.51550138 61 cvpr-2013-Beyond Point Clouds: Scene Understanding by Reasoning Geometry and Physics

7 0.50607437 284 cvpr-2013-Mesh Based Semantic Modelling for Indoor and Outdoor Scenes

8 0.49016455 331 cvpr-2013-Physically Plausible 3D Scene Tracking: The Single Actor Hypothesis

9 0.47741115 110 cvpr-2013-Dense Object Reconstruction with Semantic Priors

10 0.47336093 176 cvpr-2013-Five Shades of Grey for Fast and Reliable Camera Pose Estimation

11 0.46375373 345 cvpr-2013-Real-Time Model-Based Rigid Object Pose Estimation and Tracking Combining Dense and Sparse Visual Cues

12 0.45389619 395 cvpr-2013-Shape from Silhouette Probability Maps: Reconstruction of Thin Objects in the Presence of Silhouette Extraction and Calibration Error

13 0.44219953 289 cvpr-2013-Monocular Template-Based 3D Reconstruction of Extensible Surfaces with Local Linear Elasticity

14 0.43626025 82 cvpr-2013-Class Generative Models Based on Feature Regression for Pose Estimation of Object Categories

15 0.43221509 138 cvpr-2013-Efficient 2D-to-3D Correspondence Filtering for Scalable 3D Object Recognition

16 0.42994601 368 cvpr-2013-Rolling Shutter Camera Calibration

17 0.41502661 209 cvpr-2013-Hypergraphs for Joint Multi-view Reconstruction and Multi-object Tracking

18 0.41134647 354 cvpr-2013-Relative Volume Constraints for Single View 3D Reconstruction

19 0.4104445 397 cvpr-2013-Simultaneous Super-Resolution of Depth and Images Using a Single Camera

20 0.40953261 117 cvpr-2013-Detecting Changes in 3D Structure of a Scene from Multi-view Images Captured by a Vehicle-Mounted Camera

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(10, 0.118), (16, 0.034), (26, 0.035), (28, 0.016), (33, 0.229), (67, 0.073), (69, 0.103), (87, 0.085), (98, 0.22)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.84098983 372 cvpr-2013-SLAM++: Simultaneous Localisation and Mapping at the Level of Objects

Author: Renato F. Salas-Moreno, Richard A. Newcombe, Hauke Strasdat, Paul H.J. Kelly, Andrew J. Davison

2 0.81187558 22 cvpr-2013-A Non-parametric Framework for Document Bleed-through Removal

Author: Róisín Rowley-Brooke, François Pitié, Anil Kokaram

Abstract: This paper presents recent work on a new framework for non-blind document bleed-through removal. The framework includes image preprocessing to remove local intensity variations, pixel region classification based on a segmentation of the joint recto-verso intensity histogram and connected component analysis on the subsequent image labelling. Finally restoration of the degraded regions is performed using exemplar-based image inpainting. The proposed method is evaluated visually and numerically on a freely available database of 25 scanned manuscript image pairs with ground truth, and is shown to outperform recent non-blind bleed-through removal techniques.

3 0.78201628 172 cvpr-2013-Finding Group Interactions in Social Clutter

Author: Ruonan Li, Parker Porfilio, Todd Zickler

Abstract: We consider the problem of finding distinctive social interactions involving groups of agents embedded in larger social gatherings. Given a pre-defined gallery of short exemplar interaction videos, and a long input video of a large gathering (with approximately-tracked agents), we identify within the gathering small sub-groups of agents exhibiting social interactions that resemble those in the exemplars. The participants of each detected group interaction are localized in space; the extent of their interaction is localized in time; and when the gallery ofexemplars is annotated with group-interaction categories, each detected interaction is classified into one of the pre-defined categories. Our approach represents group behaviors by dichotomous collections of descriptors for (a) individual actions, and (b) pairwise interactions; and it includes efficient algorithms for optimally distinguishing participants from by-standers in every temporal unit and for temporally localizing the extent of the group interaction. Most importantly, the method is generic and can be applied whenever numerous interacting agents can be approximately tracked over time. We evaluate the approach using three different video collections, two that involve humans and one that involves mice.

4 0.78110617 248 cvpr-2013-Learning Collections of Part Models for Object Recognition

Author: Ian Endres, Kevin J. Shih, Johnston Jiaa, Derek Hoiem

Abstract: We propose a method to learn a diverse collection of discriminative parts from object bounding box annotations. Part detectors can be trained and applied individually, which simplifies learning and extension to new features or categories. We apply the parts to object category detection, pooling part detections within bottom-up proposed regions and using a boosted classifier with proposed sigmoid weak learners for scoring. On PASCAL VOC 2010, we evaluate the part detectors ’ ability to discriminate and localize annotated keypoints. Our detection system is competitive with the best-existing systems, outperforming other HOG-based detectors on the more deformable categories.

5 0.77958411 231 cvpr-2013-Joint Detection, Tracking and Mapping by Semantic Bundle Adjustment

Author: Nicola Fioraio, Luigi Di_Stefano

6 0.77958119 61 cvpr-2013-Beyond Point Clouds: Scene Understanding by Reasoning Geometry and Physics

7 0.775778 292 cvpr-2013-Multi-agent Event Detection: Localization and Role Assignment

8 0.77514899 365 cvpr-2013-Robust Real-Time Tracking of Multiple Objects by Volumetric Mass Densities

9 0.77511454 86 cvpr-2013-Composite Statistical Inference for Semantic Segmentation

10 0.77192652 187 cvpr-2013-Geometric Context from Videos

11 0.77155364 445 cvpr-2013-Understanding Bayesian Rooms Using Composite 3D Object Models

12 0.77092063 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases

13 0.77083677 70 cvpr-2013-Bottom-Up Segmentation for Top-Down Detection

14 0.76931268 1 cvpr-2013-3D-Based Reasoning with Blocks, Support, and Stability

15 0.76775986 19 cvpr-2013-A Minimum Error Vanishing Point Detection Approach for Uncalibrated Monocular Images of Man-Made Environments

16 0.76754683 331 cvpr-2013-Physically Plausible 3D Scene Tracking: The Single Actor Hypothesis

17 0.76715946 288 cvpr-2013-Modeling Mutual Visibility Relationship in Pedestrian Detection

18 0.76672369 414 cvpr-2013-Structure Preserving Object Tracking

19 0.7655884 325 cvpr-2013-Part Discovery from Partial Correspondence

20 0.7655561 256 cvpr-2013-Learning Structured Hough Voting for Joint Object Detection and Occlusion Reasoning