cvpr cvpr2013 cvpr2013-231 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Nicola Fioraio, Luigi Di_Stefano
Abstract: In this paper we propose a novel Semantic Bundle Adjustmentframework whereby known rigid stationary objects are detected while tracking the camera and mapping the environment. The system builds on established tracking and mapping techniques to exploit incremental 3D reconstruction in order to validate hypotheses on the presence and pose of sought objects. Then, detected objects are explicitly taken into account for a global semantic optimization of both camera and object poses. Thus, unlike all systems proposed so far, our approach allows for solving jointly the detection and SLAM problems, so as to achieve object detection together with improved SLAM accuracy.
Reference: text
sentIndex sentText sentNum sentScore
1 it a Abstract In this paper we propose a novel Semantic Bundle Adjustmentframework whereby known rigid stationary objects are detected while tracking the camera and mapping the environment. [sent-5, score-0.298]
2 The system builds on established tracking and mapping techniques to exploit incremental 3D reconstruction in order to validate hypotheses on the presence and pose of sought objects. [sent-6, score-0.273]
3 Then, detected objects are explicitly taken into account for a global semantic optimization of both camera and object poses. [sent-7, score-0.432]
4 Thus, unlike all systems proposed so far, our approach allows for solving jointly the detection and SLAM problems, so as to achieve object detection together with improved SLAM accuracy. [sent-8, score-0.203]
5 The classical approach builds on filtering techniques, such as the Extended Kalman Filter (EKF) [10, 11, 8]: visual features are tracked through frames and their 3D positions estimated along with the unknown camera pose. [sent-12, score-0.167]
6 The authors propose to split the SLAM problem into two different tasks associated with parallel threads: one tracks the camera with respect to current estimates of landmark locations, the other is in charge of the global optimization over selected keyframes. [sent-16, score-0.207]
7 While the tracking and mapping task has thus reached a certain degree of maturity, none of previous methods can seamlessly handle or derive semantic information within the visual SLAM process. [sent-23, score-0.313]
8 Indeed, nowadays the scientific community is showing ever increasing interest for the novel field of semantic perception and mapping [12, 9, 19]. [sent-24, score-0.218]
9 The intuition is that a partial reconstruction of the environment can improve the object detection task, so as to better handle nuisances such as occlusions, clutter and viewpoint changes, while the knowledge of object poses provides useful constraints to improve the mapping and tracking tasks. [sent-25, score-0.457]
10 Though some interesting steps grounded on the above intuition have been made [9, 7, 4, 3], we claim that none ofcurrent proposals has really closed the detection-SLAM loop (see Sec. [sent-26, score-0.135]
11 Related Work Many recent works share the idea of combining semantic knowledge and geometrical constraints for scene understanding. [sent-39, score-0.187]
12 Unlike our proposal, most of them [12, 15, 20] perform the object detection task on a single view and hence do not enforce multi-view consistency. [sent-40, score-0.134]
13 [12] do not try to estimate the exact position of detected objects, even though detection is tight to a SLAM framework. [sent-42, score-0.162]
14 Others exploit geometric information for consistent object detection, but without deploying previously detected objects to constrain the mapping task, as instead we actually do. [sent-43, score-0.303]
15 [28] detect objects using a standard feature-based pipeline [21] and use the estimated relative poses for place representation. [sent-45, score-0.145]
16 [22] find known objects in a map built from laser and odometry data, but again the object and camera poses are not estimated jointly. [sent-47, score-0.243]
17 The former introduces the notion of “cognitive loop”, but detection is limited to cars and pedestrians and, moreover, strong assumptions are made about the environment and camera motion. [sent-49, score-0.198]
18 However, the detection problem is still not fully integrated into the SLAM framework, so decisions about object presence are taken using features coming from a single image. [sent-55, score-0.244]
19 [4, 3] have proposed a semantic structure from motion technique which addresses the problem of estimating the camera poses and recognizing object categories from an image sequence. [sent-57, score-0.353]
20 Including into ject’s features (b) as graph vertex (c) to achieve object graph (a) ignores any semantic the optimization matches of obedges and the object pose as a detection and improve SLAM. [sent-61, score-0.814]
21 More importantly, in our proposal the object detection pipeline is not external but instead fully integrated into the SLAM framework so that object existence is inferred through a novel semantic bundle adjustment framework. [sent-63, score-0.882]
22 Finally, our method is aimed at detection and full 6DOF pose estimation of object instances rather than category level recognition and image plane localization. [sent-64, score-0.199]
23 Semantic Bundle Adjustment Bundle adjustment is the problem of the joint estimation of a set of geometric parameters that are simultaneously optimized with respect to a cost function quantifying the model fitting error [27]. [sent-66, score-0.164]
24 In a typical SLAM applica- tion, where the geometric unknowns are camera poses, a BA formulation allows for both tracking the sensor movement and reconstructing the environment incrementally. [sent-67, score-0.179]
25 For instance, in a monocular SLAM scenario we might wish to track 2D features between subsequent frames, so that we are led to include in x both the 6DOF camera poses and the 3D location of matched features. [sent-82, score-0.184]
26 Then, given an estimate for feature xi and camera pose xj, the constraint ei? [sent-83, score-0.147]
27 (1) are effectively represented by a graph: parameter block xi maps to vertex i, while constraint ejk to an edge connecting vertexes j and k. [sent-86, score-0.295]
28 Note that two parameter blocks can be constrained by more than one measurement, so we could have more than one edge linking the same vertex pair. [sent-87, score-0.175]
29 1b, to achieve object detection and also improve SLAM. [sent-95, score-0.134]
30 1c, explicitly includes into the graph the unknown object pose as a vertex constrained to the matching frames. [sent-97, score-0.349]
31 As our proposal is agnostic with respect to a specific BA-based SLAM approach, in the following we will simply refer to generic frame-to-frame constraints coming from a SLAM engine. [sent-99, score-0.165]
32 The Object Detection Pipeline State-of-the-art feature-based 2D/3D object detection pipelines, such as e. [sent-112, score-0.134]
33 On the other hand, should several snapshots of the scene be available, it would not be straightforward to deploy such far richer information as there is no established machinery to carry out detection from cues gathered from different uncalibrated views. [sent-116, score-0.145]
34 Our novel BA-formulation of the object detection problem effectively overcomes the above limitation. [sent-117, score-0.134]
35 The Validation Graph As soon as a new frame becomes available, features are extracted, described and matched to the model database (see Sec. [sent-118, score-0.206]
36 Then, for every set of correspondences related to a given object we build a validation graph as shown in Fig. [sent-122, score-0.352]
37 Figure 2: A validation graph is built from frame-toobject correspondences as well as frame-to-frame constraints (solid green) provided by the SLAM engine; both 2D (a) and 3D (b) features can be used. [sent-128, score-0.324]
38 vertex representing the unknown 3D position of the landmark and a set of edges to include its reprojection errors into the cost function. [sent-129, score-0.345]
39 2a, which is, let’s say, the landmark position for the nth object feature. [sent-131, score-0.193]
40 2 (2) qon qon where denotes the nth 2D feature point learned for the oth object, pio,n is the 2D feature point of the ith frame that matches with probability sio,n, Vi [r] rotates and translates r ∈ R3 so as to apply the current pose estimate associated with the ith vertex, i. [sent-144, score-0.523]
41 In this scenario 3D coordinates are known in every vertex reference frame, so no landmark vertex has to be created but instead we can directly link the camera frames to the object (see Fig. [sent-170, score-0.58]
42 Moreover, we can constrain together the frames in which matches related to the same object features are found by an extra edge representing a virtual match. [sent-172, score-0.321]
43 Accordingly, if mij is the event “feature i matches feature j ” and we know that Pr (mik) = sik and Pr (mjk) = sjk, we wish to know Pr (mij). [sent-173, score-0.169]
44 Thus, recalling that now qon and pio,n represent 3D features, the semantic edges for the nth object feature in Fig. [sent-175, score-0.446]
45 2 (6) Finally, we add frame-to-frame constraints from the SLAM engine in order to robustify detection and get consistent optimization results. [sent-197, score-0.261]
46 If the current frame is the first matching that object, we also expand the validation graph with frame-to-frame constraints to the previous frame (c. [sent-198, score-0.495]
47 Hypotheses Validation and Pose Refinement The validation graph is optimized minimizing the cost function in Eq. [sent-202, score-0.238]
48 Then, to retain or discard edges, we rely on the global weighted mean residual from the last global optimization, ? [sent-204, score-0.135]
49 Therefore, we compare this value to the residual ofeach match coming from the last processed frame and, if this is above some threshold, the edge is removed. [sent-209, score-0.317]
50 More precisely, in case of 2D feature matches we remove the edge from the frame vertex to the landmark vertex if − Vi ≥ α? [sent-210, score-0.595]
51 the landmark vertex associated with the nth feature point on the oth object and α is a given parameter. [sent-218, score-0.323]
52 Of course, if the removal of the frameto-landmark edge leaves the landmark vertex attached only to the object, we delete the object-to-landmark edge too. [sent-219, score-0.308]
53 Under the hypotheses given in the previous paragraph about feature matching, the edges representing virtual matches created toward other frame vertexes are deleted as well. [sent-225, score-0.532]
54 the frame-to-landmark or the frame-to-object edges, connected to the last processed frame are counted. [sent-228, score-0.15]
55 Ifabove a minimum number, the remaining constraints are validated, otherwise we treat them as noise and remove them from the validation graph. [sent-229, score-0.186]
56 If the validation process is successful, an other optimization is run on the remaining edges to refine the estimate. [sent-231, score-0.232]
57 Then, a final cleaning is performed on the whole validation graph with a threshold ? [sent-232, score-0.331]
58 3); ηf ≤ Nse < ηt the detection is ambiguous, the validation graph is saved waiting for more visual cues, but the object is removed from the global graph, ifpresent (c. [sent-238, score-0.449]
59 3); Nse ≥ ηt the object is detected and added to, if not present, or updated in, if already present, the global graph (c. [sent-242, score-0.249]
60 Again, the two thresholds are not critical, as at every frame the validation of new hypotheses benefits from previously refined feature matches and, moreover, the final cleaning retains only the best edges among all, i. [sent-247, score-0.576]
61 Semantic SLAM The validation graphs built in the previous section for every matched object covers only a subset of the whole map, i. [sent-256, score-0.26]
62 Hence, we carry out a global semantic optimization in order to jointly optimize all camera poses as well as all object poses. [sent-259, score-0.425]
63 This global graph comprises: • • • all the camera pose vertexes with frame-to-frame constraints coming from the SLAM engine; all the pose vertexes of those objects for which the validation procedure turned out successful (c. [sent-260, score-0.865]
64 2); all frame-to-landmark and object-to-landmark constraints, in case of 2D feature matching, or frame-toobject and virtual frame-to-frame constraints, in case of 3D feature matching, coming from detected objects’ validation graphs. [sent-264, score-0.302]
65 Optimizing such a graph spreads the error over all the estimates and gives a consistent global solution. [sent-265, score-0.159]
66 This residual represents the mean expected error for an edge consistent with the current solution. [sent-273, score-0.139]
67 As such, it can be used as validation threshold during the next object detection routine, as already discussed in Sec. [sent-274, score-0.283]
68 The adopted strategy returns a large number of matches, so a reduced set is picked out by a RANSAC-based 6DOF object pose estimation [2]. [sent-284, score-0.13]
69 The final set still includes outliers, but our semantic bundle adjustment algorithm can cope with them effectively. [sent-287, score-0.445]
70 Quantitative Results Our proposal is about the estimation of both camera and objects 6DOF poses, but to the best of our knowledge no publicly available dataset provides all such ground-truth information. [sent-292, score-0.189]
71 Therefore, to achieve quantitative results, we decided to render object meshes into frames belonging to the publicly available RGB-D SLAM dataset [25], which is a collection of sequences acquired by a Kinect and available together with ground-truth camera poses. [sent-293, score-0.232]
72 Moreover, before rendering objects, we performed a 2Hz time subsampling and an ICP-like [5] pose optimization in order to improve the accuracy of ground-truth camera poses. [sent-294, score-0.147]
73 Frame-to-frame correspondences are obtained by matching SIFT features [21], projecting them in 3D based on the depth map and running a RANSAC-based camera pose estimation [2], which is the basic approach taken by recent SLAM engines for RGB-D sensors such as [14, 13]. [sent-298, score-0.196]
74 (a) (b) Figure 6: 4-objects sequence: Final error in rotation (a), in degrees, and translation (b), in meters, for every processed frame (numbered) and the detected objects (a: Doll, b: Duck, c: Frog, d: Mario). [sent-301, score-0.281]
75 Figure 4: 7-objects sequence: without explicit loop closure detection a plain SLAM engine (left) accumulates error over time, whilst the ability to detect Mario allows our technique to also constrain the last and first frames and thus to tear down the global error (right) . [sent-303, score-0.66]
76 5 and 6 show, together with object detection (also in Fig. [sent-306, score-0.134]
77 3), improved accuracy compared to the adopted plain SLAM engine due to the links between frames established through our semantic edges, such as in particular those due to detection of objects at the beginning and at the end of the sequences (see Fig. [sent-307, score-0.587]
78 To illustrate the effectiveness of the proposed match validation technique, Tab. [sent-309, score-0.149]
79 1 reports the frame-to-object edges in the validation graph ofthe Frog model for some frames of the 7-objects sequence. [sent-310, score-0.445]
80 Frog is firstly matched in frame 16, with 11 feature correspondences surviving the RANSAC step (row “Frame 16” in Tab. [sent-311, score-0.205]
81 These matches are clearly false positives (Frog is the object displayed in frame 36) and our cleaning algorithm is able to recognize this inconsistency leaving only 3 edges into the graph. [sent-313, score-0.443]
82 Thus, the object pose is not inserted into the global graph to be optimized, though the wrong edges are saved. [sent-314, score-0.416]
83 Later, Frog is correctly matched, the correct correspondences pass all the validation tests and the object is added to the global graph, but the previous wrong edges raise the weighted mean residual and cause a large reconstruction error (rows from “Frame 34” to “Frame 37”). [sent-315, score-0.519]
84 However, after a few frames, as more good correspondences are gathered, the 3 wrong edges are detected and erased, thus tearing down the error on the object’s pose estimate (rows “Frame 38” and “Frame 39”). [sent-316, score-0.33]
85 7, we performed a complete loop capturing the object Doll at the beginning and at the end of the sequence. [sent-322, score-0.168]
86 Then, we ran both the plain SLAM engine and our novel semantic pipeline: while a simple tracking scheme eventually drifts (see Fig. [sent-323, score-0.402]
87 7a), our approach correctly validates object presence and implicitly closes the loop by deploying object de111555444311 Frame 1368Frame 396F Fr a m me 31 3 5649873x(0 13 6)1x3 —(31 48)5x —53( 5 )12 x— 2(3 162 )17x — (2317 2)13x— 32(183)0x— (3479)P18 o6084s. [sent-324, score-0.305]
88 0821 0(739R5m370m / t) Table 1: 7-objects sequence: an excerpt from the validation graph ofmodel Frog. [sent-327, score-0.238]
89 Rows reports the number offrame-to-object edges for the vertexes in the graph at the end of the validation procedure for the frame in first column. [sent-328, score-0.59]
90 Also, the number of matches before edge cleaning is shown in brackets. [sent-329, score-0.23]
91 Last column reports the pose error for Frog in the global graph. [sent-330, score-0.174]
92 (a) (b) Figure 7: A plain SLAM engine (a) accumulates errors and eventually cannot close the loop; integrated object detection and semantic optimization (b) allows for implicit loop closure and improved reconstruction. [sent-331, score-0.736]
93 Figure 8: AR with 3D occlusion handling: once the Doll is detected (top right) a red umbrella can be rendered, even when the object is later occluded (bottom right); the final augmented 3D reconstruction is shown on the left. [sent-332, score-0.157]
94 Indeed, unlike previous work which enables to augment only the whole scene reconstruction due to the lack of semantic information related to individual objects [17, 23], our framework brings in seamlessly 3D mapping, camera tracking and object detection/localization, i. [sent-336, score-0.432]
95 Concluding Remarks We have proposed a novel Semantical Bundle Adjustment framework which allows for solving jointly the object detection and SLAM problems. [sent-340, score-0.134]
96 To address the problem through bundle adjustment, which is inherently unimodal, we follow two steps: first the validation graphs establishes upon objects’ existence, then the global semantic graph jointly solves for camera and detected object poses. [sent-342, score-0.794]
97 As such, it is not conceived as a realtime application, the bottleneck being detection, description and matching of 3D features, which requires several seconds, although the semantic optimization can run in the order of tens of milliseconds. [sent-344, score-0.15]
98 From a theoretical perspective, we plan to extend our work towards two main directions: detection of multiple object instances, which is not supported in the current formulation but quite easily addressable in principle, and generalizing the framework to deal also with category-level recognition. [sent-346, score-0.134]
99 Integrating active mobile robot object recognition and slam in natural environments. [sent-444, score-0.709]
100 A benchmark for the evaluation of rgb-d slam systems. [sent-562, score-0.595]
wordName wordTfidf (topN-words)
[('slam', 0.595), ('bundle', 0.164), ('semantic', 0.15), ('validation', 0.149), ('int', 0.143), ('frog', 0.14), ('adjustment', 0.131), ('vertex', 0.13), ('vertexes', 0.12), ('engine', 0.113), ('oct', 0.111), ('frame', 0.11), ('qon', 0.108), ('loop', 0.103), ('cleaning', 0.093), ('matches', 0.092), ('plain', 0.089), ('graph', 0.089), ('landmark', 0.088), ('frames', 0.085), ('edges', 0.083), ('camera', 0.082), ('mij', 0.077), ('nse', 0.077), ('deploying', 0.072), ('strasdat', 0.072), ('robotics', 0.071), ('detection', 0.069), ('mapping', 0.068), ('proposal', 0.067), ('object', 0.065), ('pose', 0.065), ('mario', 0.063), ('residual', 0.061), ('coming', 0.061), ('closure', 0.06), ('detected', 0.058), ('poses', 0.056), ('ekvall', 0.054), ('engelhard', 0.054), ('mjk', 0.054), ('vasudevan', 0.054), ('doll', 0.054), ('zij', 0.054), ('tracking', 0.05), ('soon', 0.05), ('correspondences', 0.049), ('integrated', 0.049), ('robot', 0.049), ('hypotheses', 0.049), ('pipeline', 0.049), ('sep', 0.048), ('davison', 0.048), ('duck', 0.048), ('bologna', 0.048), ('luigi', 0.048), ('meger', 0.048), ('environment', 0.047), ('kinect', 0.047), ('matched', 0.046), ('seamlessly', 0.045), ('edge', 0.045), ('deleted', 0.044), ('mik', 0.044), ('castle', 0.044), ('ekf', 0.044), ('reprojection', 0.044), ('bao', 0.043), ('pr', 0.042), ('cornelis', 0.042), ('robustify', 0.042), ('wrong', 0.042), ('established', 0.041), ('nth', 0.04), ('processed', 0.04), ('tombari', 0.04), ('civera', 0.04), ('saved', 0.04), ('numbered', 0.04), ('objects', 0.04), ('reports', 0.039), ('ptam', 0.038), ('accumulates', 0.038), ('external', 0.038), ('reality', 0.038), ('global', 0.037), ('constraints', 0.037), ('italy', 0.037), ('stefano', 0.036), ('concluding', 0.036), ('carry', 0.035), ('existence', 0.035), ('though', 0.035), ('virtual', 0.034), ('augmented', 0.034), ('error', 0.033), ('iros', 0.033), ('routine', 0.033), ('ismar', 0.033), ('grounded', 0.032)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000006 231 cvpr-2013-Joint Detection, Tracking and Mapping by Semantic Bundle Adjustment
Author: Nicola Fioraio, Luigi Di_Stefano
Abstract: In this paper we propose a novel Semantic Bundle Adjustmentframework whereby known rigid stationary objects are detected while tracking the camera and mapping the environment. The system builds on established tracking and mapping techniques to exploit incremental 3D reconstruction in order to validate hypotheses on the presence and pose of sought objects. Then, detected objects are explicitly taken into account for a global semantic optimization of both camera and object poses. Thus, unlike all systems proposed so far, our approach allows for solving jointly the detection and SLAM problems, so as to achieve object detection together with improved SLAM accuracy.
2 0.47809091 372 cvpr-2013-SLAM++: Simultaneous Localisation and Mapping at the Level of Objects
Author: Renato F. Salas-Moreno, Richard A. Newcombe, Hauke Strasdat, Paul H.J. Kelly, Andrew J. Davison
Abstract: We present the major advantages of a new ‘object oriented’ 3D SLAM paradigm, which takes full advantage in the loop of prior knowledge that many scenes consist of repeated, domain-specific objects and structures. As a hand-held depth camera browses a cluttered scene, realtime 3D object recognition and tracking provides 6DoF camera-object constraints which feed into an explicit graph of objects, continually refined by efficient pose-graph optimisation. This offers the descriptive and predictive power of SLAM systems which perform dense surface reconstruction, but with a huge representation compression. The object graph enables predictions for accurate ICP-based camera to model tracking at each live frame, and efficient active search for new objects in currently undescribed image regions. We demonstrate real-time incremental SLAM in large, cluttered environments, including loop closure, relocalisation and the detection of moved objects, and of course the generation of an object level scene description with the potential to enable interaction.
3 0.3964386 111 cvpr-2013-Dense Reconstruction Using 3D Object Shape Priors
Author: Amaury Dame, Victor A. Prisacariu, Carl Y. Ren, Ian Reid
Abstract: We propose a formulation of monocular SLAM which combines live dense reconstruction with shape priors-based 3D tracking and reconstruction. Current live dense SLAM approaches are limited to the reconstruction of visible surfaces. Moreover, most of them are based on the minimisation of a photo-consistency error, which usually makes them sensitive to specularities. In the 3D pose recovery literature, problems caused by imperfect and ambiguous image information have been dealt with by using prior shape knowledge. At the same time, the success of depth sensors has shown that combining joint image and depth information drastically increases the robustness of the classical monocular 3D tracking and 3D reconstruction approaches. In this work we link dense SLAM to 3D object pose and shape recovery. More specifically, we automatically augment our SLAMsystem with object specific identity, together with 6D pose and additional shape degrees of freedom for the object(s) of known class in the scene, combining im- age data and depth information for the pose and shape recovery. This leads to a system that allows for full scaled 3D reconstruction with the known object(s) segmented from the scene. The segmentation enhances the clarity, accuracy and completeness of the maps built by the dense SLAM system, while the dense 3D data aids the segmentation process, yieldingfaster and more reliable convergence than when using 2D image data alone.
4 0.17217553 74 cvpr-2013-CLAM: Coupled Localization and Mapping with Efficient Outlier Handling
Author: Jonathan Balzer, Stefano Soatto
Abstract: We describe a method to efficiently generate a model (map) of small-scale objects from video. The map encodes sparse geometry as well as coarse photometry, and could be used to initialize dense reconstruction schemes as well as to support recognition and localization of three-dimensional objects. Self-occlusions and the predominance of outliers present a challenge to existing online Structure From Motion and Simultaneous Localization and Mapping systems. We propose a unified inference criterion that encompasses map building and localization (object detection) relative to the map in a coupled fashion. We establish correspondence in a computationally efficient way without resorting to combinatorial matching or random-sampling techniques. Instead, we use a simpler M-estimator that exploits putative correspondence from tracking after photometric and topological validation. We have collected a new dataset to benchmark model building in the small scale, which we test our algorithm on in comparison to others. Although our system is significantly leaner than previous ones, it compares favorably to the state of the art in terms of accuracy and robustness.
5 0.13657619 380 cvpr-2013-Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images
Author: Jamie Shotton, Ben Glocker, Christopher Zach, Shahram Izadi, Antonio Criminisi, Andrew Fitzgibbon
Abstract: We address the problem of inferring the pose of an RGB-D camera relative to a known 3D scene, given only a single acquired image. Our approach employs a regression forest that is capable of inferring an estimate of each pixel’s correspondence to 3D points in the scene ’s world coordinate frame. The forest uses only simple depth and RGB pixel comparison features, and does not require the computation of feature descriptors. The forest is trained to be capable of predicting correspondences at any pixel, so no interest point detectors are required. The camera pose is inferred using a robust optimization scheme. This starts with an initial set of hypothesized camera poses, constructed by applying the forest at a small fraction of image pixels. Preemptive RANSAC then iterates sampling more pixels at which to evaluate the forest, counting inliers, and refining the hypothesized poses. We evaluate on several varied scenes captured with an RGB-D camera and observe that the proposed technique achieves highly accurate relocalization and substantially out-performs two state of the art baselines.
6 0.12005398 345 cvpr-2013-Real-Time Model-Based Rigid Object Pose Estimation and Tracking Combining Dense and Sparse Visual Cues
7 0.11534869 284 cvpr-2013-Mesh Based Semantic Modelling for Indoor and Outdoor Scenes
8 0.11089817 300 cvpr-2013-Multi-target Tracking by Lagrangian Relaxation to Min-cost Network Flow
9 0.10754417 290 cvpr-2013-Motion Estimation for Self-Driving Cars with a Generalized Camera
10 0.10320142 138 cvpr-2013-Efficient 2D-to-3D Correspondence Filtering for Scalable 3D Object Recognition
11 0.099737927 386 cvpr-2013-Self-Paced Learning for Long-Term Tracking
12 0.098286308 425 cvpr-2013-Tensor-Based High-Order Semantic Relation Transfer for Semantic Scene Segmentation
13 0.096337549 455 cvpr-2013-Video Object Segmentation through Spatially Accurate and Temporally Dense Extraction of Primary Object Regions
14 0.094793782 325 cvpr-2013-Part Discovery from Partial Correspondence
15 0.091911137 117 cvpr-2013-Detecting Changes in 3D Structure of a Scene from Multi-view Images Captured by a Vehicle-Mounted Camera
16 0.089693405 309 cvpr-2013-Nonparametric Scene Parsing with Adaptive Feature Relevance and Semantic Context
17 0.089329734 14 cvpr-2013-A Joint Model for 2D and 3D Pose Estimation from a Single Image
18 0.087516174 368 cvpr-2013-Rolling Shutter Camera Calibration
19 0.086321376 82 cvpr-2013-Class Generative Models Based on Feature Regression for Pose Estimation of Object Categories
20 0.085772268 209 cvpr-2013-Hypergraphs for Joint Multi-view Reconstruction and Multi-object Tracking
topicId topicWeight
[(0, 0.243), (1, 0.106), (2, 0.024), (3, -0.046), (4, 0.034), (5, -0.065), (6, 0.028), (7, 0.006), (8, 0.041), (9, 0.018), (10, -0.017), (11, 0.094), (12, -0.009), (13, 0.078), (14, 0.012), (15, -0.19), (16, -0.006), (17, 0.136), (18, -0.041), (19, 0.037), (20, 0.009), (21, -0.049), (22, -0.01), (23, 0.026), (24, -0.159), (25, -0.062), (26, 0.046), (27, 0.045), (28, -0.033), (29, -0.015), (30, -0.027), (31, 0.038), (32, -0.203), (33, -0.138), (34, -0.045), (35, 0.241), (36, 0.226), (37, -0.043), (38, 0.061), (39, -0.088), (40, -0.048), (41, 0.08), (42, -0.018), (43, -0.171), (44, 0.055), (45, -0.054), (46, 0.075), (47, -0.123), (48, 0.045), (49, -0.115)]
simIndex simValue paperId paperTitle
same-paper 1 0.92655587 231 cvpr-2013-Joint Detection, Tracking and Mapping by Semantic Bundle Adjustment
Author: Nicola Fioraio, Luigi Di_Stefano
Abstract: In this paper we propose a novel Semantic Bundle Adjustmentframework whereby known rigid stationary objects are detected while tracking the camera and mapping the environment. The system builds on established tracking and mapping techniques to exploit incremental 3D reconstruction in order to validate hypotheses on the presence and pose of sought objects. Then, detected objects are explicitly taken into account for a global semantic optimization of both camera and object poses. Thus, unlike all systems proposed so far, our approach allows for solving jointly the detection and SLAM problems, so as to achieve object detection together with improved SLAM accuracy.
2 0.8985945 372 cvpr-2013-SLAM++: Simultaneous Localisation and Mapping at the Level of Objects
Author: Renato F. Salas-Moreno, Richard A. Newcombe, Hauke Strasdat, Paul H.J. Kelly, Andrew J. Davison
Abstract: We present the major advantages of a new ‘object oriented’ 3D SLAM paradigm, which takes full advantage in the loop of prior knowledge that many scenes consist of repeated, domain-specific objects and structures. As a hand-held depth camera browses a cluttered scene, realtime 3D object recognition and tracking provides 6DoF camera-object constraints which feed into an explicit graph of objects, continually refined by efficient pose-graph optimisation. This offers the descriptive and predictive power of SLAM systems which perform dense surface reconstruction, but with a huge representation compression. The object graph enables predictions for accurate ICP-based camera to model tracking at each live frame, and efficient active search for new objects in currently undescribed image regions. We demonstrate real-time incremental SLAM in large, cluttered environments, including loop closure, relocalisation and the detection of moved objects, and of course the generation of an object level scene description with the potential to enable interaction.
3 0.73372316 111 cvpr-2013-Dense Reconstruction Using 3D Object Shape Priors
Author: Amaury Dame, Victor A. Prisacariu, Carl Y. Ren, Ian Reid
Abstract: We propose a formulation of monocular SLAM which combines live dense reconstruction with shape priors-based 3D tracking and reconstruction. Current live dense SLAM approaches are limited to the reconstruction of visible surfaces. Moreover, most of them are based on the minimisation of a photo-consistency error, which usually makes them sensitive to specularities. In the 3D pose recovery literature, problems caused by imperfect and ambiguous image information have been dealt with by using prior shape knowledge. At the same time, the success of depth sensors has shown that combining joint image and depth information drastically increases the robustness of the classical monocular 3D tracking and 3D reconstruction approaches. In this work we link dense SLAM to 3D object pose and shape recovery. More specifically, we automatically augment our SLAMsystem with object specific identity, together with 6D pose and additional shape degrees of freedom for the object(s) of known class in the scene, combining im- age data and depth information for the pose and shape recovery. This leads to a system that allows for full scaled 3D reconstruction with the known object(s) segmented from the scene. The segmentation enhances the clarity, accuracy and completeness of the maps built by the dense SLAM system, while the dense 3D data aids the segmentation process, yieldingfaster and more reliable convergence than when using 2D image data alone.
4 0.65592206 74 cvpr-2013-CLAM: Coupled Localization and Mapping with Efficient Outlier Handling
Author: Jonathan Balzer, Stefano Soatto
Abstract: We describe a method to efficiently generate a model (map) of small-scale objects from video. The map encodes sparse geometry as well as coarse photometry, and could be used to initialize dense reconstruction schemes as well as to support recognition and localization of three-dimensional objects. Self-occlusions and the predominance of outliers present a challenge to existing online Structure From Motion and Simultaneous Localization and Mapping systems. We propose a unified inference criterion that encompasses map building and localization (object detection) relative to the map in a coupled fashion. We establish correspondence in a computationally efficient way without resorting to combinatorial matching or random-sampling techniques. Instead, we use a simpler M-estimator that exploits putative correspondence from tracking after photometric and topological validation. We have collected a new dataset to benchmark model building in the small scale, which we test our algorithm on in comparison to others. Although our system is significantly leaner than previous ones, it compares favorably to the state of the art in terms of accuracy and robustness.
5 0.52102083 380 cvpr-2013-Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images
Author: Jamie Shotton, Ben Glocker, Christopher Zach, Shahram Izadi, Antonio Criminisi, Andrew Fitzgibbon
Abstract: We address the problem of inferring the pose of an RGB-D camera relative to a known 3D scene, given only a single acquired image. Our approach employs a regression forest that is capable of inferring an estimate of each pixel’s correspondence to 3D points in the scene ’s world coordinate frame. The forest uses only simple depth and RGB pixel comparison features, and does not require the computation of feature descriptors. The forest is trained to be capable of predicting correspondences at any pixel, so no interest point detectors are required. The camera pose is inferred using a robust optimization scheme. This starts with an initial set of hypothesized camera poses, constructed by applying the forest at a small fraction of image pixels. Preemptive RANSAC then iterates sampling more pixels at which to evaluate the forest, counting inliers, and refining the hypothesized poses. We evaluate on several varied scenes captured with an RGB-D camera and observe that the proposed technique achieves highly accurate relocalization and substantially out-performs two state of the art baselines.
6 0.51382482 110 cvpr-2013-Dense Object Reconstruction with Semantic Priors
7 0.49914339 176 cvpr-2013-Five Shades of Grey for Fast and Reliable Camera Pose Estimation
8 0.49511927 138 cvpr-2013-Efficient 2D-to-3D Correspondence Filtering for Scalable 3D Object Recognition
9 0.47585803 61 cvpr-2013-Beyond Point Clouds: Scene Understanding by Reasoning Geometry and Physics
10 0.47379142 345 cvpr-2013-Real-Time Model-Based Rigid Object Pose Estimation and Tracking Combining Dense and Sparse Visual Cues
11 0.4575707 284 cvpr-2013-Mesh Based Semantic Modelling for Indoor and Outdoor Scenes
12 0.45190933 331 cvpr-2013-Physically Plausible 3D Scene Tracking: The Single Actor Hypothesis
13 0.45115727 82 cvpr-2013-Class Generative Models Based on Feature Regression for Pose Estimation of Object Categories
15 0.43500635 209 cvpr-2013-Hypergraphs for Joint Multi-view Reconstruction and Multi-object Tracking
16 0.43093342 368 cvpr-2013-Rolling Shutter Camera Calibration
17 0.42437714 290 cvpr-2013-Motion Estimation for Self-Driving Cars with a Generalized Camera
18 0.41168478 289 cvpr-2013-Monocular Template-Based 3D Reconstruction of Extensible Surfaces with Local Linear Elasticity
19 0.40915719 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases
20 0.40283972 440 cvpr-2013-Tracking People and Their Objects
topicId topicWeight
[(10, 0.09), (16, 0.024), (26, 0.043), (33, 0.227), (39, 0.01), (67, 0.053), (69, 0.392), (87, 0.073), (98, 0.01)]
simIndex simValue paperId paperTitle
1 0.89880043 1 cvpr-2013-3D-Based Reasoning with Blocks, Support, and Stability
Author: Zhaoyin Jia, Andrew Gallagher, Ashutosh Saxena, Tsuhan Chen
Abstract: 3D volumetric reasoning is important for truly understanding a scene. Humans are able to both segment each object in an image, and perceive a rich 3D interpretation of the scene, e.g., the space an object occupies, which objects support other objects, and which objects would, if moved, cause other objects to fall. We propose a new approach for parsing RGB-D images using 3D block units for volumetric reasoning. The algorithm fits image segments with 3D blocks, and iteratively evaluates the scene based on block interaction properties. We produce a 3D representation of the scene based on jointly optimizing over segmentations, block fitting, supporting relations, and object stability. Our algorithm incorporates the intuition that a good 3D representation of the scene is the one that fits the data well, and is a stable, self-supporting (i.e., one that does not topple) arrangement of objects. We experiment on several datasets including controlled and real indoor scenarios. Results show that our stability-reasoning framework improves RGB-D segmentation and scene volumetric representation.
2 0.87735707 172 cvpr-2013-Finding Group Interactions in Social Clutter
Author: Ruonan Li, Parker Porfilio, Todd Zickler
Abstract: We consider the problem of finding distinctive social interactions involving groups of agents embedded in larger social gatherings. Given a pre-defined gallery of short exemplar interaction videos, and a long input video of a large gathering (with approximately-tracked agents), we identify within the gathering small sub-groups of agents exhibiting social interactions that resemble those in the exemplars. The participants of each detected group interaction are localized in space; the extent of their interaction is localized in time; and when the gallery ofexemplars is annotated with group-interaction categories, each detected interaction is classified into one of the pre-defined categories. Our approach represents group behaviors by dichotomous collections of descriptors for (a) individual actions, and (b) pairwise interactions; and it includes efficient algorithms for optimally distinguishing participants from by-standers in every temporal unit and for temporally localizing the extent of the group interaction. Most importantly, the method is generic and can be applied whenever numerous interacting agents can be approximately tracked over time. We evaluate the approach using three different video collections, two that involve humans and one that involves mice.
3 0.86481607 114 cvpr-2013-Depth Acquisition from Density Modulated Binary Patterns
Author: Zhe Yang, Zhiwei Xiong, Yueyi Zhang, Jiao Wang, Feng Wu
Abstract: This paper proposes novel density modulated binary patterns for depth acquisition. Similar to Kinect, the illumination patterns do not need a projector for generation and can be emitted by infrared lasers and diffraction gratings. Our key idea is to use the density of light spots in the patterns to carry phase information. Two technical problems are addressed here. First, we propose an algorithm to design the patterns to carry more phase information without compromising the depth reconstruction from a single captured image as with Kinect. Second, since the carried phase is not strictly sinusoidal, the depth reconstructed from the phase contains a systematic error. We further propose a pixelbased phase matching algorithm to reduce the error. Experimental results show that the depth quality can be greatly improved using the phase carried by the density of light spots. Furthermore, our scheme can achieve 20 fps depth reconstruction with GPU assistance.
4 0.85641783 135 cvpr-2013-Discriminative Subspace Clustering
Author: Vasileios Zografos, Liam Ellis, Rudolf Mester
Abstract: We present a novel method for clustering data drawn from a union of arbitrary dimensional subspaces, called Discriminative Subspace Clustering (DiSC). DiSC solves the subspace clustering problem by using a quadratic classifier trained from unlabeled data (clustering by classification). We generate labels by exploiting the locality of points from the same subspace and a basic affinity criterion. A number of classifiers are then diversely trained from different partitions of the data, and their results are combined together in an ensemble, in order to obtain the final clustering result. We have tested our method with 4 challenging datasets and compared against 8 state-of-the-art methods from literature. Our results show that DiSC is a very strong performer in both accuracy and robustness, and also of low computational complexity.
5 0.84767854 86 cvpr-2013-Composite Statistical Inference for Semantic Segmentation
Author: Fuxin Li, Joao Carreira, Guy Lebanon, Cristian Sminchisescu
Abstract: In this paper we present an inference procedure for the semantic segmentation of images. Differentfrom many CRF approaches that rely on dependencies modeled with unary and pairwise pixel or superpixel potentials, our method is entirely based on estimates of the overlap between each of a set of mid-level object segmentation proposals and the objects present in the image. We define continuous latent variables on superpixels obtained by multiple intersections of segments, then output the optimal segments from the inferred superpixel statistics. The algorithm is capable of recombine and refine initial mid-level proposals, as well as handle multiple interacting objects, even from the same class, all in a consistent joint inference framework by maximizing the composite likelihood of the underlying statistical model using an EM algorithm. In the PASCAL VOC segmentation challenge, the proposed approach obtains high accuracy and successfully handles images of complex object interactions.
same-paper 6 0.83575451 231 cvpr-2013-Joint Detection, Tracking and Mapping by Semantic Bundle Adjustment
7 0.82656258 392 cvpr-2013-Separable Dictionary Learning
9 0.75358945 292 cvpr-2013-Multi-agent Event Detection: Localization and Role Assignment
10 0.71881568 61 cvpr-2013-Beyond Point Clouds: Scene Understanding by Reasoning Geometry and Physics
11 0.69263053 70 cvpr-2013-Bottom-Up Segmentation for Top-Down Detection
12 0.68788481 288 cvpr-2013-Modeling Mutual Visibility Relationship in Pedestrian Detection
13 0.68632686 445 cvpr-2013-Understanding Bayesian Rooms Using Composite 3D Object Models
14 0.68222564 372 cvpr-2013-SLAM++: Simultaneous Localisation and Mapping at the Level of Objects
15 0.68028033 381 cvpr-2013-Scene Parsing by Integrating Function, Geometry and Appearance Models
16 0.67876291 282 cvpr-2013-Measuring Crowd Collectiveness
17 0.67518955 402 cvpr-2013-Social Role Discovery in Human Events
18 0.67463768 132 cvpr-2013-Discriminative Re-ranking of Diverse Segmentations
19 0.66761136 364 cvpr-2013-Robust Object Co-detection
20 0.66295451 248 cvpr-2013-Learning Collections of Part Models for Object Recognition