cvpr cvpr2013 cvpr2013-74 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Jonathan Balzer, Stefano Soatto
Abstract: We describe a method to efficiently generate a model (map) of small-scale objects from video. The map encodes sparse geometry as well as coarse photometry, and could be used to initialize dense reconstruction schemes as well as to support recognition and localization of three-dimensional objects. Self-occlusions and the predominance of outliers present a challenge to existing online Structure From Motion and Simultaneous Localization and Mapping systems. We propose a unified inference criterion that encompasses map building and localization (object detection) relative to the map in a coupled fashion. We establish correspondence in a computationally efficient way without resorting to combinatorial matching or random-sampling techniques. Instead, we use a simpler M-estimator that exploits putative correspondence from tracking after photometric and topological validation. We have collected a new dataset to benchmark model building in the small scale, which we test our algorithm on in comparison to others. Although our system is significantly leaner than previous ones, it compares favorably to the state of the art in terms of accuracy and robustness.
Reference: text
sentIndex sentText sentNum sentScore
1 CLAM: Coupled Localization and Mapping with Efficient Outlier Handling Jonathan Balzer University of California Los Angeles, CA 90095 USA bal z e r@ c s . [sent-1, score-0.041]
2 The map encodes sparse geometry as well as coarse photometry, and could be used to initialize dense reconstruction schemes as well as to support recognition and localization of three-dimensional objects. [sent-4, score-0.244]
3 Self-occlusions and the predominance of outliers present a challenge to existing online Structure From Motion and Simultaneous Localization and Mapping systems. [sent-5, score-0.045]
4 We propose a unified inference criterion that encompasses map building and localization (object detection) relative to the map in a coupled fashion. [sent-6, score-0.415]
5 We establish correspondence in a computationally efficient way without resorting to combinatorial matching or random-sampling techniques. [sent-7, score-0.201]
6 Instead, we use a simpler M-estimator that exploits putative correspondence from tracking after photometric and topological validation. [sent-8, score-0.365]
7 We envision a scenario whereby a video of an object is captured while manipulating or moving around it with a hand-held camera or phone. [sent-15, score-0.14]
8 Despite a wealth of work in Structure From Motion (SFM), Simultaneous Localization and Mapping (SLAM) and 3-D reconstruction in Stefano Soatto University of California Los Angeles, CA 90095 USA s oatt o @ c s . [sent-18, score-0.093]
9 Also, the object occupies a small portion ofthe visual field, resulting in a small effective fieldof-view that presents a challenge for SFM and SLAM. [sent-24, score-0.062]
10 Global bundle adjustment (BA) methods [13] extract the majority of information of interest in the data, but this comes at a computational cost. [sent-26, score-0.098]
11 Even simpler batch factorization schemes that exploit the weak-perspective nature of small objects can introduce significant latency. [sent-27, score-0.141]
12 Existing real-time SFM systems are too brittle, and SLAM systems that incrementally build a map and localize the viewer relative to it still fail in the presence of significant selfocclusions. [sent-28, score-0.1]
13 Although our goal is to eventually use these models for recognition, and therefore our effort naturally relates to [23], here we focus on the reconstruction aspect. [sent-30, score-0.049]
14 Each point is endowed with a descriptor, and camera motion relative to it comes as a byproduct. [sent-34, score-0.164]
15 Even though it appears that the object dominates the image, it only occupies 20 % of the area of the image. [sent-35, score-0.062]
16 Modelsacquiredbyourmethodcanprovidesup ortin recognition tasks such as tracking of small-scale objects. [sent-37, score-0.083]
17 In addition, our model can be used as initialization for post-process BA refinement for accurate (sparse) reconstruction and pose estimation, or as one of the building blocks in some of the recent algorithms for dense re- construction in real time [9, 11, 18, 21, 25, 27]. [sent-40, score-0.097]
18 It performs best when the initial motion is orthogonal to the optical axis. [sent-44, score-0.071]
19 We operate under assumptions similar to PTAM: The scene is static, rigid, and for the most part, Lambertian. [sent-45, score-0.04]
20 We assume the camera has been previously calibrated, and both, temporal and spatial scale, are relatively moderate. [sent-46, score-0.052]
21 Contributions Our system is considerably simpler than PTAM: While the latter employs a full-fledged epipolar geometry pipeline for initialization (feature selection and tracking, epipolar constraint, incremental bundle adjustment etc. [sent-55, score-0.283]
22 ), our approach bypasses all that and trivially starts with all points on the image plane. [sent-56, score-0.041]
23 Our first contribution is a unified optimization criterion (Sect. [sent-58, score-0.099]
24 2) that addresses both localization and mapping in a coupled fashion. [sent-60, score-0.256]
25 This would at first seem to go against the wisdom of [15], but presents additional benefits in terms of simplification and management of correspondence, that represents our second contribution: In Sect. [sent-61, score-0.038]
26 3, we propose a putative correspondence mechanism based on tracking to generate inlier hypotheses, and a simple photometric validation mechanism based on a contrast-invariant descriptor. [sent-63, score-0.4]
27 It accommodates the percentage of outliers the Mestimator [12], favored in [15] over slower combinatorial or acceptance/rejection sampling methods, can tolerate before breakdown. [sent-64, score-0.153]
28 back the motion estimates from local BA to infer the scaleand rotation-covariant component of the descriptor, to reduce nuisance variability. [sent-66, score-0.115]
29 A temporal aggregate of such descriptors can then be the basis for a classification scheme that uses our system for detection, recognition, localization of the learned objects in cluttered scenes. [sent-67, score-0.085]
30 Our fourth contribution is to expand and adapt the benchmarks [8, 17] to the task of small-scale object modeling. [sent-69, score-0.047]
31 An experimental assessment of the performance of our system in comparison with offline SFM (at the high-end) and PTAM (at the low-end) is reported in Sect. [sent-70, score-0.043]
32 Unlike ours, it includes inertial measure- ments to cope with drift occuring at large time scales. [sent-76, score-0.047]
33 The latter two rely on accelerated versions RANSAC for hypothesis generation, based on the 5-point algorithm, and separately triangulate new depths. [sent-78, score-0.091]
34 This leads to the decoupling that we find detrimental to performance and hence wish to avoid. [sent-79, score-0.041]
35 provide a review of BA, whereas the authors of [24] argue that batch processing based on keyframes performs better marginalization than a causal filtering approach, although the conclusions contravene some of the basic tenets of causal data processing. [sent-81, score-0.27]
36 described in [28] coincides with ours because correspondences are collected by tracking. [sent-83, score-0.077]
37 Setting In the following, matrices are in bold, vectors in bold italic; points in space X ∈ R3 are capitalized when possitiablliec. [sent-89, score-0.042]
38 ; pTohient sca inno snpicacael Xpinh ∈ole R projection π : R3 → R2, X → x, maps a point in space onto the image plane, →wh Rere XX →= x¯ ρ faorp some depth ρ >e 0n,t oan tdhe x¯ im =ag e(x p? [sent-90, score-0.041]
39 : DA vantage point at time t ∈ N iρs, represented by gt =o X(R. [sent-94, score-0.529]
40 l oTcahete idnv aetr −seR of gt is denoted by g−t1. [sent-98, score-0.573]
41 Given a collection of images {Iτ : D → R+}tτ=0 up to Gtimivee nt, we wlleicshti oton eosft i mmaatgee tshe { camera m→ot Rion} and the geometry of the scene in a computationally efficient manner. [sent-99, score-0.093]
42 To this end, we focus on a sparse collection of points 111555555533 M = {Xj }jm=1, the map, and their corresponding projections xsi in each image where Xj is visible. [sent-100, score-0.23]
43 Since M does not include surface topology, visibility boils down to a combinatorial matching problem; we will address it using tools from robust inference and without explicitly determining the inlier/outlier sets. [sent-101, score-0.151]
44 A unified inference criterion Two feature points xti ∈ D at time t, and xsk ∈ D at time s, are said to correspond ∈if Dthe arte t emxies tts, a dlo xcati∈on D Din a space Xj that projects to both: x¯ tiρit = Xj = x¯ skρks. [sent-104, score-0.384]
45 If such image-to-image correspondences xti ↔ xsk were known, we could compute the inter-frame mo↔tion x and depths by minimizing the reprojection error Er(gt,ρsi) := ? [sent-105, score-0.444]
46 (gt ,gs ) Here, with an abuse of notation, i ∈ V (gs , gt) indexes im- age correspondences whereby, a,f ite ∈r a su (gitable permutation of indices, xsi stands for Analogously, if scene-toimage correspondences were known, we could infer pose gt by minimizing the projection error xks(i). [sent-110, score-1.126]
47 (2) correspondences, again with an abuse of notation whereby xtj stands for xit(j). [sent-116, score-0.303]
48 These two terms can be combined to provide coupled localization and mapping, by minimizing E(gt, ρsi) := Er + αEp, (3) where α ∈ R+ is a positive scalar that weighs off the influence eo fα t h∈e Rtwo separate error terms according to the ratio of #V (gs , gt) and #V (gt). [sent-117, score-0.242]
49 Note that (3) covers all aspects of a SLAM algorithm: At initialization, when the map is empty, we have V (gt) = ∅; Ep = 0, and E = Er is equivalent to the class)ica =l B ∅A; f Eunctional. [sent-118, score-0.059]
50 When image-toimage correspondence fails, V (gs , gt) = ∅ but so long as #V (gt) ≥ 3, minimizing (3) yields a camera pose gt relative to) t ≥he now nonempty map. [sent-119, score-0.791]
51 Finally, mtheer general case where V (gt) , V (gs , gt) are both nonempty covers the two subproblems of map expansion and motion estimation by minimizing (1) (in lieu of simply triangulating new depths) respectively (2). [sent-120, score-0.373]
52 Note that both are coupled through the variable gt, and such coupling is critical to avoid gauge ambiguities beyond the initialization stage. [sent-121, score-0.161]
53 Indeed, the two terms reflect the same model, and could be further coupled by imposing that g−s1πρ−si1(xsi) in (1) be equal to Xj in (2). [sent-124, score-0.113]
54 One could determine the set of features V (gs ,gt) that are co-visible between s and t by combinatorial matching and voting schemes such as [7] during minimization of the reprojection error. [sent-130, score-0.239]
55 Similarly, one could determine the set of features V (gt) in the map that are visible at time t using the iterative closest-point method [1] or one of its variants. [sent-131, score-0.059]
56 Alternatively, like [15], one could forgo explicit determination of the correspondence sets and use a robust statistical estimator to minimize (3), cf. [sent-133, score-0.201]
57 Unfortunately, such techniques have a low breakdown point (percentage of outliers) and can still fail in practice. [sent-135, score-0.041]
58 We adopt an intermediate criterion, where image-toimage putative correspondence is established by shortbaseline tracking, and verified using a local contrastinvariant, rotation- and scale-covariant descriptor. [sent-136, score-0.178]
59 If s is the instance when a feature first appears, and the feature is tracked through {xiτ}tτ=s, ideally we would have that i s ∈ t aVck (gesd , gthτr)o fuogrh a {llx τ = s + 1, . [sent-137, score-0.068]
60 In practice, however, Vsh (ogrt-baseline trackers are subject to drift, and therefore, as time goes by, the track may continue to exist but fail to correspond to a stationary point on the map. [sent-141, score-0.041]
61 Therefore, we design a photometric consistency test based on a local contrast-invariant descriptor. [sent-142, score-0.061]
62 That is, a function of the image in a neighborhood of the tracked point, φ(Iτ |xiτ , gτ) that is invariant to contrast changes (monotonic |cxontinuous transformations h of the image intensity, h(Iτ), i. [sent-143, score-0.156]
63 that the neighborhood around xsi usually undergoes nonrigid transformations, and the above condition is violated even though the feature remains visible. [sent-152, score-0.268]
64 While in the absence of surface topology, analyzing the map for occlusions is infeasible, analyzing the camera motion relative to it is not. [sent-153, score-0.182]
65 Thus, we feed-back and compensate for domain transformations using portions of the estimated motion gt (Fig. [sent-154, score-0.65]
66 In this sense, the test is co-variant with respect to scale and inplane rotation2. [sent-156, score-0.044]
67 We can do the same for map-to-image matching, by augmenting each point Xj in the map M with 2Notice that in the reprojection error, gt and gs only appear as a product, which may suggest that the co-visibility only depends on inter-frame motion gtgs− 1. [sent-157, score-0.967]
68 However, this is not the case, since the dependency on the absolute pose gt is reflected in the dependency on the scene. [sent-158, score-0.609]
69 The novelty of our approach hinges on the joint optimization of (3). [sent-160, score-0.093]
70 This may seem to go counter the results of [15] and others. [sent-161, score-0.038]
71 However, equation (3), combined with tracking and the pre-rejection of inconsistent features, enables us to operate without combinatorial or sequential outlier rejection, significantly decreasing run-time. [sent-162, score-0.312]
72 To the best of our knowledge, nobody has addressed the determination of the visibility sets V by a combination of tracking and contrast-invariant validation. [sent-163, score-0.19]
73 This is the innovation that enables faster outlier handling with standard robust-statistical tools. [sent-164, score-0.081]
74 Also notice that the joint energy functional enables us to expand the map in a robust fashion, unlike [15] that performs triangulation in a separate stage, while lacking outlier filtering altogether. [sent-165, score-0.187]
75 1 Tracking To warrant sufficient parallax, structure and motion estimation is customarily performed on a subset of the input image sequence called keyframes. [sent-170, score-0.159]
76 We rely on features from accelerated segment tests (FAST) in such keyframes [22]. [sent-171, score-0.13]
77 These are tracked with the help of the Kanade-LucasTomasi method (KLT) at the original video frame rate. [sent-172, score-0.068]
78 One is not bound to this particular choice of tracker because the central part of our algorithm makes no assumption on how correspondences are established. [sent-173, score-0.077]
79 Two points in two keyframes correspond if they are connected by a track. [sent-175, score-0.083]
80 There are two situations in which a track expires: The associated feature either leaves the field of view, or it is about to merge with another tracked feature3. [sent-176, score-0.068]
81 cWouhnents th thise region rh oasf t fheea shape no tfh an rectangular neighborhood Br (x) := {y | ? [sent-181, score-0.038]
wordName wordTfidf (topN-words)
[('gt', 0.529), ('xsi', 0.23), ('gs', 0.228), ('ptam', 0.21), ('sfm', 0.157), ('coupled', 0.113), ('slam', 0.109), ('combinatorial', 0.108), ('xsk', 0.099), ('xtj', 0.099), ('xti', 0.098), ('correspondence', 0.093), ('whereby', 0.088), ('photometry', 0.088), ('localization', 0.085), ('ba', 0.085), ('putative', 0.085), ('keyframes', 0.083), ('tracking', 0.083), ('outlier', 0.081), ('reprojection', 0.08), ('xj', 0.079), ('correspondences', 0.077), ('xit', 0.077), ('nonempty', 0.073), ('motion', 0.071), ('abuse', 0.07), ('causal', 0.07), ('tracked', 0.068), ('determination', 0.064), ('occupies', 0.062), ('photometric', 0.061), ('map', 0.059), ('angeles', 0.059), ('dh', 0.059), ('ep', 0.058), ('mapping', 0.058), ('criterion', 0.057), ('los', 0.055), ('indexes', 0.053), ('camera', 0.052), ('schemes', 0.051), ('bundle', 0.05), ('transformations', 0.05), ('novelty', 0.049), ('reconstruction', 0.049), ('initialization', 0.048), ('adjustment', 0.048), ('epipolar', 0.047), ('accelerated', 0.047), ('expand', 0.047), ('batch', 0.047), ('drift', 0.047), ('depths', 0.046), ('stands', 0.046), ('outliers', 0.045), ('minimizing', 0.044), ('hinges', 0.044), ('customarily', 0.044), ('mtheer', 0.044), ('aetr', 0.044), ('arte', 0.044), ('balzer', 0.044), ('dlo', 0.044), ('forgo', 0.044), ('inplane', 0.044), ('mair', 0.044), ('oatt', 0.044), ('qualified', 0.044), ('rere', 0.044), ('rion', 0.044), ('scaleand', 0.044), ('triangulate', 0.044), ('warrant', 0.044), ('visibility', 0.043), ('simpler', 0.043), ('xi', 0.043), ('offline', 0.043), ('unified', 0.042), ('topology', 0.042), ('bold', 0.042), ('fail', 0.041), ('italic', 0.041), ('brittle', 0.041), ('lieu', 0.041), ('triangulating', 0.041), ('bal', 0.041), ('bypasses', 0.041), ('detrimental', 0.041), ('endowed', 0.041), ('engels', 0.041), ('eosft', 0.041), ('framerate', 0.041), ('gtg', 0.041), ('inno', 0.041), ('dependency', 0.04), ('operate', 0.04), ('mechanism', 0.039), ('neighborhood', 0.038), ('seem', 0.038)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 74 cvpr-2013-CLAM: Coupled Localization and Mapping with Efficient Outlier Handling
Author: Jonathan Balzer, Stefano Soatto
Abstract: We describe a method to efficiently generate a model (map) of small-scale objects from video. The map encodes sparse geometry as well as coarse photometry, and could be used to initialize dense reconstruction schemes as well as to support recognition and localization of three-dimensional objects. Self-occlusions and the predominance of outliers present a challenge to existing online Structure From Motion and Simultaneous Localization and Mapping systems. We propose a unified inference criterion that encompasses map building and localization (object detection) relative to the map in a coupled fashion. We establish correspondence in a computationally efficient way without resorting to combinatorial matching or random-sampling techniques. Instead, we use a simpler M-estimator that exploits putative correspondence from tracking after photometric and topological validation. We have collected a new dataset to benchmark model building in the small scale, which we test our algorithm on in comparison to others. Although our system is significantly leaner than previous ones, it compares favorably to the state of the art in terms of accuracy and robustness.
2 0.17217553 231 cvpr-2013-Joint Detection, Tracking and Mapping by Semantic Bundle Adjustment
Author: Nicola Fioraio, Luigi Di_Stefano
Abstract: In this paper we propose a novel Semantic Bundle Adjustmentframework whereby known rigid stationary objects are detected while tracking the camera and mapping the environment. The system builds on established tracking and mapping techniques to exploit incremental 3D reconstruction in order to validate hypotheses on the presence and pose of sought objects. Then, detected objects are explicitly taken into account for a global semantic optimization of both camera and object poses. Thus, unlike all systems proposed so far, our approach allows for solving jointly the detection and SLAM problems, so as to achieve object detection together with improved SLAM accuracy.
3 0.1396852 372 cvpr-2013-SLAM++: Simultaneous Localisation and Mapping at the Level of Objects
Author: Renato F. Salas-Moreno, Richard A. Newcombe, Hauke Strasdat, Paul H.J. Kelly, Andrew J. Davison
Abstract: We present the major advantages of a new ‘object oriented’ 3D SLAM paradigm, which takes full advantage in the loop of prior knowledge that many scenes consist of repeated, domain-specific objects and structures. As a hand-held depth camera browses a cluttered scene, realtime 3D object recognition and tracking provides 6DoF camera-object constraints which feed into an explicit graph of objects, continually refined by efficient pose-graph optimisation. This offers the descriptive and predictive power of SLAM systems which perform dense surface reconstruction, but with a huge representation compression. The object graph enables predictions for accurate ICP-based camera to model tracking at each live frame, and efficient active search for new objects in currently undescribed image regions. We demonstrate real-time incremental SLAM in large, cluttered environments, including loop closure, relocalisation and the detection of moved objects, and of course the generation of an object level scene description with the potential to enable interaction.
4 0.13648638 111 cvpr-2013-Dense Reconstruction Using 3D Object Shape Priors
Author: Amaury Dame, Victor A. Prisacariu, Carl Y. Ren, Ian Reid
Abstract: We propose a formulation of monocular SLAM which combines live dense reconstruction with shape priors-based 3D tracking and reconstruction. Current live dense SLAM approaches are limited to the reconstruction of visible surfaces. Moreover, most of them are based on the minimisation of a photo-consistency error, which usually makes them sensitive to specularities. In the 3D pose recovery literature, problems caused by imperfect and ambiguous image information have been dealt with by using prior shape knowledge. At the same time, the success of depth sensors has shown that combining joint image and depth information drastically increases the robustness of the classical monocular 3D tracking and 3D reconstruction approaches. In this work we link dense SLAM to 3D object pose and shape recovery. More specifically, we automatically augment our SLAMsystem with object specific identity, together with 6D pose and additional shape degrees of freedom for the object(s) of known class in the scene, combining im- age data and depth information for the pose and shape recovery. This leads to a system that allows for full scaled 3D reconstruction with the known object(s) segmented from the scene. The segmentation enhances the clarity, accuracy and completeness of the maps built by the dense SLAM system, while the dense 3D data aids the segmentation process, yieldingfaster and more reliable convergence than when using 2D image data alone.
5 0.11730918 360 cvpr-2013-Robust Estimation of Nonrigid Transformation for Point Set Registration
Author: Jiayi Ma, Ji Zhao, Jinwen Tian, Zhuowen Tu, Alan L. Yuille
Abstract: We present a new point matching algorithm for robust nonrigid registration. The method iteratively recovers the point correspondence and estimates the transformation between two point sets. In the first step of the iteration, feature descriptors such as shape context are used to establish rough correspondence. In the second step, we estimate the transformation using a robust estimator called L2E. This is the main novelty of our approach and it enables us to deal with the noise and outliers which arise in the correspondence step. The transformation is specified in a functional space, more specifically a reproducing kernel Hilbert space. We apply our method to nonrigid sparse image feature correspondence on 2D images and 3D surfaces. Our results quantitatively show that our approach outperforms state-ofthe-art methods, particularly when there are a large number of outliers. Moreover, our method of robustly estimating transformations from correspondences is general and has many other applications.
6 0.11202654 138 cvpr-2013-Efficient 2D-to-3D Correspondence Filtering for Scalable 3D Object Recognition
7 0.10994425 368 cvpr-2013-Rolling Shutter Camera Calibration
8 0.096266158 386 cvpr-2013-Self-Paced Learning for Long-Term Tracking
9 0.094950601 46 cvpr-2013-Articulated and Restricted Motion Subspaces and Their Signatures
10 0.092477515 380 cvpr-2013-Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images
11 0.089655422 290 cvpr-2013-Motion Estimation for Self-Driving Cars with a Generalized Camera
12 0.088919401 424 cvpr-2013-Templateless Quasi-rigid Shape Modeling with Implicit Loop-Closure
13 0.088569686 117 cvpr-2013-Detecting Changes in 3D Structure of a Scene from Multi-view Images Captured by a Vehicle-Mounted Camera
14 0.088205531 336 cvpr-2013-Poselet Key-Framing: A Model for Human Activity Recognition
15 0.086476624 95 cvpr-2013-Continuous Inference in Graphical Models with Polynomial Energies
16 0.086073957 43 cvpr-2013-Analyzing Semantic Segmentation Using Hybrid Human-Machine CRFs
17 0.085920602 345 cvpr-2013-Real-Time Model-Based Rigid Object Pose Estimation and Tracking Combining Dense and Sparse Visual Cues
18 0.081816308 314 cvpr-2013-Online Object Tracking: A Benchmark
19 0.080078758 170 cvpr-2013-Fast Rigid Motion Segmentation via Incrementally-Complex Local Models
20 0.079452604 209 cvpr-2013-Hypergraphs for Joint Multi-view Reconstruction and Multi-object Tracking
topicId topicWeight
[(0, 0.206), (1, 0.098), (2, 0.001), (3, -0.018), (4, -0.0), (5, -0.052), (6, 0.033), (7, -0.052), (8, 0.017), (9, 0.051), (10, -0.019), (11, 0.065), (12, -0.019), (13, 0.013), (14, -0.002), (15, -0.093), (16, 0.009), (17, 0.072), (18, 0.01), (19, -0.007), (20, 0.017), (21, -0.057), (22, 0.028), (23, 0.001), (24, -0.001), (25, -0.06), (26, -0.065), (27, 0.048), (28, -0.04), (29, 0.01), (30, -0.011), (31, -0.026), (32, -0.06), (33, 0.007), (34, -0.013), (35, 0.043), (36, 0.066), (37, -0.016), (38, 0.064), (39, 0.007), (40, 0.015), (41, 0.08), (42, -0.067), (43, -0.053), (44, 0.054), (45, -0.031), (46, 0.067), (47, -0.05), (48, 0.028), (49, -0.046)]
simIndex simValue paperId paperTitle
same-paper 1 0.94052804 74 cvpr-2013-CLAM: Coupled Localization and Mapping with Efficient Outlier Handling
Author: Jonathan Balzer, Stefano Soatto
Abstract: We describe a method to efficiently generate a model (map) of small-scale objects from video. The map encodes sparse geometry as well as coarse photometry, and could be used to initialize dense reconstruction schemes as well as to support recognition and localization of three-dimensional objects. Self-occlusions and the predominance of outliers present a challenge to existing online Structure From Motion and Simultaneous Localization and Mapping systems. We propose a unified inference criterion that encompasses map building and localization (object detection) relative to the map in a coupled fashion. We establish correspondence in a computationally efficient way without resorting to combinatorial matching or random-sampling techniques. Instead, we use a simpler M-estimator that exploits putative correspondence from tracking after photometric and topological validation. We have collected a new dataset to benchmark model building in the small scale, which we test our algorithm on in comparison to others. Although our system is significantly leaner than previous ones, it compares favorably to the state of the art in terms of accuracy and robustness.
2 0.79946768 231 cvpr-2013-Joint Detection, Tracking and Mapping by Semantic Bundle Adjustment
Author: Nicola Fioraio, Luigi Di_Stefano
Abstract: In this paper we propose a novel Semantic Bundle Adjustmentframework whereby known rigid stationary objects are detected while tracking the camera and mapping the environment. The system builds on established tracking and mapping techniques to exploit incremental 3D reconstruction in order to validate hypotheses on the presence and pose of sought objects. Then, detected objects are explicitly taken into account for a global semantic optimization of both camera and object poses. Thus, unlike all systems proposed so far, our approach allows for solving jointly the detection and SLAM problems, so as to achieve object detection together with improved SLAM accuracy.
3 0.77045095 372 cvpr-2013-SLAM++: Simultaneous Localisation and Mapping at the Level of Objects
Author: Renato F. Salas-Moreno, Richard A. Newcombe, Hauke Strasdat, Paul H.J. Kelly, Andrew J. Davison
Abstract: We present the major advantages of a new ‘object oriented’ 3D SLAM paradigm, which takes full advantage in the loop of prior knowledge that many scenes consist of repeated, domain-specific objects and structures. As a hand-held depth camera browses a cluttered scene, realtime 3D object recognition and tracking provides 6DoF camera-object constraints which feed into an explicit graph of objects, continually refined by efficient pose-graph optimisation. This offers the descriptive and predictive power of SLAM systems which perform dense surface reconstruction, but with a huge representation compression. The object graph enables predictions for accurate ICP-based camera to model tracking at each live frame, and efficient active search for new objects in currently undescribed image regions. We demonstrate real-time incremental SLAM in large, cluttered environments, including loop closure, relocalisation and the detection of moved objects, and of course the generation of an object level scene description with the potential to enable interaction.
4 0.7265265 290 cvpr-2013-Motion Estimation for Self-Driving Cars with a Generalized Camera
Author: Gim Hee Lee, Friedrich Faundorfer, Marc Pollefeys
Abstract: In this paper, we present a visual ego-motion estimation algorithm for a self-driving car equipped with a closeto-market multi-camera system. By modeling the multicamera system as a generalized camera and applying the non-holonomic motion constraint of a car, we show that this leads to a novel 2-point minimal solution for the generalized essential matrix where the full relative motion including metric scale can be obtained. We provide the analytical solutions for the general case with at least one inter-camera correspondence and a special case with only intra-camera correspondences. We show that up to a maximum of 6 solutions exist for both cases. We identify the existence of degeneracy when the car undergoes straight motion in the special case with only intra-camera correspondences where the scale becomes unobservable and provide a practical alternative solution. Our formulation can be efficiently implemented within RANSAC for robust estimation. We verify the validity of our assumptions on the motion model by comparing our results on a large real-world dataset collected by a car equipped with 4 cameras with minimal overlapping field-of-views against the GPS/INS ground truth.
5 0.6815995 138 cvpr-2013-Efficient 2D-to-3D Correspondence Filtering for Scalable 3D Object Recognition
Author: Qiang Hao, Rui Cai, Zhiwei Li, Lei Zhang, Yanwei Pang, Feng Wu, Yong Rui
Abstract: 3D model-based object recognition has been a noticeable research trend in recent years. Common methods find 2D-to-3D correspondences and make recognition decisions by pose estimation, whose efficiency usually suffers from noisy correspondences caused by the increasing number of target objects. To overcome this scalability bottleneck, we propose an efficient 2D-to-3D correspondence filtering approach, which combines a light-weight neighborhoodbased step with a finer-grained pairwise step to remove spurious correspondences based on 2D/3D geometric cues. On a dataset of 300 3D objects, our solution achieves ∼10 times speed improvement over the baseline, with a comparable recognition accuracy. A parallel implementation on a quad-core CPU can run at ∼3fps for 1280× 720 images.
6 0.65844637 176 cvpr-2013-Five Shades of Grey for Fast and Reliable Camera Pose Estimation
7 0.64551502 111 cvpr-2013-Dense Reconstruction Using 3D Object Shape Priors
8 0.6340757 368 cvpr-2013-Rolling Shutter Camera Calibration
9 0.6255278 428 cvpr-2013-The Episolar Constraint: Monocular Shape from Shadow Correspondence
10 0.6210826 47 cvpr-2013-As-Projective-As-Possible Image Stitching with Moving DLT
11 0.61874467 345 cvpr-2013-Real-Time Model-Based Rigid Object Pose Estimation and Tracking Combining Dense and Sparse Visual Cues
12 0.61604398 331 cvpr-2013-Physically Plausible 3D Scene Tracking: The Single Actor Hypothesis
13 0.60260069 380 cvpr-2013-Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images
14 0.59153086 209 cvpr-2013-Hypergraphs for Joint Multi-view Reconstruction and Multi-object Tracking
15 0.57262325 110 cvpr-2013-Dense Object Reconstruction with Semantic Priors
16 0.5676558 423 cvpr-2013-Template-Based Isometric Deformable 3D Reconstruction with Sampling-Based Focal Length Self-Calibration
17 0.55175227 333 cvpr-2013-Plane-Based Content Preserving Warps for Video Stabilization
18 0.55086023 23 cvpr-2013-A Practical Rank-Constrained Eight-Point Algorithm for Fundamental Matrix Estimation
19 0.5507679 440 cvpr-2013-Tracking People and Their Objects
20 0.54976314 113 cvpr-2013-Dense Variational Reconstruction of Non-rigid Surfaces from Monocular Video
topicId topicWeight
[(10, 0.134), (16, 0.039), (26, 0.056), (28, 0.016), (33, 0.242), (63, 0.23), (67, 0.081), (69, 0.061), (87, 0.086)]
simIndex simValue paperId paperTitle
1 0.887999 467 cvpr-2013-Wide-Baseline Hair Capture Using Strand-Based Refinement
Author: Linjie Luo, Cha Zhang, Zhengyou Zhang, Szymon Rusinkiewicz
Abstract: We propose a novel algorithm to reconstruct the 3D geometry of human hairs in wide-baseline setups using strand-based refinement. The hair strands arefirst extracted in each 2D view, and projected onto the 3D visual hull for initialization. The 3D positions of these strands are then refined by optimizing an objective function that takes into account cross-view hair orientation consistency, the visual hull constraint and smoothness constraints defined at the strand, wisp and global levels. Based on the refined strands, the algorithm can reconstruct an approximate hair surface: experiments with synthetic hair models achieve an accuracy of ∼3mm. We also show real-world examples to demonsotfra ∼te3 mthme capability t soh capture full-head hamairp styles as mwoenll- as hair in motion with as few as 8 cameras.
2 0.88401061 52 cvpr-2013-Axially Symmetric 3D Pots Configuration System Using Axis of Symmetry and Break Curve
Author: Kilho Son, Eduardo B. Almeida, David B. Cooper
Abstract: Thispaper introduces a novel approachfor reassembling pot sherds found at archaeological excavation sites, for the purpose ofreconstructing claypots that had been made on a wheel. These pots and the sherds into which they have broken are axially symmetric. The reassembly process can be viewed as 3D puzzle solving or generalized cylinder learning from broken fragments. The estimation exploits both local and semi-global geometric structure, thus making it a fundamental problem of geometry estimation from noisy fragments in computer vision and pattern recognition. The data used are densely digitized 3D laser scans of each fragment’s outer surface. The proposed reassembly system is automatic and functions when the pile of available fragments is from one or multiple pots, and even when pieces are missing from any pot. The geometric structure used are curves on the pot along which the surface had broken and the silhouette of a pot with respect to an axis, called axisprofile curve (APC). For reassembling multiple pots with or without missing pieces, our algorithm estimates the APC from each fragment, then reassembles into configurations the ones having distinctive APC. Further growth of configurations is based on adding remaining fragments such that their APC and break curves are consistent with those of a configuration. The method is novel, more robust and handles the largest numbers of fragments to date.
same-paper 3 0.8563723 74 cvpr-2013-CLAM: Coupled Localization and Mapping with Efficient Outlier Handling
Author: Jonathan Balzer, Stefano Soatto
Abstract: We describe a method to efficiently generate a model (map) of small-scale objects from video. The map encodes sparse geometry as well as coarse photometry, and could be used to initialize dense reconstruction schemes as well as to support recognition and localization of three-dimensional objects. Self-occlusions and the predominance of outliers present a challenge to existing online Structure From Motion and Simultaneous Localization and Mapping systems. We propose a unified inference criterion that encompasses map building and localization (object detection) relative to the map in a coupled fashion. We establish correspondence in a computationally efficient way without resorting to combinatorial matching or random-sampling techniques. Instead, we use a simpler M-estimator that exploits putative correspondence from tracking after photometric and topological validation. We have collected a new dataset to benchmark model building in the small scale, which we test our algorithm on in comparison to others. Although our system is significantly leaner than previous ones, it compares favorably to the state of the art in terms of accuracy and robustness.
4 0.83771646 308 cvpr-2013-Nonlinearly Constrained MRFs: Exploring the Intrinsic Dimensions of Higher-Order Cliques
Author: Yun Zeng, Chaohui Wang, Stefano Soatto, Shing-Tung Yau
Abstract: This paper introduces an efficient approach to integrating non-local statistics into the higher-order Markov Random Fields (MRFs) framework. Motivated by the observation that many non-local statistics (e.g., shape priors, color distributions) can usually be represented by a small number of parameters, we reformulate the higher-order MRF model by introducing additional latent variables to represent the intrinsic dimensions of the higher-order cliques. The resulting new model, called NC-MRF, not only provides the flexibility in representing the configurations of higher-order cliques, but also automatically decomposes the energy function into less coupled terms, allowing us to design an efficient algorithmic framework for maximum a posteriori (MAP) inference. Based on this novel modeling/inference framework, we achieve state-of-the-art solutions to the challenging problems of class-specific image segmentation and template-based 3D facial expression tracking, which demonstrate the potential of our approach.
5 0.81356138 40 cvpr-2013-An Approach to Pose-Based Action Recognition
Author: Chunyu Wang, Yizhou Wang, Alan L. Yuille
Abstract: We address action recognition in videos by modeling the spatial-temporal structures of human poses. We start by improving a state of the art method for estimating human joint locations from videos. More precisely, we obtain the K-best estimations output by the existing method and incorporate additional segmentation cues and temporal constraints to select the “best” one. Then we group the estimated joints into five body parts (e.g. the left arm) and apply data mining techniques to obtain a representation for the spatial-temporal structures of human actions. This representation captures the spatial configurations ofbodyparts in one frame (by spatial-part-sets) as well as the body part movements(by temporal-part-sets) which are characteristic of human actions. It is interpretable, compact, and also robust to errors on joint estimations. Experimental results first show that our approach is able to localize body joints more accurately than existing methods. Next we show that it outperforms state of the art action recognizers on the UCF sport, the Keck Gesture and the MSR-Action3D datasets.
6 0.79217726 248 cvpr-2013-Learning Collections of Part Models for Object Recognition
7 0.78570253 33 cvpr-2013-Active Contours with Group Similarity
8 0.78425843 414 cvpr-2013-Structure Preserving Object Tracking
9 0.78378856 365 cvpr-2013-Robust Real-Time Tracking of Multiple Objects by Volumetric Mass Densities
10 0.78297222 225 cvpr-2013-Integrating Grammar and Segmentation for Human Pose Estimation
11 0.78147233 325 cvpr-2013-Part Discovery from Partial Correspondence
12 0.78057605 104 cvpr-2013-Deep Convolutional Network Cascade for Facial Point Detection
13 0.78037786 408 cvpr-2013-Spatiotemporal Deformable Part Models for Action Detection
14 0.77912807 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases
15 0.77905446 285 cvpr-2013-Minimum Uncertainty Gap for Robust Visual Tracking
16 0.77901304 331 cvpr-2013-Physically Plausible 3D Scene Tracking: The Single Actor Hypothesis
17 0.77853549 400 cvpr-2013-Single Image Calibration of Multi-axial Imaging Systems
18 0.77720404 98 cvpr-2013-Cross-View Action Recognition via a Continuous Virtual Path
19 0.77703929 61 cvpr-2013-Beyond Point Clouds: Scene Understanding by Reasoning Geometry and Physics
20 0.776483 445 cvpr-2013-Understanding Bayesian Rooms Using Composite 3D Object Models