iccv iccv2013 iccv2013-387 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Andrew Owens, Jianxiong Xiao, Antonio Torralba, William Freeman
Abstract: We present a data-driven method for building dense 3D reconstructions using a combination of recognition and multi-view cues. Our approach is based on the idea that there are image patches that are so distinctive that we can accurately estimate their latent 3D shapes solely using recognition. We call these patches shape anchors, and we use them as the basis of a multi-view reconstruction system that transfers dense, complex geometry between scenes. We “anchor” our 3D interpretation from these patches, using them to predict geometry for parts of the scene that are relatively ambiguous. The resulting algorithm produces dense reconstructions from stereo point clouds that are sparse and noisy, and we demonstrate it on a challenging dataset of real-world, indoor scenes.
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract We present a data-driven method for building dense 3D reconstructions using a combination of recognition and multi-view cues. [sent-3, score-0.515]
2 We call these patches shape anchors, and we use them as the basis of a multi-view reconstruction system that transfers dense, complex geometry between scenes. [sent-5, score-0.628]
3 The resulting algorithm produces dense reconstructions from stereo point clouds that are sparse and noisy, and we demonstrate it on a challenging dataset of real-world, indoor scenes. [sent-7, score-0.858]
4 Introduction While there are many cues that could be used to estimate depth from a video, the most successful approaches rely almost exclusively on cues based on multiple-view geometry. [sent-9, score-0.254]
5 These multi-view cues, such as parallax and occlusion ordering, are highly reliable, but they are not always available, and the resulting reconstructions are often incomplete containing structure, for example, only where stable image correspondences can be found. [sent-10, score-0.479]
6 What’s often missing in these reconstructions is surface information: for example it is often difficult to tell from just a stereo point cloud whether the floor and wall intersect in a clean right angle or in a more rounded way. [sent-11, score-0.993]
7 with a Markov Random Field [21] or by transferring depth from a small number of matching images [16]; however, it is not clear how to use these heavily regularized reconstructions when high-accuracy multi-view cues are available as well. [sent-15, score-0.669]
8 Despite the ambiguity of image patches in general, we hypothesize that many patches are so distinctive that their latent 3D shapes can be estimated using recognition cues alone. [sent-20, score-0.465]
9 We call these distinctive patches and their associated reconstructions shape anchors (Figure 2), and in this paper we describe how to use them in conjunction with multi-view cues to produce dense 3D reconstructions (Figure 1). [sent-21, score-1.819]
10 We start with a sparse point cloud produced by multiview stereo [11] and apply recognition cues cautiously, estimating dense geometry only in places where the combination of image and multi-view evidence tells us that our predictions are likely to be accurate. [sent-22, score-0.908]
11 We then use these con- fident predictions to anchor additional reconstruction, predicting 3D shape in places where the solution is more ambiguous. [sent-23, score-0.606]
12 Since our approach is based on transferring depth from an RGB-D database, it can be used to estimate the geometry for a wide variety of 3D structures, and it is well suited for reconstructing scenes that share common objects and architectural styles with the training data. [sent-24, score-0.394]
13 Our goal in this work is to build dense 3D reconstructions of real-world scenes, and to do so with accuracy at the level of a few centimeters. [sent-25, score-0.515]
14 Shape anchors (left) are distinctive image patches whose 3D shapes can be predicted from their appearance alone. [sent-28, score-0.62]
15 We transfer the geometry from another patch (second column) to the scene after measuring its similarity to a sparse stereo point cloud (third column), resulting in a dense 3D reconstruction (right). [sent-29, score-1.156]
16 1 view cues, and as a result the stereo point clouds are sparse and very noisy; by the standards of traditional multi-view benchmarks [26] [22] the reconstructions that we seek are rather coarse. [sent-30, score-0.757]
17 In the places where we predict depth using shape anchors, the result is dense, with accuracy close to that of multi-view stereo, and often there is qualitative information that may not be obvious from the point cloud alone (e. [sent-31, score-0.57]
18 Our way of combining these two cues is to use the single-image cues sparingly, hypothesiz- ing a dense depth map for each image patch using recognition and accepting only the hypotheses that agree with the multi-view evidence. [sent-37, score-0.615]
19 In this way, our goals differ from some recent work in single-image reconstruction such as [21] [15], which model lower-level shape information, e. [sent-46, score-0.349]
20 Our approach avoids this problem by transferring depth at a patch level. [sent-52, score-0.426]
21 The idea of finding image patches whose appearance is informative about geometry takes inspiration from recent work in recognition, notably poselets [4] (i. [sent-57, score-0.284]
22 Shape anchors Our approach is based on reconstructing the 3D shape of individual image patches, and in its most general form this problem is impossibly hard: the shape of most image patches is highly ambiguous. [sent-62, score-0.928]
23 We hypothesize, however, that there are image patches so distinctive that their shape can be guessed rather easily. [sent-63, score-0.36]
24 We call these patches and their associated reconstructions shape anchors (Figure 2), and we say that a point cloud representing a 3D-reconstructed patch is a shape anchor if it is sufficiently similar to the patch’s ground-truth point cloud. [sent-64, score-2.329]
25 Later, we will describe how to identify these correct reconstructions (Section 4) and use them to interpret the geometry for other parts of the scene (Section 5). [sent-65, score-0.565]
26 Now we will define what it means for a patch’s 3D reconstruction to be correct in other words, for a patch and its reconstruction to be a shape anchor. [sent-66, score-0.804]
27 Shape similarity One of the hazards of using recognition to estimate shape is an ambiguity in absolute depth, and accordingly we use a measure of shape similarity that is invariant to the point cloud’s distance from the camera (we do not model other ambiguities, e. [sent-67, score-0.564]
28 Specifically, if PD is the point cloud that we estimate for a patch, and v is the camera ray passing through the patch’s center, then we require PD to satisfy the distance relationship – ↵m? [sent-70, score-0.365]
29 (PD+ ↵v,PGT) ⌧, (1) where PGT is the patch’s ground-truth point cloud and PD + ↵v denotes a version of the point cloud that has been shifted away from the camera by distance ↵, i. [sent-72, score-0.632]
30 Note that this value is small given that patch reconstructions are often meters in total size, and that this parameter controls the overall accuracy of the reconstruction. [sent-77, score-0.734]
31 for it to be considered a shape anchor), the average distance between a reconstructed point and the nearest ground-truth point must be at most ⌧ (and vice versa) after correcting for ambiguity in absolute depth. [sent-84, score-0.41]
32 In effect, patch reconstructions are evaluated holistically: the only ones that “count” are those that are mostly right. [sent-87, score-0.795]
33 Predicting shape anchors We start by generating multiple 3D reconstructions for every patch in the image using a data-driven search proce- 35 Input Database F(icg)ure 3. [sent-89, score-1.366]
34 We then (b) compare the depth map of the best matches with the sparse stereo point cloud, transferring the dense shape if their depths agree (c). [sent-93, score-0.72]
35 We then introduce multi-view information and use it in combination with the image evidence to distinguish the “good” patch reconstructions (i. [sent-96, score-0.779]
36 Data-driven shape estimation Under our framework, the use of recognition and multiview cues is mostly decoupled: the goal of the “recognition system” is to produce as many good patch reconstructions (i. [sent-102, score-1.084]
37 In this work, we choose to generate our reconstructions using a data-driven search procedure, since this allows us to represent complex geometry for a variety of scenes. [sent-108, score-0.576]
38 Given a set of patches from an input image, we find each one’s best matches in an RGB-D database (using the “RGB” part only) and transfer the corresponding point cloud for one of the examples (using the “-D” part). [sent-109, score-0.551]
39 The highest-scoring shape anchor prediction for a sample of scenes, with their associated database matches. [sent-112, score-0.582]
40 The corresponding stereo point clouds are not shown, but they are used as part of the scoring process. [sent-113, score-0.282]
41 Extracting and representing patches Following recent work in image search and object detection [12] [14], we represent each patch as a HOG template whitened by Linear Discriminant Analysis. [sent-114, score-0.476]
42 We keep the k highest-scoring detections for each template, resulting in k reconstructions for each patch (we use k = 3). [sent-119, score-0.757]
43 Distinguishing shape anchors We now use multi-view cues to identify a subset of patch reconstructions that we are confident are shape anchors (i. [sent-124, score-2.049]
44 We start by aligning each reconstruction to a sparse point cloud (produced by multiview stereo, see Section 6), shifting the reconstruction away from the camera so as to maximize its agreement with the sparse point cloud. [sent-127, score-0.853]
45 After the alignment, we discard erroneous patch reconstructions, keeping only the ones that we are confident are shape anchors. [sent-131, score-0.603]
46 We do this primarily by throwing out the ones that significantly disagree with the multi-view evidence; in other words, we look for reconstructions for which the recognition- and multi-view-based interpretations coincide. [sent-132, score-0.47]
47 There are other sources of information that can be used as well, and we combine them using a random forest classifier [5], trained to predict which patch reconstructions are shape anchors. [sent-133, score-0.941]
48 For each patch reconstruction, we compute three kinds of features. [sent-134, score-0.287]
49 s yth||e, recentered patch reconstruction and S is the sparse point cloud. [sent-137, score-0.558]
50 We also include the absolute difference between the patch reconstruction’s depth before and after alignment. [sent-138, score-0.379]
51 Patch informativeness These features test whether the queried patch is so distinctive that there is only one 3D shape interpretation. [sent-141, score-0.608]
52 We measure the reconstruction’s simi- larity to the point clouds of the other best-matching patches (Figure 3 (d)). [sent-142, score-0.253]
53 We note that all of these features measure only the quality of the match; we do not compute any features for the point cloud itself, nor do we use any image features (e. [sent-151, score-0.294]
54 If a patch reconstruction is given a positive label by the random forest, then we consider it a shape anchor prediction, i. [sent-154, score-0.963]
55 Interpreting geometry with shape anchors We now describe how to “anchor” a reconstruction using the high-confidence estimates of geometry provided by shape anchors. [sent-160, score-1.128]
56 We use them to find other patch reconstructions using contextual information (Section 5. [sent-161, score-0.734]
57 Propagating shape anchor matches We start by repeating the search-and-classification procedure described in Section 3, restricting the search to sub- sequences centered on the sites of the highest-scoring shape anchor predictions (we use a subsequence of 20 frames and 200 top shape anchors). [sent-168, score-1.366]
58 We also try to find good patch reconstructions for the area surrounding a shape anchor (Figure 6). [sent-170, score-1.242]
59 We sample RGB-D patches near the matched database patch, and for each one we test whether it agrees with the corresponding patch in the query image using the method from Section 4. [sent-171, score-0.518]
60 aligning the patch’s points to the stereo point cloud and then classifying it). [sent-174, score-0.48]
61 Extrapolating planes from shape anchors We use shape anchor predictions that are mostly planar to guide a plane-finding algorithm (Figure 5). [sent-178, score-1.289]
62 We then use the shape anchor to infer the support of the plane, possibly expanding it to be much larger than the original patch. [sent-181, score-0.508]
63 We also show the database matches for the two shape anchors. [sent-186, score-0.274]
64 considered to be background observations; superpixels that intersect with the on-plane parts of the shape anchor are considered foreground observations. [sent-187, score-0.561]
65 We keep the expanded plane only if it is larger than the original shape anchor and agrees with the multi-view evidence. [sent-189, score-0.614]
66 Using occlusion constraints Since the patch reconstructions (shape anchors and propagated patches) are predicted in isolation, they may be inconsistent with each other. [sent-194, score-1.18]
67 First, we remove points that are inconsistent with each other in a single view; we keep at each pixel only the point that comes from the patch reconstruction with the greatest classifier score. [sent-196, score-0.613]
68 For each image, we find all of the patch reconstructions from other images that are visible. [sent-198, score-0.734]
69 If a point from one of these other images occludes a point from the image’s own patch reconstructions, then this violates an occlusion constraint; we resolve this by discarding the point that comes from the patch reconstruction with the lower classifier score. [sent-199, score-1.049]
70 Finally, we completely discard patch reconstructions for which only 10% or fewer of the points remain, since they are likely to be incorrect or redundant. [sent-200, score-0.797]
71 For the point cloud visualizations in the qualitative results, we estimate the camera pose using structure from motion (SfM) instead of using the SUN3D pose estimates. [sent-207, score-0.441]
72 The propagation step (a) starts with an anchor patch (the corner, in blue) and finds an additional match with the relatively ambiguous patch (in green). [sent-210, score-0.928]
73 We estimate the camera pose for each sequence using Bundler [25] after sampling one in every 5 frames (from 300 frames), discarding scenes whose SfM reconstructions have significant error 2; approximately 18% of these subsequences pass this test. [sent-215, score-0.644]
74 We search for shape anchors in 6 frames per video. [sent-218, score-0.632]
75 We estimate this from a set of high-confidence patch reconstructions (high convolution score and low patch-location difference). [sent-222, score-0.793]
76 Each triangulated 3D point votes for a scale its distance to the camera divided by that of the corresponding point in the patch reconstruction and we choose the mode. [sent-223, score-0.649]
77 – – Quantitative evaluation As a measure of accuracy, we estimate the distance from each reconstructed point to the nearest point in the ground-truth point cloud (Figure 7(a)). [sent-225, score-0.494]
78 If multiple shape anchors overlap, then we take the highest-scoring point at each pixel. [sent-227, score-0.67]
79 38 shape anchor prediction windows, in both the reconstruction and the ground-truth, we find that the shape anchors are more complete than the PMVS points (Figure 7(d)). [sent-235, score-1.33]
80 We find that combining our predicted geometry with the original point cloud results in a denser reconstruction with similar overall accuracy. [sent-237, score-0.554]
81 Qualitative results In Figure 8, we show visualizations for some of our reconstructions (a subset of the test set). [sent-242, score-0.477]
82 These reconstructions were created by combining the predicted patch reconstructions (i. [sent-243, score-1.181]
83 shape anchor predictions plus the propagated geometry) and the extrapolated planes. [sent-245, score-0.644]
84 The results are dense 3D reconstructions composed of translated point clouds from the database, plus a small number of extrapolated planes. [sent-250, score-0.717]
85 (a) Accuracy of points from shape anchor predictions. [sent-272, score-0.542]
86 (d) Accuracy and completeness (b) Number of anchor predictions per measures, for both the full scene and the just the shape anchor windows. [sent-275, score-0.974]
87 chair in sequence (c), and a sink in Figure 9 (whose highly reflective surface produces many erroneous stereo points). [sent-277, score-0.276]
88 Our method is less successful in modeling fine-scale geometry, partly due to the large patch size and the distance threshold of 10cm that we require for shape anchors (Equation 1). [sent-278, score-0.882]
89 For example, in (d), we model a chair arm using a patch from a bathroom. [sent-279, score-0.329]
90 We also sometimes transfer patches containing extra geometry: in (a) we hallucinate a chair while transferring a wall. [sent-280, score-0.284]
91 We make no attempt to align shape anchors beyond translating them, so walls may be at the wrong angles, e. [sent-281, score-0.617]
92 We note that the magnitude of the errors is usually not too large, since the classifier is unlikely to introduce a shape anchor that strays too far from the sparse point cloud. [sent-284, score-0.66]
93 The number of shape anchor predictions can also vary a great deal between scenes (Figure 7(b)), meaning that for many scenes the results are sparser than the ones presented here (please see our video for examples). [sent-286, score-0.685]
94 This is partly due to the data-driven nature of our algorithm: for some scenes it is hard to find matches even when the search is conducted at the patch level. [sent-287, score-0.413]
95 Conclusion In this work, we introduced shape anchors, image patches whose shape can easily be recognized from the patch itself, and which can be used to “anchor” a reconstruction. [sent-291, score-0.772]
96 We also believe that the recognition task presented in this work namely that of generating accurate 3D reconstructions from image patches is an interesting problem with many solutions beyond the data-driven search method described here. [sent-293, score-0.607]
97 3D reconstruction results for four scenes, chosen from the test set for their large number of shape anchor predictions. [sent-443, score-0.676]
98 We show the PMVS point cloud and two views of our dense reconstruction combined with the PMVS points (our final output). [sent-444, score-0.564]
99 For each scene, we show four shape anchor transfers, selected by hand from among the top-ten highest scoring ones (that survive occlusion testing); we show one erroneous shape anchor per scene in the last row. [sent-445, score-1.138]
100 A comparison and evaluation of multi-view stereo reconstruction algorithms. [sent-457, score-0.32]
wordName wordTfidf (topN-words)
[('reconstructions', 0.447), ('anchors', 0.414), ('anchor', 0.327), ('patch', 0.287), ('cloud', 0.219), ('shape', 0.181), ('reconstruction', 0.168), ('pmvs', 0.161), ('stereo', 0.152), ('pd', 0.126), ('patches', 0.123), ('geometry', 0.092), ('cues', 0.083), ('transferring', 0.078), ('point', 0.075), ('extrapolated', 0.072), ('dense', 0.068), ('csail', 0.066), ('pgt', 0.065), ('predictions', 0.064), ('transfers', 0.064), ('depth', 0.061), ('plane', 0.057), ('distinctive', 0.056), ('clouds', 0.055), ('queried', 0.051), ('seitz', 0.05), ('sfm', 0.049), ('completeness', 0.049), ('database', 0.049), ('multiview', 0.048), ('transferred', 0.047), ('evidence', 0.045), ('furukawa', 0.045), ('scenes', 0.045), ('matches', 0.044), ('camera', 0.044), ('planar', 0.044), ('mit', 0.044), ('apartment', 0.043), ('chair', 0.042), ('confident', 0.042), ('hd', 0.041), ('erroneous', 0.041), ('transfer', 0.041), ('surface', 0.041), ('planes', 0.04), ('poselets', 0.039), ('mostly', 0.038), ('search', 0.037), ('curless', 0.037), ('owens', 0.036), ('pik', 0.036), ('points', 0.034), ('subsequences', 0.034), ('architectural', 0.034), ('places', 0.034), ('indoor', 0.033), ('whether', 0.033), ('agree', 0.033), ('tog', 0.032), ('occlusion', 0.032), ('convolution', 0.032), ('absolute', 0.031), ('ransac', 0.031), ('saxena', 0.03), ('visualizations', 0.03), ('informative', 0.03), ('corners', 0.029), ('reconstructing', 0.029), ('discard', 0.029), ('perceive', 0.029), ('template', 0.029), ('sparse', 0.028), ('manhattan', 0.028), ('suited', 0.028), ('hypothesize', 0.028), ('propagation', 0.027), ('superpixels', 0.027), ('estimate', 0.027), ('shapes', 0.027), ('scene', 0.026), ('classifier', 0.026), ('agrees', 0.026), ('intersect', 0.026), ('ambiguity', 0.025), ('prediction', 0.025), ('discarding', 0.024), ('restricting', 0.024), ('keep', 0.023), ('folds', 0.023), ('pose', 0.023), ('errors', 0.023), ('torralba', 0.023), ('malisiewicz', 0.023), ('uncontrolled', 0.023), ('ones', 0.023), ('reconstructed', 0.023), ('readers', 0.022), ('align', 0.022)]
simIndex simValue paperId paperTitle
same-paper 1 0.9999997 387 iccv-2013-Shape Anchors for Data-Driven Multi-view Reconstruction
Author: Andrew Owens, Jianxiong Xiao, Antonio Torralba, William Freeman
Abstract: We present a data-driven method for building dense 3D reconstructions using a combination of recognition and multi-view cues. Our approach is based on the idea that there are image patches that are so distinctive that we can accurately estimate their latent 3D shapes solely using recognition. We call these patches shape anchors, and we use them as the basis of a multi-view reconstruction system that transfers dense, complex geometry between scenes. We “anchor” our 3D interpretation from these patches, using them to predict geometry for parts of the scene that are relatively ambiguous. The resulting algorithm produces dense reconstructions from stereo point clouds that are sparse and noisy, and we demonstrate it on a challenging dataset of real-world, indoor scenes.
2 0.23097351 445 iccv-2013-Visual Reranking through Weakly Supervised Multi-graph Learning
Author: Cheng Deng, Rongrong Ji, Wei Liu, Dacheng Tao, Xinbo Gao
Abstract: Visual reranking has been widely deployed to refine the quality of conventional content-based image retrieval engines. The current trend lies in employing a crowd of retrieved results stemming from multiple feature modalities to boost the overall performance of visual reranking. However, a major challenge pertaining to current reranking methods is how to take full advantage of the complementary property of distinct feature modalities. Given a query image and one feature modality, a regular visual reranking framework treats the top-ranked images as pseudo positive instances which are inevitably noisy, difficult to reveal this complementary property, and thus lead to inferior ranking performance. This paper proposes a novel image reranking approach by introducing a Co-Regularized Multi-Graph Learning (Co-RMGL) framework, in which the intra-graph and inter-graph constraints are simultaneously imposed to encode affinities in a single graph and consistency across different graphs. Moreover, weakly supervised learning driven by image attributes is performed to denoise the pseudo- labeled instances, thereby highlighting the unique strength of individual feature modality. Meanwhile, such learning can yield a few anchors in graphs that vitally enable the alignment and fusion of multiple graphs. As a result, an edge weight matrix learned from the fused graph automatically gives the ordering to the initially retrieved results. We evaluate our approach on four benchmark image retrieval datasets, demonstrating a significant performance gain over the state-of-the-arts.
3 0.17648527 319 iccv-2013-Point-Based 3D Reconstruction of Thin Objects
Author: Benjamin Ummenhofer, Thomas Brox
Abstract: 3D reconstruction deals with the problem of finding the shape of an object from a set of images. Thin objects that have virtually no volumepose a special challengefor reconstruction with respect to shape representation and fusion of depth information. In this paper we present a dense pointbased reconstruction method that can deal with this special class of objects. We seek to jointly optimize a set of depth maps by treating each pixel as a point in space. Points are pulled towards a common surface by pairwise forces in an iterative scheme. The method also handles the problem of opposed surfaces by means of penalty forces. Efficient optimization is achieved by grouping points to superpixels and a spatial hashing approach for fast neighborhood queries. We show that the approach is on a par with state-of-the-art methods for standard multi view stereo settings and gives superior results for thin objects.
4 0.16286606 441 iccv-2013-Video Motion for Every Visible Point
Author: Susanna Ricco, Carlo Tomasi
Abstract: Dense motion of image points over many video frames can provide important information about the world. However, occlusions and drift make it impossible to compute long motionpaths by merely concatenating opticalflow vectors between consecutive frames. Instead, we solve for entire paths directly, and flag the frames in which each is visible. As in previous work, we anchor each path to a unique pixel which guarantees an even spatial distribution of paths. Unlike earlier methods, we allow paths to be anchored in any frame. By explicitly requiring that at least one visible path passes within a small neighborhood of every pixel, we guarantee complete coverage of all visible points in all frames. We achieve state-of-the-art results on real sequences including both rigid and non-rigid motions with significant occlusions.
5 0.15515406 367 iccv-2013-SUN3D: A Database of Big Spaces Reconstructed Using SfM and Object Labels
Author: Jianxiong Xiao, Andrew Owens, Antonio Torralba
Abstract: Existing scene understanding datasets contain only a limited set of views of a place, and they lack representations of complete 3D spaces. In this paper, we introduce SUN3D, a large-scale RGB-D video database with camera pose and object labels, capturing the full 3D extent of many places. The tasks that go into constructing such a dataset are difficult in isolation hand-labeling videos is painstaking, and structure from motion (SfM) is unreliable for large spaces. But if we combine them together, we make the dataset construction task much easier. First, we introduce an intuitive labeling tool that uses a partial reconstruction to propagate labels from one frame to another. Then we use the object labels to fix errors in the reconstruction. For this, we introduce a generalization of bundle adjustment that incorporates object-to-object correspondences. This algorithm works by constraining points for the same object from different frames to lie inside a fixed-size bounding box, parameterized by its rotation and translation. The SUN3D database, the source code for the generalized bundle adjustment, and the web-based 3D annotation tool are all avail– able at http://sun3d.cs.princeton.edu.
6 0.14961174 111 iccv-2013-Detecting Dynamic Objects with Multi-view Background Subtraction
7 0.13870811 101 iccv-2013-DCSH - Matching Patches in RGBD Images
8 0.13153583 219 iccv-2013-Internet Based Morphable Model
9 0.1259639 255 iccv-2013-Local Signal Equalization for Correspondence Matching
10 0.12560806 281 iccv-2013-Multi-view Normal Field Integration for 3D Reconstruction of Mirroring Objects
11 0.12031832 444 iccv-2013-Viewing Real-World Faces in 3D
12 0.11912191 317 iccv-2013-Piecewise Rigid Scene Flow
13 0.11853872 366 iccv-2013-STAR3D: Simultaneous Tracking and Reconstruction of 3D Objects Using RGB-D Data
14 0.11814985 304 iccv-2013-PM-Huber: PatchMatch with Huber Regularization for Stereo Matching
15 0.11446773 394 iccv-2013-Single-Patch Low-Rank Prior for Non-pointwise Impulse Noise Removal
16 0.10999416 382 iccv-2013-Semi-dense Visual Odometry for a Monocular Camera
17 0.10666929 9 iccv-2013-A Flexible Scene Representation for 3D Reconstruction Using an RGB-D Camera
18 0.10339101 1 iccv-2013-3DNN: Viewpoint Invariant 3D Geometry Matching for Scene Understanding
19 0.10170232 424 iccv-2013-Tracking Revisited Using RGBD Camera: Unified Benchmark and Baselines
20 0.10061514 228 iccv-2013-Large-Scale Multi-resolution Surface Reconstruction from RGB-D Sequences
topicId topicWeight
[(0, 0.22), (1, -0.163), (2, -0.049), (3, -0.004), (4, 0.011), (5, -0.021), (6, 0.021), (7, -0.139), (8, -0.05), (9, -0.008), (10, 0.005), (11, 0.058), (12, -0.01), (13, 0.023), (14, -0.003), (15, -0.061), (16, -0.006), (17, -0.037), (18, 0.028), (19, -0.003), (20, -0.049), (21, 0.03), (22, -0.005), (23, 0.052), (24, 0.032), (25, 0.059), (26, -0.036), (27, -0.053), (28, -0.055), (29, -0.001), (30, -0.048), (31, -0.02), (32, 0.087), (33, -0.048), (34, -0.002), (35, -0.006), (36, 0.062), (37, 0.014), (38, 0.11), (39, -0.16), (40, -0.155), (41, -0.103), (42, -0.003), (43, -0.007), (44, 0.038), (45, -0.119), (46, 0.047), (47, 0.033), (48, 0.068), (49, -0.033)]
simIndex simValue paperId paperTitle
same-paper 1 0.95675886 387 iccv-2013-Shape Anchors for Data-Driven Multi-view Reconstruction
Author: Andrew Owens, Jianxiong Xiao, Antonio Torralba, William Freeman
Abstract: We present a data-driven method for building dense 3D reconstructions using a combination of recognition and multi-view cues. Our approach is based on the idea that there are image patches that are so distinctive that we can accurately estimate their latent 3D shapes solely using recognition. We call these patches shape anchors, and we use them as the basis of a multi-view reconstruction system that transfers dense, complex geometry between scenes. We “anchor” our 3D interpretation from these patches, using them to predict geometry for parts of the scene that are relatively ambiguous. The resulting algorithm produces dense reconstructions from stereo point clouds that are sparse and noisy, and we demonstrate it on a challenging dataset of real-world, indoor scenes.
2 0.71699065 101 iccv-2013-DCSH - Matching Patches in RGBD Images
Author: Yaron Eshet, Simon Korman, Eyal Ofek, Shai Avidan
Abstract: We extend patch based methods to work on patches in 3D space. We start with Coherency Sensitive Hashing [12] (CSH), which is an algorithm for matching patches between two RGB images, and extend it to work with RGBD images. This is done by warping all 3D patches to a common virtual plane in which CSH is performed. To avoid noise due to warping of patches of various normals and depths, we estimate a group of dominant planes and compute CSH on each plane separately, before merging the matching patches. The result is DCSH - an algorithm that matches world (3D) patches in order to guide the search for image plane matches. An independent contribution is an extension of CSH, which we term Social-CSH. It allows a major speedup of the k nearest neighbor (kNN) version of CSH - its runtime growing linearly, rather than quadratically, in k. Social-CSH is used as a subcomponent of DCSH when many NNs are required, as in the case of image denoising. We show the benefits ofusing depth information to image reconstruction and image denoising, demonstrated on several RGBD images.
3 0.66933471 319 iccv-2013-Point-Based 3D Reconstruction of Thin Objects
Author: Benjamin Ummenhofer, Thomas Brox
Abstract: 3D reconstruction deals with the problem of finding the shape of an object from a set of images. Thin objects that have virtually no volumepose a special challengefor reconstruction with respect to shape representation and fusion of depth information. In this paper we present a dense pointbased reconstruction method that can deal with this special class of objects. We seek to jointly optimize a set of depth maps by treating each pixel as a point in space. Points are pulled towards a common surface by pairwise forces in an iterative scheme. The method also handles the problem of opposed surfaces by means of penalty forces. Efficient optimization is achieved by grouping points to superpixels and a spatial hashing approach for fast neighborhood queries. We show that the approach is on a par with state-of-the-art methods for standard multi view stereo settings and gives superior results for thin objects.
4 0.62823397 9 iccv-2013-A Flexible Scene Representation for 3D Reconstruction Using an RGB-D Camera
Author: Diego Thomas, Akihiro Sugimoto
Abstract: Updating a global 3D model with live RGB-D measurements has proven to be successful for 3D reconstruction of indoor scenes. Recently, a Truncated Signed Distance Function (TSDF) volumetric model and a fusion algorithm have been introduced (KinectFusion), showing significant advantages such as computational speed and accuracy of the reconstructed scene. This algorithm, however, is expensive in memory when constructing and updating the global model. As a consequence, the method is not well scalable to large scenes. We propose a new flexible 3D scene representation using a set of planes that is cheap in memory use and, nevertheless, achieves accurate reconstruction of indoor scenes from RGB-D image sequences. Projecting the scene onto different planes reduces significantly the size of the scene representation and thus it allows us to generate a global textured 3D model with lower memory requirement while keeping accuracy and easiness to update with live RGB-D measurements. Experimental results demonstrate that our proposed flexible 3D scene representation achieves accurate reconstruction, while keeping the scalability for large indoor scenes.
5 0.58213252 255 iccv-2013-Local Signal Equalization for Correspondence Matching
Author: Derek Bradley, Thabo Beeler
Abstract: Correspondence matching is one of the most common problems in computer vision, and it is often solved using photo-consistency of local regions. These approaches typically assume that the frequency content in the local region is consistent in the image pair, such that matching is performed on similar signals. However, in many practical situations this is not the case, for example with low depth of field cameras a scene point may be out of focus in one view and in-focus in the other, causing a mismatch of frequency signals. Furthermore, this mismatch can vary spatially over the entire image. In this paper we propose a local signal equalization approach for correspondence matching. Using a measure of local image frequency, we equalize local signals using an efficient scale-space image representation such that their frequency contents are optimally suited for matching. Our approach allows better correspondence matching, which we demonstrate with a number of stereo reconstruction examples on synthetic and real datasets.
6 0.56883055 410 iccv-2013-Support Surface Prediction in Indoor Scenes
7 0.56826502 102 iccv-2013-Data-Driven 3D Primitives for Single Image Understanding
8 0.55928159 228 iccv-2013-Large-Scale Multi-resolution Surface Reconstruction from RGB-D Sequences
9 0.5539192 254 iccv-2013-Live Metric 3D Reconstruction on Mobile Phones
10 0.55168968 367 iccv-2013-SUN3D: A Database of Big Spaces Reconstructed Using SfM and Object Labels
11 0.54910856 284 iccv-2013-Multiview Photometric Stereo Using Planar Mesh Parameterization
12 0.54337305 441 iccv-2013-Video Motion for Every Visible Point
13 0.54280257 382 iccv-2013-Semi-dense Visual Odometry for a Monocular Camera
14 0.54033154 139 iccv-2013-Elastic Fragments for Dense Scene Reconstruction
15 0.53210044 111 iccv-2013-Detecting Dynamic Objects with Multi-view Background Subtraction
16 0.5226649 304 iccv-2013-PM-Huber: PatchMatch with Huber Regularization for Stereo Matching
17 0.51821887 28 iccv-2013-A Rotational Stereo Model Based on XSlit Imaging
18 0.51514465 108 iccv-2013-Depth from Combining Defocus and Correspondence Using Light-Field Cameras
19 0.5145399 2 iccv-2013-3D Scene Understanding by Voxel-CRF
20 0.51224327 394 iccv-2013-Single-Patch Low-Rank Prior for Non-pointwise Impulse Noise Removal
topicId topicWeight
[(2, 0.055), (7, 0.011), (12, 0.029), (26, 0.11), (31, 0.031), (34, 0.011), (35, 0.013), (40, 0.019), (42, 0.107), (53, 0.095), (64, 0.084), (73, 0.043), (89, 0.253), (95, 0.028), (98, 0.025)]
simIndex simValue paperId paperTitle
1 0.95555544 217 iccv-2013-Initialization-Insensitive Visual Tracking through Voting with Salient Local Features
Author: Kwang Moo Yi, Hawook Jeong, Byeongho Heo, Hyung Jin Chang, Jin Young Choi
Abstract: In this paper we propose an object tracking method in case of inaccurate initializations. To track objects accurately in such situation, the proposed method uses “motion saliency ” and “descriptor saliency ” of local features and performs tracking based on generalized Hough transform (GHT). The proposed motion saliency of a local feature emphasizes features having distinctive motions, compared to the motions which are not from the target object. The descriptor saliency emphasizes features which are likely to be of the object in terms of its feature descriptors. Through these saliencies, the proposed method tries to “learn and find” the target object rather than looking for what was given at initialization, giving robust results even with inaccurate initializations. Also, our tracking result is obtained by combining the results of each local feature of the target and the surroundings with GHT voting, thus is robust against severe occlusions as well. The proposed method is compared against nine other methods, with nine image sequences, and hundred random initializations. The experimental results show that our method outperforms all other compared methods.
same-paper 2 0.94312024 387 iccv-2013-Shape Anchors for Data-Driven Multi-view Reconstruction
Author: Andrew Owens, Jianxiong Xiao, Antonio Torralba, William Freeman
Abstract: We present a data-driven method for building dense 3D reconstructions using a combination of recognition and multi-view cues. Our approach is based on the idea that there are image patches that are so distinctive that we can accurately estimate their latent 3D shapes solely using recognition. We call these patches shape anchors, and we use them as the basis of a multi-view reconstruction system that transfers dense, complex geometry between scenes. We “anchor” our 3D interpretation from these patches, using them to predict geometry for parts of the scene that are relatively ambiguous. The resulting algorithm produces dense reconstructions from stereo point clouds that are sparse and noisy, and we demonstrate it on a challenging dataset of real-world, indoor scenes.
3 0.93887591 160 iccv-2013-Fast Object Segmentation in Unconstrained Video
Author: Anestis Papazoglou, Vittorio Ferrari
Abstract: We present a technique for separating foreground objects from the background in a video. Our method isfast, , fully automatic, and makes minimal assumptions about the video. This enables handling essentially unconstrained settings, including rapidly moving background, arbitrary object motion and appearance, and non-rigid deformations and articulations. In experiments on two datasets containing over 1400 video shots, our method outperforms a state-of-theart background subtraction technique [4] as well as methods based on clustering point tracks [6, 18, 19]. Moreover, it performs comparably to recent video object segmentation methods based on objectproposals [14, 16, 27], while being orders of magnitude faster.
4 0.93775094 190 iccv-2013-Handling Occlusions with Franken-Classifiers
Author: Markus Mathias, Rodrigo Benenson, Radu Timofte, Luc Van_Gool
Abstract: Detecting partially occluded pedestrians is challenging. A common practice to maximize detection quality is to train a set of occlusion-specific classifiers, each for a certain amount and type of occlusion. Since training classifiers is expensive, only a handful are typically trained. We show that by using many occlusion-specific classifiers, we outperform previous approaches on three pedestrian datasets; INRIA, ETH, and Caltech USA. We present a new approach to train such classifiers. By reusing computations among different training stages, 16 occlusion-specific classifiers can be trained at only one tenth the cost of one full training. We show that also test time cost grows sub-linearly.
5 0.93670022 65 iccv-2013-Breaking the Chain: Liberation from the Temporal Markov Assumption for Tracking Human Poses
Author: Ryan Tokola, Wongun Choi, Silvio Savarese
Abstract: We present an approach to multi-target tracking that has expressive potential beyond the capabilities of chainshaped hidden Markov models, yet has significantly reduced complexity. Our framework, which we call tracking-byselection, is similar to tracking-by-detection in that it separates the tasks of detection and tracking, but it shifts tempo-labs . com Stanford, CA ssi lvio @ st an ford . edu ral reasoning from the tracking stage to the detection stage. The core feature of tracking-by-selection is that it reasons about path hypotheses that traverse the entire video instead of a chain of single-frame object hypotheses. A traditional chain-shaped tracking-by-detection model is only able to promote consistency between one frame and the next. In tracking-by-selection, path hypotheses exist across time, and encouraging long-term temporal consistency is as simple as rewarding path hypotheses with consistent image features. One additional advantage of tracking-by-selection is that it results in a dramatically simplified model that can be solved exactly. We adapt an existing tracking-by-detection model to the tracking-by-selectionframework, and show improvedperformance on a challenging dataset (introduced in [18]).
6 0.93665946 196 iccv-2013-Hierarchical Data-Driven Descent for Efficient Optimal Deformation Estimation
7 0.93633854 299 iccv-2013-Online Video SEEDS for Temporal Window Objectness
8 0.93620604 146 iccv-2013-Event Detection in Complex Scenes Using Interval Temporal Constraints
9 0.93619215 379 iccv-2013-Semantic Segmentation without Annotating Segments
10 0.93544811 107 iccv-2013-Deformable Part Descriptors for Fine-Grained Recognition and Attribute Prediction
11 0.93526655 127 iccv-2013-Dynamic Pooling for Complex Event Recognition
12 0.93501878 204 iccv-2013-Human Attribute Recognition by Rich Appearance Dictionary
13 0.93322778 78 iccv-2013-Coherent Motion Segmentation in Moving Camera Videos Using Optical Flow Orientations
14 0.93245792 274 iccv-2013-Monte Carlo Tree Search for Scheduling Activity Recognition
15 0.93206507 338 iccv-2013-Randomized Ensemble Tracking
16 0.93179804 111 iccv-2013-Detecting Dynamic Objects with Multi-view Background Subtraction
17 0.93169516 420 iccv-2013-Topology-Constrained Layered Tracking with Latent Flow
18 0.93114161 396 iccv-2013-Space-Time Robust Representation for Action Recognition
19 0.93090898 411 iccv-2013-Symbiotic Segmentation and Part Localization for Fine-Grained Categorization
20 0.9308145 270 iccv-2013-Modeling Self-Occlusions in Dynamic Shape and Appearance Tracking