iccv iccv2013 iccv2013-102 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: David F. Fouhey, Abhinav Gupta, Martial Hebert
Abstract: What primitives should we use to infer the rich 3D world behind an image? We argue that these primitives should be both visually discriminative and geometrically informative and we present a technique for discovering such primitives. We demonstrate the utility of our primitives by using them to infer 3D surface normals given a single image. Our technique substantially outperforms the state-of-the-art and shows improved cross-dataset performance.
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract What primitives should we use to infer the rich 3D world behind an image? [sent-4, score-0.865]
2 We argue that these primitives should be both visually discriminative and geometrically informative and we present a technique for discovering such primitives. [sent-5, score-1.068]
3 We demonstrate the utility of our primitives by using them to infer 3D surface normals given a single image. [sent-6, score-1.176]
4 At the heart of the 3D inference problem is the question: What are the right primitives for inferring the 3D world from a 2D image? [sent-15, score-0.867]
5 It is not clear what kind of 3D primitives can be directly detected in images and be used for subsequent 3D reasoning. [sent-16, score-0.772]
6 There is a rich literature proposing a myriad of 3D primitives ranging from edges and surfaces to volumetric primitives such as generalized cylinders, geons and cuboids. [sent-17, score-1.662]
7 While these 3D primitives make sense intuitively, they are often hard to detect because they are not discriminative in appearance. [sent-18, score-0.824]
8 On the other hand, primitives based on appearance might be easy to detect but can be geometrically uninformative. [sent-19, score-0.919]
9 In this paper, we propose data-driven geometric primitives which are visually-discriminative, or easily recognized in a scene, and geometrically-informative, or conveying information about the 3D world when recognized. [sent-20, score-0.936]
10 Our primitives can correspond to geometric surfaces, corners of cuboids, intersection of planes, object parts or even whole objects. [sent-21, score-0.829]
11 These can be recognized with high precision in RGB images (b) and convey the underlying 3D geometry (c). [sent-25, score-0.157]
12 We can use sparse detections of these primitives to find dense surface normals via simple label transfer (d). [sent-26, score-1.348]
13 function which encodes these two criteria and learn these 3D primitives from indoor RGBD data (see Fig. [sent-27, score-0.795]
14 We then demonstrate that our primitives can be recognized with high precision in RGB images (Fig. [sent-29, score-0.808]
15 1(b)) and convey a great deal of information about the underlying 3D world (Fig. [sent-30, score-0.154]
16 We use these primitives to densely recover the surface normals of a scene from a single image via simple transfer (Fig. [sent-32, score-1.213]
17 Our 3D primitives significantly outperform the state-of-theart as well as a number of other credible baselines. [sent-34, score-0.772]
18 We also demonstrate that our primitives generalize well by showing improved cross-dataset performance. [sent-35, score-0.794]
19 Historical Background The problem of inferring the 3D layout of a scene from a single image is a long-studied problem in computer vision. [sent-38, score-0.116]
20 Other research focused on volumetric 3D primitives such 33338925 Figure 2. [sent-42, score-0.819]
21 However, al- though these primitives produced impressive demos such as ACRONYM [5], they failed to generalize well and the field moved towards appearance-based approaches (e. [sent-52, score-0.794]
22 Recently, there has been a renewed push toward more geometric approaches where the appearance of primitives is learned using large amounts of labeled [14] or depth data [23]. [sent-55, score-0.916]
23 The most commonly used primitives include oriented 3D surfaces [14, 19, 23, 3 1] represented as segments in the image, or volumetric primitives such as blocks [11] and cuboids [18, 33]. [sent-56, score-1.663]
24 However, since these primitives are not discriminative, a global consistency must be enforced, e. [sent-57, score-0.772]
25 , by a learned model such as in [23], a hierarchical segmentation [14], physical and volumetric relationships [18], recognizing primitives as parts of semantic objects [3 1], or assuming a Manhattan world and low-parameter room layout model [13, 35, 26]. [sent-59, score-0.982]
26 While all of these constraint-based approaches have improved 3D scene understanding, accurate detection of primitives still remains a major challenge. [sent-61, score-0.8]
27 Specifically, instead of using manually defined and semantically meaningful primitives or parts, these approaches discover primitives in labeled [4, 9], weakly-labeled [8] or unlabeled data [29]. [sent-63, score-1.566]
28 While these primitives have high detection accuracy, they might not have consistent underlying geometry. [sent-64, score-0.822]
29 Building upon these advances, our work discovers primitives that are both discriminative and informative. [sent-65, score-0.824]
30 Overview Our goal is to discover a vocabulary of 3D primitives that are visually discriminative and geometrically informative; in other words, primitives need to be easily recognized in unseen images and convey information about 3D proper33338936 ties of the scene when recognized. [sent-73, score-1.894]
31 , instances can be used to obtain a detector and vice-versa. [sent-77, score-0.129]
32 We build our primitives using an iterative procedure detailed in Section 3. [sent-78, score-0.772]
33 After using an initialization that en- sures both visual discriminativity and geometric informativity, we optimize the objective function by alternating between finding instances, learning the detector, and computing the canonical form. [sent-79, score-0.137]
34 Once the primitives have been discovered, we use them to interpret new images and demonstrate that our detectors can trade off between sparsity and accuracy of predictions. [sent-80, score-0.816]
35 Discovering 3D Primitives Given a set of training images and their corresponding surface normals, our goal is to discover geometric primitives that are both discriminative and informative. [sent-83, score-1.06]
36 The challenge is that the space of geometric primitives is enormous, and we must sift through all the data to find geometricallyconsistent and visually discriminative concepts. [sent-84, score-0.991]
37 Similar to object discovery approaches, we pose it as a clustering problem: given millions of image patches, we group them so each cluster is discriminative (we can learn an accurate detector) and geometrically consistent (all patches in the cluster have consistent surface normals). [sent-85, score-0.54]
38 hwi nhee (rSeVwM is) learned in appearance space, N represents the underlying geometry of the primitive and y ∈ {0, 1}m is an instance ignedoimcaettorry v oefct toher w pritimh yi =e 1n dfo yr i ∈ns t{a0n,c1e}s of the primitive and zero otherwise. [sent-100, score-0.612]
39 In our case, we use an SVM-based detector and represent geometry with surface normals. [sent-106, score-0.25]
40 c Tohseinreef odirest,an wcee s s betetw Re(ewn) )p taotc |h|wes||’ surface normals, and L to hinge-loss on each xiA with respect to w and y. [sent-108, score-0.157]
41 We obtain a collection of primitives by finding the many minima, which we do via multiple initialization. [sent-111, score-0.797]
42 Iterative Optimization Our approach alternates between optimizing membership y and detector weights w while ensuring both visual discriminativity and geometric informativity. [sent-118, score-0.196]
43 2), we train a detector w to separate the elements of the cluster from geometrically dissimilar patches from negative examples V (gfeooumnde rviciaa l tyhe d cisasniomnilicaarl p afotcrmhe sN f)ro. [sent-120, score-0.327]
44 From y to w: given the primitive instances, we want to train a detector that can distinguish the primitive instances from the rest of the world. [sent-124, score-0.686]
45 To help find dissimilar patches, we first compute the canonical form N of the current instances as the per-pixel average surface normal over the instances (i. [sent-125, score-0.425]
46 We then train a linear SVM w to separate the positive instances from geometrically dissimilar patches in the negative set V farnodm a glle poamtechtreisc ainl y yW di. [sent-128, score-0.312]
47 We align the training images via detections and predict the test image as a weighted linear sum of the training images. [sent-134, score-0.127]
48 geometrically consistent set among the top detections of w in I; in our experiments, we use the s-member subset that ianpp Iro; ixnim oaurtel eyx pmeirnimimeniztes,s wthee iunstera t-hseet s -comseinmeb edrist saunbcsee. [sent-135, score-0.245]
49 t If done directly, this sort of iterative technique (akin to discriminative clustering [34]) has been demonstrated to overfit and produce sub-optimal results. [sent-136, score-0.11]
50 We use two partitions; we initialize identities y and train the detector w on partition 1; then we update the identities y and train w on partition 2; following this, we return to partition 1, etc. [sent-138, score-0.121]
51 Implementation Details Initialization: We initialize our algorithm with a greedy × × approach to find independently visually and geometrically compact groups in randomly sampled patches. [sent-141, score-0.155]
52 We group the query patch with its neighbors to initialize the primitive instances. [sent-144, score-0.294]
53 For a training set of 800 images, we produce 3, 000 primitive candidates. [sent-145, score-0.262]
54 Calibrating the Detectors: Our discovery procedure will produce a collection of geometric primitives with detectors trained in isolation. [sent-146, score-0.937]
55 Interpretation via Data-driven Primitives Our discovery algorithm extracts geometrically consistent and visually discriminative primitives from RGBD data. [sent-157, score-1.042]
56 These primitives can then be detected with high precision and accuracy in new RGB images to develop a 3D interpretation of the image. [sent-158, score-0.828]
57 However, not all surface normals are equally easy to infer. [sent-159, score-0.382]
58 While the primitives are detected in discriminative regions such as the cupboard and painting, other regions such as a patch on a textureless wall are hopelessly difficult to classify in isolation. [sent-162, score-0.884]
59 A dense interpretation would require propagating information from the confident regions to the uncertain regions and a variety of methods have been proposed to do this (e. [sent-165, score-0.121]
60 We warp the surface normals so the primitive detections and s training instances per detection are aligned, producing a collection of sT aligned surface normal images (M1,1 . [sent-173, score-1.065]
61 We infer the pixels of a test image as a linear combination of the surface normals of these aligned training images with weights determined by detections: Mˆ(p) =Z1i,? [sent-180, score-0.448]
62 The first term gives high weight to confident detections and to detections that fire consistently at the same absolute location in training and test image (e. [sent-183, score-0.224]
63 , floor primitives should not be at the top of an image). [sent-185, score-0.772]
64 The second term is the spatial term and gives high weight for transfer to pixels near the detection and the weight decreases as a function of the distance from the location of the primitive detection. [sent-186, score-0.315]
65 We ignore values for which we cannot obtain an accurate estimate of the surface normals due to missing depth data. [sent-193, score-0.445]
66 We compute the “ground truth” surface normals with respect to the camera axes from the depth data 33338958 Input Top Primitives and Context Detections With Context Figure4. [sent-194, score-0.445]
67 This sparse understanding can then be used to produce an accurate dense 3D interpretation even using a simple transfer approach. [sent-196, score-0.174]
68 2 shows some examples of the top primitives of one fold. [sent-200, score-0.772]
69 Baselines: We qualitatively and quantitatively compare against state-of-the-art methods for depth and surface normal prediction. [sent-201, score-0.338]
70 Specifically, we compare against eight baselines in sparse and dense prediction. [sent-202, score-0.11]
71 The first five are the state-of-the-art; the sixth tests the contribution of geometric supervision; the last two test against direct regression of surface normals. [sent-203, score-0.214]
72 [14]: The geometric context approach predicts quantized surface normals in five directions using multiple-segmentation based classifiers. [sent-208, score-0.439]
73 [13]: This baseline builds on geometric context classifiers (including one for clutter), which we retrain on NYU and uses structured prediction to predict a vanishing-point-aligned room cuboid. [sent-211, score-0.162]
74 [17]: One can also produce surface normals by predicting depth and computing normals on the results; we do this with the depth prediction method of Karsch et al. [sent-213, score-0.777]
75 [23]: We also compare with surface normals computed from depth predicted by Make 3D using the pre-trained model. [sent-216, score-0.473]
76 [29]: We compare against this appearance based primitive discovery approach. [sent-218, score-0.325]
77 We replace our geometric primitives with mid-level patches discovered by [29], and use the same inference pipeline. [sent-219, score-0.92]
78 (7) RF + SIFT: We train a random forest (RF) regressor to predict surface normals using a histogram of dense-SIFT [20] features (codebook size 1K) over SLIC [1] superpixels (S = 20, M = 100) as well as location features. [sent-220, score-0.444]
79 -Support Vector Regressor (SVR) using a Hellinger kernel to predict surface normal orientations using the same input features as above. [sent-222, score-0.231]
80 Evaluation Criteria: Characterizing surface normal predictor performance is difficult because different metrics encode different objectives, not all of which are desirable. [sent-223, score-0.225]
81 To characterize the noise in our data, we annotated a randomly chosen subset of planar sur- faces (ideally with the same surface normal) in 100 images and evaluated the angular error between pairs of pixels; the median error was 5. [sent-231, score-0.193]
82 4 shows qualitative examples of the top few primitive detections in two images. [sent-234, score-0.382]
83 The detections are accurate and convey 3D information about the scene despite their sparsity. [sent-235, score-0.183]
84 We also quantitatively evaluate our primitives by producing a precision-vs-coverage curve that trades off between precision (the fraction of pixels correctly predicted) and coverage (the fraction of pixels predicted). [sent-238, score-1.043]
85 We compare with Geometric Context [14] (sweeping over classifier confidence), and the appearance-only primitives of Singh et al. [sent-242, score-0.794]
86 Finally, we report results using only a single round of the iterative procedure to test whether the primitives improve with iterations. [sent-245, score-0.772]
87 Our approach works considerably better than all baselines and the initialization at every coverage level. [sent-246, score-0.147]
88 cabulary does not contain crucial 3D primitives which are difficult to cluster using appearance alone (e. [sent-278, score-0.83]
89 Note that our primitive method does not reach 100% coverage in Fig. [sent-283, score-0.364]
90 The remaining unpredicted pixels correspond to textureless surfaces and cannot be predicted accurately from local evidence, thus requiring our context-transfer technique. [sent-285, score-0.113]
91 As seen in Table 2, the most confident primitives perform much better, showing that our technique can identify which predictions are accurate; in addition to enabling high performance for applications using sparse normals, this confidence is a crucial cue for any subsequent reasoning. [sent-286, score-0.888]
92 4 shows how a few detections can be used to align training normals and test images. [sent-288, score-0.323]
93 The primitives create an accurate dense interpretation from sparse detections. [sent-289, score-0.893]
94 We qualitatively compare the results of our technique with several baselines in Fig. [sent-290, score-0.109]
95 Predictions are qualitatively and quantitatively different if one assumes there are three orthogonal surface normal directions (the Manhattan-world assumption). [sent-297, score-0.275]
96 5◦ 30◦ 25% coverage 50% coverage 75% coverage Full Coverage 27. [sent-379, score-0.306]
97 Cross-Dataset Prediction: We also want to demonstrate that our primitives generalize well and do not overfit to the NYU data. [sent-405, score-0.821]
98 Therefore, using identical parameters, we use models learned on one split of the NYU dataset to predict dense surface normals on the Berkeley 3D Object Dataset (B3DO) [16] and the subset of SUNS dataset [32] used in [22]. [sent-406, score-0.448]
99 Blocks world revisited: Image understanding using qualitative geometry and mechanics. [sent-546, score-0.153]
100 Estimating spatial layout of rooms using volumetric reasoning about objects and surfaces. [sent-593, score-0.138]
wordName wordTfidf (topN-words)
[('primitives', 0.772), ('primitive', 0.262), ('normals', 0.225), ('surface', 0.157), ('geometrically', 0.123), ('coverage', 0.102), ('detections', 0.098), ('nyu', 0.086), ('karsch', 0.081), ('instances', 0.074), ('world', 0.071), ('hedau', 0.065), ('layout', 0.064), ('manhattan', 0.064), ('depth', 0.063), ('rmse', 0.059), ('singh', 0.058), ('convey', 0.057), ('geometric', 0.057), ('interpretation', 0.056), ('detector', 0.055), ('saxena', 0.055), ('svr', 0.055), ('patches', 0.053), ('discriminative', 0.052), ('hoiem', 0.05), ('membership', 0.05), ('rf', 0.049), ('volumetric', 0.047), ('canonical', 0.046), ('baselines', 0.045), ('normal', 0.045), ('geometricallyconsistent', 0.044), ('suns', 0.044), ('detectors', 0.044), ('quantitatively', 0.04), ('discovery', 0.039), ('acronym', 0.039), ('geometry', 0.038), ('discovered', 0.038), ('cuboids', 0.037), ('xia', 0.037), ('dense', 0.037), ('geons', 0.036), ('median', 0.036), ('recognized', 0.036), ('surfaces', 0.035), ('sift', 0.034), ('cluster', 0.034), ('informative', 0.034), ('discriminativity', 0.034), ('train', 0.033), ('hebert', 0.033), ('qualitatively', 0.033), ('gupta', 0.032), ('patch', 0.032), ('visually', 0.032), ('rgbd', 0.032), ('lee', 0.031), ('cylinders', 0.031), ('trades', 0.031), ('transfer', 0.031), ('technique', 0.031), ('scheirer', 0.03), ('dissimilar', 0.029), ('predictions', 0.029), ('predict', 0.029), ('calibration', 0.029), ('room', 0.028), ('confident', 0.028), ('textureless', 0.028), ('predicted', 0.028), ('sparse', 0.028), ('scene', 0.028), ('fraction', 0.027), ('rooms', 0.027), ('overfit', 0.027), ('rgb', 0.026), ('satkin', 0.026), ('retrain', 0.026), ('underlying', 0.026), ('slic', 0.025), ('collection', 0.025), ('appearance', 0.024), ('inferring', 0.024), ('discovering', 0.024), ('consistent', 0.024), ('metrics', 0.023), ('akin', 0.023), ('indoor', 0.023), ('pixels', 0.022), ('supplementary', 0.022), ('prediction', 0.022), ('generalize', 0.022), ('discover', 0.022), ('infer', 0.022), ('et', 0.022), ('understanding', 0.022), ('aligned', 0.022), ('qualitative', 0.022)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000005 102 iccv-2013-Data-Driven 3D Primitives for Single Image Understanding
Author: David F. Fouhey, Abhinav Gupta, Martial Hebert
Abstract: What primitives should we use to infer the rich 3D world behind an image? We argue that these primitives should be both visually discriminative and geometrically informative and we present a technique for discovering such primitives. We demonstrate the utility of our primitives by using them to infer 3D surface normals given a single image. Our technique substantially outperforms the state-of-the-art and shows improved cross-dataset performance.
2 0.27000734 57 iccv-2013-BOLD Features to Detect Texture-less Objects
Author: Federico Tombari, Alessandro Franchi, Luigi Di_Stefano
Abstract: Object detection in images withstanding significant clutter and occlusion is still a challenging task whenever the object surface is characterized by poor informative content. We propose to tackle this problem by a compact and distinctive representation of groups of neighboring line segments aggregated over limited spatial supports and invariant to rotation, translation and scale changes. Peculiarly, our proposal allows for leveraging on the inherent strengths of descriptor-based approaches, i.e. robustness to occlusion and clutter and scalability with respect to the size of the model library, also when dealing with scarcely textured objects.
3 0.15481763 1 iccv-2013-3DNN: Viewpoint Invariant 3D Geometry Matching for Scene Understanding
Author: Scott Satkin, Martial Hebert
Abstract: We present a new algorithm 3DNN (3D NearestNeighbor), which is capable of matching an image with 3D data, independently of the viewpoint from which the image was captured. By leveraging rich annotations associated with each image, our algorithm can automatically produce precise and detailed 3D models of a scene from a single image. Moreover, we can transfer information across images to accurately label and segment objects in a scene. The true benefit of 3DNN compared to a traditional 2D nearest-neighbor approach is that by generalizing across viewpoints, we free ourselves from the need to have training examples captured from all possible viewpoints. Thus, we are able to achieve comparable results using orders of magnitude less data, and recognize objects from never-beforeseen viewpoints. In this work, we describe the 3DNN algorithm and rigorously evaluate its performance for the tasks of geometry estimation and object detection/segmentation. By decoupling the viewpoint and the geometry of an image, we develop a scene matching approach which is truly 100% viewpoint invariant, yielding state-of-the-art performance on challenging data.
4 0.12845081 396 iccv-2013-Space-Time Robust Representation for Action Recognition
Author: Nicolas Ballas, Yi Yang, Zhen-Zhong Lan, Bertrand Delezoide, Françoise Prêteux, Alexander Hauptmann
Abstract: We address the problem of action recognition in unconstrained videos. We propose a novel content driven pooling that leverages space-time context while being robust toward global space-time transformations. Being robust to such transformations is of primary importance in unconstrained videos where the action localizations can drastically shift between frames. Our pooling identifies regions of interest using video structural cues estimated by different saliency functions. To combine the different structural information, we introduce an iterative structure learning algorithm, WSVM (weighted SVM), that determines the optimal saliency layout ofan action model through a sparse regularizer. A new optimization method isproposed to solve the WSVM’ highly non-smooth objective function. We evaluate our approach on standard action datasets (KTH, UCF50 and HMDB). Most noticeably, the accuracy of our algorithm reaches 51.8% on the challenging HMDB dataset which outperforms the state-of-the-art of 7.3% relatively.
5 0.12737975 281 iccv-2013-Multi-view Normal Field Integration for 3D Reconstruction of Mirroring Objects
Author: Michael Weinmann, Aljosa Osep, Roland Ruiters, Reinhard Klein
Abstract: In this paper, we present a novel, robust multi-view normal field integration technique for reconstructing the full 3D shape of mirroring objects. We employ a turntablebased setup with several cameras and displays. These are used to display illumination patterns which are reflected by the object surface. The pattern information observed in the cameras enables the calculation of individual volumetric normal fields for each combination of camera, display and turntable angle. As the pattern information might be blurred depending on the surface curvature or due to nonperfect mirroring surface characteristics, we locally adapt the decoding to the finest still resolvable pattern resolution. In complex real-world scenarios, the normal fields contain regions without observations due to occlusions and outliers due to interreflections and noise. Therefore, a robust reconstruction using only normal information is challenging. Via a non-parametric clustering of normal hypotheses derived for each point in the scene, we obtain both the most likely local surface normal and a local surface consistency estimate. This information is utilized in an iterative mincut based variational approach to reconstruct the surface geometry.
6 0.12462585 199 iccv-2013-High Quality Shape from a Single RGB-D Image under Uncalibrated Natural Illumination
7 0.1244153 410 iccv-2013-Support Surface Prediction in Indoor Scenes
8 0.12392273 144 iccv-2013-Estimating the 3D Layout of Indoor Scenes and Its Clutter from Depth Sensors
9 0.11723447 66 iccv-2013-Building Part-Based Object Detectors via 3D Geometry
10 0.10518275 343 iccv-2013-Real-World Normal Map Capture for Nearly Flat Reflective Surfaces
11 0.10243662 64 iccv-2013-Box in the Box: Joint 3D Layout and Object Reasoning from Single Images
12 0.10024863 406 iccv-2013-Style-Aware Mid-level Representation for Discovering Visual Connections in Space and Time
13 0.096219979 319 iccv-2013-Point-Based 3D Reconstruction of Thin Objects
14 0.095891722 56 iccv-2013-Automatic Registration of RGB-D Scans via Salient Directions
15 0.09516532 201 iccv-2013-Holistic Scene Understanding for 3D Object Detection with RGBD Cameras
16 0.092285596 111 iccv-2013-Detecting Dynamic Objects with Multi-view Background Subtraction
17 0.09009099 204 iccv-2013-Human Attribute Recognition by Rich Appearance Dictionary
18 0.088320427 79 iccv-2013-Coherent Object Detection with 3D Geometric Context from a Single Image
19 0.086489424 323 iccv-2013-Pose Estimation with Unknown Focal Length Using Points, Directions and Lines
20 0.08269272 387 iccv-2013-Shape Anchors for Data-Driven Multi-view Reconstruction
topicId topicWeight
[(0, 0.176), (1, -0.092), (2, -0.013), (3, -0.013), (4, 0.051), (5, -0.026), (6, -0.017), (7, -0.113), (8, -0.057), (9, -0.064), (10, 0.037), (11, 0.064), (12, -0.082), (13, 0.01), (14, 0.024), (15, -0.117), (16, -0.014), (17, 0.041), (18, -0.006), (19, -0.046), (20, -0.108), (21, 0.041), (22, 0.097), (23, -0.053), (24, 0.016), (25, 0.012), (26, -0.038), (27, 0.025), (28, -0.037), (29, -0.016), (30, -0.031), (31, 0.018), (32, -0.023), (33, -0.006), (34, -0.029), (35, -0.032), (36, 0.05), (37, -0.092), (38, 0.01), (39, -0.137), (40, -0.003), (41, 0.049), (42, -0.063), (43, -0.026), (44, -0.039), (45, 0.084), (46, -0.053), (47, 0.04), (48, -0.015), (49, 0.027)]
simIndex simValue paperId paperTitle
same-paper 1 0.93807393 102 iccv-2013-Data-Driven 3D Primitives for Single Image Understanding
Author: David F. Fouhey, Abhinav Gupta, Martial Hebert
Abstract: What primitives should we use to infer the rich 3D world behind an image? We argue that these primitives should be both visually discriminative and geometrically informative and we present a technique for discovering such primitives. We demonstrate the utility of our primitives by using them to infer 3D surface normals given a single image. Our technique substantially outperforms the state-of-the-art and shows improved cross-dataset performance.
2 0.75972641 410 iccv-2013-Support Surface Prediction in Indoor Scenes
Author: Ruiqi Guo, Derek Hoiem
Abstract: In this paper, we present an approach to predict the extent and height of supporting surfaces such as tables, chairs, and cabinet tops from a single RGBD image. We define support surfaces to be horizontal, planar surfaces that can physically support objects and humans. Given a RGBD image, our goal is to localize the height and full extent of such surfaces in 3D space. To achieve this, we created a labeling tool and annotated 1449 images with rich, complete 3D scene models in NYU dataset. We extract ground truth from the annotated dataset and developed a pipeline for predicting floor space, walls, the height and full extent of support surfaces. Finally we match the predicted extent with annotated scenes in training scenes and transfer the the support surface configuration from training scenes. We evaluate the proposed approach in our dataset and demonstrate its effectiveness in understanding scenes in 3D space.
3 0.71604091 1 iccv-2013-3DNN: Viewpoint Invariant 3D Geometry Matching for Scene Understanding
Author: Scott Satkin, Martial Hebert
Abstract: We present a new algorithm 3DNN (3D NearestNeighbor), which is capable of matching an image with 3D data, independently of the viewpoint from which the image was captured. By leveraging rich annotations associated with each image, our algorithm can automatically produce precise and detailed 3D models of a scene from a single image. Moreover, we can transfer information across images to accurately label and segment objects in a scene. The true benefit of 3DNN compared to a traditional 2D nearest-neighbor approach is that by generalizing across viewpoints, we free ourselves from the need to have training examples captured from all possible viewpoints. Thus, we are able to achieve comparable results using orders of magnitude less data, and recognize objects from never-beforeseen viewpoints. In this work, we describe the 3DNN algorithm and rigorously evaluate its performance for the tasks of geometry estimation and object detection/segmentation. By decoupling the viewpoint and the geometry of an image, we develop a scene matching approach which is truly 100% viewpoint invariant, yielding state-of-the-art performance on challenging data.
4 0.67694998 57 iccv-2013-BOLD Features to Detect Texture-less Objects
Author: Federico Tombari, Alessandro Franchi, Luigi Di_Stefano
Abstract: Object detection in images withstanding significant clutter and occlusion is still a challenging task whenever the object surface is characterized by poor informative content. We propose to tackle this problem by a compact and distinctive representation of groups of neighboring line segments aggregated over limited spatial supports and invariant to rotation, translation and scale changes. Peculiarly, our proposal allows for leveraging on the inherent strengths of descriptor-based approaches, i.e. robustness to occlusion and clutter and scalability with respect to the size of the model library, also when dealing with scarcely textured objects.
5 0.6589368 79 iccv-2013-Coherent Object Detection with 3D Geometric Context from a Single Image
Author: Jiyan Pan, Takeo Kanade
Abstract: Objects in a real world image cannot have arbitrary appearance, sizes and locations due to geometric constraints in 3D space. Such a 3D geometric context plays an important role in resolving visual ambiguities and achieving coherent object detection. In this paper, we develop a RANSAC-CRF framework to detect objects that are geometrically coherent in the 3D world. Different from existing methods, we propose a novel generalized RANSAC algorithm to generate global 3D geometry hypothesesfrom local entities such that outlier suppression and noise reduction is achieved simultaneously. In addition, we evaluate those hypotheses using a CRF which considers both the compatibility of individual objects under global 3D geometric context and the compatibility between adjacent objects under local 3D geometric context. Experiment results show that our approach compares favorably with the state of the art.
6 0.62768549 66 iccv-2013-Building Part-Based Object Detectors via 3D Geometry
7 0.61855417 319 iccv-2013-Point-Based 3D Reconstruction of Thin Objects
8 0.61098021 281 iccv-2013-Multi-view Normal Field Integration for 3D Reconstruction of Mirroring Objects
9 0.60500485 56 iccv-2013-Automatic Registration of RGB-D Scans via Salient Directions
10 0.60119045 250 iccv-2013-Lifting 3D Manhattan Lines from a Single Image
11 0.58975774 343 iccv-2013-Real-World Normal Map Capture for Nearly Flat Reflective Surfaces
12 0.58576018 2 iccv-2013-3D Scene Understanding by Voxel-CRF
13 0.58007932 132 iccv-2013-Efficient 3D Scene Labeling Using Fields of Trees
14 0.57770503 201 iccv-2013-Holistic Scene Understanding for 3D Object Detection with RGBD Cameras
15 0.56637502 9 iccv-2013-A Flexible Scene Representation for 3D Reconstruction Using an RGB-D Camera
16 0.56052232 64 iccv-2013-Box in the Box: Joint 3D Layout and Object Reasoning from Single Images
17 0.55572081 284 iccv-2013-Multiview Photometric Stereo Using Planar Mesh Parameterization
18 0.54846984 346 iccv-2013-Rectangling Stereographic Projection for Wide-Angle Image Visualization
19 0.53902924 84 iccv-2013-Complex 3D General Object Reconstruction from Line Drawings
20 0.53648126 387 iccv-2013-Shape Anchors for Data-Driven Multi-view Reconstruction
topicId topicWeight
[(2, 0.05), (7, 0.017), (12, 0.019), (26, 0.381), (31, 0.046), (42, 0.069), (55, 0.016), (64, 0.043), (73, 0.023), (89, 0.206), (98, 0.023)]
simIndex simValue paperId paperTitle
1 0.98806214 405 iccv-2013-Structured Light in Sunlight
Author: Mohit Gupta, Qi Yin, Shree K. Nayar
Abstract: Strong ambient illumination severely degrades the performance of structured light based techniques. This is especially true in outdoor scenarios, where the structured light sources have to compete with sunlight, whose power is often 2-5 orders of magnitude larger than the projected light. In this paper, we propose the concept of light-concentration to overcome strong ambient illumination. Our key observation is that given a fixed light (power) budget, it is always better to allocate it sequentially in several portions of the scene, as compared to spreading it over the entire scene at once. For a desired level of accuracy, we show that by distributing light appropriately, the proposed approach requires 1-2 orders lower acquisition time than existing approaches. Our approach is illumination-adaptive as the optimal light distribution is determined based on a measurement of the ambient illumination level. Since current light sources have a fixed light distribution, we have built a prototype light source that supports flexible light distribution by controlling the scanning speed of a laser scanner. We show several high quality 3D scanning results in a wide range of outdoor scenarios. The proposed approach will benefit 3D vision systems that need to operate outdoors under extreme ambient illumination levels on a limited time and power budget.
2 0.9737252 395 iccv-2013-Slice Sampling Particle Belief Propagation
Author: Oliver Müller, Michael Ying Yang, Bodo Rosenhahn
Abstract: Inference in continuous label Markov random fields is a challenging task. We use particle belief propagation (PBP) for solving the inference problem in continuous label space. Sampling particles from the belief distribution is typically done by using Metropolis-Hastings (MH) Markov chain Monte Carlo (MCMC) methods which involves sampling from a proposal distribution. This proposal distribution has to be carefully designed depending on the particular model and input data to achieve fast convergence. We propose to avoid dependence on a proposal distribution by introducing a slice sampling based PBP algorithm. The proposed approach shows superior convergence performance on an image denoising toy example. Our findings are validated on a challenging relational 2D feature tracking application.
3 0.96619749 51 iccv-2013-Anchored Neighborhood Regression for Fast Example-Based Super-Resolution
Author: Radu Timofte, Vincent De_Smet, Luc Van_Gool
Abstract: Recently there have been significant advances in image upscaling or image super-resolution based on a dictionary of low and high resolution exemplars. The running time of the methods is often ignored despite the fact that it is a critical factor for real applications. This paper proposes fast super-resolution methods while making no compromise on quality. First, we support the use of sparse learned dictionaries in combination with neighbor embedding methods. In this case, the nearest neighbors are computed using the correlation with the dictionary atoms rather than the Euclidean distance. Moreover, we show that most of the current approaches reach top performance for the right parameters. Second, we show that using global collaborative coding has considerable speed advantages, reducing the super-resolution mapping to a precomputed projective matrix. Third, we propose the anchored neighborhood regression. That is to anchor the neighborhood embedding of a low resolution patch to the nearest atom in the dictionary and to precompute the corresponding embedding matrix. These proposals are contrasted with current state-of- the-art methods on standard images. We obtain similar or improved quality and one or two orders of magnitude speed improvements.
4 0.96329576 125 iccv-2013-Drosophila Embryo Stage Annotation Using Label Propagation
Author: Tomáš Kazmar, Evgeny Z. Kvon, Alexander Stark, Christoph H. Lampert
Abstract: In this work we propose a system for automatic classification of Drosophila embryos into developmental stages. While the system is designed to solve an actual problem in biological research, we believe that the principle underlying it is interesting not only for biologists, but also for researchers in computer vision. The main idea is to combine two orthogonal sources of information: one is a classifier trained on strongly invariant features, which makes it applicable to images of very different conditions, but also leads to rather noisy predictions. The other is a label propagation step based on a more powerful similarity measure that however is only consistent within specific subsets of the data at a time. In our biological setup, the information sources are the shape and the staining patterns of embryo images. We show experimentally that while neither of the methods can be used by itself to achieve satisfactory results, their combination achieves prediction quality comparable to human per- formance.
5 0.95625943 348 iccv-2013-Refractive Structure-from-Motion on Underwater Images
Author: Anne Jordt-Sedlazeck, Reinhard Koch
Abstract: In underwater environments, cameras need to be confined in an underwater housing, viewing the scene through a piece of glass. In case of flat port underwater housings, light rays entering the camera housing are refracted twice, due to different medium densities of water, glass, and air. This causes the usually linear rays of light to bend and the commonly used pinhole camera model to be invalid. When using the pinhole camera model without explicitly modeling refraction in Structure-from-Motion (SfM) methods, a systematic model error occurs. Therefore, in this paper, we propose a system for computing camera path and 3D points with explicit incorporation of refraction using new methods for pose estimation. Additionally, a new error function is introduced for non-linear optimization, especially bundle adjustment. The proposed method allows to increase reconstruction accuracy and is evaluated in a set of experiments, where the proposed method’s performance is compared to SfM with the perspective camera model.
6 0.95517045 282 iccv-2013-Multi-view Object Segmentation in Space and Time
7 0.95340371 198 iccv-2013-Hierarchical Part Matching for Fine-Grained Visual Categorization
8 0.90907437 295 iccv-2013-On One-Shot Similarity Kernels: Explicit Feature Maps and Properties
same-paper 9 0.90336633 102 iccv-2013-Data-Driven 3D Primitives for Single Image Understanding
10 0.89839083 8 iccv-2013-A Deformable Mixture Parsing Model with Parselets
11 0.82726884 414 iccv-2013-Temporally Consistent Superpixels
12 0.82391852 156 iccv-2013-Fast Direct Super-Resolution by Simple Functions
13 0.7980119 326 iccv-2013-Predicting Sufficient Annotation Strength for Interactive Foreground Segmentation
14 0.79112339 150 iccv-2013-Exemplar Cut
15 0.7907449 423 iccv-2013-Towards Motion Aware Light Field Video for Dynamic Scenes
17 0.78904939 411 iccv-2013-Symbiotic Segmentation and Part Localization for Fine-Grained Categorization
18 0.78232735 209 iccv-2013-Image Guided Depth Upsampling Using Anisotropic Total Generalized Variation
19 0.77576762 196 iccv-2013-Hierarchical Data-Driven Descent for Efficient Optimal Deformation Estimation
20 0.77335775 225 iccv-2013-Joint Segmentation and Pose Tracking of Human in Natural Videos