iccv iccv2013 iccv2013-1 knowledge-graph by maker-knowledge-mining

1 iccv-2013-3DNN: Viewpoint Invariant 3D Geometry Matching for Scene Understanding

Source: pdf

Author: Scott Satkin, Martial Hebert

Abstract: We present a new algorithm 3DNN (3D NearestNeighbor), which is capable of matching an image with 3D data, independently of the viewpoint from which the image was captured. By leveraging rich annotations associated with each image, our algorithm can automatically produce precise and detailed 3D models of a scene from a single image. Moreover, we can transfer information across images to accurately label and segment objects in a scene. The true benefit of 3DNN compared to a traditional 2D nearest-neighbor approach is that by generalizing across viewpoints, we free ourselves from the need to have training examples captured from all possible viewpoints. Thus, we are able to achieve comparable results using orders of magnitude less data, and recognize objects from never-beforeseen viewpoints. In this work, we describe the 3DNN algorithm and rigorously evaluate its performance for the tasks of geometry estimation and object detection/segmentation. By decoupling the viewpoint and the geometry of an image, we develop a scene matching approach which is truly 100% viewpoint invariant, yielding state-of-the-art performance on challenging data.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 com Abstract We present a new algorithm 3DNN (3D NearestNeighbor), which is capable of matching an image with 3D data, independently of the viewpoint from which the image was captured. [sent-3, score-0.514]

2 In this work, we describe the 3DNN algorithm and rigorously evaluate its performance for the tasks of geometry estimation and object detection/segmentation. [sent-8, score-0.376]

3 By decoupling the viewpoint and the geometry of an image, we develop a scene matching approach which is truly 100% viewpoint invariant, yielding state-of-the-art performance on challenging data. [sent-9, score-1.438]

4 Introduction Data-driven scene matching is at the forefront of the computer vision field. [sent-11, score-0.29]

5 Traditional appearance based image matching approaches fail to generalize across such extreme viewpoint differences; however, our approach is able to match the geometry of these two scenes, and transfer object labels. [sent-19, score-1.079]

6 Recently, we demonstrated a proof-of-concept method for matching images with 3D models to estimate the geometry of a scene [28]. [sent-27, score-0.604]

7 Building upon this work, we present a viewpoint invariant approach to match images based solely on each scene’s geometry. [sent-28, score-0.429]

8 A traditional appearance-based image matching approach such as [20, 24] would fail to generalize across such extreme viewpoint differences. [sent-31, score-0.632]

9 In this work, we show that we are able to automatically match these images by comparing the appearance of one image with the geometry of another. [sent-33, score-0.379]

10 By decoupling the viewpoint and the geometry of an image, we develop a scene matching approach which is truly 100% viewpoint invariant. [sent-34, score-1.438]

11 The common goal of this research is to estimate the full geometry of a scene from a single viewpoint. [sent-39, score-0.497]

12 The ability to infer the geometry of a scene has enabled a variety of applications in both the vision and graphics fields. [sent-40, score-0.497]

13 [8] use coarse geometry estimates to predict what locations in an environment afford various actions. [sent-42, score-0.453]

14 [16] use scene geometry to realistically render additional objects into a scene. [sent-44, score-0.637]

15 [39] utilize knowledge of scene geometry to create an interactive 3D image editing tool. [sent-46, score-0.497]

16 It is important to note, that these graphics applications require precise geometry estimates, which traditionally have involved manual annotation. [sent-47, score-0.369]

17 Current approaches for monocular geometry estimation typically produce coarse results, modeling each object with bounding cuboids (e. [sent-48, score-0.512]

18 Additionally, many scene understanding approaches such as [7, 8, 28] make limiting assumptions regarding the robustness of existing monocular autocalibration algorithms. [sent-53, score-0.322]

19 In addition, we present a refinement algorithm which takes a rough initial estimate of the structure of a scene, and adjusts the locations of objects in 3D, such that their projections align with predicted object locations in the image plane. [sent-56, score-0.865]

20 In this paper, we describe the 3DNN algorithm and evaluate its performance for the tasks of object detection/segmentation, as well as monocular geometry reconstruction. [sent-57, score-0.429]

21 We show that 3DNN is capable of not only producing state-of-the-art geometry estimation results, but is also capable of precisely localizing and segmenting objects in an image. [sent-58, score-0.593]

22 Our experiments compare 3DNN with traditional 2D nearest-neighbor approaches to demonstrate the benefits of viewpoint invariant scene matching. [sent-59, score-0.598]

23 Approach We estimate the viewpoint from which an image was captured, and search for a 3D model that best matches the input image when rendered from this viewpoint. [sent-61, score-0.461]

24 This type of fine-grained geometry refinement is challenging, and requires a set of features which are sufficiently discriminative to identify when rendered objects are precisely aligned in the image plane. [sent-64, score-0.808]

25 Thus, we present a new set of features which improve the overall accuracy of our scene matching algorithm, enabling this geometry refinement stage. [sent-65, score-0.916]

26 In addition, we introduce a viewpoint selection process which does not commit to a single viewpoint estimate. [sent-66, score-0.786]

27 We consider many camera pose hypotheses and use a learned cost function to select the camera parameters which enable the best scene geometry match. [sent-67, score-0.689]

28 Researchers are now working on automated methods for inferring the full 3D geometry of a scene given a 2. [sent-72, score-0.534]

29 Our work shows how these emerging new sources of data can be used by quantifying their effectiveness in terms of matching efficiency (dataset size), generalization to unseen viewpoints, geometry estimation, and object segmentation. [sent-75, score-0.483]

30 Given the novelty of 3D scene matching approaches, there still remains substantial room for improvement via feature engineering. [sent-81, score-0.535]

31 To accurately predict the locations of objects in an image, we train a probabilistic classifier using the algorithm of Munoz et al. [sent-83, score-0.304]

32 This p(object) descriptor is compared to hypothesized object locations via rendering to compute a similarity feature indicating how well hypothesized objects align with predicted object locations. [sent-86, score-0.621]

33 Figure 2 shows an example of a relatively simple scene for which [11] is unable to accurately estimate the locations of objects; however, our approach succeeds. [sent-88, score-0.445]

34 For each hypothesized 3D model, we first analyze its surface normals to identify edges (which we define as discontinuities greater than 20◦). [sent-91, score-0.311]

35 Viewpoint Selection The problem of viewpoint estimation is very challenging. [sent-104, score-0.364]

36 Estimating the layout of a room, especially in situations where objects such as furniture occlude the boundaries between the walls and the floor remains unsolved. [sent-105, score-0.446]

37 Recently, researchers such as [12, 18, 25] proposed mechanisms for adjusting the estimated locations of walls and floors to ensure that objects (represented by cuboids) are fully contained within the boundaries of the scene. [sent-106, score-0.402]

38 Inspired by these approaches, we aim to intelligently search over viewpoint hypotheses. [sent-107, score-0.416]

39 Intuitively, if we can fit an object configuration using a particular viewpoint hypothesis with high con- fidence, then that room layout is likely correct (i. [sent-108, score-0.926]

40 By searching over possible viewpoints, we aim to alleviate the brittleness of algorithms such as [7, 8, 28], which rely on hard decisions for the estimated viewpoint of an image. [sent-111, score-0.413]

41 These types of geometry estimation algorithms are unable to recover when the room layout estimation process fails. [sent-112, score-0.82]

42 Thus, in this work, we do not assume any individual viewpoint hypothesis is correct. [sent-113, score-0.411]

43 Rather, we use our learned cost function to re-rank a set of room layout hypotheses, by jointly selecting a combination of furniture and camera parameters, which together best match the image. [sent-114, score-0.642]

44 We search over the top N room layout hypotheses, returned by the algorithm of [11]. [sent-115, score-0.453]

45 For each individual room layout, we use the estimated camera parameters corresponding to that room layout to render every 3D model from [1]. [sent-116, score-0.796]

46 This approach scales linearly with the number of viewpoint hypotheses explored, and is trivially parallelizable. [sent-117, score-0.45]

47 In all our experiments, we consider the top 20 results from [11]’s room layout algorithm. [sent-118, score-0.453]

48 However, our approach is agnostic to the source of these viewpoint hypotheses, and additional hypotheses from [19, 27, 30] or any other algorithm could easily be incorporated to improve robustness. [sent-119, score-0.45]

49 The top row shows the result of 3DNN using only the top-ranking room layout from [11]. [sent-121, score-0.453]

50 However, by not limiting ourselves to a single camera param- eter hypothesis, we can automatically select a better room layout estimate, enabling a higher-scoring geometry match to be found. [sent-123, score-0.986]

51 Example results highlighting the benefit of searching over viewpoint hypotheses. [sent-128, score-0.413]

52 The top row shows the best matching scene geometry using the top-ranking room layout hypothesis of [11] (note the incorrect camera height estimate, causing ob- jects to be rendered at the wrong scale). [sent-129, score-1.247]

53 The bottom row show the best matching scene geometry after intelligently selecting the best room layout. [sent-130, score-0.901]

54 For each result, matching 3D model surface normals are shown on the right next to the input image with overlayed object masks. [sent-131, score-0.362]

55 Thus, we propose a geometry refinement algorithm which is inherently 3D. [sent-140, score-0.609]

56 Our method begins with a top-ranking 3D model for an image and searches for the best location of each object in 3D, such that the projection of these objects best align in the image plane, producing a more precise result. [sent-141, score-0.319]

57 We search for local refinements of the objects’ locations which improve the overall geometric scene matching score, using a stochastic algorithm. [sent-142, score-0.467]

58 justed objects’ locations match the image better than the previous locations, the new locations are saved. [sent-147, score-0.343]

59 Figure 4 highlights the effects of our geometry refinement process. [sent-150, score-0.571]

60 Note the initial object locations in 4(b), when projected into the image plane do not align with the actual object boundaries. [sent-151, score-0.397]

61 The projected objects produce an excellent segmentation mask, and because the scene interpretation is inherently 3D, we can properly reason about occlusions and depth ordering. [sent-153, score-0.445]

62 Additionally, we analyze the added benefit of each component of the 3DNN system: improved similarity features, geometry refinement and viewpoint selection. [sent-157, score-0.935]

63 Lastly, we explore how the viewpoint invariance of 3DNN enables scene matching and the transfer of object labels using limited amounts of data. [sent-158, score-0.773]

64 Note that we are able to produce accurate 3D models shown in the surface nor- 11887766 face normal renderings and overlaid object segmentation masks. [sent-162, score-0.341]

65 In addition, each object’s boundaries are well-delineated due to our geometry refinement stage, as indicated in the overlaid object segmentation masks. [sent-164, score-0.668]

66 Comparison of 3DNN with state-of-the-art 2D nearestneighbor approaches and the geometry matching algorithm of [28]. [sent-169, score-0.492]

67 formance using the two “Pixelwise Surface Normal Accuracy” metrics from [28], one measuring how accurately the surface normals of all pixels are predicted, the second eval- uating only those pixels which correspond to objects in the ground-truth annotations. [sent-170, score-0.411]

68 Although these metrics are informative for the task of surface normal prediction, they are unable to capture how accurately objects in an image are localized. [sent-171, score-0.461]

69 For example, a horizontal surface corresponding to a bed in an image may be scored as “correct” even if the predicted scene contains no objects. [sent-172, score-0.44]

70 These metrics require rectifying the predicted scene geometry, and are ill-posed when the estimated viewpoint deviates substantially from the groundtruth camera parameters. [sent-180, score-0.703]

71 A simple pixel-wise overlap score (intersection/union) of the object footprints can now be used to compare the ground-truth floorplan of a scene with our estimated scene geometry. [sent-183, score-0.72]

72 We compare 3DNN with our previous geometry matching approach [28] as well as two popular 2D nearestneighbor approaches: GIST [24] and HoG [5] matching. [sent-184, score-0.492]

73 2 Figure 6, reports the results for 3DNN compared to each baseline, for the task of geometry estimation. [sent-185, score-0.314]

74 Note that the geometry matching algorithm from [28] does not offer substantial improvements over the 2D nearest-neighbor approaches on the more challenging metrics (matched object surface normals and floorplan overlap score); however, 3DNN exhibits dramatic improvement on each ofthese metrics. [sent-186, score-0.999]

75 Object Detection and Segmentation Our mechanism for inferring the structure of a scene in 3D provides us with rich information about the depth ordering and the occlusions of objects when projected onto the image plane. [sent-189, score-0.35]

76 Naturally, 3DNN’s ability to precisely segment objects is due in part to the geometry refinement stage. [sent-198, score-0.764]

77 For fair comparison, we run the SIFT flow algorithm (the state-of-the-art 2D refinement process) as a baseline. [sent-201, score-0.324]

78 of objects in the image plane, akin to our geometry refinement process. [sent-208, score-0.701]

79 We apply the SIFT flow algorithm using code from [20]; this process takes the top-10 scene matches (using either GIST or HoG), warps each matched image, and computes the energy of each warping. [sent-209, score-0.449]

80 We then re-rank the top-10 scene matches according to their SIFT flow energy, and score the top-ranking warped recall image. [sent-210, score-0.433]

81 2, we described our approach to automatically identify the viewpoint from which an image was captured, and in Section 2. [sent-215, score-0.364]

82 Performance is measured using the matched object surface normal scores. [sent-221, score-0.363]

83 seen across all images in the CMU 3D-Annotated Scene Database as a result of the viewpoint selection and geometry refinement stages. [sent-223, score-0.993]

84 The y-axis indicates how much the matched object surface normal score was affected via refinement or viewpoint selection. [sent-224, score-1.041]

85 Note that for approximately two-thirds of the images, both the viewpoint selection and the refinement processes result in an improved scene geometry (indicated in green). [sent-225, score-1.176]

86 Not only does viewpoint selection result in more accurate object geometries, it also improves the accuracy of room box estimation by reranking viewpoint hypotheses based on which room layout affords for the best 3D model matching (14. [sent-226, score-1.739]

87 Dataset Size It is well known that for appearance-based image matching to be effective, there must be a large recall corpus of images to match with [9, 33]. [sent-231, score-0.299]

88 This is because the data set needs to include recall images captured from a similar viewpoint as the query image. [sent-232, score-0.475]

89 On the contrary for 3DNN, the viewpoint and the geometry of the recall images are decoupled. [sent-233, score-0.751]

90 Thus, each scene provides an exemplar which can be matched to images from any viewpoint. [sent-234, score-0.294]

91 Solid lines indicate “matched objects surface normal score,” dotted lines indicate “floorplan overlap score. [sent-238, score-0.352]

92 We report results using two of the more challenging metrics: “matched object surface normal scores” (solid lines) and “floorplan overlap scores” (dashed lines). [sent-241, score-0.319]

93 This is because our algorithm starts by estimating the room layout of each image, identifying the locations of floors and walls. [sent-247, score-0.644]

94 On the contrary, GIST and HoG matching do not incorporate this knowledge directly, and must infer the viewpoint of the scene by finding a similar image from the recall corpus. [sent-248, score-0.727]

95 This indicates that performance is increasing more quickly as a function of the dataset size, and that fewer training examples are required to achieve the same level of performance using 3DNN compared to a traditional appearance-based 2D nearest-neighbor scene matching approach. [sent-251, score-0.341]

96 This approach differs from traditional 2D nearest-neighbor methods by decoupling the pose of the camera capturing an image and the underlying scene geometry, enabling the transfer of information across extreme viewpoint differences. [sent-255, score-0.883]

97 In addition, we presented an algorithm for refining the locations of objects in 3D to produce precise results, and the features necessary to achieve this level of fine-grained alignment. [sent-257, score-0.334]

98 Thus, 3DNN achieves dramatic improvement over state-of-the-art approaches for the tasks of object detection, segmentation and geometry estimation. [sent-259, score-0.411]

99 In addition, we demonstrated the ability of 3DNN to generalize to never-before-seen viewpoints, enabling non-parametric scene matching to be effective using orders of magnitude less data than traditional approaches. [sent-260, score-0.453]

100 Estimating spatial layout of rooms using volumetric reasoning about objects and surfaces. [sent-376, score-0.345]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('viewpoint', 0.364), ('geometry', 0.314), ('refinement', 0.257), ('room', 0.245), ('layout', 0.208), ('scene', 0.183), ('floorplan', 0.168), ('locations', 0.139), ('surface', 0.121), ('gist', 0.113), ('matched', 0.111), ('matching', 0.107), ('precisely', 0.098), ('objects', 0.095), ('cmu', 0.091), ('bed', 0.086), ('hypotheses', 0.086), ('hedau', 0.08), ('karsch', 0.074), ('pero', 0.074), ('recall', 0.073), ('normals', 0.072), ('couch', 0.071), ('furniture', 0.071), ('nearestneighbor', 0.071), ('align', 0.071), ('hypothesized', 0.071), ('accurately', 0.07), ('normal', 0.069), ('flow', 0.067), ('decoupling', 0.067), ('overlap', 0.067), ('sift', 0.066), ('match', 0.065), ('viewpoints', 0.064), ('plane', 0.063), ('object', 0.062), ('satkin', 0.06), ('freespace', 0.06), ('selection', 0.058), ('indoor', 0.058), ('score', 0.057), ('transfer', 0.057), ('generalize', 0.057), ('precise', 0.055), ('enabling', 0.055), ('corpus', 0.054), ('monocular', 0.053), ('unable', 0.053), ('metrics', 0.053), ('extreme', 0.053), ('matches', 0.053), ('camera', 0.053), ('adjusts', 0.052), ('bowdish', 0.052), ('floors', 0.052), ('intelligently', 0.052), ('kermgard', 0.052), ('traditional', 0.051), ('hoiem', 0.051), ('predicted', 0.05), ('hebert', 0.05), ('munoz', 0.05), ('searching', 0.049), ('properly', 0.049), ('hypothesis', 0.047), ('edges', 0.047), ('height', 0.046), ('hays', 0.046), ('limiting', 0.046), ('render', 0.045), ('produce', 0.045), ('researchers', 0.044), ('geometries', 0.044), ('renderings', 0.044), ('rendered', 0.044), ('hog', 0.044), ('capable', 0.043), ('kin', 0.042), ('rooms', 0.042), ('massive', 0.04), ('understanding', 0.04), ('oliva', 0.04), ('gupta', 0.04), ('truly', 0.039), ('efros', 0.038), ('inherently', 0.038), ('cuboids', 0.038), ('pixelwise', 0.038), ('geometric', 0.038), ('captured', 0.038), ('walls', 0.037), ('inferring', 0.037), ('scenes', 0.036), ('begins', 0.036), ('akin', 0.035), ('warps', 0.035), ('dramatic', 0.035), ('boundaries', 0.035), ('occlusions', 0.035)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999982 1 iccv-2013-3DNN: Viewpoint Invariant 3D Geometry Matching for Scene Understanding

Author: Scott Satkin, Martial Hebert

2 0.2688742 64 iccv-2013-Box in the Box: Joint 3D Layout and Object Reasoning from Single Images

Author: Alexander G. Schwing, Sanja Fidler, Marc Pollefeys, Raquel Urtasun

Abstract: In this paper we propose an approach to jointly infer the room layout as well as the objects present in the scene. Towards this goal, we propose a branch and bound algorithm which is guaranteed to retrieve the global optimum of the joint problem. The main difficulty resides in taking into account occlusion in order to not over-count the evidence. We introduce a new decomposition method, which generalizes integral geometry to triangular shapes, and allows us to bound the different terms in constant time. We exploit both geometric cues and object detectors as image features and show large improvements in 2D and 3D object detection over state-of-the-art deformable part-based models.

3 0.2519536 144 iccv-2013-Estimating the 3D Layout of Indoor Scenes and Its Clutter from Depth Sensors

Author: Jian Zhang, Chen Kan, Alexander G. Schwing, Raquel Urtasun

Abstract: In this paper we propose an approach to jointly estimate the layout ofrooms as well as the clutterpresent in the scene using RGB-D data. Towards this goal, we propose an effective model that is able to exploit both depth and appearance features, which are complementary. Furthermore, our approach is efficient as we exploit the inherent decomposition of additive potentials. We demonstrate the effectiveness of our approach on the challenging NYU v2 dataset and show that employing depth reduces the layout error by 6% and the clutter estimation by 13%.

4 0.19612083 410 iccv-2013-Support Surface Prediction in Indoor Scenes

Author: Ruiqi Guo, Derek Hoiem

Abstract: In this paper, we present an approach to predict the extent and height of supporting surfaces such as tables, chairs, and cabinet tops from a single RGBD image. We define support surfaces to be horizontal, planar surfaces that can physically support objects and humans. Given a RGBD image, our goal is to localize the height and full extent of such surfaces in 3D space. To achieve this, we created a labeling tool and annotated 1449 images with rich, complete 3D scene models in NYU dataset. We extract ground truth from the annotated dataset and developed a pipeline for predicting floor space, walls, the height and full extent of support surfaces. Finally we match the predicted extent with annotated scenes in training scenes and transfer the the support surface configuration from training scenes. We evaluate the proposed approach in our dataset and demonstrate its effectiveness in understanding scenes in 3D space.

5 0.1888863 79 iccv-2013-Coherent Object Detection with 3D Geometric Context from a Single Image

Author: Jiyan Pan, Takeo Kanade

Abstract: Objects in a real world image cannot have arbitrary appearance, sizes and locations due to geometric constraints in 3D space. Such a 3D geometric context plays an important role in resolving visual ambiguities and achieving coherent object detection. In this paper, we develop a RANSAC-CRF framework to detect objects that are geometrically coherent in the 3D world. Different from existing methods, we propose a novel generalized RANSAC algorithm to generate global 3D geometry hypothesesfrom local entities such that outlier suppression and noise reduction is achieved simultaneously. In addition, we evaluate those hypotheses using a CRF which considers both the compatibility of individual objects under global 3D geometric context and the compatibility between adjacent objects under local 3D geometric context. Experiment results show that our approach compares favorably with the state of the art.

6 0.18830425 201 iccv-2013-Holistic Scene Understanding for 3D Object Detection with RGBD Cameras

7 0.16321303 111 iccv-2013-Detecting Dynamic Objects with Multi-view Background Subtraction

8 0.15481763 102 iccv-2013-Data-Driven 3D Primitives for Single Image Understanding

9 0.14384393 56 iccv-2013-Automatic Registration of RGB-D Scans via Salient Directions

10 0.14230481 72 iccv-2013-Characterizing Layouts of Outdoor Scenes Using Spatial Topic Processes

11 0.14145768 317 iccv-2013-Piecewise Rigid Scene Flow

12 0.13865562 281 iccv-2013-Multi-view Normal Field Integration for 3D Reconstruction of Mirroring Objects

13 0.13379987 367 iccv-2013-SUN3D: A Database of Big Spaces Reconstructed Using SfM and Object Labels

14 0.13255036 132 iccv-2013-Efficient 3D Scene Labeling Using Fields of Trees

15 0.1320883 66 iccv-2013-Building Part-Based Object Detectors via 3D Geometry

16 0.12759781 286 iccv-2013-NYC3DCars: A Dataset of 3D Vehicles in Geographic Context

17 0.12474429 284 iccv-2013-Multiview Photometric Stereo Using Planar Mesh Parameterization

18 0.11912008 9 iccv-2013-A Flexible Scene Representation for 3D Reconstruction Using an RGB-D Camera

19 0.11656833 340 iccv-2013-Real-Time Articulated Hand Pose Estimation Using Semi-supervised Transductive Regression Forests

20 0.11486564 308 iccv-2013-Parsing IKEA Objects: Fine Pose Estimation

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.268), (1, -0.15), (2, -0.015), (3, -0.003), (4, 0.103), (5, 0.001), (6, -0.034), (7, -0.118), (8, -0.063), (9, -0.1), (10, 0.07), (11, 0.095), (12, -0.059), (13, 0.028), (14, 0.009), (15, -0.122), (16, -0.055), (17, 0.045), (18, 0.027), (19, -0.089), (20, -0.163), (21, -0.044), (22, 0.188), (23, -0.068), (24, 0.155), (25, -0.11), (26, -0.004), (27, 0.09), (28, -0.028), (29, -0.032), (30, -0.011), (31, 0.04), (32, 0.032), (33, 0.043), (34, 0.028), (35, 0.032), (36, -0.03), (37, -0.064), (38, -0.016), (39, 0.008), (40, -0.041), (41, 0.052), (42, -0.025), (43, -0.017), (44, 0.081), (45, 0.026), (46, -0.055), (47, -0.01), (48, 0.041), (49, 0.061)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96918732 1 iccv-2013-3DNN: Viewpoint Invariant 3D Geometry Matching for Scene Understanding

Author: Scott Satkin, Martial Hebert

2 0.88349771 410 iccv-2013-Support Surface Prediction in Indoor Scenes

Author: Ruiqi Guo, Derek Hoiem

3 0.8313176 64 iccv-2013-Box in the Box: Joint 3D Layout and Object Reasoning from Single Images

Author: Alexander G. Schwing, Sanja Fidler, Marc Pollefeys, Raquel Urtasun

4 0.8277452 79 iccv-2013-Coherent Object Detection with 3D Geometric Context from a Single Image

Author: Jiyan Pan, Takeo Kanade

5 0.78930956 201 iccv-2013-Holistic Scene Understanding for 3D Object Detection with RGBD Cameras

Author: Dahua Lin, Sanja Fidler, Raquel Urtasun

Abstract: In this paper, we tackle the problem of indoor scene understanding using RGBD data. Towards this goal, we propose a holistic approach that exploits 2D segmentation, 3D geometry, as well as contextual relations between scenes and objects. Specifically, we extend the CPMC [3] framework to 3D in order to generate candidate cuboids, and develop a conditional random field to integrate information from different sources to classify the cuboids. With this formulation, scene classification and 3D object recognition are coupled and can be jointly solved through probabilistic inference. We test the effectiveness of our approach on the challenging NYU v2 dataset. The experimental results demonstrate that through effective evidence integration and holistic reasoning, our approach achieves substantial improvement over the state-of-the-art.

6 0.77112699 102 iccv-2013-Data-Driven 3D Primitives for Single Image Understanding

7 0.7693373 144 iccv-2013-Estimating the 3D Layout of Indoor Scenes and Its Clutter from Depth Sensors

8 0.73837179 2 iccv-2013-3D Scene Understanding by Voxel-CRF

9 0.71066731 132 iccv-2013-Efficient 3D Scene Labeling Using Fields of Trees

10 0.6386553 56 iccv-2013-Automatic Registration of RGB-D Scans via Salient Directions

11 0.62855047 375 iccv-2013-Scene Collaging: Analysis and Synthesis of Natural Images with Semantic Layers

12 0.61475796 433 iccv-2013-Understanding High-Level Semantics by Modeling Traffic Patterns

13 0.61188143 72 iccv-2013-Characterizing Layouts of Outdoor Scenes Using Spatial Topic Processes

14 0.60217845 9 iccv-2013-A Flexible Scene Representation for 3D Reconstruction Using an RGB-D Camera

15 0.59368455 111 iccv-2013-Detecting Dynamic Objects with Multi-view Background Subtraction

16 0.58081335 250 iccv-2013-Lifting 3D Manhattan Lines from a Single Image

17 0.56513751 66 iccv-2013-Building Part-Based Object Detectors via 3D Geometry

18 0.56023508 367 iccv-2013-SUN3D: A Database of Big Spaces Reconstructed Using SfM and Object Labels

19 0.55595481 46 iccv-2013-Allocentric Pose Estimation

20 0.55066168 57 iccv-2013-BOLD Features to Detect Texture-less Objects

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(2, 0.058), (12, 0.015), (26, 0.075), (27, 0.012), (31, 0.07), (42, 0.111), (55, 0.013), (64, 0.049), (73, 0.045), (89, 0.287), (98, 0.144)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.9843725 402 iccv-2013-Street View Motion-from-Structure-from-Motion

Author: Bryan Klingner, David Martin, James Roseborough

Abstract: We describe a structure-from-motion framework that handles “generalized” cameras, such as moving rollingshutter cameras, and works at an unprecedented scale— billions of images covering millions of linear kilometers of roads—by exploiting a good relative pose prior along vehicle paths. We exhibit a planet-scale, appearanceaugmented point cloud constructed with our framework and demonstrate its practical use in correcting the pose of a street-level image collection.

2 0.98310065 434 iccv-2013-Unifying Nuclear Norm and Bilinear Factorization Approaches for Low-Rank Matrix Decomposition

Author: Ricardo Cabral, Fernando De_La_Torre, João P. Costeira, Alexandre Bernardino

Abstract: Low rank models have been widely usedfor the representation of shape, appearance or motion in computer vision problems. Traditional approaches to fit low rank models make use of an explicit bilinear factorization. These approaches benefit from fast numerical methods for optimization and easy kernelization. However, they suffer from serious local minima problems depending on the loss function and the amount/type of missing data. Recently, these lowrank models have alternatively been formulated as convex problems using the nuclear norm regularizer; unlike factorization methods, their numerical solvers are slow and it is unclear how to kernelize them or to impose a rank a priori. This paper proposes a unified approach to bilinear factorization and nuclear norm regularization, that inherits the benefits of both. We analyze the conditions under which these approaches are equivalent. Moreover, based on this analysis, we propose a new optimization algorithm and a “rank continuation ” strategy that outperform state-of-theart approaches for Robust PCA, Structure from Motion and Photometric Stereo with outliers and missing data.

3 0.97909534 271 iccv-2013-Modeling the Calibration Pipeline of the Lytro Camera for High Quality Light-Field Image Reconstruction

Author: Donghyeon Cho, Minhaeng Lee, Sunyeong Kim, Yu-Wing Tai

Abstract: Light-field imaging systems have got much attention recently as the next generation camera model. A light-field imaging system consists of three parts: data acquisition, manipulation, and application. Given an acquisition system, it is important to understand how a light-field camera converts from its raw image to its resulting refocused image. In this paper, using the Lytro camera as an example, we describe step-by-step procedures to calibrate a raw light-field image. In particular, we are interested in knowing the spatial and angular coordinates of the micro lens array and the resampling process for image reconstruction. Since Lytro uses a hexagonal arrangement of a micro lens image, additional treatments in calibration are required. After calibration, we analyze and compare the performances of several resampling methods for image reconstruction with and without calibration. Finally, a learning based interpolation method is proposed which demonstrates a higher quality image reconstruction than previous interpolation methods including a method used in Lytro software.

4 0.97487217 19 iccv-2013-A Learning-Based Approach to Reduce JPEG Artifacts in Image Matting

Author: Inchang Choi, Sunyeong Kim, Michael S. Brown, Yu-Wing Tai

Abstract: Single image matting techniques assume high-quality input images. The vast majority of images on the web and in personal photo collections are encoded using JPEG compression. JPEG images exhibit quantization artifacts that adversely affect the performance of matting algorithms. To address this situation, we propose a learning-based post-processing method to improve the alpha mattes extracted from JPEG images. Our approach learns a set of sparse dictionaries from training examples that are used to transfer details from high-quality alpha mattes to alpha mattes corrupted by JPEG compression. Three different dictionaries are defined to accommodate different object structure (long hair, short hair, and sharp boundaries). A back-projection criteria combined within an MRF framework is used to automatically select the best dictionary to apply on the object’s local boundary. We demonstrate that our method can produces superior results over existing state-of-the-art matting algorithms on a variety of inputs and compression levels.

5 0.97218275 33 iccv-2013-A Unified Video Segmentation Benchmark: Annotation, Metrics and Analysis

Author: Fabio Galasso, Naveen Shankar Nagaraja, Tatiana Jiménez Cárdenas, Thomas Brox, Bernt Schiele

Abstract: Video segmentation research is currently limited by the lack of a benchmark dataset that covers the large variety of subproblems appearing in video segmentation and that is large enough to avoid overfitting. Consequently, there is little analysis of video segmentation which generalizes across subtasks, and it is not yet clear which and how video segmentation should leverage the information from the still-frames, as previously studied in image segmentation, alongside video specific information, such as temporal volume, motion and occlusion. In this work we provide such an analysis based on annotations of a large video dataset, where each video is manually segmented by multiple persons. Moreover, we introduce a new volume-based metric that includes the important aspect of temporal consistency, that can deal with segmentation hierarchies, and that reflects the tradeoff between over-segmentation and segmentation accuracy.

6 0.96508497 435 iccv-2013-Unsupervised Domain Adaptation by Domain Invariant Projection

same-paper 7 0.94846964 1 iccv-2013-3DNN: Viewpoint Invariant 3D Geometry Matching for Scene Understanding

8 0.93359154 431 iccv-2013-Unbiased Metric Learning: On the Utilization of Multiple Datasets and Web Images for Softening Bias

9 0.92869508 183 iccv-2013-Geometric Registration Based on Distortion Estimation

10 0.92481107 181 iccv-2013-Frustratingly Easy NBNN Domain Adaptation

11 0.92420715 438 iccv-2013-Unsupervised Visual Domain Adaptation Using Subspace Alignment

12 0.92210734 363 iccv-2013-Rolling Shutter Stereo

13 0.92200595 196 iccv-2013-Hierarchical Data-Driven Descent for Efficient Optimal Deformation Estimation

14 0.9214474 60 iccv-2013-Bayesian Robust Matrix Factorization for Image and Video Processing

15 0.92117113 35 iccv-2013-Accurate Blur Models vs. Image Priors in Single Image Super-resolution

16 0.92090285 108 iccv-2013-Depth from Combining Defocus and Correspondence Using Light-Field Cameras

17 0.91960603 190 iccv-2013-Handling Occlusions with Franken-Classifiers

18 0.91935861 280 iccv-2013-Multi-view 3D Reconstruction from Uncalibrated Radially-Symmetric Cameras

19 0.91886902 66 iccv-2013-Building Part-Based Object Detectors via 3D Geometry

20 0.91843015 201 iccv-2013-Holistic Scene Understanding for 3D Object Detection with RGBD Cameras