iccv iccv2013 iccv2013-79 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Jiyan Pan, Takeo Kanade
Abstract: Objects in a real world image cannot have arbitrary appearance, sizes and locations due to geometric constraints in 3D space. Such a 3D geometric context plays an important role in resolving visual ambiguities and achieving coherent object detection. In this paper, we develop a RANSAC-CRF framework to detect objects that are geometrically coherent in the 3D world. Different from existing methods, we propose a novel generalized RANSAC algorithm to generate global 3D geometry hypothesesfrom local entities such that outlier suppression and noise reduction is achieved simultaneously. In addition, we evaluate those hypotheses using a CRF which considers both the compatibility of individual objects under global 3D geometric context and the compatibility between adjacent objects under local 3D geometric context. Experiment results show that our approach compares favorably with the state of the art.
Reference: text
sentIndex sentText sentNum sentScore
1 Such a 3D geometric context plays an important role in resolving visual ambiguities and achieving coherent object detection. [sent-7, score-0.321]
2 Different from existing methods, we propose a novel generalized RANSAC algorithm to generate global 3D geometry hypothesesfrom local entities such that outlier suppression and noise reduction is achieved simultaneously. [sent-9, score-0.328]
3 In addition, we evaluate those hypotheses using a CRF which considers both the compatibility of individual objects under global 3D geometric context and the compatibility between adjacent objects under local 3D geometric context. [sent-10, score-1.179]
4 etry is recovered together with object detection, the joint optimal solution would enforce 3D geometric coherence among detected objects and therefore improve object detection performance. [sent-26, score-0.351]
5 represents the scene geometry with a ground plane parameterized by its pitch angle (i. [sent-33, score-0.783]
6 horizon position) and height with respect to the camera [14]. [sent-35, score-0.367]
7 The ground plane is more flexible in [3] and [18] where it is allowed to have a non-zero roll angle (i. [sent-36, score-0.58]
8 In all those works, ground plane parameters are quantized into a small number of bins for tractability. [sent-39, score-0.332]
9 Different from the existing works, we also model gravity direction in addition to the ground plane (please see Figure 3), so that scenes like sloped streets can be represented as well. [sent-40, score-0.704]
10 derives an approximate relationship between ground plane parameters and the position and height of object bounding box in the image [14]. [sent-44, score-0.626]
11 As only bounding box information is used, the simplified geometric relationship is effective only when ground plane roll is zero and ground plane pitch is small. [sent-45, score-1.316]
12 The methods proposed in [3] and [18] use object 2D appearance to estimate its pitch angle with respect to the camera, and compute the ground plane from at least 3 objects. [sent-46, score-0.688]
13 build a Bayes net in which every object candidate is attached to the common global 3D geometry (i. [sent-50, score-0.49]
14 Inference over the Bayes net gives the optimal ground plane parameters and the validity of each candidate. [sent-53, score-0.434]
15 To make the inference tractable, the ground plane is assumed to have zero roll angle, and the quantization of the ground plane parameters is relatively coarse. [sent-54, score-0.847]
16 For each enumerated ground plane hypothesis, the validity of object candidates are checked against it, and the ground plane with the highest compatibility is chosen. [sent-57, score-1.219]
17 An improved method is presented in [18], where all object candidates cast votes for the ground plane parameters in a Hough voting space, and the peak in the voting space is regarded as the optimal ground plane. [sent-59, score-0.631]
18 Another novelty of our approach is that surface regions are also involved in generating and evaluating global 3D geometry hypotheses. [sent-62, score-0.441]
19 After an overview of our algorithm in Section 2, we describe in Section 3 how we generate the hypotheses of global 3D geometry in a way that suppresses outliers and reduces noise simultaneously. [sent-66, score-0.461]
20 Evaluation of those hypotheses that incorporates both global and local 3D geometric constraints is detailed in Section 4. [sent-67, score-0.442]
21 camera is represented by the orange horizon line, and the ground plane is represented by the blue mesh. [sent-71, score-0.616]
22 We start by generating object/surface candidates (including cars, pedestrians, vertical and horizontal surface regions) using state-of-the-art object detectors (e. [sent-75, score-0.621]
23 Each object/surface candidate gives an estimate of the global 3D geometry (i. [sent-80, score-0.426]
24 gravity direction and ground plane parameters) based on their 2D appearance. [sent-82, score-0.704]
25 Those noisy estimates are then pooled together using an generalized RANSAC algorithm to generate a set of global 3D geometry hypotheses. [sent-83, score-0.328]
26 Given each hypothesis, we compute the compatibility of each object/surface candidate and infer their validity according to global and local 3D geometric context. [sent-84, score-0.767]
27 Finally the hypothesis with the highest quality is selected as the optimal estimate of the global 3D geometry, and the inference result of object candidate validity associated with the best hypothesis gives the final object detection result. [sent-86, score-0.784]
28 The first group contains global variables depicting the global 3D geometry: (inverse) gravity direction ng, ground plane orientation np, and ground plane height hp. [sent-92, score-1.442]
29 The orange and purple lines in the image plane are ground horizon and gravity horizon, respectively. [sent-94, score-0.915]
30 Now we list all the geometric relationships that exist among those variables: dt = H sin θ/ sin α; (1) db = H(sin θ/ tan α + cos θ) ; (2) nv = m{dtr{xt , f} − dbr{xb, f}}; (3) nv = g{−r{xb, f}, θ, γ}; (4) α = arccos{< r{xt , f}, r{xb, f} >}; (5) hnpv= −? [sent-96, score-0.634]
31 In addition, we could also estimate the distributions of object pitch and roll angles given object appearance I and category c: θ ∼ p1(I, c) ; (8) γ ∼ p2 (I, c) . [sent-104, score-0.624]
32 Among all the geometric variables, only xt (10) and xb are directly observable. [sent-106, score-0.353]
33 Therefore, instead of attempting to directly solve them, we use those equations to propose hypotheses of the global 3D geometry (ng, np, hp), and to evaluate those hypotheses. [sent-109, score-0.429]
34 The pitch and roll angles, together with landmark locations, produce a non-parametric distribution of object vertical orientation nv, according to the algorithm summarized in Figure 4. [sent-118, score-0.858]
35 Following constraint 7, the vertical orientation distributions obtained from cars are regarded as the estimations of the ground plane orientation np, and those from pedestrians are for the gravity direction ng. [sent-119, score-1.241]
36 In addition to estimating ng and np, given the vertical orientation, each object candidate also provides cues for the ground plane height hp according to its size and location. [sent-120, score-1.0]
37 The algorithm for using an object candidate to generate a non-parametric distribution of hp is summarized in Figure 5. [sent-121, score-0.368]
38 For each surface candidate: Given a vertical surface region like a building facade, we extract long edges within it and compute vertical and horizontal vanishing points (VP) using Gaussian sphere [1]. [sent-122, score-0.778]
39 The vertical direction of the surface nv can be estimated directly from the vertical 22557788 non-parametric distribution of the ground plane height. [sent-124, score-1.142]
40 Therefore, for the vertical VP, it directly yields a set of nv samples from its constituent circle-intersection points. [sent-127, score-0.406]
41 For each pair of horizontal VPs, we compute the cross product of their respective constituent circle-intersection points and generate a set of nv samples. [sent-128, score-0.354]
42 The nv samples from the vertical VP and all pairs of horizontal VPs are pooled together to generate a non-parametric distribution of nv over the dense grid Gn using Kernel Density Estimation (KDE). [sent-129, score-0.737]
43 Estimating the vertical direction of a horizontal surface region (e. [sent-130, score-0.489]
44 In outdoor street scenes, vertical surfaces usually correspond to building facades which typically agree with the gravity direction, while horizontal surfaces usually fall on roads which relate to the ground plane orientation. [sent-133, score-1.04]
45 Therefore, we have nv = ng for vertical surfaces and nv = np for horizontal surfaces. [sent-134, score-0.872]
46 An example of each object/surface candidate giving an estimate of the global 3D geometry is shown in Figure 6a and b. [sent-135, score-0.426]
47 Generating hypotheses of global 3D geometry with generalized RANSAC One of the keys for RANSAC to succeed is that at least one hypothesis should be close to the ground truth. [sent-140, score-0.72]
48 Generate hypotheses of global 3D geometry from object/surface candidates. [sent-149, score-0.429]
49 Here, red and green shades indicate vertical and horizontal surface candidates, respectively. [sent-151, score-0.416]
50 Here, magenta lines represent gravity horizons, and yellow grids indicate ground planes, where the grid size is 1m. [sent-153, score-0.469]
51 After each object/surface candidate has estimated a distribution of the global 3D geometry, we generate a set of mixed distributions by mixing individual distributions together with randomly generated weights. [sent-159, score-0.515]
52 To verify this claim, we perform experiments on the 100 images that are provided with the ground truth horizon in Hoiem’s dataset [14], and the results are plotted in Figure 7. [sent-163, score-0.367]
53 The error of hy- pothesis generation is defined as the difference between the ground truth and the best hypothesis among all the hypotheses generated. [sent-165, score-0.424]
54 Evaluating global 3D geometry hypotheses Given a global 3D geometry hypothesis ( n˜g, we evaluate its quality by measuring how well it is supported by object/surface candidates after excluding the influence of np, h˜p), outliers. [sent-176, score-0.909]
55 For this purpose, we evaluate the global and local geometric compatibilities of each object/surface candidate, and employ a CRF to infer the validity of each candidate. [sent-177, score-0.342]
56 Global geometric compatibility Global geometric compatibility refers to the compatibility of an individual object/surface candidate w. [sent-181, score-1.269]
57 the global 3D geometry such as ground plane and gravity. [sent-184, score-0.588]
58 Individual object candidate: We use two sources of geometric constraints to compute the global compatibility of an individual object candidate. [sent-186, score-0.7]
59 In both the two sources, landmark locations xt and xb take the mean value produced by the landmark regressor. [sent-187, score-0.427]
60 The first source compares the pitch and roll angles predicted by the pose regressor (using constraints 8 and 9) with those directly computed from the current hypothesis of the global 3D geometry (using constraint 4 where nv = for pedestrians and nv = for cars). [sent-188, score-1.352]
61 The resulting compatibility score is ng sg1= exp{−(θ˜−2 σ θθ20)2} · exp{−( γ˜2 −σγ γ20)2} − 0. [sent-189, score-0.351]
62 5, np (11) where θ0, γ0 and σθ2, σγ2 are the mean and variance of the pitch and roll regressor outputs, respectively. [sent-190, score-0.548]
63 Given the ground plane hypothesis and we compute the bottom landmark depth db using constraint 6. [sent-197, score-0.575]
64 Denote the angle between and the direction of Xt − Xb as δ, then the resulting compatibility score is np ng nv ˜hp, np nv sg2= exp{−2δσ22} · exp{−(? [sent-201, score-1.097]
65 The final compatibility score sg for an individual object candidate is the average of sg1 and sg2. [sent-205, score-0.614]
66 Individual surface candidate: The compatibility of a surface candidate also comes from two sources. [sent-206, score-0.721]
67 Firstly, we check how well the vertical orientation distribution produced by the surface candidate agree with the current hypothesis (for vertical surface) or (for horizontal surface). [sent-207, score-1.032]
68 Secondly, we check the plausibility of the location of the surface candidate with respect to the ground horizon in the image. [sent-211, score-0.685]
69 For the horizontal surface candidate, denote the proportion of the surface region above the ground horizon as rh, then the compatibility score sg2 is −rh with range between -1 and 0. [sent-212, score-1.044]
70 For the vertical surface c−arndidate, this type of compatibility does not apply, as it usually straddles across the horizon. [sent-213, score-0.579]
71 The final compatibility score sg is the average of sg1 and sg2 for the horizontal surface candidate, and is sg1 for the vertical surface candidate. [sent-214, score-0.896]
72 Local geometric compatibility Local geometric compatibility refers to the compatibility between nearby object candidates. [sent-217, score-1.115]
73 Therefore, we define the pairwise compatibility score related to depth ordering as s(idjep) s(idjep) = −(|Rij|/|Rj|)λ, (13) where |Rij | is the overlapping area of candidates iand j, |Rj e| ies |thRe area othfe eca onvedirldaaptpe j,ga andre λa oisf a parameter sie atn as j5,. [sent-220, score-0.393]
74 on the ground plane significantly overlap, they are unlikely to co-exist due to space occupancy conflict, as is illustrated by the upper pair of orange cubes in Figure 8. [sent-222, score-0.415]
75 The pairwise compatibility score related to space occupancy is therefore defined as s(iojcp) s(iojcp) = −|Ri ∩ Rj|/|Ri ∪ Rj|, (14) where |Ri ∩ Rj | and |Ri ∪ Rj | are the intersection and uniownh areas of∩ ∩th Re footprints ∪of R Rca|n adireda thtees i n taenrds j, respectively. [sent-223, score-0.34]
76 The footprint of an object candidate is obtained by mapping several ground-touching landmarks in the image to the ground plane. [sent-224, score-0.427]
77 Inferring candidate validity with CRF We construct a CRF over the object/surface candidates to infer their validity. [sent-228, score-0.376]
78 Each candidate forms a node, and two object candidates have an edge between them iftheir bounding boxes and/or footprints overlap. [sent-229, score-0.423]
79 ), ), and ωi (oi) are the unary potentials for the vertical surface candidate, horizontal surface candidate, and object candidate i, respectively. [sent-239, score-0.798]
80 5) and ω(o = 0) = 0, where sg is the compatibility score d0. [sent-241, score-0.332]
81 After the inference is complete, the quality of the current global 3D geometry hypothesis is the maximum value V∗ of the objective function. [sent-247, score-0.376]
82 When proposing hypotheses from distributions, 50 random mixtures are usually enough, and the total number of hypotheses to evaluate is in the hundreds. [sent-263, score-0.379]
83 This is probably because Hoiem’s algorithm takes the bounding box height as the object height in the image. [sent-275, score-0.425]
84 The dataset provides the ground truth of horizon in the form of the row index where the horizon is located. [sent-279, score-0.603]
85 It does not distinguish between gravity and ground horizons, since the two are almost the same for most of the images in the dataset. [sent-280, score-0.43]
86 After converting the row index of a horizon to the corresponding orientation vector, we compute the error of an estimated gravity direction (or ground plane orientation) by measuring the angle between it and the ground truth orientation vector. [sent-281, score-1.298]
87 Here, ”Det” shows the result of the DPM baseline detector; ”Det+GlbGeo” shows the result of including global geometric context alone; ”Det+LocGeo” shows the result of including local geometric context alone; ”Det+FullGeo” shows the result of our full system using both types of context. [sent-289, score-0.513]
88 The estimated ground plane height by our method is centered around 1. [sent-292, score-0.463]
89 It is worth noting that, unlike Hoiem’s algorithm, we do not assume the ground plane is perpendicular to the gravity direction and has zero roll and small pitch. [sent-294, score-0.887]
90 Even with a greater flexibility, our approach still outperforms Hoiem’s algorithm both in object detection and global geometry estimation, on the test dataset that largely satisfies those assumptions. [sent-295, score-0.367]
91 Global and local 3D geometric context: Different from existing works, we use both global and local 3D geometric context when inferring the validity of object candidates. [sent-303, score-0.614]
92 Both the global and local 3D geometric context enhance detection performance, and the highest gain is achieved when they are applied simultaneously. [sent-305, score-0.352]
93 Benefit is mutual: Not only does 3D geometric context enhance object detection performance, but coherent object detection in turn improves the estimation of gravity and ground horizons. [sent-306, score-0.909]
94 To verify this argument, we estimate the gravity direction and ground plane orientation from vertical Figure 10. [sent-307, score-0.961]
95 The first row shows the distributions of gravity direction error, ground orientation error, and ground height from our algorithm. [sent-309, score-0.892]
96 We are not able to compute the error of the ground plane height estimation due to the lack of ground truth. [sent-312, score-0.594]
97 Yellow grid: ground plane where the grid spacing is 1m. [sent-352, score-0.332]
98 We can see that some false detections from DPM are rejected due to inconsistency with global geometry (e. [sent-355, score-0.43]
99 the huge ”pedestrian” in (a)); some false detections from DPM are rejected due to inconsistency with local geometry (e. [sent-357, score-0.333]
100 The case when gravity direction and ground orientation do not agree is shown in (f), where the ground horizon is illustrated by the thick yellow line. [sent-363, score-0.983]
wordName wordTfidf (topN-words)
[('gravity', 0.299), ('compatibility', 0.255), ('horizon', 0.236), ('pitch', 0.227), ('plane', 0.201), ('nv', 0.2), ('hoiem', 0.191), ('roll', 0.183), ('vertical', 0.176), ('hypotheses', 0.173), ('candidate', 0.17), ('geometry', 0.159), ('surface', 0.148), ('geometric', 0.143), ('xb', 0.133), ('height', 0.131), ('ground', 0.131), ('dpm', 0.122), ('hypothesis', 0.12), ('np', 0.104), ('candidates', 0.104), ('validity', 0.102), ('global', 0.097), ('horizontal', 0.092), ('ransac', 0.092), ('cars', 0.09), ('landmark', 0.09), ('crf', 0.082), ('orientation', 0.081), ('xt', 0.077), ('direction', 0.073), ('vp', 0.072), ('idjep', 0.071), ('iojcp', 0.071), ('rj', 0.068), ('box', 0.065), ('hp', 0.065), ('context', 0.065), ('angle', 0.065), ('detections', 0.064), ('object', 0.064), ('pedestrians', 0.063), ('ng', 0.062), ('landmarks', 0.062), ('rejected', 0.06), ('hdr', 0.055), ('modes', 0.054), ('detector', 0.051), ('footprints', 0.051), ('false', 0.05), ('coherent', 0.049), ('individual', 0.048), ('orange', 0.048), ('oi', 0.047), ('det', 0.047), ('detection', 0.047), ('distributions', 0.046), ('efros', 0.045), ('pedestrian', 0.045), ('layout', 0.043), ('sg', 0.043), ('sij', 0.042), ('angles', 0.04), ('generalized', 0.04), ('mixed', 0.039), ('oj', 0.039), ('horizons', 0.039), ('messy', 0.039), ('magenta', 0.039), ('please', 0.039), ('surfaces', 0.038), ('degrees', 0.038), ('bao', 0.038), ('vanishing', 0.038), ('car', 0.038), ('locations', 0.037), ('distribution', 0.037), ('vps', 0.037), ('generating', 0.037), ('labelme', 0.036), ('cubes', 0.035), ('conflict', 0.035), ('regressor', 0.034), ('score', 0.034), ('bounding', 0.034), ('mixtures', 0.033), ('recovered', 0.033), ('outdoor', 0.033), ('db', 0.033), ('corrupt', 0.032), ('agree', 0.032), ('generate', 0.032), ('wheel', 0.031), ('constituent', 0.03), ('checked', 0.03), ('pittsburgh', 0.03), ('constraints', 0.029), ('sin', 0.029), ('ri', 0.029), ('rij', 0.028)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999958 79 iccv-2013-Coherent Object Detection with 3D Geometric Context from a Single Image
Author: Jiyan Pan, Takeo Kanade
Abstract: Objects in a real world image cannot have arbitrary appearance, sizes and locations due to geometric constraints in 3D space. Such a 3D geometric context plays an important role in resolving visual ambiguities and achieving coherent object detection. In this paper, we develop a RANSAC-CRF framework to detect objects that are geometrically coherent in the 3D world. Different from existing methods, we propose a novel generalized RANSAC algorithm to generate global 3D geometry hypothesesfrom local entities such that outlier suppression and noise reduction is achieved simultaneously. In addition, we evaluate those hypotheses using a CRF which considers both the compatibility of individual objects under global 3D geometric context and the compatibility between adjacent objects under local 3D geometric context. Experiment results show that our approach compares favorably with the state of the art.
2 0.20232503 111 iccv-2013-Detecting Dynamic Objects with Multi-view Background Subtraction
Author: Raúl Díaz, Sam Hallman, Charless C. Fowlkes
Abstract: The confluence of robust algorithms for structure from motion along with high-coverage mapping and imaging of the world around us suggests that it will soon be feasible to accurately estimate camera pose for a large class photographs taken in outdoor, urban environments. In this paper, we investigate how such information can be used to improve the detection of dynamic objects such as pedestrians and cars. First, we show that when rough camera location is known, we can utilize detectors that have been trained with a scene-specific background model in order to improve detection accuracy. Second, when precise camera pose is available, dense matching to a database of existing images using multi-view stereo provides a way to eliminate static backgrounds such as building facades, akin to background-subtraction often used in video analysis. We evaluate these ideas using a dataset of tourist photos with estimated camera pose. For template-based pedestrian detection, we achieve a 50 percent boost in average precision over baseline.
3 0.1888863 1 iccv-2013-3DNN: Viewpoint Invariant 3D Geometry Matching for Scene Understanding
Author: Scott Satkin, Martial Hebert
Abstract: We present a new algorithm 3DNN (3D NearestNeighbor), which is capable of matching an image with 3D data, independently of the viewpoint from which the image was captured. By leveraging rich annotations associated with each image, our algorithm can automatically produce precise and detailed 3D models of a scene from a single image. Moreover, we can transfer information across images to accurately label and segment objects in a scene. The true benefit of 3DNN compared to a traditional 2D nearest-neighbor approach is that by generalizing across viewpoints, we free ourselves from the need to have training examples captured from all possible viewpoints. Thus, we are able to achieve comparable results using orders of magnitude less data, and recognize objects from never-beforeseen viewpoints. In this work, we describe the 3DNN algorithm and rigorously evaluate its performance for the tasks of geometry estimation and object detection/segmentation. By decoupling the viewpoint and the geometry of an image, we develop a scene matching approach which is truly 100% viewpoint invariant, yielding state-of-the-art performance on challenging data.
4 0.16039595 410 iccv-2013-Support Surface Prediction in Indoor Scenes
Author: Ruiqi Guo, Derek Hoiem
Abstract: In this paper, we present an approach to predict the extent and height of supporting surfaces such as tables, chairs, and cabinet tops from a single RGBD image. We define support surfaces to be horizontal, planar surfaces that can physically support objects and humans. Given a RGBD image, our goal is to localize the height and full extent of such surfaces in 3D space. To achieve this, we created a labeling tool and annotated 1449 images with rich, complete 3D scene models in NYU dataset. We extract ground truth from the annotated dataset and developed a pipeline for predicting floor space, walls, the height and full extent of support surfaces. Finally we match the predicted extent with annotated scenes in training scenes and transfer the the support surface configuration from training scenes. We evaluate the proposed approach in our dataset and demonstrate its effectiveness in understanding scenes in 3D space.
5 0.15050389 201 iccv-2013-Holistic Scene Understanding for 3D Object Detection with RGBD Cameras
Author: Dahua Lin, Sanja Fidler, Raquel Urtasun
Abstract: In this paper, we tackle the problem of indoor scene understanding using RGBD data. Towards this goal, we propose a holistic approach that exploits 2D segmentation, 3D geometry, as well as contextual relations between scenes and objects. Specifically, we extend the CPMC [3] framework to 3D in order to generate candidate cuboids, and develop a conditional random field to integrate information from different sources to classify the cuboids. With this formulation, scene classification and 3D object recognition are coupled and can be jointly solved through probabilistic inference. We test the effectiveness of our approach on the challenging NYU v2 dataset. The experimental results demonstrate that through effective evidence integration and holistic reasoning, our approach achieves substantial improvement over the state-of-the-art.
6 0.13702956 46 iccv-2013-Allocentric Pose Estimation
7 0.1308765 64 iccv-2013-Box in the Box: Joint 3D Layout and Object Reasoning from Single Images
8 0.12968102 144 iccv-2013-Estimating the 3D Layout of Indoor Scenes and Its Clutter from Depth Sensors
9 0.12732942 70 iccv-2013-Cascaded Shape Space Pruning for Robust Facial Landmark Detection
10 0.12635852 286 iccv-2013-NYC3DCars: A Dataset of 3D Vehicles in Geographic Context
11 0.12572289 66 iccv-2013-Building Part-Based Object Detectors via 3D Geometry
12 0.11554371 343 iccv-2013-Real-World Normal Map Capture for Nearly Flat Reflective Surfaces
13 0.11530515 281 iccv-2013-Multi-view Normal Field Integration for 3D Reconstruction of Mirroring Objects
14 0.11059384 65 iccv-2013-Breaking the Chain: Liberation from the Temporal Markov Assumption for Tracking Human Poses
15 0.10915612 9 iccv-2013-A Flexible Scene Representation for 3D Reconstruction Using an RGB-D Camera
16 0.10830726 56 iccv-2013-Automatic Registration of RGB-D Scans via Salient Directions
17 0.10689704 379 iccv-2013-Semantic Segmentation without Annotating Segments
18 0.10682852 237 iccv-2013-Learning Graph Matching: Oriented to Category Modeling from Cluttered Scenes
19 0.10478946 269 iccv-2013-Modeling Occlusion by Discriminative AND-OR Structures
20 0.099345595 75 iccv-2013-CoDeL: A Human Co-detection and Labeling Framework
topicId topicWeight
[(0, 0.23), (1, -0.113), (2, -0.022), (3, -0.022), (4, 0.112), (5, -0.043), (6, -0.006), (7, -0.048), (8, -0.07), (9, -0.076), (10, 0.065), (11, 0.05), (12, -0.096), (13, 0.005), (14, 0.013), (15, -0.037), (16, 0.017), (17, 0.112), (18, -0.014), (19, -0.068), (20, -0.131), (21, 0.02), (22, 0.1), (23, -0.004), (24, 0.096), (25, -0.057), (26, -0.053), (27, -0.02), (28, -0.033), (29, -0.102), (30, -0.105), (31, 0.008), (32, 0.019), (33, 0.004), (34, 0.067), (35, 0.053), (36, 0.034), (37, -0.024), (38, 0.025), (39, 0.021), (40, -0.069), (41, 0.026), (42, 0.038), (43, -0.023), (44, 0.063), (45, 0.011), (46, 0.024), (47, -0.058), (48, 0.071), (49, 0.074)]
simIndex simValue paperId paperTitle
same-paper 1 0.96818101 79 iccv-2013-Coherent Object Detection with 3D Geometric Context from a Single Image
Author: Jiyan Pan, Takeo Kanade
Abstract: Objects in a real world image cannot have arbitrary appearance, sizes and locations due to geometric constraints in 3D space. Such a 3D geometric context plays an important role in resolving visual ambiguities and achieving coherent object detection. In this paper, we develop a RANSAC-CRF framework to detect objects that are geometrically coherent in the 3D world. Different from existing methods, we propose a novel generalized RANSAC algorithm to generate global 3D geometry hypothesesfrom local entities such that outlier suppression and noise reduction is achieved simultaneously. In addition, we evaluate those hypotheses using a CRF which considers both the compatibility of individual objects under global 3D geometric context and the compatibility between adjacent objects under local 3D geometric context. Experiment results show that our approach compares favorably with the state of the art.
2 0.8071484 1 iccv-2013-3DNN: Viewpoint Invariant 3D Geometry Matching for Scene Understanding
Author: Scott Satkin, Martial Hebert
Abstract: We present a new algorithm 3DNN (3D NearestNeighbor), which is capable of matching an image with 3D data, independently of the viewpoint from which the image was captured. By leveraging rich annotations associated with each image, our algorithm can automatically produce precise and detailed 3D models of a scene from a single image. Moreover, we can transfer information across images to accurately label and segment objects in a scene. The true benefit of 3DNN compared to a traditional 2D nearest-neighbor approach is that by generalizing across viewpoints, we free ourselves from the need to have training examples captured from all possible viewpoints. Thus, we are able to achieve comparable results using orders of magnitude less data, and recognize objects from never-beforeseen viewpoints. In this work, we describe the 3DNN algorithm and rigorously evaluate its performance for the tasks of geometry estimation and object detection/segmentation. By decoupling the viewpoint and the geometry of an image, we develop a scene matching approach which is truly 100% viewpoint invariant, yielding state-of-the-art performance on challenging data.
3 0.78146064 410 iccv-2013-Support Surface Prediction in Indoor Scenes
Author: Ruiqi Guo, Derek Hoiem
Abstract: In this paper, we present an approach to predict the extent and height of supporting surfaces such as tables, chairs, and cabinet tops from a single RGBD image. We define support surfaces to be horizontal, planar surfaces that can physically support objects and humans. Given a RGBD image, our goal is to localize the height and full extent of such surfaces in 3D space. To achieve this, we created a labeling tool and annotated 1449 images with rich, complete 3D scene models in NYU dataset. We extract ground truth from the annotated dataset and developed a pipeline for predicting floor space, walls, the height and full extent of support surfaces. Finally we match the predicted extent with annotated scenes in training scenes and transfer the the support surface configuration from training scenes. We evaluate the proposed approach in our dataset and demonstrate its effectiveness in understanding scenes in 3D space.
4 0.7528497 201 iccv-2013-Holistic Scene Understanding for 3D Object Detection with RGBD Cameras
Author: Dahua Lin, Sanja Fidler, Raquel Urtasun
Abstract: In this paper, we tackle the problem of indoor scene understanding using RGBD data. Towards this goal, we propose a holistic approach that exploits 2D segmentation, 3D geometry, as well as contextual relations between scenes and objects. Specifically, we extend the CPMC [3] framework to 3D in order to generate candidate cuboids, and develop a conditional random field to integrate information from different sources to classify the cuboids. With this formulation, scene classification and 3D object recognition are coupled and can be jointly solved through probabilistic inference. We test the effectiveness of our approach on the challenging NYU v2 dataset. The experimental results demonstrate that through effective evidence integration and holistic reasoning, our approach achieves substantial improvement over the state-of-the-art.
5 0.7385444 64 iccv-2013-Box in the Box: Joint 3D Layout and Object Reasoning from Single Images
Author: Alexander G. Schwing, Sanja Fidler, Marc Pollefeys, Raquel Urtasun
Abstract: In this paper we propose an approach to jointly infer the room layout as well as the objects present in the scene. Towards this goal, we propose a branch and bound algorithm which is guaranteed to retrieve the global optimum of the joint problem. The main difficulty resides in taking into account occlusion in order to not over-count the evidence. We introduce a new decomposition method, which generalizes integral geometry to triangular shapes, and allows us to bound the different terms in constant time. We exploit both geometric cues and object detectors as image features and show large improvements in 2D and 3D object detection over state-of-the-art deformable part-based models.
6 0.69501334 111 iccv-2013-Detecting Dynamic Objects with Multi-view Background Subtraction
7 0.68495393 102 iccv-2013-Data-Driven 3D Primitives for Single Image Understanding
8 0.66563034 286 iccv-2013-NYC3DCars: A Dataset of 3D Vehicles in Geographic Context
9 0.6182723 66 iccv-2013-Building Part-Based Object Detectors via 3D Geometry
10 0.61069971 2 iccv-2013-3D Scene Understanding by Voxel-CRF
11 0.60718894 109 iccv-2013-Detecting Avocados to Zucchinis: What Have We Done, and Where Are We Going?
12 0.60481256 46 iccv-2013-Allocentric Pose Estimation
13 0.59596503 308 iccv-2013-Parsing IKEA Objects: Fine Pose Estimation
14 0.58355665 179 iccv-2013-From Subcategories to Visual Composites: A Multi-level Framework for Object Detection
15 0.58024287 349 iccv-2013-Regionlets for Generic Object Detection
16 0.56942511 367 iccv-2013-SUN3D: A Database of Big Spaces Reconstructed Using SfM and Object Labels
17 0.56626004 144 iccv-2013-Estimating the 3D Layout of Indoor Scenes and Its Clutter from Depth Sensors
18 0.56594253 433 iccv-2013-Understanding High-Level Semantics by Modeling Traffic Patterns
19 0.56338197 56 iccv-2013-Automatic Registration of RGB-D Scans via Salient Directions
20 0.55788076 189 iccv-2013-HOGgles: Visualizing Object Detection Features
topicId topicWeight
[(2, 0.051), (7, 0.023), (26, 0.076), (31, 0.093), (34, 0.026), (35, 0.01), (42, 0.175), (48, 0.011), (64, 0.048), (68, 0.1), (73, 0.043), (89, 0.188), (95, 0.058), (98, 0.014)]
simIndex simValue paperId paperTitle
same-paper 1 0.9317916 79 iccv-2013-Coherent Object Detection with 3D Geometric Context from a Single Image
Author: Jiyan Pan, Takeo Kanade
Abstract: Objects in a real world image cannot have arbitrary appearance, sizes and locations due to geometric constraints in 3D space. Such a 3D geometric context plays an important role in resolving visual ambiguities and achieving coherent object detection. In this paper, we develop a RANSAC-CRF framework to detect objects that are geometrically coherent in the 3D world. Different from existing methods, we propose a novel generalized RANSAC algorithm to generate global 3D geometry hypothesesfrom local entities such that outlier suppression and noise reduction is achieved simultaneously. In addition, we evaluate those hypotheses using a CRF which considers both the compatibility of individual objects under global 3D geometric context and the compatibility between adjacent objects under local 3D geometric context. Experiment results show that our approach compares favorably with the state of the art.
2 0.92470229 235 iccv-2013-Learning Coupled Feature Spaces for Cross-Modal Matching
Author: Kaiye Wang, Ran He, Wei Wang, Liang Wang, Tieniu Tan
Abstract: Cross-modal matching has recently drawn much attention due to the widespread existence of multimodal data. It aims to match data from different modalities, and generally involves two basic problems: the measure of relevance and coupled feature selection. Most previous works mainly focus on solving the first problem. In this paper, we propose a novel coupled linear regression framework to deal with both problems. Our method learns two projection matrices to map multimodal data into a common feature space, in which cross-modal data matching can be performed. And in the learning procedure, the ?21-norm penalties are imposed on the two projection matrices separately, which leads to select relevant and discriminative features from coupled feature spaces simultaneously. A trace norm is further imposed on the projected data as a low-rank constraint, which enhances the relevance of different modal data with connections. We also present an iterative algorithm based on halfquadratic minimization to solve the proposed regularized linear regression problem. The experimental results on two challenging cross-modal datasets demonstrate that the proposed method outperforms the state-of-the-art approaches.
3 0.91874158 182 iccv-2013-GOSUS: Grassmannian Online Subspace Updates with Structured-Sparsity
Author: Jia Xu, Vamsi K. Ithapu, Lopamudra Mukherjee, James M. Rehg, Vikas Singh
Abstract: We study the problem of online subspace learning in the context of sequential observations involving structured perturbations. In online subspace learning, the observations are an unknown mixture of two components presented to the model sequentially the main effect which pertains to the subspace and a residual/error term. If no additional requirement is imposed on the residual, it often corresponds to noise terms in the signal which were unaccounted for by the main effect. To remedy this, one may impose ‘structural’ contiguity, which has the intended effect of leveraging the secondary terms as a covariate that helps the estimation of the subspace itself, instead of merely serving as a noise residual. We show that the corresponding online estimation procedure can be written as an approximate optimization process on a Grassmannian. We propose an efficient numerical solution, GOSUS, Grassmannian Online Subspace Updates with Structured-sparsity, for this problem. GOSUS is expressive enough in modeling both homogeneous perturbations of the subspace and structural contiguities of outliers, and after certain manipulations, solvable — via an alternating direction method of multipliers (ADMM). We evaluate the empirical performance of this algorithm on two problems of interest: online background subtraction and online multiple face tracking, and demonstrate that it achieves competitive performance with the state-of-the-art in near real time.
4 0.913324 208 iccv-2013-Image Co-segmentation via Consistent Functional Maps
Author: Fan Wang, Qixing Huang, Leonidas J. Guibas
Abstract: Joint segmentation of image sets has great importance for object recognition, image classification, and image retrieval. In this paper, we aim to jointly segment a set of images starting from a small number of labeled images or none at all. To allow the images to share segmentation information with each other, we build a network that contains segmented as well as unsegmented images, and extract functional maps between connected image pairs based on image appearance features. These functional maps act as general property transporters between the images and, in particular, are used to transfer segmentations. We define and operate in a reduced functional space optimized so that the functional maps approximately satisfy cycle-consistency under composition in the network. A joint optimization framework is proposed to simultaneously generate all segmentation functions over the images so that they both align with local segmentation cues in each particular image, and agree with each other under network transportation. This formulation allows us to extract segmentations even with no training data, but can also exploit such data when available. The collective effect of the joint processing using functional maps leads to accurate information sharing among images and yields superior segmentation results, as shown on the iCoseg, MSRC, and PASCAL data sets.
5 0.91005516 259 iccv-2013-Manifold Based Face Synthesis from Sparse Samples
Author: Hongteng Xu, Hongyuan Zha
Abstract: Data sparsity has been a thorny issuefor manifold-based image synthesis, and in this paper we address this critical problem by leveraging ideas from transfer learning. Specifically, we propose methods based on generating auxiliary data in the form of synthetic samples using transformations of the original sparse samples. To incorporate the auxiliary data, we propose a weighted data synthesis method, which adaptively selects from the generated samples for inclusion during the manifold learning process via a weighted iterative algorithm. To demonstrate the feasibility of the proposed method, we apply it to the problem of face image synthesis from sparse samples. Compared with existing methods, the proposed method shows encouraging results with good performance improvements.
6 0.90704542 328 iccv-2013-Probabilistic Elastic Part Model for Unsupervised Face Detector Adaptation
7 0.90085626 26 iccv-2013-A Practical Transfer Learning Algorithm for Face Verification
8 0.90065587 187 iccv-2013-Group Norm for Learning Structured SVMs with Unstructured Latent Variables
9 0.89904153 277 iccv-2013-Multi-channel Correlation Filters
10 0.89699608 59 iccv-2013-Bayesian Joint Topic Modelling for Weakly Supervised Object Localisation
11 0.8965292 45 iccv-2013-Affine-Constrained Group Sparse Coding and Its Application to Image-Based Classifications
12 0.89629138 106 iccv-2013-Deep Learning Identity-Preserving Face Space
13 0.89560091 362 iccv-2013-Robust Tucker Tensor Decomposition for Effective Image Representation
14 0.89558691 427 iccv-2013-Transfer Feature Learning with Joint Distribution Adaptation
15 0.89544958 52 iccv-2013-Attribute Adaptation for Personalized Image Search
16 0.89534426 257 iccv-2013-Log-Euclidean Kernels for Sparse Representation and Dictionary Learning
17 0.89517927 180 iccv-2013-From Where and How to What We See
18 0.89481002 25 iccv-2013-A Novel Earth Mover's Distance Methodology for Image Matching with Gaussian Mixture Models
19 0.89407247 392 iccv-2013-Similarity Metric Learning for Face Recognition
20 0.89397871 349 iccv-2013-Regionlets for Generic Object Detection