iccv iccv2013 iccv2013-308 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Joseph J. Lim, Hamed Pirsiavash, Antonio Torralba
Abstract: We address the problem of localizing and estimating the fine-pose of objects in the image with exact 3D models. Our main focus is to unify contributions from the 1970s with recent advances in object detection: use local keypoint detectors to find candidate poses and score global alignment of each candidate pose to the image. Moreover, we also provide a new dataset containing fine-aligned objects with their exactly matched 3D models, and a set of models for widely used objects. We also evaluate our algorithm both on object detection and fine pose estimation, and show that our method outperforms state-of-the art algorithms.
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract We address the problem of localizing and estimating the fine-pose of objects in the image with exact 3D models. [sent-4, score-0.108]
2 Our main focus is to unify contributions from the 1970s with recent advances in object detection: use local keypoint detectors to find candidate poses and score global alignment of each candidate pose to the image. [sent-5, score-0.571]
3 Moreover, we also provide a new dataset containing fine-aligned objects with their exactly matched 3D models, and a set of models for widely used objects. [sent-6, score-0.101]
4 We also evaluate our algorithm both on object detection and fine pose estimation, and show that our method outperforms state-of-the art algorithms. [sent-7, score-0.453]
5 Introduction Just as a thought experiment imagine that we want to detect and fit 3D models of IKEA furniture in the images as shown in Figure 1. [sent-9, score-0.166]
6 We can find surprisingly accurate 3D models of IKEA furniture, such as billy bookcase and ektorp sofa, created by IKEA fans from Google 3D Warehouse and other publicly available databases. [sent-10, score-0.236]
7 Therefore, detecting those models in images could seem to be a very similar task to an instance detection problem in which we have training images of the exact instance that we want to detect. [sent-11, score-0.228]
8 In the case of typical 3D models (including IKEA models from Google Warehouse), the exact appearance of each piece is not available, only the 3D shape is. [sent-13, score-0.175]
9 For instance, IKEA furniture might appear with different colors and textures, and with geometric deformations (as people building them might not do it perfectly) and occlusions (e. [sent-15, score-0.226]
10 The problem that we introduce in this paper is detecting and accurately fitting exact 3D models of objects to real images, as shown in Figure 1. [sent-18, score-0.246]
11 Detecting 3D objects in images and estimating their pose was a popular topic in the early days ofcomputer vision [14] and has gained a renewed interest in the last few years. [sent-19, score-0.347]
12 The traditional approaches 3D Model Original Image Fine-pose Estimation pose of an object in the image given an exact 3D model. [sent-20, score-0.341]
13 [12, 14] were dominated by using accurate geometric representations of 3D objects with an emphasis on viewpoint invariance. [sent-21, score-0.083]
14 Objects would appear in almost any pose and orientation in the image. [sent-22, score-0.237]
15 Instance level detection regained interest with the introduction of new invariant local descriptors that dramatically improved the detection of interest points [17]. [sent-24, score-0.296]
16 Having access to accurate knowledge about those two aspects allowed precise detections and pose estimations of the object on images even in the presence of clutter and occlusions. [sent-26, score-0.32]
17 We compute HOG on edgemaps to ensure a real image and our model share the modality. [sent-28, score-0.108]
18 level detection have extended 2D constellation models to include 3D information in the object representation. [sent-29, score-0.218]
19 Category level detection requires the models to be generic and flexible, as they have to deal with all the variations in shape and appearance of the instances that belong to the same category. [sent-31, score-0.162]
20 In this paper we introduce a detection task that is in the intersection of these two settings; it is more generic than detecting instances, but we assume richer models than the ones typically used in category level detection. [sent-35, score-0.202]
21 In particular, we assume that accurate CAD models of the objects are available. [sent-36, score-0.101]
22 However, there might be large variation in the appearance, as the CAD models do not completely constrain the appearance of the objects placed in the real world. [sent-38, score-0.179]
23 Although assuming that CAD models are available might seem to be a restrictive assumption, there are available 3D CAD models for most man-made artifacts, used for manufacturing or virtual reality. [sent-39, score-0.151]
24 Hence, we focus on detection and pose estimation of objects in the wild given their 3D CAD models. [sent-41, score-0.395]
25 Our goal is to provide an accurate localization of the object, as in the instance level detection problem, but dealing with some of the variability that one finds in category level detection. [sent-42, score-0.103]
26 Our contributions are three folds: (1) Proposing a detection problem that has some of the challenges of category level detection while allowing for an accurate representation of the object pose in the image. [sent-43, score-0.447]
27 This problem will motivate the development of better 3D object models and the algorithms needed to find them in images. [sent-44, score-0.097]
28 (3) And we introduce a new dataset of 3D IKEA models obtained from Google Warehouse and real images containing instances of IKEA furniture and annotated with ground truth pose. [sent-46, score-0.245]
29 Methods We now propose the framework that can detect objects and estimate their poses simultaneously by matching to one of the 3D models in our database. [sent-48, score-0.157]
30 Local correspondence error The goal here is to measure the local correspondences. [sent-58, score-0.169]
31 Given a projection P, we find the local shape-based matching score between the rendered image of the CAD model and the 2D image. [sent-59, score-0.085]
32 In order to overcome this modality difference, we compute HOG on the edgemap of both images since it is more robust to appearance change but still sensitive to the change in shape. [sent-62, score-0.215]
33 Geometric distance between correspondences Given a proposed set of correspondences c between the CAD model and the image, it is also necessary to measure if c is a geometrically acceptable pose based on the 3D model. [sent-70, score-0.81]
34 For the error measure, we use euclidean distance between the projection of xi and its corresponding 2D point xci ; as well as the line distance defined in [12] between the 3D line land its corresponding 2D line cl . [sent-71, score-0.567]
35 The first term measures a pairwise distance between a 3D interest point Xi and its 2D corresponding point xci . [sent-88, score-0.299]
36 The second term measures the alignment error between a 3D interest line l and one of its corresponding 2D lines cl [12]. [sent-89, score-0.372]
37 Global dissimilarity One key improvement of our work compared to tradi- tional works on pose estimation using a 3D model is how we measure the global alignment. [sent-95, score-0.359]
38 In order to capture edge alignment per orientation, we compute a fine HOG descriptor (2x2 per cell) of edgemaps of I and the rendered image of pose P. [sent-97, score-0.615]
39 Regions: We add another alignment feature based on superpixels. [sent-103, score-0.115]
40 Region feature: One feature to measure a fine alignment is the ratio between areas of a proposed pose (yellow) and regions overlapped with the proposed pose. [sent-108, score-0.498]
41 1| RP||RP|# (7) Texture boundary: The goal of this feature is to capture appearance by measuring how well our proposed pose separates object boundary. [sent-111, score-0.278]
42 In other words, we would like to 222999999444 measure the alignment between the boundary of our proposed pose P and the texture boundary of I. [sent-112, score-0.605]
43 We compute histograms of LBP on inner boundary and outer boundary of proposed pose P. [sent-114, score-0.393]
44 We define an inner boundary region by extruding the proposed object mask followed by subtracting the mask, and an outer boundary by diluting the proposed object mask. [sent-115, score-0.279]
45 Essentially, the histograms will encode the texture patterns of near-inner/outer pixels along the proposed pose P’s boundary. [sent-116, score-0.297]
46 Hence, a large change in these two histograms indicates a large texture pattern change, and it is ideal if the LBP histogram difference along the proposed pose’s boundary is large. [sent-117, score-0.173]
47 This feature will discourage the object boundary aligning with contours with small texture change (such as contours within an object or contours due to illumination). [sent-118, score-0.381]
48 Edges: We extract edges [11] from the image to measure their alignment with the edgemap of estimated pose. [sent-119, score-0.255]
49 ∈ (8) {10, 25, 50, ∞} to control the influence Number of correspondences: fcorr is a binary vector, where the i’th dimension indicates if there are more than i good correspondences between the 3D model and the 2D image under pose P. [sent-126, score-0.559]
50 Good correspondences are the ones with local correspondence error (in Eq 2) below a threshold. [sent-127, score-0.377]
51 Because our L(P, c) is ∞ if any local correspondence score cisa u bseelo owu rth Le( tPh,recs)h iosl∞d , we afnirsyt lfoicndal la cllo srerets- of correspondences for which all local correspondences are above the threshold. [sent-132, score-0.581]
52 Then, we find the pose P that minimizes L(P, c) + wD (P, c). [sent-133, score-0.237]
53 We use RANSAC in populating a set of candidates by optimizing L(P, c). [sent-135, score-0.119]
54 We then minimize L(P, c) + wDD(P, c) by estimating pose P for each found correspondence c. [sent-137, score-0.328]
55 Given a set of correspondences c, we estimate pose P using the Levenberg-Marquardt algorithm minimizing D(P, c). [sent-138, score-0.482]
56 Dataset through 3D models and label point correspondences to an image. [sent-153, score-0.335]
57 The tool provides a feedback by rendering estimated pose on the image and an user can edit more correspondences. [sent-154, score-0.237]
58 In order to develop and evaluate fine pose estimation based on 3D models, we created a new dataset of images and 3D models representing typical indoor scenes. [sent-155, score-0.489]
59 The key difference of this dataset from previous works [18, 20] is that we align exact 3D models with each image, whereas others provided coarse pose information without using exact 3D models. [sent-157, score-0.419]
60 All 800 images are fully annotated with 90 dif222999999555 (a) 3D models KEA object I oom r KEIA(b) Aligned Images Figure 5. [sent-159, score-0.097]
61 Dataset: (a) examples of 3D models we collected from Google Warehouse, and (b) ground truth images where objects are aligned with 3D models using our labeling tool. [sent-160, score-0.157]
62 IKEAobject is the split where 300 images are queried by individual object name (e. [sent-164, score-0.096]
63 IKEAroom is the split where 500 images are queried by ikea room and ikea home; and contains more complex scene where multiple objects appear at a smaller scale. [sent-168, score-1.108]
64 For alignment, we created an online tool that allows a user to browse through models and label point correspondences (usually 5 are sufficient), and check the model’s estimated pose as the user labels. [sent-170, score-0.572]
65 When the distance is small, this is close to the average error in viewing angle for all points. [sent-178, score-0.087]
66 Formally, given an estimated pose Pe and a ground truth pose Pgt of image I, we obtain corresponding 3D points in the camera space. [sent-179, score-0.474]
67 Then, we compute the average pair-wise distance between all corresponding points divided by their distance to the camera. [sent-180, score-0.092]
68 Correspondences (9) First of all, we evaluate our algorithm on finding good correspondences between a 3D model and an image. [sent-185, score-0.245]
69 This is crucial for the rest of our system as each additional poor correspondence grows the search space of RANSAC exponentially. [sent-186, score-0.091]
70 Correspondence evaluation: we are comparing correspondences between our interest point detector and Harris detector. [sent-188, score-0.344]
71 The minimum number of interest points we need for reliable pose estimation is 5. [sent-189, score-0.349]
72 Ours can recall 5 correct correspondences by considering only the top 10 detections per 3D interest point, whereas the Harris detector requires 100 per point. [sent-190, score-0.351]
73 On average, to capture 5 correct correspondences with our method, each interest point has to consider only its top 10 matched candidates. [sent-194, score-0.385]
74 This helps finding good candidates in the RANSAC algorithm. [sent-196, score-0.085]
75 RANSAC evaluation: we evaluated how many pose candidates per image we need in order to obtain a certain recall. [sent-200, score-0.322]
76 We need only about 2000 candidates in order to have 0. [sent-201, score-0.085]
77 In order for us to find the pose P minimizing S(P, c), we need to ensure that our set of poses {(P, c)} minimizing nLe(ePd, tco) e+n Dure(P th, ac)t ocounrt saeitns o fa tp loeseasst one cc)o}rre mcitn pose. [sent-204, score-0.293]
78 Figure 7 shows a semi-log plot of recall vs minimum number of top candidates required per image. [sent-206, score-0.085]
79 Considering the top 2000 candidate poses (shown with a red line) from RANSAC, we can obtain 0. [sent-207, score-0.098]
80 In other words, the later optimization, where feature extraction is computationally heavy, can run with only the top 2000 candidate poses and still have a recall 0. [sent-209, score-0.098]
81 Final Pose Estimation Table 1 shows the evaluation of our method with various sets of features as well as two state-of-the-art object detectors: Deformable part models [4] and Exemplar LDA [8] in IKEAobject database. [sent-213, score-0.097]
82 The training set contains rendered images (examples are shown in Figure 5b) in order to cover various poses (which are not possible to cover with real images). [sent-215, score-0.18]
83 If there are some mixtures with high false positive rates, then the whole system can break down, and (2) they are trained with rendered images due to their requirement ofimages for each different mixture. [sent-218, score-0.085]
84 We score each pose estimation P as a true detection if its normalized 3D space distance to the ground truth is within the threshold; and for the rest, we precisely follow a standard bounding box criterion [2]. [sent-222, score-0.484]
85 Here, we would like to emphasize that this is a much harder task than a typical bounding box detection task. [sent-223, score-0.154]
86 Though it is true that the top candidates generally contain a correct pose (as shown in Figure 7), it is clear that scores based only on local correspondences (D and L) are not robust to false positives, despite high recall. [sent-227, score-0.608]
87 Cornecase:thes twoimagesilutraecase whic are incorrect under our pose estimation criteria, but are still correct under standard bounding box detection criterion. [sent-233, score-0.479]
88 For the detection task, we trained [4, 8] with real images, because there exist enough real images to train a small number of mixtures. [sent-236, score-0.144]
89 Because our method is designed for capturing a fine alignment, we measured at two different thresholds on bounding box intersection over union. [sent-238, score-0.197]
90 As the table shows, our method does not fluctuate much with a threshold change, whereas both [8] and [4] suffer and drop performances significantly. [sent-242, score-0.111]
91 Figure 8 shows several detection examples where pose estimation is incorrect, but still bounding box estimation is correct with a threshold of 0. [sent-243, score-0.563]
92 We first show our pose predictions by drawing blue outlines, and predicted normal directions. [sent-246, score-0.237]
93 To show that our algorithm obtained an accurate pose alignment, we also render different 222999999777 (a) Image (b) Our result (c) Normal map (d) Novel view Figure9. [sent-247, score-0.237]
94 First 4 rows are correct estimations and last 2 rows are incorrect. [sent-250, score-0.083]
95 Note that an error on the 5th row is misaligning a bookcase by 1shelf. [sent-251, score-0.11]
96 AP Performances on Pose Estimation: we evaluate our pose estimation performance at a fine scale in the IKEAobject database. [sent-289, score-0.393]
97 Note that DPM and ELDA are trained using rendered images. [sent-291, score-0.085]
98 The gap between our method and [4] becomes significantly larger as we increase the threshold; which suggests that our method is better at fine detection. [sent-330, score-0.109]
99 Conclusion We have introduced a novel problem and model of estimating fine-pose of objects in the image with exact 3D models, combining traditionally used and recently developed techniques. [sent-333, score-0.108]
100 3d object detection and viewpoint estimation with a deformable 3d cuboid model. [sent-369, score-0.195]
wordName wordTfidf (topN-words)
[('ikea', 0.504), ('correspondences', 0.245), ('pose', 0.237), ('cad', 0.205), ('ikeaobject', 0.194), ('warehouse', 0.159), ('xci', 0.12), ('ransac', 0.116), ('alignment', 0.115), ('eq', 0.115), ('furniture', 0.11), ('fine', 0.109), ('cl', 0.106), ('edgemap', 0.103), ('harris', 0.092), ('correspondence', 0.091), ('lda', 0.089), ('rendered', 0.085), ('candidates', 0.085), ('boundary', 0.078), ('candi', 0.077), ('ektorp', 0.077), ('fcorr', 0.077), ('fedge', 0.077), ('fhog', 0.077), ('fregion', 0.077), ('wdd', 0.077), ('rp', 0.074), ('bookcase', 0.069), ('edgemaps', 0.069), ('detection', 0.066), ('interest', 0.065), ('exact', 0.063), ('alg', 0.06), ('texture', 0.06), ('google', 0.06), ('hog', 0.059), ('agreeing', 0.057), ('poses', 0.056), ('models', 0.056), ('constellation', 0.055), ('queried', 0.055), ('sofa', 0.048), ('estimation', 0.047), ('bounding', 0.047), ('lbp', 0.046), ('satkin', 0.046), ('distance', 0.046), ('lim', 0.045), ('objects', 0.045), ('line', 0.045), ('wd', 0.043), ('detecting', 0.043), ('modality', 0.042), ('estimations', 0.042), ('contours', 0.042), ('candidate', 0.042), ('mask', 0.041), ('object', 0.041), ('correct', 0.041), ('deformable', 0.041), ('box', 0.041), ('error', 0.041), ('performances', 0.04), ('hence', 0.04), ('instances', 0.04), ('indoor', 0.04), ('real', 0.039), ('might', 0.039), ('xi', 0.039), ('lastly', 0.038), ('dissimilarity', 0.038), ('keypoint', 0.038), ('geometric', 0.038), ('threshold', 0.037), ('measure', 0.037), ('category', 0.037), ('change', 0.035), ('ate', 0.035), ('proposal', 0.035), ('ohen', 0.034), ('wtriex', 0.034), ('singleview', 0.034), ('populating', 0.034), ('categorylevel', 0.034), ('rmin', 0.034), ('fans', 0.034), ('difi', 0.034), ('endw', 0.034), ('epd', 0.034), ('itso', 0.034), ('ambitious', 0.034), ('cpe', 0.034), ('fluctuate', 0.034), ('orignal', 0.034), ('pgt', 0.034), ('regained', 0.034), ('screenshot', 0.034), ('tsx', 0.034), ('point', 0.034)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999964 308 iccv-2013-Parsing IKEA Objects: Fine Pose Estimation
Author: Joseph J. Lim, Hamed Pirsiavash, Antonio Torralba
Abstract: We address the problem of localizing and estimating the fine-pose of objects in the image with exact 3D models. Our main focus is to unify contributions from the 1970s with recent advances in object detection: use local keypoint detectors to find candidate poses and score global alignment of each candidate pose to the image. Moreover, we also provide a new dataset containing fine-aligned objects with their exactly matched 3D models, and a set of models for widely used objects. We also evaluate our algorithm both on object detection and fine pose estimation, and show that our method outperforms state-of-the art algorithms.
2 0.13682453 46 iccv-2013-Allocentric Pose Estimation
Author: M. José Antonio, Luc De_Raedt, Tinne Tuytelaars
Abstract: The task of object pose estimation has been a challenge since the early days of computer vision. To estimate the pose (or viewpoint) of an object, people have mostly looked at object intrinsic features, such as shape or appearance. Surprisingly, informative features provided by other, external elements in the scene, have so far mostly been ignored. At the same time, contextual cues have been shown to be of great benefit for related tasks such as object detection or action recognition. In this paper, we explore how information from other objects in the scene can be exploited for pose estimation. In particular, we look at object configurations. We show that, starting from noisy object detections and pose estimates, exploiting the estimated pose and location of other objects in the scene can help to estimate the objects’ poses more accurately. We explore both a camera-centered as well as an object-centered representation for relations. Experiments on the challenging KITTI dataset show that object configurations can indeed be used as a complementary cue to appearance-based pose estimation. In addition, object-centered relational representations can also assist object detection.
3 0.13163964 273 iccv-2013-Monocular Image 3D Human Pose Estimation under Self-Occlusion
Author: Ibrahim Radwan, Abhinav Dhall, Roland Goecke
Abstract: In this paper, an automatic approach for 3D pose reconstruction from a single image is proposed. The presence of human body articulation, hallucinated parts and cluttered background leads to ambiguity during the pose inference, which makes the problem non-trivial. Researchers have explored various methods based on motion and shading in order to reduce the ambiguity and reconstruct the 3D pose. The key idea of our algorithm is to impose both kinematic and orientation constraints. The former is imposed by projecting a 3D model onto the input image and pruning the parts, which are incompatible with the anthropomorphism. The latter is applied by creating synthetic views via regressing the input view to multiple oriented views. After applying the constraints, the 3D model is projected onto the initial and synthetic views, which further reduces the ambiguity. Finally, we borrow the direction of the unambiguous parts from the synthetic views to the initial one, which results in the 3D pose. Quantitative experiments are performed on the HumanEva-I dataset and qualitatively on unconstrained images from the Image Parse dataset. The results show the robustness of the proposed approach to accurately reconstruct the 3D pose form a single image.
4 0.1191422 367 iccv-2013-SUN3D: A Database of Big Spaces Reconstructed Using SfM and Object Labels
Author: Jianxiong Xiao, Andrew Owens, Antonio Torralba
Abstract: Existing scene understanding datasets contain only a limited set of views of a place, and they lack representations of complete 3D spaces. In this paper, we introduce SUN3D, a large-scale RGB-D video database with camera pose and object labels, capturing the full 3D extent of many places. The tasks that go into constructing such a dataset are difficult in isolation hand-labeling videos is painstaking, and structure from motion (SfM) is unreliable for large spaces. But if we combine them together, we make the dataset construction task much easier. First, we introduce an intuitive labeling tool that uses a partial reconstruction to propagate labels from one frame to another. Then we use the object labels to fix errors in the reconstruction. For this, we introduce a generalization of bundle adjustment that incorporates object-to-object correspondences. This algorithm works by constraining points for the same object from different frames to lie inside a fixed-size bounding box, parameterized by its rotation and translation. The SUN3D database, the source code for the generalized bundle adjustment, and the web-based 3D annotation tool are all avail– able at http://sun3d.cs.princeton.edu.
5 0.11611858 111 iccv-2013-Detecting Dynamic Objects with Multi-view Background Subtraction
Author: Raúl Díaz, Sam Hallman, Charless C. Fowlkes
Abstract: The confluence of robust algorithms for structure from motion along with high-coverage mapping and imaging of the world around us suggests that it will soon be feasible to accurately estimate camera pose for a large class photographs taken in outdoor, urban environments. In this paper, we investigate how such information can be used to improve the detection of dynamic objects such as pedestrians and cars. First, we show that when rough camera location is known, we can utilize detectors that have been trained with a scene-specific background model in order to improve detection accuracy. Second, when precise camera pose is available, dense matching to a database of existing images using multi-view stereo provides a way to eliminate static backgrounds such as building facades, akin to background-subtraction often used in video analysis. We evaluate these ideas using a dataset of tourist photos with estimated camera pose. For template-based pedestrian detection, we achieve a 50 percent boost in average precision over baseline.
6 0.11486564 1 iccv-2013-3DNN: Viewpoint Invariant 3D Geometry Matching for Scene Understanding
7 0.11376177 201 iccv-2013-Holistic Scene Understanding for 3D Object Detection with RGBD Cameras
8 0.1136585 218 iccv-2013-Interactive Markerless Articulated Hand Motion Tracking Using RGB and Depth Data
10 0.10974196 322 iccv-2013-Pose Estimation and Segmentation of People in 3D Movies
11 0.10925518 225 iccv-2013-Joint Segmentation and Pose Tracking of Human in Natural Videos
12 0.10672771 269 iccv-2013-Modeling Occlusion by Discriminative AND-OR Structures
13 0.10113045 366 iccv-2013-STAR3D: Simultaneous Tracking and Reconstruction of 3D Objects Using RGB-D Data
14 0.098060399 379 iccv-2013-Semantic Segmentation without Annotating Segments
15 0.097715087 335 iccv-2013-Random Faces Guided Sparse Many-to-One Encoder for Pose-Invariant Face Recognition
16 0.096830331 323 iccv-2013-Pose Estimation with Unknown Focal Length Using Points, Directions and Lines
18 0.094483599 107 iccv-2013-Deformable Part Descriptors for Fine-Grained Recognition and Attribute Prediction
19 0.094048932 316 iccv-2013-Pictorial Human Spaces: How Well Do Humans Perceive a 3D Articulated Pose?
20 0.09385483 341 iccv-2013-Real-Time Body Tracking with One Depth Camera and Inertial Sensors
topicId topicWeight
[(0, 0.245), (1, -0.073), (2, -0.028), (3, -0.025), (4, 0.09), (5, -0.057), (6, 0.031), (7, -0.017), (8, -0.071), (9, 0.003), (10, 0.073), (11, 0.039), (12, -0.107), (13, -0.077), (14, -0.019), (15, 0.052), (16, 0.043), (17, 0.001), (18, 0.043), (19, 0.026), (20, 0.03), (21, -0.029), (22, 0.048), (23, 0.0), (24, 0.099), (25, -0.009), (26, -0.041), (27, 0.082), (28, -0.023), (29, -0.048), (30, -0.009), (31, -0.028), (32, 0.012), (33, 0.01), (34, 0.009), (35, -0.024), (36, -0.006), (37, -0.08), (38, 0.031), (39, 0.041), (40, 0.009), (41, -0.019), (42, -0.067), (43, 0.001), (44, 0.03), (45, -0.04), (46, 0.034), (47, -0.045), (48, -0.04), (49, 0.042)]
simIndex simValue paperId paperTitle
same-paper 1 0.95908058 308 iccv-2013-Parsing IKEA Objects: Fine Pose Estimation
Author: Joseph J. Lim, Hamed Pirsiavash, Antonio Torralba
Abstract: We address the problem of localizing and estimating the fine-pose of objects in the image with exact 3D models. Our main focus is to unify contributions from the 1970s with recent advances in object detection: use local keypoint detectors to find candidate poses and score global alignment of each candidate pose to the image. Moreover, we also provide a new dataset containing fine-aligned objects with their exactly matched 3D models, and a set of models for widely used objects. We also evaluate our algorithm both on object detection and fine pose estimation, and show that our method outperforms state-of-the art algorithms.
2 0.84725398 46 iccv-2013-Allocentric Pose Estimation
Author: M. José Antonio, Luc De_Raedt, Tinne Tuytelaars
Abstract: The task of object pose estimation has been a challenge since the early days of computer vision. To estimate the pose (or viewpoint) of an object, people have mostly looked at object intrinsic features, such as shape or appearance. Surprisingly, informative features provided by other, external elements in the scene, have so far mostly been ignored. At the same time, contextual cues have been shown to be of great benefit for related tasks such as object detection or action recognition. In this paper, we explore how information from other objects in the scene can be exploited for pose estimation. In particular, we look at object configurations. We show that, starting from noisy object detections and pose estimates, exploiting the estimated pose and location of other objects in the scene can help to estimate the objects’ poses more accurately. We explore both a camera-centered as well as an object-centered representation for relations. Experiments on the challenging KITTI dataset show that object configurations can indeed be used as a complementary cue to appearance-based pose estimation. In addition, object-centered relational representations can also assist object detection.
3 0.80791998 118 iccv-2013-Discovering Object Functionality
Author: Bangpeng Yao, Jiayuan Ma, Li Fei-Fei
Abstract: Object functionality refers to the quality of an object that allows humans to perform some specific actions. It has been shown in psychology that functionality (affordance) is at least as essential as appearance in object recognition by humans. In computer vision, most previous work on functionality either assumes exactly one functionality for each object, or requires detailed annotation of human poses and objects. In this paper, we propose a weakly supervised approach to discover all possible object functionalities. Each object functionality is represented by a specific type of human-object interaction. Our method takes any possible human-object interaction into consideration, and evaluates image similarity in 3D rather than 2D in order to cluster human-object interactions more coherently. Experimental results on a dataset of people interacting with musical instruments show the effectiveness of our approach.
4 0.76589036 273 iccv-2013-Monocular Image 3D Human Pose Estimation under Self-Occlusion
Author: Ibrahim Radwan, Abhinav Dhall, Roland Goecke
Abstract: In this paper, an automatic approach for 3D pose reconstruction from a single image is proposed. The presence of human body articulation, hallucinated parts and cluttered background leads to ambiguity during the pose inference, which makes the problem non-trivial. Researchers have explored various methods based on motion and shading in order to reduce the ambiguity and reconstruct the 3D pose. The key idea of our algorithm is to impose both kinematic and orientation constraints. The former is imposed by projecting a 3D model onto the input image and pruning the parts, which are incompatible with the anthropomorphism. The latter is applied by creating synthetic views via regressing the input view to multiple oriented views. After applying the constraints, the 3D model is projected onto the initial and synthetic views, which further reduces the ambiguity. Finally, we borrow the direction of the unambiguous parts from the synthetic views to the initial one, which results in the 3D pose. Quantitative experiments are performed on the HumanEva-I dataset and qualitatively on unconstrained images from the Image Parse dataset. The results show the robustness of the proposed approach to accurately reconstruct the 3D pose form a single image.
5 0.74514908 344 iccv-2013-Recognising Human-Object Interaction via Exemplar Based Modelling
Author: Jian-Fang Hu, Wei-Shi Zheng, Jianhuang Lai, Shaogang Gong, Tao Xiang
Abstract: Human action can be recognised from a single still image by modelling Human-object interaction (HOI), which infers the mutual spatial structure information between human and object as well as their appearance. Existing approaches rely heavily on accurate detection of human and object, and estimation of human pose. They are thus sensitive to large variations of human poses, occlusion and unsatisfactory detection of small size objects. To overcome this limitation, a novel exemplar based approach is proposed in this work. Our approach learns a set of spatial pose-object interaction exemplars, which are density functions describing how a person is interacting with a manipulated object for different activities spatially in a probabilistic way. A representation based on our HOI exemplar thus has great potential for being robust to the errors in human/object detection and pose estimation. A new framework consists of a proposed exemplar based HOI descriptor and an activity specific matching model that learns the parameters is formulated for robust human activity recog- nition. Experiments on two benchmark activity datasets demonstrate that the proposed approach obtains state-ofthe-art performance.
6 0.70108426 316 iccv-2013-Pictorial Human Spaces: How Well Do Humans Perceive a 3D Articulated Pose?
7 0.68636346 56 iccv-2013-Automatic Registration of RGB-D Scans via Salient Directions
9 0.67858833 62 iccv-2013-Bird Part Localization Using Exemplar-Based Models with Enforced Pose and Subcategory Consistency
10 0.67853451 79 iccv-2013-Coherent Object Detection with 3D Geometric Context from a Single Image
11 0.67396939 111 iccv-2013-Detecting Dynamic Objects with Multi-view Background Subtraction
12 0.66801661 403 iccv-2013-Strong Appearance and Expressive Spatial Models for Human Pose Estimation
13 0.66660619 218 iccv-2013-Interactive Markerless Articulated Hand Motion Tracking Using RGB and Depth Data
14 0.64538336 313 iccv-2013-Person Re-identification by Salience Matching
15 0.64041591 189 iccv-2013-HOGgles: Visualizing Object Detection Features
16 0.6399194 367 iccv-2013-SUN3D: A Database of Big Spaces Reconstructed Using SfM and Object Labels
17 0.63947284 205 iccv-2013-Human Re-identification by Matching Compositional Template with Cluster Sampling
18 0.63943124 1 iccv-2013-3DNN: Viewpoint Invariant 3D Geometry Matching for Scene Understanding
19 0.63865817 75 iccv-2013-CoDeL: A Human Co-detection and Labeling Framework
20 0.63346374 24 iccv-2013-A Non-parametric Bayesian Network Prior of Human Pose
topicId topicWeight
[(2, 0.059), (4, 0.017), (7, 0.023), (12, 0.022), (26, 0.076), (31, 0.059), (35, 0.01), (40, 0.011), (42, 0.109), (48, 0.011), (55, 0.012), (64, 0.035), (73, 0.033), (81, 0.18), (89, 0.239), (98, 0.03)]
simIndex simValue paperId paperTitle
1 0.89773196 375 iccv-2013-Scene Collaging: Analysis and Synthesis of Natural Images with Semantic Layers
Author: Phillip Isola, Ce Liu
Abstract: To quickly synthesize complex scenes, digital artists often collage together visual elements from multiple sources: for example, mountainsfrom New Zealand behind a Scottish castle with wisps of Saharan sand in front. In this paper, we propose to use a similar process in order to parse a scene. We model a scene as a collage of warped, layered objects sampled from labeled, reference images. Each object is related to the rest by a set of support constraints. Scene parsing is achieved through analysis-by-synthesis. Starting with a dataset of labeled exemplar scenes, we retrieve a dictionary of candidate object segments thaOt miginaatl icmhag ea querEdyi eimd im-a age. We then combine elements of this set into a “scene collage ” that explains the query image. Beyond just assigning object labels to pixels, scene collaging produces a lot more information such as the number of each type of object in the scene, how they support one another, the ordinal depth of each object, and, to some degree, occluded content. We exploit this representation for several applications: image editing, random scene synthesis, and image-to-anaglyph.
same-paper 2 0.88044333 308 iccv-2013-Parsing IKEA Objects: Fine Pose Estimation
Author: Joseph J. Lim, Hamed Pirsiavash, Antonio Torralba
Abstract: We address the problem of localizing and estimating the fine-pose of objects in the image with exact 3D models. Our main focus is to unify contributions from the 1970s with recent advances in object detection: use local keypoint detectors to find candidate poses and score global alignment of each candidate pose to the image. Moreover, we also provide a new dataset containing fine-aligned objects with their exactly matched 3D models, and a set of models for widely used objects. We also evaluate our algorithm both on object detection and fine pose estimation, and show that our method outperforms state-of-the art algorithms.
3 0.86738837 174 iccv-2013-Forward Motion Deblurring
Author: Shicheng Zheng, Li Xu, Jiaya Jia
Abstract: We handle a special type of motion blur considering that cameras move primarily forward or backward. Solving this type of blur is of unique practical importance since nearly all car, traffic and bike-mounted cameras follow out-ofplane translational motion. We start with the study of geometric models and analyze the difficulty of existing methods to deal with them. We also propose a solution accounting for depth variation. Homographies associated with different 3D planes are considered and solved for in an optimization framework. Our method is verified on several natural image examples that cannot be satisfyingly dealt with by previous methods.
4 0.83957487 66 iccv-2013-Building Part-Based Object Detectors via 3D Geometry
Author: Abhinav Shrivastava, Abhinav Gupta
Abstract: This paper proposes a novel part-based representation for modeling object categories. Our representation combines the effectiveness of deformable part-based models with the richness of geometric representation by defining parts based on consistent underlying 3D geometry. Our key hypothesis is that while the appearance and the arrangement of parts might vary across the instances of object categories, the constituent parts will still have consistent underlying 3D geometry. We propose to learn this geometrydriven deformable part-based model (gDPM) from a set of labeled RGBD images. We also demonstrate how the geometric representation of gDPM can help us leverage depth data during training and constrain the latent model learning problem. But most importantly, a joint geometric and appearance based representation not only allows us to achieve state-of-the-art results on object detection but also allows us to tackle the grand challenge of understanding 3D objects from 2D images.
5 0.83857346 196 iccv-2013-Hierarchical Data-Driven Descent for Efficient Optimal Deformation Estimation
Author: Yuandong Tian, Srinivasa G. Narasimhan
Abstract: Real-world surfaces such as clothing, water and human body deform in complex ways. The image distortions observed are high-dimensional and non-linear, making it hard to estimate these deformations accurately. The recent datadriven descent approach [17] applies Nearest Neighbor estimators iteratively on a particular distribution of training samples to obtain a globally optimal and dense deformation field between a template and a distorted image. In this work, we develop a hierarchical structure for the Nearest Neighbor estimators, each of which can have only a local image support. We demonstrate in both theory and practice that this algorithm has several advantages over the nonhierarchical version: it guarantees global optimality with significantly fewer training samples, is several orders faster, provides a metric to decide whether a given image is “hard” (or “easy ”) requiring more (or less) samples, and can handle more complex scenes that include both global motion and local deformation. The proposed algorithm successfully tracks a broad range of non-rigid scenes including water, clothing, and medical images, and compares favorably against several other deformation estimation and tracking approaches that do not provide optimality guarantees.
6 0.83763093 447 iccv-2013-Volumetric Semantic Segmentation Using Pyramid Context Features
7 0.8372125 35 iccv-2013-Accurate Blur Models vs. Image Priors in Single Image Super-resolution
8 0.83670533 386 iccv-2013-Sequential Bayesian Model Update under Structured Scene Prior for Semantic Road Scenes Labeling
9 0.83613265 201 iccv-2013-Holistic Scene Understanding for 3D Object Detection with RGBD Cameras
10 0.83537275 349 iccv-2013-Regionlets for Generic Object Detection
11 0.83531588 426 iccv-2013-Training Deformable Part Models with Decorrelated Features
12 0.83521074 436 iccv-2013-Unsupervised Intrinsic Calibration from a Single Frame Using a "Plumb-Line" Approach
13 0.83521032 128 iccv-2013-Dynamic Probabilistic Volumetric Models
14 0.8350597 1 iccv-2013-3DNN: Viewpoint Invariant 3D Geometry Matching for Scene Understanding
15 0.83493501 189 iccv-2013-HOGgles: Visualizing Object Detection Features
16 0.83484936 107 iccv-2013-Deformable Part Descriptors for Fine-Grained Recognition and Attribute Prediction
17 0.83458364 24 iccv-2013-A Non-parametric Bayesian Network Prior of Human Pose
18 0.83454049 327 iccv-2013-Predicting an Object Location Using a Global Image Representation
19 0.83445698 286 iccv-2013-NYC3DCars: A Dataset of 3D Vehicles in Geographic Context
20 0.83382601 190 iccv-2013-Handling Occlusions with Franken-Classifiers