nips nips2012 nips2012-201 knowledge-graph by maker-knowledge-mining

201 nips-2012-Localizing 3D cuboids in single-view images


Source: pdf

Author: Jianxiong Xiao, Bryan Russell, Antonio Torralba

Abstract: In this paper we seek to detect rectangular cuboids and localize their corners in uncalibrated single-view images depicting everyday scenes. In contrast to recent approaches that rely on detecting vanishing points of the scene and grouping line segments to form cuboids, we build a discriminative parts-based detector that models the appearance of the cuboid corners and internal edges while enforcing consistency to a 3D cuboid model. Our model copes with different 3D viewpoints and aspect ratios and is able to detect cuboids across many different object categories. We introduce a database of images with cuboid annotations that spans a variety of indoor and outdoor scenes and show qualitative and quantitative results on our collected database. Our model out-performs baseline detectors that use 2D constraints alone on the task of localizing cuboid corners. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Localizing 3D cuboids in single-view images Jianxiong Xiao Bryan C. [sent-1, score-0.485]

2 Russell∗ Massachusetts Institute of Technology ∗ Antonio Torralba University of Washington Abstract In this paper we seek to detect rectangular cuboids and localize their corners in uncalibrated single-view images depicting everyday scenes. [sent-2, score-0.782]

3 In contrast to recent approaches that rely on detecting vanishing points of the scene and grouping line segments to form cuboids, we build a discriminative parts-based detector that models the appearance of the cuboid corners and internal edges while enforcing consistency to a 3D cuboid model. [sent-3, score-1.699]

4 Our model copes with different 3D viewpoints and aspect ratios and is able to detect cuboids across many different object categories. [sent-4, score-0.893]

5 We introduce a database of images with cuboid annotations that spans a variety of indoor and outdoor scenes and show qualitative and quantitative results on our collected database. [sent-5, score-0.85]

6 Our model out-performs baseline detectors that use 2D constraints alone on the task of localizing cuboid corners. [sent-6, score-0.606]

7 The detection and recovery of shape parameters yield at least a partial geometric description of the depicted scene, which allows a system to reason about the affordances of a scene in an object-agnostic fashion [9, 15]. [sent-15, score-0.379]

8 They detect line segments via Canny edges and recover surface orientations [13] to form 3D cuboid hypotheses using bottomup grouping of line and region segments. [sent-20, score-0.75]

9 As shown in Figure 1, we aim to build a 3D cuboid detector to detect individual boxy volumetric structures. [sent-23, score-0.747]

10 We build a discriminative parts-based detector that models the appearance of the corners and internal edges of cuboids while enforcing spatial consistency of the corners and edges to a 3D cuboid model. [sent-24, score-1.71]

11 Given a single-view input image, our goal is to detect the 2D corner locations of the cuboids depicted in the image. [sent-27, score-1.03]

12 With the output part locations we can subsequently recover information about the camera and 3D shape via camera resectioning. [sent-28, score-0.42]

13 Our cuboid detector is trained across different 3D viewpoints and aspect ratios. [sent-29, score-0.903]

14 Moreover, instead of relying on edge detection and grouping to form an initial hypothesis of a cuboid [9, 17, 26, 29], we use a 2D sliding window approach to exhaustively evaluate all possible detection windows. [sent-33, score-0.907]

15 2 Model for 3D cuboid localization We represent the appearance of cuboids by a set of parts located at the corners of the cuboid and a set of internal edges. [sent-40, score-1.918]

16 Let I be the image and pi = (xi , yi ) be the 2D image location of the ith corner on the cuboid. [sent-42, score-0.647]

17 We define an undirected loopy graph G = (V, E) over the corners of the cuboid, with vertices V and edges E connecting the corners of the cuboid. [sent-43, score-0.421]

18 By reasoning about the 3D shape, our model handles different 3D viewpoints and aspect ratios, as illustrated in Figure 2. [sent-46, score-0.272]

19 Edge(I, pi , pj ): The internal edge information on cuboids is informative and provides a salient feature for the locations of the corners. [sent-50, score-0.82]

20 For adjacent corners on the cuboid, we identify the edge between the two corners and calculate the image evidence to support the existence of such an edge. [sent-52, score-0.538]

21 Given the corner locations pi and pj , we use Chamfer matching to align the straight line between the two corners to edges extracted from the image. [sent-53, score-0.908]

22 We find image edges using Canny edge detection [3] and efficiently compute the distance of each pixel along the line segment to the nearest edge via the truncated distance transform. [sent-54, score-0.471]

23 We use Bresenham’s line algorithm [2] to efficiently find the 2D image locations on the line between the two points. [sent-55, score-0.272]

24 In (a) and (b) the displayed corner locations are the average 2D locations across all viewpoints and aspect ratios in our database. [sent-63, score-0.95]

25 Shape3D (p): The 3D shape of a cuboid constrains the layout of the parts and edges in the image. [sent-66, score-0.75]

26 We propose to define a shape term that measures how well the configuration of corner locations respect the 3D shape. [sent-67, score-0.59]

27 In other words, given the 2D locations p of the corners, we define a term that tells us how likely this configuration of corner locations p can be interpreted as the reprojection of a valid cuboid in 3D. [sent-68, score-1.22]

28 For each corner, we use the other six 2D corner locations to estimate the product PL using camera resectioning [10]. [sent-72, score-0.652]

29 The estimated matrix is used to predict the corner location. [sent-73, score-0.372]

30 We use the negative L2 distance to the predicted corner location as a feature for the corner in our model. [sent-74, score-0.787]

31 1 Inference Our goal is to find the 2D corner locations p over the HOG grid of I that maximizes the score given in Equation (1). [sent-78, score-0.505]

32 Therefore, we propose a spanning tree approximation to the graph to obtain multiple initial solutions for possible corner locations. [sent-80, score-0.401]

33 Then we adjust the corner locations using randomized simple hill climbing. [sent-81, score-0.534]

34 We use the following scoring function for the initialization: ST (I, p) = i∈V H wi · HOG(I, pi ) + ij∈T D wij · Displacement2D (pi , pj ) (2) Note that the model used for obtaining initial solutions is similar to [7, 27], which is only able to handle a fixed viewpoint and 2D aspect ratio. [sent-85, score-0.32]

35 With the tree approximation, we pick the top 1000 possible configurations of corner locations from each image and optimize our scoring function by adjusting the corner locations using randomized simple hill climbing. [sent-87, score-1.188]

36 Given the initial corner locations for a single configuration, we iteratively choose a random corner i with the goal of finding a new pixel location pi that increases the scoring ˆ function given in Equation (1) while holding the other corner locations fixed. [sent-88, score-1.558]

37 We also consider the pixel location that the 3D rigid model predicts when estimated from the other corner locations. [sent-90, score-0.457]

38 The algorithm terminates when no corner can reach a location that improves the score, which indicates that we have reached a local maxima. [sent-93, score-0.415]

39 Also, since only one corner can change locations at each iteration, we can reuse the computed scoring function from previous iterations during hill climbing. [sent-97, score-0.581]

40 Finally, we perform non-maximal suppression among the parts and then perform non-maximal suppression over the entire object to get the final detection result. [sent-98, score-0.285]

41 We tried mining negatives from the wrong corner locations in the positive examples but found that it did not improve the performance. [sent-106, score-0.505]

42 Since the latent positive mining helped, we also tried an offset compensation as post-processing to obtain the offset of corner locations introduced during latent positive mining. [sent-108, score-0.505]

43 3 Discussion Sliding window object detectors typically use a root filter that covers the entire object [4] or a combination of root filter and part filters [7]. [sent-112, score-0.309]

44 The use of a root filter is sufficient to capture the appearance for many object categories since they have canonical 3D viewpoints and aspect ratios. [sent-113, score-0.493]

45 However, cuboids in general span a large number of object categories and do not have a consistent 3D viewpoint or aspect ratio. [sent-114, score-0.672]

46 The diversity of 3D viewpoints and aspect ratios causes dramatic changes in the root filter response. [sent-115, score-0.36]

47 Moreover, we argue that a purely view-based approach that trains separate models for the different viewpoints and aspect ratios may not capture well this diversity. [sent-117, score-0.312]

48 As our model handles different viewpoints and aspect ratios, we are able to make use of the entire database during training. [sent-121, score-0.341]

49 Due to the diversity of cuboid appearance, our model is designed to capture the most salient features, namely the corners and edges. [sent-122, score-0.729]

50 A projection of the cuboid model is overlaid on the image and the user must select and drag anchor points to their corresponding location in the image. [sent-127, score-0.691]

51 (b) Scatter plot of 3D azimuth and elevation angles for annotated cuboids with zenith angle close zero. [sent-128, score-0.694]

52 (c) Crops of cuboids at different azimuth angles for a fixed elevation, with the shown examples marked as red points in the scatter plot of (b). [sent-130, score-0.612]

53 Furthermore, we do not make use of other appearance cues, such as the appearance within the cuboid faces, since they have a larger variation across the object categories (e. [sent-132, score-0.834]

54 Compared with recent approaches that detect cuboids by reasoning about the shape of the entire scene [9, 11, 12, 17, 19, 29], one of the key differences is that we detect cuboids directly without consideration of the global scene geometry. [sent-136, score-1.231]

55 These prior approaches rely heavily on the assumption that the camera is located inside a cuboid-like room and held at human height, with the parameters of the room cuboid inferred through vanishing points based on a Manhattan world assumption. [sent-137, score-0.72]

56 As our detector is agnostic to the scene geometry, we are able to detect cuboids even when these assumptions are violated. [sent-141, score-0.693]

57 We observe that not all cuboid-like objects are perfect cuboids in practice. [sent-143, score-0.492]

58 With our approach, if a rigid cuboid is needed, we can recover the 3D shape parameters via camera resectioning, as shown in Figure 9. [sent-154, score-0.773]

59 3 Database of 3D cuboids To develop and evaluate any models for 3D cuboid detection in real-world environments, it is necessary to have a large database of images depicting everyday scenes with 3D cuboids labeled. [sent-155, score-1.766]

60 We have built a labeling tool that allows a user to select and drag key points on a projected 3D cuboid model to its corresponding location in the image. [sent-157, score-0.618]

61 Given the corner correspondences, the parameters for the 3D cuboids and camera are estimated. [sent-161, score-0.903]

62 The cuboid and camera parameters are estimated up to a similarity transformation via camera resectioning using Levenberg-Marquardt optimization [10]. [sent-162, score-0.793]

63 5 Figure 5: Single top 3D cuboid detection in each image. [sent-163, score-0.663]

64 The false positives tend to occur when a part fires on a “cuboid-like” corner region (e. [sent-166, score-0.372]

65 Figure 6: All 3D cuboid detections above a fixed threshold in each image. [sent-171, score-0.545]

66 Notice that our model is able to detect the presence of multiple cuboids in an image (e. [sent-172, score-0.562]

67 For our database, we have 785 images with 1269 cuboids annotated. [sent-182, score-0.485]

68 We have also collected a negative set containing 2746 images that do contain any cuboid like objects. [sent-183, score-0.6]

69 In Figure 4(b) we show a scatter plot of the azimuth and elevation angles for all of the labeled cuboids with zenith angle close to zero. [sent-186, score-0.735]

70 Notice that the cuboids cover a large range of azimuth angles for elevation angles between 0 (frontal view) and 45 degrees. [sent-187, score-0.702]

71 Figure 8(c) shows the distribution of objects from the SUN database [25] that overlap with our cuboids (there are 326 objects total from 114 unique classes). [sent-189, score-0.623]

72 Compared with [12], our database covers a larger set of object and scene categories, with images focusing on both objects and scenes (all images in [12] are indoor scene images). [sent-190, score-0.623]

73 Moreover, we annotate objects closely resembling a 3D cuboid (in [12] there are many non-cuboids that are annotated with a bounding cuboid) and overall our cuboids are more accurately labeled. [sent-191, score-1.069]

74 4 Evaluation In this section we show qualitative results of our model on the 3D cuboids database and report quantitative results on two tasks: (i) 3D cuboid detection and (ii) corner localization accuracy. [sent-192, score-1.634]

75 Notice that our model is able to better localize cuboid corners over the baseline 2D tree-based model, which corresponds to 2D parts-based models used in object detection and articulated pose estimation [7, 27]. [sent-199, score-1.01]

76 The last column shows a failure case where a part fires on a “cuboid-like” corner region in the image. [sent-200, score-0.372]

77 to mirror left-right and orient the 3D cuboid to minimize the variation in rotational angle. [sent-201, score-0.57]

78 During testing, we run the detector on left-right mirrors of the image and select the output at each location with the highest detector response. [sent-202, score-0.356]

79 For the parts we extract HOG features [4] in a window centered at each corner with scale of 10% of the object bounding box size. [sent-203, score-0.57]

80 Figure 5 shows the single top cuboid detection in each image and Figure 6 shows all of the most confident detections in the image. [sent-204, score-0.736]

81 We note that our model fails when a corner fires on a “cuboid-like” corner region (e. [sent-209, score-0.744]

82 The first baseline is a root HOG template [4] trained over the appearance within a bounding box covering the entire object. [sent-213, score-0.271]

83 A single model using the root HOG template is trained for all viewpoints and aspect ratios. [sent-214, score-0.318]

84 During detection, output corner locations corresponding to the average training corner locations relative to the bounding boxes are returned. [sent-215, score-1.042]

85 The second baseline is the 2D tree-based approximation of Equation (2), which corresponds to existing 2D parts models used in object detection and articulated pose estimation [7, 27]. [sent-216, score-0.31]

86 We evaluate geometric primitive detection accuracy using the bounding box overlap criteria in the Pascal VOC [6]. [sent-219, score-0.284]

87 We have observed that all of the cornerbased models achieve almost identical detection accuracy across all recall levels, and out-perform the root HOG template detector [4]. [sent-221, score-0.344]

88 This in effect does not allow us to detect additional cuboids but allows for better part localization. [sent-223, score-0.489]

89 In addition to detection accuracy, we also measure corner localization accuracy for correctly detected examples for a given model. [sent-224, score-0.602]

90 A corner is deemed correct if its predicted image location is within t pixels of the ground truth corner location. [sent-225, score-0.913]

91 The reported trends in the corner localization performance hold for nearby values of t. [sent-227, score-0.448]

92 In Figure 8 we plot corner localization accuracy as a function of recall and compare our model against the two baselines. [sent-228, score-0.474]

93 Also, the additional edge and 3D shape terms provide a gain in performance over using the appearance and 2D spatial terms alone. [sent-231, score-0.296]

94 Notice that all of the corner-based models achieve almost identical detection accuracy across all recall levels and out-perform the root HOG template detector [4]. [sent-274, score-0.344]

95 For the task of corner localization, our full model out-performs the two baseline detectors or when either the Edge or Shape3D terms are omitted from our model. [sent-275, score-0.433]

96 Figure 9: Detected cuboids and subsequent synthesized new views via camera resectioning. [sent-279, score-0.531]

97 5 Conclusion We have introduced a novel model that detects 3D cuboids and localizes their corners in single-view images. [sent-280, score-0.641]

98 Our 3D cuboid detector makes use of both corner and edge information. [sent-281, score-1.134]

99 Moreover, we have constructed a dataset with ground truth cuboid annotations. [sent-282, score-0.598]

100 Our detector handles different 3D viewpoints and aspect ratios and, in contrast to recent approaches for 3D cuboid detection, does not make any assumptions about the scene geometry and allows for deformation of the 3D cuboid shape. [sent-283, score-1.692]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('cuboid', 0.545), ('cuboids', 0.43), ('corner', 0.372), ('corners', 0.184), ('viewpoints', 0.148), ('locations', 0.133), ('detector', 0.12), ('detection', 0.118), ('camera', 0.101), ('hog', 0.099), ('edge', 0.097), ('object', 0.092), ('aspect', 0.09), ('azimuth', 0.087), ('pi', 0.086), ('shape', 0.085), ('scene', 0.084), ('appearance', 0.082), ('elevation', 0.077), ('localization', 0.076), ('ratios', 0.074), ('image', 0.073), ('database', 0.069), ('scenes', 0.065), ('objects', 0.062), ('xiao', 0.059), ('detect', 0.059), ('indoor', 0.057), ('geometric', 0.056), ('images', 0.055), ('angles', 0.054), ('depicting', 0.054), ('edges', 0.053), ('ws', 0.049), ('root', 0.048), ('pj', 0.047), ('scoring', 0.047), ('chamfer', 0.046), ('resectioning', 0.046), ('zenith', 0.046), ('box', 0.045), ('location', 0.043), ('rigid', 0.042), ('scatter', 0.041), ('primitives', 0.041), ('articulated', 0.039), ('layout', 0.038), ('hoiem', 0.037), ('reprojection', 0.037), ('detected', 0.036), ('depicted', 0.036), ('outdoor', 0.035), ('canny', 0.035), ('handles', 0.034), ('notice', 0.033), ('categories', 0.033), ('primitive', 0.033), ('asia', 0.033), ('line', 0.033), ('spatial', 0.032), ('baseline', 0.032), ('bounding', 0.032), ('siggraph', 0.032), ('res', 0.032), ('template', 0.032), ('cvpr', 0.031), ('deformations', 0.031), ('cube', 0.031), ('drag', 0.03), ('night', 0.03), ('detectors', 0.029), ('parts', 0.029), ('sliding', 0.029), ('occlusions', 0.029), ('hill', 0.029), ('tree', 0.029), ('ground', 0.028), ('deformation', 0.028), ('russell', 0.028), ('viewpoint', 0.027), ('orientations', 0.027), ('sun', 0.027), ('hedau', 0.027), ('jianxiong', 0.027), ('localizes', 0.027), ('internal', 0.027), ('recall', 0.026), ('vanishing', 0.026), ('bryan', 0.025), ('rotational', 0.025), ('truth', 0.025), ('room', 0.024), ('zhao', 0.024), ('geometry', 0.024), ('qualitative', 0.024), ('wij', 0.023), ('efros', 0.023), ('hartley', 0.023), ('volumetric', 0.023), ('suppression', 0.023)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000005 201 nips-2012-Localizing 3D cuboids in single-view images

Author: Jianxiong Xiao, Bryan Russell, Antonio Torralba

Abstract: In this paper we seek to detect rectangular cuboids and localize their corners in uncalibrated single-view images depicting everyday scenes. In contrast to recent approaches that rely on detecting vanishing points of the scene and grouping line segments to form cuboids, we build a discriminative parts-based detector that models the appearance of the cuboid corners and internal edges while enforcing consistency to a 3D cuboid model. Our model copes with different 3D viewpoints and aspect ratios and is able to detect cuboids across many different object categories. We introduce a database of images with cuboid annotations that spans a variety of indoor and outdoor scenes and show qualitative and quantitative results on our collected database. Our model out-performs baseline detectors that use 2D constraints alone on the task of localizing cuboid corners. 1

2 0.44578525 1 nips-2012-3D Object Detection and Viewpoint Estimation with a Deformable 3D Cuboid Model

Author: Sanja Fidler, Sven Dickinson, Raquel Urtasun

Abstract: This paper addresses the problem of category-level 3D object detection. Given a monocular image, our aim is to localize the objects in 3D by enclosing them with tight oriented 3D bounding boxes. We propose a novel approach that extends the well-acclaimed deformable part-based model [1] to reason in 3D. Our model represents an object class as a deformable 3D cuboid composed of faces and parts, which are both allowed to deform with respect to their anchors on the 3D box. We model the appearance of each face in fronto-parallel coordinates, thus effectively factoring out the appearance variation induced by viewpoint. Our model reasons about face visibility patters called aspects. We train the cuboid model jointly and discriminatively and share weights across all aspects to attain efficiency. Inference then entails sliding and rotating the box in 3D and scoring object hypotheses. While for inference we discretize the search space, the variables are continuous in our model. We demonstrate the effectiveness of our approach in indoor and outdoor scenarios, and show that our approach significantly outperforms the stateof-the-art in both 2D [1] and 3D object detection [2]. 1

3 0.18149997 40 nips-2012-Analyzing 3D Objects in Cluttered Images

Author: Mohsen Hejrati, Deva Ramanan

Abstract: We present an approach to detecting and analyzing the 3D configuration of objects in real-world images with heavy occlusion and clutter. We focus on the application of finding and analyzing cars. We do so with a two-stage model; the first stage reasons about 2D shape and appearance variation due to within-class variation (station wagons look different than sedans) and changes in viewpoint. Rather than using a view-based model, we describe a compositional representation that models a large number of effective views and shapes using a small number of local view-based templates. We use this model to propose candidate detections and 2D estimates of shape. These estimates are then refined by our second stage, using an explicit 3D model of shape and viewpoint. We use a morphable model to capture 3D within-class variation, and use a weak-perspective camera model to capture viewpoint. We learn all model parameters from 2D annotations. We demonstrate state-of-the-art accuracy for detection, viewpoint estimation, and 3D shape reconstruction on challenging images from the PASCAL VOC 2011 dataset. 1

4 0.16355281 344 nips-2012-Timely Object Recognition

Author: Sergey Karayev, Tobias Baumgartner, Mario Fritz, Trevor Darrell

Abstract: In a large visual multi-class detection framework, the timeliness of results can be crucial. Our method for timely multi-class detection aims to give the best possible performance at any single point after a start time; it is terminated at a deadline time. Toward this goal, we formulate a dynamic, closed-loop policy that infers the contents of the image in order to decide which detector to deploy next. In contrast to previous work, our method significantly diverges from the predominant greedy strategies, and is able to learn to take actions with deferred values. We evaluate our method with a novel timeliness measure, computed as the area under an Average Precision vs. Time curve. Experiments are conducted on the PASCAL VOC object detection dataset. If execution is stopped when only half the detectors have been run, our method obtains 66% better AP than a random ordering, and 14% better performance than an intelligent baseline. On the timeliness measure, our method obtains at least 11% better performance. Our method is easily extensible, as it treats detectors and classifiers as black boxes and learns from execution traces using reinforcement learning. 1

5 0.13252604 106 nips-2012-Dynamical And-Or Graph Learning for Object Shape Modeling and Detection

Author: Xiaolong Wang, Liang Lin

Abstract: This paper studies a novel discriminative part-based model to represent and recognize object shapes with an “And-Or graph”. We define this model consisting of three layers: the leaf-nodes with collaborative edges for localizing local parts, the or-nodes specifying the switch of leaf-nodes, and the root-node encoding the global verification. A discriminative learning algorithm, extended from the CCCP [23], is proposed to train the model in a dynamical manner: the model structure (e.g., the configuration of the leaf-nodes associated with the or-nodes) is automatically determined with optimizing the multi-layer parameters during the iteration. The advantages of our method are two-fold. (i) The And-Or graph model enables us to handle well large intra-class variance and background clutters for object shape detection from images. (ii) The proposed learning algorithm is able to obtain the And-Or graph representation without requiring elaborate supervision and initialization. We validate the proposed method on several challenging databases (e.g., INRIA-Horse, ETHZ-Shape, and UIUC-People), and it outperforms the state-of-the-arts approaches. 1

6 0.12468732 209 nips-2012-Max-Margin Structured Output Regression for Spatio-Temporal Action Localization

7 0.11895437 303 nips-2012-Searching for objects driven by context

8 0.11675823 311 nips-2012-Shifting Weights: Adapting Object Detectors from Image to Video

9 0.11150084 360 nips-2012-Visual Recognition using Embedded Feature Selection for Curvature Self-Similarity

10 0.10664184 357 nips-2012-Unsupervised Template Learning for Fine-Grained Object Recognition

11 0.096493483 8 nips-2012-A Generative Model for Parts-based Object Segmentation

12 0.086341202 185 nips-2012-Learning about Canonical Views from Internet Image Collections

13 0.08320196 83 nips-2012-Controlled Recognition Bounds for Visual Learning and Exploration

14 0.082495652 81 nips-2012-Context-Sensitive Decision Forests for Object Detection

15 0.078296833 168 nips-2012-Kernel Latent SVM for Visual Recognition

16 0.07409855 18 nips-2012-A Simple and Practical Algorithm for Differentially Private Data Release

17 0.063901439 101 nips-2012-Discriminatively Trained Sparse Code Gradients for Contour Detection

18 0.0630043 103 nips-2012-Distributed Probabilistic Learning for Camera Networks with Missing Data

19 0.062820561 62 nips-2012-Burn-in, bias, and the rationality of anchoring

20 0.062820561 116 nips-2012-Emergence of Object-Selective Features in Unsupervised Feature Learning


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.144), (1, 0.017), (2, -0.204), (3, -0.023), (4, 0.127), (5, -0.139), (6, -0.001), (7, -0.125), (8, -0.001), (9, -0.007), (10, -0.075), (11, 0.014), (12, 0.133), (13, -0.223), (14, 0.085), (15, 0.242), (16, 0.018), (17, -0.133), (18, -0.108), (19, -0.008), (20, 0.01), (21, -0.027), (22, -0.02), (23, 0.055), (24, -0.048), (25, 0.084), (26, 0.033), (27, 0.055), (28, -0.065), (29, -0.058), (30, -0.0), (31, -0.049), (32, -0.016), (33, 0.081), (34, -0.022), (35, -0.024), (36, -0.021), (37, -0.05), (38, -0.081), (39, -0.038), (40, -0.041), (41, 0.102), (42, -0.053), (43, -0.025), (44, -0.038), (45, -0.007), (46, -0.061), (47, 0.001), (48, 0.036), (49, -0.022)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95535225 201 nips-2012-Localizing 3D cuboids in single-view images

Author: Jianxiong Xiao, Bryan Russell, Antonio Torralba

Abstract: In this paper we seek to detect rectangular cuboids and localize their corners in uncalibrated single-view images depicting everyday scenes. In contrast to recent approaches that rely on detecting vanishing points of the scene and grouping line segments to form cuboids, we build a discriminative parts-based detector that models the appearance of the cuboid corners and internal edges while enforcing consistency to a 3D cuboid model. Our model copes with different 3D viewpoints and aspect ratios and is able to detect cuboids across many different object categories. We introduce a database of images with cuboid annotations that spans a variety of indoor and outdoor scenes and show qualitative and quantitative results on our collected database. Our model out-performs baseline detectors that use 2D constraints alone on the task of localizing cuboid corners. 1

2 0.94632459 1 nips-2012-3D Object Detection and Viewpoint Estimation with a Deformable 3D Cuboid Model

Author: Sanja Fidler, Sven Dickinson, Raquel Urtasun

Abstract: This paper addresses the problem of category-level 3D object detection. Given a monocular image, our aim is to localize the objects in 3D by enclosing them with tight oriented 3D bounding boxes. We propose a novel approach that extends the well-acclaimed deformable part-based model [1] to reason in 3D. Our model represents an object class as a deformable 3D cuboid composed of faces and parts, which are both allowed to deform with respect to their anchors on the 3D box. We model the appearance of each face in fronto-parallel coordinates, thus effectively factoring out the appearance variation induced by viewpoint. Our model reasons about face visibility patters called aspects. We train the cuboid model jointly and discriminatively and share weights across all aspects to attain efficiency. Inference then entails sliding and rotating the box in 3D and scoring object hypotheses. While for inference we discretize the search space, the variables are continuous in our model. We demonstrate the effectiveness of our approach in indoor and outdoor scenarios, and show that our approach significantly outperforms the stateof-the-art in both 2D [1] and 3D object detection [2]. 1

3 0.82255512 40 nips-2012-Analyzing 3D Objects in Cluttered Images

Author: Mohsen Hejrati, Deva Ramanan

Abstract: We present an approach to detecting and analyzing the 3D configuration of objects in real-world images with heavy occlusion and clutter. We focus on the application of finding and analyzing cars. We do so with a two-stage model; the first stage reasons about 2D shape and appearance variation due to within-class variation (station wagons look different than sedans) and changes in viewpoint. Rather than using a view-based model, we describe a compositional representation that models a large number of effective views and shapes using a small number of local view-based templates. We use this model to propose candidate detections and 2D estimates of shape. These estimates are then refined by our second stage, using an explicit 3D model of shape and viewpoint. We use a morphable model to capture 3D within-class variation, and use a weak-perspective camera model to capture viewpoint. We learn all model parameters from 2D annotations. We demonstrate state-of-the-art accuracy for detection, viewpoint estimation, and 3D shape reconstruction on challenging images from the PASCAL VOC 2011 dataset. 1

4 0.71076983 357 nips-2012-Unsupervised Template Learning for Fine-Grained Object Recognition

Author: Shulin Yang, Liefeng Bo, Jue Wang, Linda G. Shapiro

Abstract: Fine-grained recognition refers to a subordinate level of recognition, such as recognizing different species of animals and plants. It differs from recognition of basic categories, such as humans, tables, and computers, in that there are global similarities in shape and structure shared cross different categories, and the differences are in the details of object parts. We suggest that the key to identifying the fine-grained differences lies in finding the right alignment of image regions that contain the same object parts. We propose a template model for the purpose, which captures common shape patterns of object parts, as well as the cooccurrence relation of the shape patterns. Once the image regions are aligned, extracted features are used for classification. Learning of the template model is efficient, and the recognition results we achieve significantly outperform the stateof-the-art algorithms. 1

5 0.67193025 106 nips-2012-Dynamical And-Or Graph Learning for Object Shape Modeling and Detection

Author: Xiaolong Wang, Liang Lin

Abstract: This paper studies a novel discriminative part-based model to represent and recognize object shapes with an “And-Or graph”. We define this model consisting of three layers: the leaf-nodes with collaborative edges for localizing local parts, the or-nodes specifying the switch of leaf-nodes, and the root-node encoding the global verification. A discriminative learning algorithm, extended from the CCCP [23], is proposed to train the model in a dynamical manner: the model structure (e.g., the configuration of the leaf-nodes associated with the or-nodes) is automatically determined with optimizing the multi-layer parameters during the iteration. The advantages of our method are two-fold. (i) The And-Or graph model enables us to handle well large intra-class variance and background clutters for object shape detection from images. (ii) The proposed learning algorithm is able to obtain the And-Or graph representation without requiring elaborate supervision and initialization. We validate the proposed method on several challenging databases (e.g., INRIA-Horse, ETHZ-Shape, and UIUC-People), and it outperforms the state-of-the-arts approaches. 1

6 0.61910015 209 nips-2012-Max-Margin Structured Output Regression for Spatio-Temporal Action Localization

7 0.61846906 8 nips-2012-A Generative Model for Parts-based Object Segmentation

8 0.61295044 344 nips-2012-Timely Object Recognition

9 0.59697449 303 nips-2012-Searching for objects driven by context

10 0.572442 311 nips-2012-Shifting Weights: Adapting Object Detectors from Image to Video

11 0.54020548 360 nips-2012-Visual Recognition using Embedded Feature Selection for Curvature Self-Similarity

12 0.51809973 137 nips-2012-From Deformations to Parts: Motion-based Segmentation of 3D Objects

13 0.47861949 185 nips-2012-Learning about Canonical Views from Internet Image Collections

14 0.46912843 101 nips-2012-Discriminatively Trained Sparse Code Gradients for Contour Detection

15 0.46700302 103 nips-2012-Distributed Probabilistic Learning for Camera Networks with Missing Data

16 0.45817199 223 nips-2012-Multi-criteria Anomaly Detection using Pareto Depth Analysis

17 0.45018661 210 nips-2012-Memorability of Image Regions

18 0.44052735 2 nips-2012-3D Social Saliency from Head-mounted Cameras

19 0.39868346 83 nips-2012-Controlled Recognition Bounds for Visual Learning and Exploration

20 0.39593273 168 nips-2012-Kernel Latent SVM for Visual Recognition


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.031), (17, 0.016), (21, 0.084), (38, 0.075), (39, 0.02), (42, 0.016), (54, 0.031), (55, 0.027), (60, 0.218), (74, 0.154), (76, 0.139), (80, 0.053), (86, 0.013), (92, 0.039)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.84748733 201 nips-2012-Localizing 3D cuboids in single-view images

Author: Jianxiong Xiao, Bryan Russell, Antonio Torralba

Abstract: In this paper we seek to detect rectangular cuboids and localize their corners in uncalibrated single-view images depicting everyday scenes. In contrast to recent approaches that rely on detecting vanishing points of the scene and grouping line segments to form cuboids, we build a discriminative parts-based detector that models the appearance of the cuboid corners and internal edges while enforcing consistency to a 3D cuboid model. Our model copes with different 3D viewpoints and aspect ratios and is able to detect cuboids across many different object categories. We introduce a database of images with cuboid annotations that spans a variety of indoor and outdoor scenes and show qualitative and quantitative results on our collected database. Our model out-performs baseline detectors that use 2D constraints alone on the task of localizing cuboid corners. 1

2 0.73322213 118 nips-2012-Entangled Monte Carlo

Author: Seong-hwan Jun, Liangliang Wang, Alexandre Bouchard-côté

Abstract: We propose a novel method for scalable parallelization of SMC algorithms, Entangled Monte Carlo simulation (EMC). EMC avoids the transmission of particles between nodes, and instead reconstructs them from the particle genealogy. In particular, we show that we can reduce the communication to the particle weights for each machine while efficiently maintaining implicit global coherence of the parallel simulation. We explain methods to efficiently maintain a genealogy of particles from which any particle can be reconstructed. We demonstrate using examples from Bayesian phylogenetic that the computational gain from parallelization using EMC significantly outweighs the cost of particle reconstruction. The timing experiments show that reconstruction of particles is indeed much more efficient as compared to transmission of particles. 1

3 0.70443135 3 nips-2012-A Bayesian Approach for Policy Learning from Trajectory Preference Queries

Author: Aaron Wilson, Alan Fern, Prasad Tadepalli

Abstract: We consider the problem of learning control policies via trajectory preference queries to an expert. In particular, the agent presents an expert with short runs of a pair of policies originating from the same state and the expert indicates which trajectory is preferred. The agent’s goal is to elicit a latent target policy from the expert with as few queries as possible. To tackle this problem we propose a novel Bayesian model of the querying process and introduce two methods that exploit this model to actively select expert queries. Experimental results on four benchmark problems indicate that our model can effectively learn policies from trajectory preference queries and that active query selection can be substantially more efficient than random selection. 1

4 0.69986963 360 nips-2012-Visual Recognition using Embedded Feature Selection for Curvature Self-Similarity

Author: Angela Eigenstetter, Bjorn Ommer

Abstract: Category-level object detection has a crucial need for informative object representations. This demand has led to feature descriptors of ever increasing dimensionality like co-occurrence statistics and self-similarity. In this paper we propose a new object representation based on curvature self-similarity that goes beyond the currently popular approximation of objects using straight lines. However, like all descriptors using second order statistics, ours also exhibits a high dimensionality. Although improving discriminability, the high dimensionality becomes a critical issue due to lack of generalization ability and curse of dimensionality. Given only a limited amount of training data, even sophisticated learning algorithms such as the popular kernel methods are not able to suppress noisy or superfluous dimensions of such high-dimensional data. Consequently, there is a natural need for feature selection when using present-day informative features and, particularly, curvature self-similarity. We therefore suggest an embedded feature selection method for SVMs that reduces complexity and improves generalization capability of object models. By successfully integrating the proposed curvature self-similarity representation together with the embedded feature selection in a widely used state-of-the-art object detection framework we show the general pertinence of the approach. 1

5 0.69792265 202 nips-2012-Locally Uniform Comparison Image Descriptor

Author: Andrew Ziegler, Eric Christiansen, David Kriegman, Serge J. Belongie

Abstract: Keypoint matching between pairs of images using popular descriptors like SIFT or a faster variant called SURF is at the heart of many computer vision algorithms including recognition, mosaicing, and structure from motion. However, SIFT and SURF do not perform well for real-time or mobile applications. As an alternative very fast binary descriptors like BRIEF and related methods use pairwise comparisons of pixel intensities in an image patch. We present an analysis of BRIEF and related approaches revealing that they are hashing schemes on the ordinal correlation metric Kendall’s tau. Here, we introduce Locally Uniform Comparison Image Descriptor (LUCID), a simple description method based on linear time permutation distances between the ordering of RGB values of two image patches. LUCID is computable in linear time with respect to the number of pixels and does not require floating point computation. 1

6 0.69530499 357 nips-2012-Unsupervised Template Learning for Fine-Grained Object Recognition

7 0.69447929 339 nips-2012-The Time-Marginalized Coalescent Prior for Hierarchical Clustering

8 0.69299734 40 nips-2012-Analyzing 3D Objects in Cluttered Images

9 0.68894523 210 nips-2012-Memorability of Image Regions

10 0.68773776 114 nips-2012-Efficient coding provides a direct link between prior and likelihood in perceptual Bayesian inference

11 0.68726808 1 nips-2012-3D Object Detection and Viewpoint Estimation with a Deformable 3D Cuboid Model

12 0.68279725 337 nips-2012-The Lovász ϑ function, SVMs and finding large dense subgraphs

13 0.68085891 303 nips-2012-Searching for objects driven by context

14 0.68031526 185 nips-2012-Learning about Canonical Views from Internet Image Collections

15 0.67946225 274 nips-2012-Priors for Diversity in Generative Latent Variable Models

16 0.67454731 176 nips-2012-Learning Image Descriptors with the Boosting-Trick

17 0.6646325 101 nips-2012-Discriminatively Trained Sparse Code Gradients for Contour Detection

18 0.66416788 8 nips-2012-A Generative Model for Parts-based Object Segmentation

19 0.6583398 106 nips-2012-Dynamical And-Or Graph Learning for Object Shape Modeling and Detection

20 0.65333652 209 nips-2012-Max-Margin Structured Output Regression for Spatio-Temporal Action Localization