nips nips2012 nips2012-1 knowledge-graph by maker-knowledge-mining

1 nips-2012-3D Object Detection and Viewpoint Estimation with a Deformable 3D Cuboid Model


Source: pdf

Author: Sanja Fidler, Sven Dickinson, Raquel Urtasun

Abstract: This paper addresses the problem of category-level 3D object detection. Given a monocular image, our aim is to localize the objects in 3D by enclosing them with tight oriented 3D bounding boxes. We propose a novel approach that extends the well-acclaimed deformable part-based model [1] to reason in 3D. Our model represents an object class as a deformable 3D cuboid composed of faces and parts, which are both allowed to deform with respect to their anchors on the 3D box. We model the appearance of each face in fronto-parallel coordinates, thus effectively factoring out the appearance variation induced by viewpoint. Our model reasons about face visibility patters called aspects. We train the cuboid model jointly and discriminatively and share weights across all aspects to attain efficiency. Inference then entails sliding and rotating the box in 3D and scoring object hypotheses. While for inference we discretize the search space, the variables are continuous in our model. We demonstrate the effectiveness of our approach in indoor and outdoor scenarios, and show that our approach significantly outperforms the stateof-the-art in both 2D [1] and 3D object detection [2]. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Our model represents an object class as a deformable 3D cuboid composed of faces and parts, which are both allowed to deform with respect to their anchors on the 3D box. [sent-8, score-1.242]

2 We model the appearance of each face in fronto-parallel coordinates, thus effectively factoring out the appearance variation induced by viewpoint. [sent-9, score-0.519]

3 Our model reasons about face visibility patters called aspects. [sent-10, score-0.427]

4 We train the cuboid model jointly and discriminatively and share weights across all aspects to attain efficiency. [sent-11, score-0.617]

5 Inference then entails sliding and rotating the box in 3D and scoring object hypotheses. [sent-12, score-0.377]

6 We demonstrate the effectiveness of our approach in indoor and outdoor scenarios, and show that our approach significantly outperforms the stateof-the-art in both 2D [1] and 3D object detection [2]. [sent-14, score-0.375]

7 While impressive performance has been achieved for instance-level 3D object recognition [3], category-level 3D object detection has proven to be a much harder task, due to intra-class variation as well as appearance variation due to viewpoint changes. [sent-19, score-0.622]

8 The most common approach to 3D detection is to discretize the viewing sphere into bins and train a 2D detector for each viewpoint [4, 5, 1, 6]. [sent-20, score-0.312]

9 However, these approaches output rather weak 3D information, where typically a 2D bounding box around the object is returned along with an estimated discretized viewpoint. [sent-21, score-0.348]

10 The main advantage of this line of work is that it enables a continuous pose representation [10, 11, 12, 8], 3D bounding box prediction [8], and potentially requires less training examples due to its more com1 Figure 1: Left: Our deformable 3D cuboid model. [sent-24, score-0.985]

11 However, since the model represents object’s appearance as a rigid template in 3D, its performance has been shown to be inferior to (2D) deformable part-based models (DPMs) [1]. [sent-30, score-0.339]

12 Our model represents an object class with a deformable 3D cuboid composed of faces and parts, which are both allowed to deform with respect to their anchors on the 3D box (see Fig 1). [sent-32, score-1.353]

13 Towards this goal, we introduce the notion of stitching point, which enables the deformation between the faces and the cuboid to be encoded efficiently. [sent-33, score-1.211]

14 We model the appearance of each face in fronto-parallel coordinates, thus effectively factoring out the appearance variation due to viewpoint. [sent-34, score-0.519]

15 We reason about different face visibility patterns called aspects [15]. [sent-35, score-0.541]

16 We train the cuboid model jointly and discriminatively and share weights across all aspects to attain efficiency. [sent-36, score-0.617]

17 We demonstrate the effectiveness of our approach in indoor [2] and outdoor scenarios [16], and show that our approach significantly outperforms the state-of-the-art in both 2D [1] and 3D object detection [2]. [sent-39, score-0.375]

18 2 Related work The most common way to tackle 3D detection is to represent a 3D object by a collection of independent 2D appearance models [4, 5, 1, 6, 13], one for each viewpoint. [sent-40, score-0.343]

19 Since these methods usually require a significant amount of training data, renderings of synthetic CAD models have been used to supplement under-represented views or provide supervision for training object parts or object geometry [22, 13, 8]. [sent-43, score-0.505]

20 While these types of models are attractive as they enable continuous viewpoint representations, their detection performance has typically been inferior to 2D deformable models. [sent-45, score-0.405]

21 Towards 3D, DPMs have been extended to reason about object viewpoint by training the mixture model with viewpoint supervision [6, 13]. [sent-47, score-0.497]

22 Consistency was enforced by forcing the parts for different 2D viewpoint models to belong to the same set of 3D parts in the physical space. [sent-50, score-0.306]

23 The closest work to ours is [2], which models an object with a rigid 3D cuboid, composed of independently trained faces without deformations or parts. [sent-52, score-0.59]

24 First, our model is hierarchical and deformable: we allow deformations of the faces, while the faces themselves are composed of deformable parts. [sent-54, score-0.592]

25 We also explicitly reason about the visibility patterns of the cuboid model and train the model accordingly. [sent-55, score-0.697]

26 Finally, in concurrent work, Xiang and Savarese [26] introduced a deformable 3D aspect model, where an object is represented as a set of planar parts in 3D. [sent-59, score-0.548]

27 Our 3D box is oriented, as we reason about the correspondences between the faces in the estimated bounding box and the faces of our model (i. [sent-63, score-0.906]

28 Towards this goal, we represent an object class as a deformable 3D cuboid, which is composed of 6 deformable faces, i. [sent-66, score-0.565]

29 The model for each cuboid’s face is a 2D template that represents the appearance of the object in view-rectified coordinates, i. [sent-69, score-0.602]

30 Additionally, we augment each face with parts, and employ a deformation model between the locations of the parts and the anchor points on the face they belong to. [sent-72, score-0.963]

31 We assume that any viewpoint of an object in the image domain can be modeled by rotating our cuboid in 3D, followed by perspective projection onto the image plane. [sent-73, score-0.909]

32 Thus inference involves sliding and rotating the deformable cuboid in 3D and scoring the hypotheses. [sent-74, score-0.839]

33 A necessary component of any 3D model is to properly reason about the face visibility of the object (in our case, the cuboid). [sent-75, score-0.63]

34 Assuming a perspective camera, for any given viewpoint, at most 3 faces are visible in an image. [sent-76, score-0.352]

35 Note that a cuboid can have up to 26 aspects, however, not all necessarily occur for each object class. [sent-78, score-0.706]

36 For example, for objects supported by the floor, the bottom face will never be visible. [sent-79, score-0.355]

37 For cars, typically the top face is not visible either. [sent-80, score-0.386]

38 Note that the visibility, and thus the aspect, is a function of the 3D orientation and position of a cuboid hypothesis with respect to the camera. [sent-82, score-0.681]

39 We define θ to be the angle between the outer normal to the front face of the cuboid hypothesis, and the vector connecting the camera and the center of the 3D box. [sent-83, score-0.978]

40 2 shows the range of the cuboid orientation angle on the viewing sphere for which each aspect occurs in the datasets of [2, 16], which we employ for our experiments. [sent-87, score-0.775]

41 In order to make the cuboid deformable, we introduce the notion of stitching point, which is a point on the box that is common to all visible faces for a particular aspect. [sent-89, score-1.211]

42 We incorporate a quadratic deformation cost between the locations of the faces and the stitching point to encourage the cuboid to be as rigid as possible. [sent-90, score-1.237]

43 We impose an additional deformation cost between the visible faces, ensuring that their sizes match when we stitch them into a cuboid hypothesis. [sent-91, score-0.83]

44 To reduce the computational complexity and impose regularization, we share the face and part templates across all aspects, as well as the deformations between them. [sent-93, score-0.476]

45 However, the deformations between the faces and the cuboid are aspect specific as they depend on the stitching point. [sent-94, score-1.229]

46 ,6 , b) where Pi models the i-th face, Pi,j is a model for the j-th part belonging to face i, and b is a real valued bias term. [sent-100, score-0.346]

47 For ease of exposition, we assume each face to have the same number of parts, n; however, the framework is general and allows the numbers of parts to vary across faces. [sent-101, score-0.411]

48 Here, ba is a bias term that is aspect specific and allows us to calibrate the scores across aspects with different number of visible faces. [sent-104, score-0.333]

49 Note that the parts are defined relative to the face and are thus independent of the aspects. [sent-107, score-0.411]

50 The appearance templates as well as the deformation parameters in the model are defined for each face in a canonical view where that face is frontal. [sent-109, score-0.934]

51 We thus score a face hypothesis in the rectified view that makes the hypothesis frontal. [sent-110, score-0.451]

52 Each pair of parallel faces shares a homography, and thus at most three rectifications are needed for each viewpoint hypothesis θ. [sent-111, score-0.43]

53 A sliding window approach is then used to score the cuboid hypotheses, by scoring the parts, faces and their deformations in their own rectified view, as well as the deformations of the faces with respect to the stitching point. [sent-118, score-1.654]

54 We define the compatibility score between the parts and the corresponding face, denoted as pi = {pi , {pi,j }j=1,. [sent-121, score-0.27]

55 where p = (p1 , · · · , p6 ) and V (i, a) is a binary variable encoding whether face i is visible under aspect a. [sent-125, score-0.49]

56 Note that a = a(θ, s) can be deterministically computed from the rotation angle θ and the position of the stitching point s (which we assume to always be visible), which in turns determines the face visibility V . [sent-126, score-0.722]

57 We use ref to index the first visible face in the aspect model, and φd (pi , pi,j , θ) = φd (du, dv) = (du, dv, du2 , dv 2 ) (3) are the part deformation features, computed in the rectified image of face i implied by the 3D angle θ. [sent-127, score-1.116]

58 The deformation features φstich (pi , s, θ) between the face pi and the stitching point s are defined as d (dui , dvi ) = (ui , vi ) − (u(s, i), v(s, i)) + ra,i ). [sent-129, score-0.826]

59 Here, (u(s, i), v(s, i)) is the position of the stitching point in the rectified coordinates corresponding to face i and level l. [sent-130, score-0.607]

60 We define the deformation cost between the faces to be a function of their relative dimensions: φf ace (pi , pk , θ) = d i ,ek ) 0, if max(ei ,ek ) < 1 + min(e ∞ otherwise (4) with ei and ek the lengths of the common edge between faces i and k. [sent-131, score-0.828]

61 We define the deformation of a face with respect to the stitching point to also be quadratic. [sent-132, score-0.7]

62 We additionally incorporate a bias term for each aspect, ba , to make the scores of multiple aspects comparable when we combine them into a full cuboid model. [sent-134, score-0.705]

63 Given an image x, the score of a hypothesized 3D cuboid can be obtained as the dot product between the model’s parameters and a feature vector, i. [sent-135, score-0.649]

64 To get the score for each θ, we first compute the feature responses for the part and face templates (Eq. [sent-144, score-0.458]

65 As in [1], distance transforms are used to compute the deformation scores of the parts efficiently, that is, Eq. [sent-146, score-0.322]

66 The score for each face simply sums the response of the face template and the scores of the parts. [sent-148, score-0.798]

67 We again use distance transforms to compute the deformation scores for each face and the stitching point, which is carried out in the rectified coordinates for each face. [sent-149, score-0.785]

68 We then compute the deformation scores between the faces in Eq. [sent-150, score-0.511]

69 (4), which can be performed efficiently due to the fact that sides of the same length along one dimension (horizontal or vertical) in the coordinates of face i will also be constant along the corresponding line when projected to the coordinate system of face j. [sent-151, score-0.671]

70 Thus, computing the side length ratios of two faces is not quadratic in the number of pixels but only in the number of horizontal or vertical lines. [sent-152, score-0.311]

71 6% face overlap Table 2: 3D detection performance in AP (50% IOU overlap of convex hulls and faces) bed: 3D perf. [sent-179, score-0.602]

72 9 1 recall Figure 5: Precision-recall curves for (left) 2D detection (middle) convex hull, (right) face overlap. [sent-244, score-0.415]

73 Learning: Given a set of training samples D = ( x1 , y1 , bb1 , · · · xN , yN , bbN ), where x is an image, yi ∈ {−1, 1}, and bb ∈ R8×2 are the eight coordinates of the 3D bounding box in the image, our goal is to learn the weights w = [wa1 , · · · , waP ] for all P aspects in Eq. [sent-245, score-0.327]

74 To initialize the full model, we first learn a deformable face+parts model for each face independently, where the faces of the training examples are rectified to be frontal prior to training. [sent-248, score-0.82]

75 We estimate the different aspects of our 3D model from the statistics of the training data, and compute for each training cuboid the relative positions va,i of face i and the stitching point in the rectified view of each face. [sent-249, score-1.199]

76 We then perform joint training of the full model, treating the training cuboid and the stitching point as latent, however, requiring that each face filter and the face annotation overlap more than 70%. [sent-250, score-1.524]

77 To enable a comparison with the DPM detector [1], we trained a model with 6 mixtures and 8 parts using the same training instances but employing 2D bounding boxes. [sent-257, score-0.27]

78 3 shows the statistics of the dataset in terms of the number of training examples for each aspect (where L-R-T denotes an aspect for which the front, right and the top face are visible), as well as per face. [sent-260, score-0.556]

79 Note that the fact that the dataset is unbalanced (fewer examples for aspects with two faces) does not affect too much our approach, as only the face-stitching point deformation parameters are aspect specific. [sent-261, score-0.356]

80 As we share the weights among the aspects, the number of training instances for each face is significantly higher (Fig. [sent-262, score-0.348]

81 The 2D bounding boxes for our model are computed by fitting a 2D box around the convex hull of the projection of the predicted 3D box. [sent-268, score-0.321]

82 We compare our approach to the deformable part model (DPM) [1] and the cuboid model of Hedau et al. [sent-272, score-0.763]

83 As shown in Table 1 we outperform the cuboid model of [2] by 8. [sent-274, score-0.545]

84 We combine the 3D and 2D detectors using a two step process, where first the 2D detector is run inside the bounding boxes produced by our cuboid model. [sent-285, score-0.759]

85 1%), it seems that our cuboid model is already scoring the correct boxes well. [sent-288, score-0.641]

86 This is in contrast to the cuboid model of [2], where the increase in performance is more significant due to the poorer accuracy of their 3D approach. [sent-289, score-0.545]

87 Following [2], we use an estimate of the room layout to rescore the object hypotheses at the scene level. [sent-290, score-0.365]

88 Here, instead of computing the overlap between the predicted boxes, we require that the convex hulls of our 3D hypotheses projected to the image plane and groundtruth annotations overlap at least 50% in IOU measure. [sent-299, score-0.268]

89 Since our model also predicts the locations of the dominant object faces (and thus the 3D object orientation), we would like to quantify its accuracy. [sent-305, score-0.605]

90 We introduce an even stricter measure where we require that also the predicted cuboid faces overlap with the faces of the ground-truth cuboids. [sent-306, score-1.217]

91 In particular, a hypothesis is correct if the average of the overlaps between top faces and vertical faces exceeds 50% IOU. [sent-307, score-0.623]

92 This can be done by sliding a cuboid (whose dimensions match our cuboid model) in 3D to best fit the 2D bounding box. [sent-312, score-1.215]

93 We attribute this to the fact that our cuboid is deformable and thus the faces localize more accurately on the faces of the object. [sent-315, score-1.3]

94 Note that the rectangular patches on the faces represent the parts, and color coding is used to depict the learned part and face deformation weights. [sent-320, score-0.809]

95 In particular, for each detection we automatically chose a CAD model out of a collection of 80 models whose 3D bounding box best matches the dimensions of the predicted box. [sent-326, score-0.311]

96 5 Conclusion We proposed a novel approach to 3D object detection, which extends the well-acclaimed DPM to reason in 3D by means of a deformable 3D cuboid. [sent-328, score-0.392]

97 Our cuboid allows for deformations at the face level via a stitching point as well as deformations between the faces and the parts. [sent-329, score-1.536]

98 (2000) A statistical method for 3d object detection applied to faces and cars. [sent-363, score-0.542]

99 (2009) Learning a dense multi-view representation for detection, viewpoint classification and synthesis of object categories. [sent-407, score-0.279]

100 (2007) Flexible object models for category-level 3d object recognition. [sent-438, score-0.322]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('cuboid', 0.545), ('face', 0.317), ('faces', 0.283), ('stitching', 0.203), ('deformable', 0.189), ('dpm', 0.188), ('deformation', 0.18), ('object', 0.161), ('bed', 0.127), ('recti', 0.123), ('viewpoint', 0.118), ('box', 0.111), ('visibility', 0.11), ('aspect', 0.104), ('pi', 0.1), ('detection', 0.098), ('parts', 0.094), ('deformations', 0.094), ('ap', 0.09), ('appearance', 0.084), ('ace', 0.082), ('cad', 0.082), ('overlap', 0.08), ('bounding', 0.076), ('score', 0.076), ('savarese', 0.076), ('aspects', 0.072), ('boxes', 0.069), ('detector', 0.069), ('visible', 0.069), ('indoor', 0.068), ('layout', 0.068), ('dstitch', 0.062), ('iou', 0.062), ('orientation', 0.057), ('anchor', 0.055), ('hedau', 0.055), ('tpami', 0.05), ('position', 0.05), ('room', 0.049), ('sliding', 0.049), ('scores', 0.048), ('outdoor', 0.048), ('bbox', 0.047), ('kitti', 0.047), ('detections', 0.046), ('reason', 0.042), ('driving', 0.042), ('angle', 0.042), ('dpms', 0.041), ('autonomous', 0.04), ('ba', 0.04), ('template', 0.04), ('front', 0.04), ('hull', 0.039), ('objects', 0.038), ('anchors', 0.038), ('coordinates', 0.037), ('templates', 0.036), ('num', 0.036), ('schiele', 0.036), ('stitch', 0.036), ('ui', 0.035), ('camera', 0.034), ('factoring', 0.034), ('wa', 0.034), ('pose', 0.033), ('urtasun', 0.033), ('dickinson', 0.031), ('fidler', 0.031), ('oblect', 0.031), ('rescore', 0.031), ('scoreparts', 0.031), ('stich', 0.031), ('training', 0.031), ('dv', 0.03), ('hypothesis', 0.029), ('rotating', 0.029), ('combined', 0.029), ('part', 0.029), ('scene', 0.029), ('image', 0.028), ('vertical', 0.028), ('scoring', 0.027), ('hulls', 0.027), ('monocular', 0.027), ('oor', 0.027), ('pepik', 0.027), ('sven', 0.027), ('hypotheses', 0.027), ('supervision', 0.027), ('viewing', 0.027), ('oriented', 0.027), ('composed', 0.026), ('df', 0.026), ('vanishing', 0.026), ('vi', 0.026), ('predicted', 0.026), ('rigid', 0.026), ('specifying', 0.026)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999958 1 nips-2012-3D Object Detection and Viewpoint Estimation with a Deformable 3D Cuboid Model

Author: Sanja Fidler, Sven Dickinson, Raquel Urtasun

Abstract: This paper addresses the problem of category-level 3D object detection. Given a monocular image, our aim is to localize the objects in 3D by enclosing them with tight oriented 3D bounding boxes. We propose a novel approach that extends the well-acclaimed deformable part-based model [1] to reason in 3D. Our model represents an object class as a deformable 3D cuboid composed of faces and parts, which are both allowed to deform with respect to their anchors on the 3D box. We model the appearance of each face in fronto-parallel coordinates, thus effectively factoring out the appearance variation induced by viewpoint. Our model reasons about face visibility patters called aspects. We train the cuboid model jointly and discriminatively and share weights across all aspects to attain efficiency. Inference then entails sliding and rotating the box in 3D and scoring object hypotheses. While for inference we discretize the search space, the variables are continuous in our model. We demonstrate the effectiveness of our approach in indoor and outdoor scenarios, and show that our approach significantly outperforms the stateof-the-art in both 2D [1] and 3D object detection [2]. 1

2 0.44578525 201 nips-2012-Localizing 3D cuboids in single-view images

Author: Jianxiong Xiao, Bryan Russell, Antonio Torralba

Abstract: In this paper we seek to detect rectangular cuboids and localize their corners in uncalibrated single-view images depicting everyday scenes. In contrast to recent approaches that rely on detecting vanishing points of the scene and grouping line segments to form cuboids, we build a discriminative parts-based detector that models the appearance of the cuboid corners and internal edges while enforcing consistency to a 3D cuboid model. Our model copes with different 3D viewpoints and aspect ratios and is able to detect cuboids across many different object categories. We introduce a database of images with cuboid annotations that spans a variety of indoor and outdoor scenes and show qualitative and quantitative results on our collected database. Our model out-performs baseline detectors that use 2D constraints alone on the task of localizing cuboid corners. 1

3 0.22562881 40 nips-2012-Analyzing 3D Objects in Cluttered Images

Author: Mohsen Hejrati, Deva Ramanan

Abstract: We present an approach to detecting and analyzing the 3D configuration of objects in real-world images with heavy occlusion and clutter. We focus on the application of finding and analyzing cars. We do so with a two-stage model; the first stage reasons about 2D shape and appearance variation due to within-class variation (station wagons look different than sedans) and changes in viewpoint. Rather than using a view-based model, we describe a compositional representation that models a large number of effective views and shapes using a small number of local view-based templates. We use this model to propose candidate detections and 2D estimates of shape. These estimates are then refined by our second stage, using an explicit 3D model of shape and viewpoint. We use a morphable model to capture 3D within-class variation, and use a weak-perspective camera model to capture viewpoint. We learn all model parameters from 2D annotations. We demonstrate state-of-the-art accuracy for detection, viewpoint estimation, and 3D shape reconstruction on challenging images from the PASCAL VOC 2011 dataset. 1

4 0.19183123 344 nips-2012-Timely Object Recognition

Author: Sergey Karayev, Tobias Baumgartner, Mario Fritz, Trevor Darrell

Abstract: In a large visual multi-class detection framework, the timeliness of results can be crucial. Our method for timely multi-class detection aims to give the best possible performance at any single point after a start time; it is terminated at a deadline time. Toward this goal, we formulate a dynamic, closed-loop policy that infers the contents of the image in order to decide which detector to deploy next. In contrast to previous work, our method significantly diverges from the predominant greedy strategies, and is able to learn to take actions with deferred values. We evaluate our method with a novel timeliness measure, computed as the area under an Average Precision vs. Time curve. Experiments are conducted on the PASCAL VOC object detection dataset. If execution is stopped when only half the detectors have been run, our method obtains 66% better AP than a random ordering, and 14% better performance than an intelligent baseline. On the timeliness measure, our method obtains at least 11% better performance. Our method is easily extensible, as it treats detectors and classifiers as black boxes and learns from execution traces using reinforcement learning. 1

5 0.13508891 62 nips-2012-Burn-in, bias, and the rationality of anchoring

Author: Falk Lieder, Thomas Griffiths, Noah Goodman

Abstract: Recent work in unsupervised feature learning has focused on the goal of discovering high-level features from unlabeled images. Much progress has been made in this direction, but in most cases it is still standard to use a large amount of labeled data in order to construct detectors sensitive to object classes or other complex patterns in the data. In this paper, we aim to test the hypothesis that unsupervised feature learning methods, provided with only unlabeled data, can learn high-level, invariant features that are sensitive to commonly-occurring objects. Though a handful of prior results suggest that this is possible when each object class accounts for a large fraction of the data (as in many labeled datasets), it is unclear whether something similar can be accomplished when dealing with completely unlabeled data. A major obstacle to this test, however, is scale: we cannot expect to succeed with small datasets or with small numbers of learned features. Here, we propose a large-scale feature learning system that enables us to carry out this experiment, learning 150,000 features from tens of millions of unlabeled images. Based on two scalable clustering algorithms (K-means and agglomerative clustering), we find that our simple system can discover features sensitive to a commonly occurring object class (human faces) and can also combine these into detectors invariant to significant global distortions like large translations and scale. 1

6 0.13508891 116 nips-2012-Emergence of Object-Selective Features in Unsupervised Feature Learning

7 0.13158651 106 nips-2012-Dynamical And-Or Graph Learning for Object Shape Modeling and Detection

8 0.11743027 357 nips-2012-Unsupervised Template Learning for Fine-Grained Object Recognition

9 0.11717013 311 nips-2012-Shifting Weights: Adapting Object Detectors from Image to Video

10 0.11483393 303 nips-2012-Searching for objects driven by context

11 0.097727895 193 nips-2012-Learning to Align from Scratch

12 0.095275797 137 nips-2012-From Deformations to Parts: Motion-based Segmentation of 3D Objects

13 0.092448995 8 nips-2012-A Generative Model for Parts-based Object Segmentation

14 0.088905431 360 nips-2012-Visual Recognition using Embedded Feature Selection for Curvature Self-Similarity

15 0.087453954 209 nips-2012-Max-Margin Structured Output Regression for Spatio-Temporal Action Localization

16 0.087238744 168 nips-2012-Kernel Latent SVM for Visual Recognition

17 0.075487979 185 nips-2012-Learning about Canonical Views from Internet Image Collections

18 0.072529651 83 nips-2012-Controlled Recognition Bounds for Visual Learning and Exploration

19 0.06660413 81 nips-2012-Context-Sensitive Decision Forests for Object Detection

20 0.061321992 18 nips-2012-A Simple and Practical Algorithm for Differentially Private Data Release


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.143), (1, 0.003), (2, -0.22), (3, -0.02), (4, 0.144), (5, -0.135), (6, 0.008), (7, -0.134), (8, 0.027), (9, -0.033), (10, -0.07), (11, 0.031), (12, 0.082), (13, -0.239), (14, 0.109), (15, 0.211), (16, 0.019), (17, -0.128), (18, -0.11), (19, 0.043), (20, 0.024), (21, 0.008), (22, -0.012), (23, 0.053), (24, -0.081), (25, 0.08), (26, 0.038), (27, 0.025), (28, -0.098), (29, -0.056), (30, 0.018), (31, -0.059), (32, -0.008), (33, 0.132), (34, -0.039), (35, -0.022), (36, -0.021), (37, -0.057), (38, -0.101), (39, -0.014), (40, -0.049), (41, 0.111), (42, -0.036), (43, -0.035), (44, -0.031), (45, -0.017), (46, -0.078), (47, 0.022), (48, 0.039), (49, -0.026)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96318561 1 nips-2012-3D Object Detection and Viewpoint Estimation with a Deformable 3D Cuboid Model

Author: Sanja Fidler, Sven Dickinson, Raquel Urtasun

Abstract: This paper addresses the problem of category-level 3D object detection. Given a monocular image, our aim is to localize the objects in 3D by enclosing them with tight oriented 3D bounding boxes. We propose a novel approach that extends the well-acclaimed deformable part-based model [1] to reason in 3D. Our model represents an object class as a deformable 3D cuboid composed of faces and parts, which are both allowed to deform with respect to their anchors on the 3D box. We model the appearance of each face in fronto-parallel coordinates, thus effectively factoring out the appearance variation induced by viewpoint. Our model reasons about face visibility patters called aspects. We train the cuboid model jointly and discriminatively and share weights across all aspects to attain efficiency. Inference then entails sliding and rotating the box in 3D and scoring object hypotheses. While for inference we discretize the search space, the variables are continuous in our model. We demonstrate the effectiveness of our approach in indoor and outdoor scenarios, and show that our approach significantly outperforms the stateof-the-art in both 2D [1] and 3D object detection [2]. 1

2 0.91591346 201 nips-2012-Localizing 3D cuboids in single-view images

Author: Jianxiong Xiao, Bryan Russell, Antonio Torralba

Abstract: In this paper we seek to detect rectangular cuboids and localize their corners in uncalibrated single-view images depicting everyday scenes. In contrast to recent approaches that rely on detecting vanishing points of the scene and grouping line segments to form cuboids, we build a discriminative parts-based detector that models the appearance of the cuboid corners and internal edges while enforcing consistency to a 3D cuboid model. Our model copes with different 3D viewpoints and aspect ratios and is able to detect cuboids across many different object categories. We introduce a database of images with cuboid annotations that spans a variety of indoor and outdoor scenes and show qualitative and quantitative results on our collected database. Our model out-performs baseline detectors that use 2D constraints alone on the task of localizing cuboid corners. 1

3 0.78327203 40 nips-2012-Analyzing 3D Objects in Cluttered Images

Author: Mohsen Hejrati, Deva Ramanan

Abstract: We present an approach to detecting and analyzing the 3D configuration of objects in real-world images with heavy occlusion and clutter. We focus on the application of finding and analyzing cars. We do so with a two-stage model; the first stage reasons about 2D shape and appearance variation due to within-class variation (station wagons look different than sedans) and changes in viewpoint. Rather than using a view-based model, we describe a compositional representation that models a large number of effective views and shapes using a small number of local view-based templates. We use this model to propose candidate detections and 2D estimates of shape. These estimates are then refined by our second stage, using an explicit 3D model of shape and viewpoint. We use a morphable model to capture 3D within-class variation, and use a weak-perspective camera model to capture viewpoint. We learn all model parameters from 2D annotations. We demonstrate state-of-the-art accuracy for detection, viewpoint estimation, and 3D shape reconstruction on challenging images from the PASCAL VOC 2011 dataset. 1

4 0.66499341 357 nips-2012-Unsupervised Template Learning for Fine-Grained Object Recognition

Author: Shulin Yang, Liefeng Bo, Jue Wang, Linda G. Shapiro

Abstract: Fine-grained recognition refers to a subordinate level of recognition, such as recognizing different species of animals and plants. It differs from recognition of basic categories, such as humans, tables, and computers, in that there are global similarities in shape and structure shared cross different categories, and the differences are in the details of object parts. We suggest that the key to identifying the fine-grained differences lies in finding the right alignment of image regions that contain the same object parts. We propose a template model for the purpose, which captures common shape patterns of object parts, as well as the cooccurrence relation of the shape patterns. Once the image regions are aligned, extracted features are used for classification. Learning of the template model is efficient, and the recognition results we achieve significantly outperform the stateof-the-art algorithms. 1

5 0.63807517 106 nips-2012-Dynamical And-Or Graph Learning for Object Shape Modeling and Detection

Author: Xiaolong Wang, Liang Lin

Abstract: This paper studies a novel discriminative part-based model to represent and recognize object shapes with an “And-Or graph”. We define this model consisting of three layers: the leaf-nodes with collaborative edges for localizing local parts, the or-nodes specifying the switch of leaf-nodes, and the root-node encoding the global verification. A discriminative learning algorithm, extended from the CCCP [23], is proposed to train the model in a dynamical manner: the model structure (e.g., the configuration of the leaf-nodes associated with the or-nodes) is automatically determined with optimizing the multi-layer parameters during the iteration. The advantages of our method are two-fold. (i) The And-Or graph model enables us to handle well large intra-class variance and background clutters for object shape detection from images. (ii) The proposed learning algorithm is able to obtain the And-Or graph representation without requiring elaborate supervision and initialization. We validate the proposed method on several challenging databases (e.g., INRIA-Horse, ETHZ-Shape, and UIUC-People), and it outperforms the state-of-the-arts approaches. 1

6 0.61236149 209 nips-2012-Max-Margin Structured Output Regression for Spatio-Temporal Action Localization

7 0.61118752 344 nips-2012-Timely Object Recognition

8 0.58040065 311 nips-2012-Shifting Weights: Adapting Object Detectors from Image to Video

9 0.57458031 8 nips-2012-A Generative Model for Parts-based Object Segmentation

10 0.56705165 303 nips-2012-Searching for objects driven by context

11 0.51876801 360 nips-2012-Visual Recognition using Embedded Feature Selection for Curvature Self-Similarity

12 0.50698042 137 nips-2012-From Deformations to Parts: Motion-based Segmentation of 3D Objects

13 0.44331181 223 nips-2012-Multi-criteria Anomaly Detection using Pareto Depth Analysis

14 0.42566043 103 nips-2012-Distributed Probabilistic Learning for Camera Networks with Missing Data

15 0.42364129 185 nips-2012-Learning about Canonical Views from Internet Image Collections

16 0.42198792 101 nips-2012-Discriminatively Trained Sparse Code Gradients for Contour Detection

17 0.39170673 2 nips-2012-3D Social Saliency from Head-mounted Cameras

18 0.37983516 168 nips-2012-Kernel Latent SVM for Visual Recognition

19 0.37802392 210 nips-2012-Memorability of Image Regions

20 0.37779778 83 nips-2012-Controlled Recognition Bounds for Visual Learning and Exploration


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.018), (17, 0.011), (21, 0.413), (38, 0.082), (42, 0.011), (54, 0.021), (55, 0.01), (74, 0.147), (76, 0.103), (80, 0.047), (86, 0.014), (92, 0.03)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.84546381 1 nips-2012-3D Object Detection and Viewpoint Estimation with a Deformable 3D Cuboid Model

Author: Sanja Fidler, Sven Dickinson, Raquel Urtasun

Abstract: This paper addresses the problem of category-level 3D object detection. Given a monocular image, our aim is to localize the objects in 3D by enclosing them with tight oriented 3D bounding boxes. We propose a novel approach that extends the well-acclaimed deformable part-based model [1] to reason in 3D. Our model represents an object class as a deformable 3D cuboid composed of faces and parts, which are both allowed to deform with respect to their anchors on the 3D box. We model the appearance of each face in fronto-parallel coordinates, thus effectively factoring out the appearance variation induced by viewpoint. Our model reasons about face visibility patters called aspects. We train the cuboid model jointly and discriminatively and share weights across all aspects to attain efficiency. Inference then entails sliding and rotating the box in 3D and scoring object hypotheses. While for inference we discretize the search space, the variables are continuous in our model. We demonstrate the effectiveness of our approach in indoor and outdoor scenarios, and show that our approach significantly outperforms the stateof-the-art in both 2D [1] and 3D object detection [2]. 1

2 0.81910539 195 nips-2012-Learning visual motion in recurrent neural networks

Author: Marius Pachitariu, Maneesh Sahani

Abstract: We present a dynamic nonlinear generative model for visual motion based on a latent representation of binary-gated Gaussian variables. Trained on sequences of images, the model learns to represent different movement directions in different variables. We use an online approximate inference scheme that can be mapped to the dynamics of networks of neurons. Probed with drifting grating stimuli and moving bars of light, neurons in the model show patterns of responses analogous to those of direction-selective simple cells in primary visual cortex. Most model neurons also show speed tuning and respond equally well to a range of motion directions and speeds aligned to the constraint line of their respective preferred speed. We show how these computations are enabled by a specific pattern of recurrent connections learned by the model. 1

3 0.70780402 300 nips-2012-Scalable nonconvex inexact proximal splitting

Author: Suvrit Sra

Abstract: We study a class of large-scale, nonsmooth, and nonconvex optimization problems. In particular, we focus on nonconvex problems with composite objectives. This class includes the extensively studied class of convex composite objective problems as a subclass. To solve composite nonconvex problems we introduce a powerful new framework based on asymptotically nonvanishing errors, avoiding the common stronger assumption of vanishing errors. Within our new framework we derive both batch and incremental proximal splitting algorithms. To our knowledge, our work is first to develop and analyze incremental nonconvex proximalsplitting algorithms, even if we were to disregard the ability to handle nonvanishing errors. We illustrate one instance of our general framework by showing an application to large-scale nonsmooth matrix factorization. 1

4 0.68729562 60 nips-2012-Bayesian nonparametric models for ranked data

Author: Francois Caron, Yee W. Teh

Abstract: We develop a Bayesian nonparametric extension of the popular Plackett-Luce choice model that can handle an infinite number of choice items. Our framework is based on the theory of random atomic measures, with the prior specified by a gamma process. We derive a posterior characterization and a simple and effective Gibbs sampler for posterior simulation. We develop a time-varying extension of our model, and apply it to the New York Times lists of weekly bestselling books. 1

5 0.66880363 105 nips-2012-Dynamic Pruning of Factor Graphs for Maximum Marginal Prediction

Author: Christoph H. Lampert

Abstract: We study the problem of maximum marginal prediction (MMP) in probabilistic graphical models, a task that occurs, for example, as the Bayes optimal decision rule under a Hamming loss. MMP is typically performed as a two-stage procedure: one estimates each variable’s marginal probability and then forms a prediction from the states of maximal probability. In this work we propose a simple yet effective technique for accelerating MMP when inference is sampling-based: instead of the above two-stage procedure we directly estimate the posterior probability of each decision variable. This allows us to identify the point of time when we are sufficiently certain about any individual decision. Whenever this is the case, we dynamically prune the variables we are confident about from the underlying factor graph. Consequently, at any time only samples of variables whose decision is still uncertain need to be created. Experiments in two prototypical scenarios, multi-label classification and image inpainting, show that adaptive sampling can drastically accelerate MMP without sacrificing prediction accuracy. 1

6 0.64226168 123 nips-2012-Exponential Concentration for Mutual Information Estimation with Application to Forests

7 0.62062198 302 nips-2012-Scaling MPE Inference for Constrained Continuous Markov Random Fields with Consensus Optimization

8 0.57927394 18 nips-2012-A Simple and Practical Algorithm for Differentially Private Data Release

9 0.56540877 23 nips-2012-A lattice filter model of the visual pathway

10 0.54818267 201 nips-2012-Localizing 3D cuboids in single-view images

11 0.54781508 113 nips-2012-Efficient and direct estimation of a neural subunit model for sensory coding

12 0.52595824 137 nips-2012-From Deformations to Parts: Motion-based Segmentation of 3D Objects

13 0.52270359 112 nips-2012-Efficient Spike-Coding with Multiplicative Adaptation in a Spike Response Model

14 0.51250154 94 nips-2012-Delay Compensation with Dynamical Synapses

15 0.51210999 190 nips-2012-Learning optimal spike-based representations

16 0.51032633 256 nips-2012-On the connections between saliency and tracking

17 0.50792205 303 nips-2012-Searching for objects driven by context

18 0.50534862 40 nips-2012-Analyzing 3D Objects in Cluttered Images

19 0.50411898 114 nips-2012-Efficient coding provides a direct link between prior and likelihood in perceptual Bayesian inference

20 0.49825811 341 nips-2012-The topographic unsupervised learning of natural sounds in the auditory cortex