nips nips2012 nips2012-1 knowledge-graph by maker-knowledge-mining

1 nips-2012-3D Object Detection and Viewpoint Estimation with a Deformable 3D Cuboid Model

Source: pdf

Author: Sanja Fidler, Sven Dickinson, Raquel Urtasun

Abstract: This paper addresses the problem of category-level 3D object detection. Given a monocular image, our aim is to localize the objects in 3D by enclosing them with tight oriented 3D bounding boxes. We propose a novel approach that extends the well-acclaimed deformable part-based model [1] to reason in 3D. Our model represents an object class as a deformable 3D cuboid composed of faces and parts, which are both allowed to deform with respect to their anchors on the 3D box. We model the appearance of each face in fronto-parallel coordinates, thus effectively factoring out the appearance variation induced by viewpoint. Our model reasons about face visibility patters called aspects. We train the cuboid model jointly and discriminatively and share weights across all aspects to attain efﬁciency. Inference then entails sliding and rotating the box in 3D and scoring object hypotheses. While for inference we discretize the search space, the variables are continuous in our model. We demonstrate the effectiveness of our approach in indoor and outdoor scenarios, and show that our approach signiﬁcantly outperforms the stateof-the-art in both 2D [1] and 3D object detection [2]. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Our model represents an object class as a deformable 3D cuboid composed of faces and parts, which are both allowed to deform with respect to their anchors on the 3D box. [sent-8, score-1.242]

2 We model the appearance of each face in fronto-parallel coordinates, thus effectively factoring out the appearance variation induced by viewpoint. [sent-9, score-0.519]

3 Our model reasons about face visibility patters called aspects. [sent-10, score-0.427]

4 We train the cuboid model jointly and discriminatively and share weights across all aspects to attain efﬁciency. [sent-11, score-0.617]

5 Inference then entails sliding and rotating the box in 3D and scoring object hypotheses. [sent-12, score-0.377]

6 We demonstrate the effectiveness of our approach in indoor and outdoor scenarios, and show that our approach signiﬁcantly outperforms the stateof-the-art in both 2D [1] and 3D object detection [2]. [sent-14, score-0.375]

7 While impressive performance has been achieved for instance-level 3D object recognition [3], category-level 3D object detection has proven to be a much harder task, due to intra-class variation as well as appearance variation due to viewpoint changes. [sent-19, score-0.622]

8 The most common approach to 3D detection is to discretize the viewing sphere into bins and train a 2D detector for each viewpoint [4, 5, 1, 6]. [sent-20, score-0.312]

9 However, these approaches output rather weak 3D information, where typically a 2D bounding box around the object is returned along with an estimated discretized viewpoint. [sent-21, score-0.348]

10 The main advantage of this line of work is that it enables a continuous pose representation [10, 11, 12, 8], 3D bounding box prediction [8], and potentially requires less training examples due to its more com1 Figure 1: Left: Our deformable 3D cuboid model. [sent-24, score-0.985]

11 However, since the model represents object’s appearance as a rigid template in 3D, its performance has been shown to be inferior to (2D) deformable part-based models (DPMs) [1]. [sent-30, score-0.339]

12 Our model represents an object class with a deformable 3D cuboid composed of faces and parts, which are both allowed to deform with respect to their anchors on the 3D box (see Fig 1). [sent-32, score-1.353]

13 Towards this goal, we introduce the notion of stitching point, which enables the deformation between the faces and the cuboid to be encoded efﬁciently. [sent-33, score-1.211]

14 We model the appearance of each face in fronto-parallel coordinates, thus effectively factoring out the appearance variation due to viewpoint. [sent-34, score-0.519]

15 We reason about different face visibility patterns called aspects [15]. [sent-35, score-0.541]

16 We train the cuboid model jointly and discriminatively and share weights across all aspects to attain efﬁciency. [sent-36, score-0.617]

17 We demonstrate the effectiveness of our approach in indoor [2] and outdoor scenarios [16], and show that our approach signiﬁcantly outperforms the state-of-the-art in both 2D [1] and 3D object detection [2]. [sent-39, score-0.375]

18 2 Related work The most common way to tackle 3D detection is to represent a 3D object by a collection of independent 2D appearance models [4, 5, 1, 6, 13], one for each viewpoint. [sent-40, score-0.343]

19 Since these methods usually require a signiﬁcant amount of training data, renderings of synthetic CAD models have been used to supplement under-represented views or provide supervision for training object parts or object geometry [22, 13, 8]. [sent-43, score-0.505]

20 While these types of models are attractive as they enable continuous viewpoint representations, their detection performance has typically been inferior to 2D deformable models. [sent-45, score-0.405]

21 Towards 3D, DPMs have been extended to reason about object viewpoint by training the mixture model with viewpoint supervision [6, 13]. [sent-47, score-0.497]

22 Consistency was enforced by forcing the parts for different 2D viewpoint models to belong to the same set of 3D parts in the physical space. [sent-50, score-0.306]

23 The closest work to ours is [2], which models an object with a rigid 3D cuboid, composed of independently trained faces without deformations or parts. [sent-52, score-0.59]

24 First, our model is hierarchical and deformable: we allow deformations of the faces, while the faces themselves are composed of deformable parts. [sent-54, score-0.592]

25 We also explicitly reason about the visibility patterns of the cuboid model and train the model accordingly. [sent-55, score-0.697]

26 Finally, in concurrent work, Xiang and Savarese [26] introduced a deformable 3D aspect model, where an object is represented as a set of planar parts in 3D. [sent-59, score-0.548]

27 Our 3D box is oriented, as we reason about the correspondences between the faces in the estimated bounding box and the faces of our model (i. [sent-63, score-0.906]

28 Towards this goal, we represent an object class as a deformable 3D cuboid, which is composed of 6 deformable faces, i. [sent-66, score-0.565]

29 The model for each cuboid’s face is a 2D template that represents the appearance of the object in view-rectiﬁed coordinates, i. [sent-69, score-0.602]

30 Additionally, we augment each face with parts, and employ a deformation model between the locations of the parts and the anchor points on the face they belong to. [sent-72, score-0.963]

31 We assume that any viewpoint of an object in the image domain can be modeled by rotating our cuboid in 3D, followed by perspective projection onto the image plane. [sent-73, score-0.909]

32 Thus inference involves sliding and rotating the deformable cuboid in 3D and scoring the hypotheses. [sent-74, score-0.839]

33 A necessary component of any 3D model is to properly reason about the face visibility of the object (in our case, the cuboid). [sent-75, score-0.63]

34 Assuming a perspective camera, for any given viewpoint, at most 3 faces are visible in an image. [sent-76, score-0.352]

35 Note that a cuboid can have up to 26 aspects, however, not all necessarily occur for each object class. [sent-78, score-0.706]

36 For example, for objects supported by the ﬂoor, the bottom face will never be visible. [sent-79, score-0.355]

37 For cars, typically the top face is not visible either. [sent-80, score-0.386]

38 Note that the visibility, and thus the aspect, is a function of the 3D orientation and position of a cuboid hypothesis with respect to the camera. [sent-82, score-0.681]

39 We deﬁne θ to be the angle between the outer normal to the front face of the cuboid hypothesis, and the vector connecting the camera and the center of the 3D box. [sent-83, score-0.978]

40 2 shows the range of the cuboid orientation angle on the viewing sphere for which each aspect occurs in the datasets of [2, 16], which we employ for our experiments. [sent-87, score-0.775]

41 In order to make the cuboid deformable, we introduce the notion of stitching point, which is a point on the box that is common to all visible faces for a particular aspect. [sent-89, score-1.211]

42 We incorporate a quadratic deformation cost between the locations of the faces and the stitching point to encourage the cuboid to be as rigid as possible. [sent-90, score-1.237]

43 We impose an additional deformation cost between the visible faces, ensuring that their sizes match when we stitch them into a cuboid hypothesis. [sent-91, score-0.83]

44 To reduce the computational complexity and impose regularization, we share the face and part templates across all aspects, as well as the deformations between them. [sent-93, score-0.476]

45 However, the deformations between the faces and the cuboid are aspect speciﬁc as they depend on the stitching point. [sent-94, score-1.229]

46 ,6 , b) where Pi models the i-th face, Pi,j is a model for the j-th part belonging to face i, and b is a real valued bias term. [sent-100, score-0.346]

47 For ease of exposition, we assume each face to have the same number of parts, n; however, the framework is general and allows the numbers of parts to vary across faces. [sent-101, score-0.411]

48 Here, ba is a bias term that is aspect speciﬁc and allows us to calibrate the scores across aspects with different number of visible faces. [sent-104, score-0.333]

49 Note that the parts are deﬁned relative to the face and are thus independent of the aspects. [sent-107, score-0.411]

50 The appearance templates as well as the deformation parameters in the model are deﬁned for each face in a canonical view where that face is frontal. [sent-109, score-0.934]

51 We thus score a face hypothesis in the rectiﬁed view that makes the hypothesis frontal. [sent-110, score-0.451]

52 Each pair of parallel faces shares a homography, and thus at most three rectiﬁcations are needed for each viewpoint hypothesis θ. [sent-111, score-0.43]

53 A sliding window approach is then used to score the cuboid hypotheses, by scoring the parts, faces and their deformations in their own rectiﬁed view, as well as the deformations of the faces with respect to the stitching point. [sent-118, score-1.654]

54 We deﬁne the compatibility score between the parts and the corresponding face, denoted as pi = {pi , {pi,j }j=1,. [sent-121, score-0.27]

55 where p = (p1 , · · · , p6 ) and V (i, a) is a binary variable encoding whether face i is visible under aspect a. [sent-125, score-0.49]

56 Note that a = a(θ, s) can be deterministically computed from the rotation angle θ and the position of the stitching point s (which we assume to always be visible), which in turns determines the face visibility V . [sent-126, score-0.722]

57 We use ref to index the ﬁrst visible face in the aspect model, and φd (pi , pi,j , θ) = φd (du, dv) = (du, dv, du2 , dv 2 ) (3) are the part deformation features, computed in the rectiﬁed image of face i implied by the 3D angle θ. [sent-127, score-1.116]

58 The deformation features φstich (pi , s, θ) between the face pi and the stitching point s are deﬁned as d (dui , dvi ) = (ui , vi ) − (u(s, i), v(s, i)) + ra,i ). [sent-129, score-0.826]

59 Here, (u(s, i), v(s, i)) is the position of the stitching point in the rectiﬁed coordinates corresponding to face i and level l. [sent-130, score-0.607]

60 We deﬁne the deformation cost between the faces to be a function of their relative dimensions: φf ace (pi , pk , θ) = d i ,ek ) 0, if max(ei ,ek ) < 1 + min(e ∞ otherwise (4) with ei and ek the lengths of the common edge between faces i and k. [sent-131, score-0.828]

61 We deﬁne the deformation of a face with respect to the stitching point to also be quadratic. [sent-132, score-0.7]

62 We additionally incorporate a bias term for each aspect, ba , to make the scores of multiple aspects comparable when we combine them into a full cuboid model. [sent-134, score-0.705]

63 Given an image x, the score of a hypothesized 3D cuboid can be obtained as the dot product between the model’s parameters and a feature vector, i. [sent-135, score-0.649]

64 To get the score for each θ, we ﬁrst compute the feature responses for the part and face templates (Eq. [sent-144, score-0.458]

65 As in [1], distance transforms are used to compute the deformation scores of the parts efﬁciently, that is, Eq. [sent-146, score-0.322]

66 The score for each face simply sums the response of the face template and the scores of the parts. [sent-148, score-0.798]

67 We again use distance transforms to compute the deformation scores for each face and the stitching point, which is carried out in the rectiﬁed coordinates for each face. [sent-149, score-0.785]

68 We then compute the deformation scores between the faces in Eq. [sent-150, score-0.511]

69 (4), which can be performed efﬁciently due to the fact that sides of the same length along one dimension (horizontal or vertical) in the coordinates of face i will also be constant along the corresponding line when projected to the coordinate system of face j. [sent-151, score-0.671]

70 Thus, computing the side length ratios of two faces is not quadratic in the number of pixels but only in the number of horizontal or vertical lines. [sent-152, score-0.311]

71 6% face overlap Table 2: 3D detection performance in AP (50% IOU overlap of convex hulls and faces) bed: 3D perf. [sent-179, score-0.602]

72 9 1 recall Figure 5: Precision-recall curves for (left) 2D detection (middle) convex hull, (right) face overlap. [sent-244, score-0.415]

73 Learning: Given a set of training samples D = ( x1 , y1 , bb1 , · · · xN , yN , bbN ), where x is an image, yi ∈ {−1, 1}, and bb ∈ R8×2 are the eight coordinates of the 3D bounding box in the image, our goal is to learn the weights w = [wa1 , · · · , waP ] for all P aspects in Eq. [sent-245, score-0.327]

74 To initialize the full model, we ﬁrst learn a deformable face+parts model for each face independently, where the faces of the training examples are rectiﬁed to be frontal prior to training. [sent-248, score-0.82]

75 We estimate the different aspects of our 3D model from the statistics of the training data, and compute for each training cuboid the relative positions va,i of face i and the stitching point in the rectiﬁed view of each face. [sent-249, score-1.199]

76 We then perform joint training of the full model, treating the training cuboid and the stitching point as latent, however, requiring that each face ﬁlter and the face annotation overlap more than 70%. [sent-250, score-1.524]

77 To enable a comparison with the DPM detector [1], we trained a model with 6 mixtures and 8 parts using the same training instances but employing 2D bounding boxes. [sent-257, score-0.27]

78 3 shows the statistics of the dataset in terms of the number of training examples for each aspect (where L-R-T denotes an aspect for which the front, right and the top face are visible), as well as per face. [sent-260, score-0.556]

79 Note that the fact that the dataset is unbalanced (fewer examples for aspects with two faces) does not affect too much our approach, as only the face-stitching point deformation parameters are aspect speciﬁc. [sent-261, score-0.356]

80 As we share the weights among the aspects, the number of training instances for each face is signiﬁcantly higher (Fig. [sent-262, score-0.348]

81 The 2D bounding boxes for our model are computed by ﬁtting a 2D box around the convex hull of the projection of the predicted 3D box. [sent-268, score-0.321]

82 We compare our approach to the deformable part model (DPM) [1] and the cuboid model of Hedau et al. [sent-272, score-0.763]

83 As shown in Table 1 we outperform the cuboid model of [2] by 8. [sent-274, score-0.545]

84 We combine the 3D and 2D detectors using a two step process, where ﬁrst the 2D detector is run inside the bounding boxes produced by our cuboid model. [sent-285, score-0.759]

85 1%), it seems that our cuboid model is already scoring the correct boxes well. [sent-288, score-0.641]

86 This is in contrast to the cuboid model of [2], where the increase in performance is more signiﬁcant due to the poorer accuracy of their 3D approach. [sent-289, score-0.545]

87 Following [2], we use an estimate of the room layout to rescore the object hypotheses at the scene level. [sent-290, score-0.365]

88 Here, instead of computing the overlap between the predicted boxes, we require that the convex hulls of our 3D hypotheses projected to the image plane and groundtruth annotations overlap at least 50% in IOU measure. [sent-299, score-0.268]

89 Since our model also predicts the locations of the dominant object faces (and thus the 3D object orientation), we would like to quantify its accuracy. [sent-305, score-0.605]

90 We introduce an even stricter measure where we require that also the predicted cuboid faces overlap with the faces of the ground-truth cuboids. [sent-306, score-1.217]

91 In particular, a hypothesis is correct if the average of the overlaps between top faces and vertical faces exceeds 50% IOU. [sent-307, score-0.623]

92 This can be done by sliding a cuboid (whose dimensions match our cuboid model) in 3D to best ﬁt the 2D bounding box. [sent-312, score-1.215]

93 We attribute this to the fact that our cuboid is deformable and thus the faces localize more accurately on the faces of the object. [sent-315, score-1.3]

94 Note that the rectangular patches on the faces represent the parts, and color coding is used to depict the learned part and face deformation weights. [sent-320, score-0.809]

95 In particular, for each detection we automatically chose a CAD model out of a collection of 80 models whose 3D bounding box best matches the dimensions of the predicted box. [sent-326, score-0.311]

96 5 Conclusion We proposed a novel approach to 3D object detection, which extends the well-acclaimed DPM to reason in 3D by means of a deformable 3D cuboid. [sent-328, score-0.392]

97 Our cuboid allows for deformations at the face level via a stitching point as well as deformations between the faces and the parts. [sent-329, score-1.536]

98 (2000) A statistical method for 3d object detection applied to faces and cars. [sent-363, score-0.542]

99 (2009) Learning a dense multi-view representation for detection, viewpoint classiﬁcation and synthesis of object categories. [sent-407, score-0.279]

100 (2007) Flexible object models for category-level 3d object recognition. [sent-438, score-0.322]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('cuboid', 0.545), ('face', 0.317), ('faces', 0.283), ('stitching', 0.203), ('deformable', 0.189), ('dpm', 0.188), ('deformation', 0.18), ('object', 0.161), ('bed', 0.127), ('recti', 0.123), ('viewpoint', 0.118), ('box', 0.111), ('visibility', 0.11), ('aspect', 0.104), ('pi', 0.1), ('detection', 0.098), ('parts', 0.094), ('deformations', 0.094), ('ap', 0.09), ('appearance', 0.084), ('ace', 0.082), ('cad', 0.082), ('overlap', 0.08), ('bounding', 0.076), ('score', 0.076), ('savarese', 0.076), ('aspects', 0.072), ('boxes', 0.069), ('detector', 0.069), ('visible', 0.069), ('indoor', 0.068), ('layout', 0.068), ('dstitch', 0.062), ('iou', 0.062), ('orientation', 0.057), ('anchor', 0.055), ('hedau', 0.055), ('tpami', 0.05), ('position', 0.05), ('room', 0.049), ('sliding', 0.049), ('scores', 0.048), ('outdoor', 0.048), ('bbox', 0.047), ('kitti', 0.047), ('detections', 0.046), ('reason', 0.042), ('driving', 0.042), ('angle', 0.042), ('dpms', 0.041), ('autonomous', 0.04), ('ba', 0.04), ('template', 0.04), ('front', 0.04), ('hull', 0.039), ('objects', 0.038), ('anchors', 0.038), ('coordinates', 0.037), ('templates', 0.036), ('num', 0.036), ('schiele', 0.036), ('stitch', 0.036), ('ui', 0.035), ('camera', 0.034), ('factoring', 0.034), ('wa', 0.034), ('pose', 0.033), ('urtasun', 0.033), ('dickinson', 0.031), ('fidler', 0.031), ('oblect', 0.031), ('rescore', 0.031), ('scoreparts', 0.031), ('stich', 0.031), ('training', 0.031), ('dv', 0.03), ('hypothesis', 0.029), ('rotating', 0.029), ('combined', 0.029), ('part', 0.029), ('scene', 0.029), ('image', 0.028), ('vertical', 0.028), ('scoring', 0.027), ('hulls', 0.027), ('monocular', 0.027), ('oor', 0.027), ('pepik', 0.027), ('sven', 0.027), ('hypotheses', 0.027), ('supervision', 0.027), ('viewing', 0.027), ('oriented', 0.027), ('composed', 0.026), ('df', 0.026), ('vanishing', 0.026), ('vi', 0.026), ('predicted', 0.026), ('rigid', 0.026), ('specifying', 0.026)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999958 1 nips-2012-3D Object Detection and Viewpoint Estimation with a Deformable 3D Cuboid Model

Author: Sanja Fidler, Sven Dickinson, Raquel Urtasun

2 0.44578525 201 nips-2012-Localizing 3D cuboids in single-view images

Author: Jianxiong Xiao, Bryan Russell, Antonio Torralba

Abstract: In this paper we seek to detect rectangular cuboids and localize their corners in uncalibrated single-view images depicting everyday scenes. In contrast to recent approaches that rely on detecting vanishing points of the scene and grouping line segments to form cuboids, we build a discriminative parts-based detector that models the appearance of the cuboid corners and internal edges while enforcing consistency to a 3D cuboid model. Our model copes with different 3D viewpoints and aspect ratios and is able to detect cuboids across many different object categories. We introduce a database of images with cuboid annotations that spans a variety of indoor and outdoor scenes and show qualitative and quantitative results on our collected database. Our model out-performs baseline detectors that use 2D constraints alone on the task of localizing cuboid corners. 1

3 0.22562881 40 nips-2012-Analyzing 3D Objects in Cluttered Images

Author: Mohsen Hejrati, Deva Ramanan

Abstract: We present an approach to detecting and analyzing the 3D conﬁguration of objects in real-world images with heavy occlusion and clutter. We focus on the application of ﬁnding and analyzing cars. We do so with a two-stage model; the ﬁrst stage reasons about 2D shape and appearance variation due to within-class variation (station wagons look different than sedans) and changes in viewpoint. Rather than using a view-based model, we describe a compositional representation that models a large number of effective views and shapes using a small number of local view-based templates. We use this model to propose candidate detections and 2D estimates of shape. These estimates are then reﬁned by our second stage, using an explicit 3D model of shape and viewpoint. We use a morphable model to capture 3D within-class variation, and use a weak-perspective camera model to capture viewpoint. We learn all model parameters from 2D annotations. We demonstrate state-of-the-art accuracy for detection, viewpoint estimation, and 3D shape reconstruction on challenging images from the PASCAL VOC 2011 dataset. 1

4 0.19183123 344 nips-2012-Timely Object Recognition

Author: Sergey Karayev, Tobias Baumgartner, Mario Fritz, Trevor Darrell

Abstract: In a large visual multi-class detection framework, the timeliness of results can be crucial. Our method for timely multi-class detection aims to give the best possible performance at any single point after a start time; it is terminated at a deadline time. Toward this goal, we formulate a dynamic, closed-loop policy that infers the contents of the image in order to decide which detector to deploy next. In contrast to previous work, our method signiﬁcantly diverges from the predominant greedy strategies, and is able to learn to take actions with deferred values. We evaluate our method with a novel timeliness measure, computed as the area under an Average Precision vs. Time curve. Experiments are conducted on the PASCAL VOC object detection dataset. If execution is stopped when only half the detectors have been run, our method obtains 66% better AP than a random ordering, and 14% better performance than an intelligent baseline. On the timeliness measure, our method obtains at least 11% better performance. Our method is easily extensible, as it treats detectors and classiﬁers as black boxes and learns from execution traces using reinforcement learning. 1

5 0.13508891 62 nips-2012-Burn-in, bias, and the rationality of anchoring

Author: Falk Lieder, Thomas Griffiths, Noah Goodman

Abstract: Recent work in unsupervised feature learning has focused on the goal of discovering high-level features from unlabeled images. Much progress has been made in this direction, but in most cases it is still standard to use a large amount of labeled data in order to construct detectors sensitive to object classes or other complex patterns in the data. In this paper, we aim to test the hypothesis that unsupervised feature learning methods, provided with only unlabeled data, can learn high-level, invariant features that are sensitive to commonly-occurring objects. Though a handful of prior results suggest that this is possible when each object class accounts for a large fraction of the data (as in many labeled datasets), it is unclear whether something similar can be accomplished when dealing with completely unlabeled data. A major obstacle to this test, however, is scale: we cannot expect to succeed with small datasets or with small numbers of learned features. Here, we propose a large-scale feature learning system that enables us to carry out this experiment, learning 150,000 features from tens of millions of unlabeled images. Based on two scalable clustering algorithms (K-means and agglomerative clustering), we ﬁnd that our simple system can discover features sensitive to a commonly occurring object class (human faces) and can also combine these into detectors invariant to signiﬁcant global distortions like large translations and scale. 1

6 0.13508891 116 nips-2012-Emergence of Object-Selective Features in Unsupervised Feature Learning

7 0.13158651 106 nips-2012-Dynamical And-Or Graph Learning for Object Shape Modeling and Detection

8 0.11743027 357 nips-2012-Unsupervised Template Learning for Fine-Grained Object Recognition

9 0.11717013 311 nips-2012-Shifting Weights: Adapting Object Detectors from Image to Video

10 0.11483393 303 nips-2012-Searching for objects driven by context

11 0.097727895 193 nips-2012-Learning to Align from Scratch

12 0.095275797 137 nips-2012-From Deformations to Parts: Motion-based Segmentation of 3D Objects

13 0.092448995 8 nips-2012-A Generative Model for Parts-based Object Segmentation

14 0.088905431 360 nips-2012-Visual Recognition using Embedded Feature Selection for Curvature Self-Similarity

15 0.087453954 209 nips-2012-Max-Margin Structured Output Regression for Spatio-Temporal Action Localization

16 0.087238744 168 nips-2012-Kernel Latent SVM for Visual Recognition

17 0.075487979 185 nips-2012-Learning about Canonical Views from Internet Image Collections

18 0.072529651 83 nips-2012-Controlled Recognition Bounds for Visual Learning and Exploration

19 0.06660413 81 nips-2012-Context-Sensitive Decision Forests for Object Detection

20 0.061321992 18 nips-2012-A Simple and Practical Algorithm for Differentially Private Data Release

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.143), (1, 0.003), (2, -0.22), (3, -0.02), (4, 0.144), (5, -0.135), (6, 0.008), (7, -0.134), (8, 0.027), (9, -0.033), (10, -0.07), (11, 0.031), (12, 0.082), (13, -0.239), (14, 0.109), (15, 0.211), (16, 0.019), (17, -0.128), (18, -0.11), (19, 0.043), (20, 0.024), (21, 0.008), (22, -0.012), (23, 0.053), (24, -0.081), (25, 0.08), (26, 0.038), (27, 0.025), (28, -0.098), (29, -0.056), (30, 0.018), (31, -0.059), (32, -0.008), (33, 0.132), (34, -0.039), (35, -0.022), (36, -0.021), (37, -0.057), (38, -0.101), (39, -0.014), (40, -0.049), (41, 0.111), (42, -0.036), (43, -0.035), (44, -0.031), (45, -0.017), (46, -0.078), (47, 0.022), (48, 0.039), (49, -0.026)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96318561 1 nips-2012-3D Object Detection and Viewpoint Estimation with a Deformable 3D Cuboid Model

Author: Sanja Fidler, Sven Dickinson, Raquel Urtasun

2 0.91591346 201 nips-2012-Localizing 3D cuboids in single-view images

Author: Jianxiong Xiao, Bryan Russell, Antonio Torralba

3 0.78327203 40 nips-2012-Analyzing 3D Objects in Cluttered Images

Author: Mohsen Hejrati, Deva Ramanan

4 0.66499341 357 nips-2012-Unsupervised Template Learning for Fine-Grained Object Recognition

Author: Shulin Yang, Liefeng Bo, Jue Wang, Linda G. Shapiro

Abstract: Fine-grained recognition refers to a subordinate level of recognition, such as recognizing different species of animals and plants. It differs from recognition of basic categories, such as humans, tables, and computers, in that there are global similarities in shape and structure shared cross different categories, and the differences are in the details of object parts. We suggest that the key to identifying the ﬁne-grained differences lies in ﬁnding the right alignment of image regions that contain the same object parts. We propose a template model for the purpose, which captures common shape patterns of object parts, as well as the cooccurrence relation of the shape patterns. Once the image regions are aligned, extracted features are used for classiﬁcation. Learning of the template model is efﬁcient, and the recognition results we achieve signiﬁcantly outperform the stateof-the-art algorithms. 1

5 0.63807517 106 nips-2012-Dynamical And-Or Graph Learning for Object Shape Modeling and Detection

Author: Xiaolong Wang, Liang Lin

Abstract: This paper studies a novel discriminative part-based model to represent and recognize object shapes with an “And-Or graph”. We deﬁne this model consisting of three layers: the leaf-nodes with collaborative edges for localizing local parts, the or-nodes specifying the switch of leaf-nodes, and the root-node encoding the global veriﬁcation. A discriminative learning algorithm, extended from the CCCP [23], is proposed to train the model in a dynamical manner: the model structure (e.g., the conﬁguration of the leaf-nodes associated with the or-nodes) is automatically determined with optimizing the multi-layer parameters during the iteration. The advantages of our method are two-fold. (i) The And-Or graph model enables us to handle well large intra-class variance and background clutters for object shape detection from images. (ii) The proposed learning algorithm is able to obtain the And-Or graph representation without requiring elaborate supervision and initialization. We validate the proposed method on several challenging databases (e.g., INRIA-Horse, ETHZ-Shape, and UIUC-People), and it outperforms the state-of-the-arts approaches. 1

6 0.61236149 209 nips-2012-Max-Margin Structured Output Regression for Spatio-Temporal Action Localization

7 0.61118752 344 nips-2012-Timely Object Recognition

8 0.58040065 311 nips-2012-Shifting Weights: Adapting Object Detectors from Image to Video

9 0.57458031 8 nips-2012-A Generative Model for Parts-based Object Segmentation

10 0.56705165 303 nips-2012-Searching for objects driven by context

11 0.51876801 360 nips-2012-Visual Recognition using Embedded Feature Selection for Curvature Self-Similarity

12 0.50698042 137 nips-2012-From Deformations to Parts: Motion-based Segmentation of 3D Objects

13 0.44331181 223 nips-2012-Multi-criteria Anomaly Detection using Pareto Depth Analysis

14 0.42566043 103 nips-2012-Distributed Probabilistic Learning for Camera Networks with Missing Data

15 0.42364129 185 nips-2012-Learning about Canonical Views from Internet Image Collections

16 0.42198792 101 nips-2012-Discriminatively Trained Sparse Code Gradients for Contour Detection

17 0.39170673 2 nips-2012-3D Social Saliency from Head-mounted Cameras

18 0.37983516 168 nips-2012-Kernel Latent SVM for Visual Recognition

19 0.37802392 210 nips-2012-Memorability of Image Regions

20 0.37779778 83 nips-2012-Controlled Recognition Bounds for Visual Learning and Exploration

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.018), (17, 0.011), (21, 0.413), (38, 0.082), (42, 0.011), (54, 0.021), (55, 0.01), (74, 0.147), (76, 0.103), (80, 0.047), (86, 0.014), (92, 0.03)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.84546381 1 nips-2012-3D Object Detection and Viewpoint Estimation with a Deformable 3D Cuboid Model

Author: Sanja Fidler, Sven Dickinson, Raquel Urtasun

2 0.81910539 195 nips-2012-Learning visual motion in recurrent neural networks

Author: Marius Pachitariu, Maneesh Sahani

Abstract: We present a dynamic nonlinear generative model for visual motion based on a latent representation of binary-gated Gaussian variables. Trained on sequences of images, the model learns to represent different movement directions in different variables. We use an online approximate inference scheme that can be mapped to the dynamics of networks of neurons. Probed with drifting grating stimuli and moving bars of light, neurons in the model show patterns of responses analogous to those of direction-selective simple cells in primary visual cortex. Most model neurons also show speed tuning and respond equally well to a range of motion directions and speeds aligned to the constraint line of their respective preferred speed. We show how these computations are enabled by a speciﬁc pattern of recurrent connections learned by the model. 1

3 0.70780402 300 nips-2012-Scalable nonconvex inexact proximal splitting

Author: Suvrit Sra

Abstract: We study a class of large-scale, nonsmooth, and nonconvex optimization problems. In particular, we focus on nonconvex problems with composite objectives. This class includes the extensively studied class of convex composite objective problems as a subclass. To solve composite nonconvex problems we introduce a powerful new framework based on asymptotically nonvanishing errors, avoiding the common stronger assumption of vanishing errors. Within our new framework we derive both batch and incremental proximal splitting algorithms. To our knowledge, our work is ﬁrst to develop and analyze incremental nonconvex proximalsplitting algorithms, even if we were to disregard the ability to handle nonvanishing errors. We illustrate one instance of our general framework by showing an application to large-scale nonsmooth matrix factorization. 1

4 0.68729562 60 nips-2012-Bayesian nonparametric models for ranked data

Author: Francois Caron, Yee W. Teh

Abstract: We develop a Bayesian nonparametric extension of the popular Plackett-Luce choice model that can handle an inﬁnite number of choice items. Our framework is based on the theory of random atomic measures, with the prior speciﬁed by a gamma process. We derive a posterior characterization and a simple and effective Gibbs sampler for posterior simulation. We develop a time-varying extension of our model, and apply it to the New York Times lists of weekly bestselling books. 1

5 0.66880363 105 nips-2012-Dynamic Pruning of Factor Graphs for Maximum Marginal Prediction

Author: Christoph H. Lampert

Abstract: We study the problem of maximum marginal prediction (MMP) in probabilistic graphical models, a task that occurs, for example, as the Bayes optimal decision rule under a Hamming loss. MMP is typically performed as a two-stage procedure: one estimates each variable’s marginal probability and then forms a prediction from the states of maximal probability. In this work we propose a simple yet effective technique for accelerating MMP when inference is sampling-based: instead of the above two-stage procedure we directly estimate the posterior probability of each decision variable. This allows us to identify the point of time when we are sufﬁciently certain about any individual decision. Whenever this is the case, we dynamically prune the variables we are conﬁdent about from the underlying factor graph. Consequently, at any time only samples of variables whose decision is still uncertain need to be created. Experiments in two prototypical scenarios, multi-label classiﬁcation and image inpainting, show that adaptive sampling can drastically accelerate MMP without sacriﬁcing prediction accuracy. 1

6 0.64226168 123 nips-2012-Exponential Concentration for Mutual Information Estimation with Application to Forests

7 0.62062198 302 nips-2012-Scaling MPE Inference for Constrained Continuous Markov Random Fields with Consensus Optimization

8 0.57927394 18 nips-2012-A Simple and Practical Algorithm for Differentially Private Data Release

9 0.56540877 23 nips-2012-A lattice filter model of the visual pathway

10 0.54818267 201 nips-2012-Localizing 3D cuboids in single-view images

11 0.54781508 113 nips-2012-Efficient and direct estimation of a neural subunit model for sensory coding

12 0.52595824 137 nips-2012-From Deformations to Parts: Motion-based Segmentation of 3D Objects

13 0.52270359 112 nips-2012-Efficient Spike-Coding with Multiplicative Adaptation in a Spike Response Model

14 0.51250154 94 nips-2012-Delay Compensation with Dynamical Synapses

15 0.51210999 190 nips-2012-Learning optimal spike-based representations

16 0.51032633 256 nips-2012-On the connections between saliency and tracking

17 0.50792205 303 nips-2012-Searching for objects driven by context

18 0.50534862 40 nips-2012-Analyzing 3D Objects in Cluttered Images

19 0.50411898 114 nips-2012-Efficient coding provides a direct link between prior and likelihood in perceptual Bayesian inference

20 0.49825811 341 nips-2012-The topographic unsupervised learning of natural sounds in the auditory cortex