nips nips2012 nips2012-303 knowledge-graph by maker-knowledge-mining

303 nips-2012-Searching for objects driven by context

Source: pdf

Author: Bogdan Alexe, Nicolas Heess, Yee W. Teh, Vittorio Ferrari

Abstract: The dominant visual search paradigm for object class detection is sliding windows. Although simple and effective, it is also wasteful, unnatural and rigidly hardwired. We propose strategies to search for objects which intelligently explore the space of windows by making sequential observations at locations decided based on previous observations. Our strategies adapt to the class being searched and to the content of a particular test image, exploiting context as the statistical relation between the appearance of a window and its location relative to the object, as observed in the training set. In addition to being more elegant than sliding windows, we demonstrate experimentally on the PASCAL VOC 2010 dataset that our strategies evaluate two orders of magnitude fewer windows while achieving higher object detection performance. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 We propose strategies to search for objects which intelligently explore the space of windows by making sequential observations at locations decided based on previous observations. [sent-3, score-0.867]

2 Our strategies adapt to the class being searched and to the content of a particular test image, exploiting context as the statistical relation between the appearance of a window and its location relative to the object, as observed in the training set. [sent-4, score-0.794]

3 In addition to being more elegant than sliding windows, we demonstrate experimentally on the PASCAL VOC 2010 dataset that our strategies evaluate two orders of magnitude fewer windows while achieving higher object detection performance. [sent-5, score-1.186]

4 Among the broad palette of approaches [2, 22, 31], most state-of-the-art detectors rely on the sliding window paradigm [7, 8, 12, 15, 30, 31]. [sent-7, score-0.816]

5 A classiﬁer is trained to decide whether a window contains an instance of the target class and is used at test time to score all windows in an image over a regular grid in location and scale. [sent-8, score-1.329]

6 Despite its popularity, the sliding window paradigm seems wasteful and unnatural. [sent-10, score-0.811]

7 Each strategy is speciﬁc to an object class and intelligently explores the space of windows by making sequential observations at locations decided based on previous observations. [sent-15, score-0.983]

8 The strategy might start at window w1 , which is a patch of sky. [sent-17, score-0.582]

9 The strategy has learned from the training data that cars are typically below the sky, so it decides to try a window below w1 , e. [sent-18, score-0.71]

10 To achieve this it models the statistical relation between the position and appearance of windows in the training images to their relative position wrt to the ground-truth (sec. [sent-26, score-0.734]

11 It greatly reduces the number of observed windows, and therefore the number of times a window classiﬁer is evaluated (potentially very expensive [15, 30]). [sent-29, score-0.565]

12 An ideal search strategy moves through the sequence of windows w1 to w4 . [sent-32, score-0.691]

13 false-positive rates, and therefore higher object detection performance than sliding windows, despite evaluating fewer windows. [sent-34, score-0.648]

14 5 we report experiments on the highly challenging PASCAL VOC 2010 dataset, using the popular deformable part model of [12] as the window classiﬁer. [sent-39, score-0.565]

15 The experiments demonstrate that our learned strategies perform better in terms of object detection accuracy than sliding windows, while greatly reducing the number of classiﬁer evaluations by a factor of 250× (100 vs 25000 in [12]). [sent-40, score-0.696]

16 Moreover, we outperform two recent methods to reduce the number of classiﬁer evaluations [1, 29] as they evaluate about 1000 windows while losing detection accuracy compared to sliding windows. [sent-41, score-0.997]

17 To our knowledge, this is the ﬁrst method capable of saving window evaluations while at the same time improving detection accuracy. [sent-42, score-0.714]

18 Several works try to reduce the number of windows evaluated in the traditional sliding-window paradigm. [sent-44, score-0.571]

19 [20] proposed a branch-and-bound scheme to ﬁnd the highest scored window while evaluating the classiﬁer as few times as possible. [sent-46, score-0.585]

20 The recent approaches [1, 29] evaluate the classiﬁer only on a small number of windows likely to cover objects rather than backgrounds. [sent-49, score-0.61]

21 All these methods use context as an additional cue on top of individual object detectors, whereas in our approach context drives the search for an object in the image, determining the sequence of windows where the classiﬁer is evaluated. [sent-56, score-1.091]

22 Analog to our work, [5] reduces the number of window classiﬁer evaluations, avoiding the wasteful sliding window scheme. [sent-61, score-1.312]

23 Our search instead is driven by the relation between the appearance of a window and the relative location of the object, as learned from annotated training images. [sent-63, score-0.791]

24 This has the added beneﬁt of improving object detection accuracy compared to sliding windows. [sent-64, score-0.582]

25 (a) Three windows wl in a training image and their displacement vector dl . [sent-74, score-0.84]

26 Applying dl to the current observation window wt results in the translated windows wt ⊕ dl . [sent-76, score-1.675]

27 2 Overview of our method Our method detects an object in a test image with a sequential process, by evaluating one window yt at each time step t. [sent-77, score-1.093]

28 At each time step, it actively decides which window to evaluate next based on all past observations, trying to acquire observations that will improve the global location estimate. [sent-79, score-0.7]

29 The key driving force here is the statistical dependency between the position/appearance of a window and the ground-truth location of the object (e. [sent-81, score-0.77]

30 Our method ﬁrst ﬁnds training windows similar in position/appearance to the current window yt in the test image. [sent-84, score-1.334]

31 Then, each such training window votes for a possible object location in the test image through its displacement vector relative to the ground-truth object (ﬁg. [sent-85, score-1.211]

32 The maps are then integrated over time and used to decide which window to evaluate next (sec. [sent-89, score-0.555]

33 The behavior of our decision process is controlled by the weights of the various features in the similarity measure used to compare windows in the test image to training windows. [sent-92, score-0.721]

34 The process involves comparing high-dimensional appearance descriptors between a test window yt and hundreds of thousand training windows. [sent-96, score-0.838]

35 Given a test image x, it sequentially collects a ﬁxed number T of observations yt for windows wt before making a ﬁnal detection decision. [sent-100, score-1.266]

36 At each time step t the next observation window is chosen based on all past observations. [sent-101, score-0.618]

37 wt we need to actively choose the window wt+1 where to make the next observation yt+1 . [sent-108, score-0.785]

38 Hence, we want to pick a search policy π S that chooses windows leading to observations that enable the output policy π O to make a good detection decision. [sent-128, score-1.064]

39 3 In the following we assume that a window wt = (xt , y t , st ) is deﬁned by its x, y location and scale s. [sent-136, score-0.785]

40 In any given image x there is a ﬁxed set of windows from a dense grid in x, y and scale space that depends on the image size and the aspect ratio of the class under consideration (see sec. [sent-137, score-0.785]

41 t t t An observation consists of J feature vectors fj which describe a window yt = (f1 , . [sent-139, score-0.808]

42 4 details the speciﬁc grid and window features we use. [sent-144, score-0.56]

43 1 Search policy The search policy π S determines the choice of the next observation window given the observation history at time step t. [sent-146, score-0.939]

44 , wt , yt ) from past observations to the next observation window. [sent-150, score-0.553]

45 , wt , yt ; Θ) over all possible candidate observations locations w in the test image given the past observation windows. [sent-154, score-0.73]

46 The mapping chooses the window with highest probability in M as the next observation window wt+1 = π S (w1 , y1 , . [sent-156, score-1.117]

47 These maps are obtained independently at each time step and can be seen as distributions over windows w, given the information about the image from that time step only. [sent-164, score-0.665]

48 We sample a large number L of windows wl uniformly from all training images, and we store their position wl = (xl , y l , sl ), the associated feature vectors yl , as well as the displacement vectors dl that record the location of the ground-truth object relative to a window. [sent-169, score-1.106]

49 Each window in a training image can use this to vote for the relative position dl where it expects the object to be. [sent-170, score-1.06]

50 Given the current observation yt for image window wt in the test image, the distribution over object positions is then given by the spatial distribution of these votes L KF (yt , yl ; ΘF ) · KS (w, wt ⊕ dl ; ΘS ). [sent-171, score-1.656]

51 The summation over all L training windows is computationally expensive. [sent-173, score-0.591]

52 In practice we truncate it and consider only the Z training windows most similar to the current observation window yt . [sent-174, score-1.362]

53 For KF we use an exponential function on distances computed separately for each type j of feature vector fj describing a window (see sec. [sent-179, score-0.559]

54 The next observation window (green) is chosen as the highest probability window in the current vote map M t . [sent-188, score-1.288]

55 Next we retrieve the Z most similar training windows according to KF (NN arrow, for ‘nearest neighbors’). [sent-189, score-0.591]

56 Normalizing the vote map m(w; wt , yt , Θt ) in eq. [sent-196, score-0.577]

57 (2) at a time ˜ step t yields a conditional distribution over candidate observation locations given the observation yt at window wt m(w; wt , yt , Θ) ˜ m(w|wt , yt , Θ) = (4) t t ˜ w m(w ; w , y , Θ) In order to obtain M t (w|w1 y1 . [sent-197, score-1.706]

58 wt yt ; Θ) we integrate the normalized vote maps over all past time steps 1 . [sent-200, score-0.614]

59 , wt , yt ; Θ) = a(t, t )m(w|wt , yt , Θ), (5) t =1 where a(t, t ) = α(1 − α)t−t for t > 1 and a(1, t) = (1 − α)t−1 for some constant 0 < α ≤ 1. [sent-206, score-0.602]

60 Even though the next observation window is chosen deterministically (eq. [sent-217, score-0.575]

61 (1)), by deriving it from the probabilistic vote-map M t and updating this map over time we are effectively maintaining an estimate of the uncertainty about which are good candidate windows to visit in the next step. [sent-218, score-0.639]

62 It should be seen as a policy to propose windows that should be visited in the future. [sent-220, score-0.699]

63 wT in the test image, our strategy must output a single window which it believes to be the most likely to contain an object of the class of interest. [sent-233, score-0.835]

64 As our strategy is designed to visit good candidate windows over the course 2 In the experiments we set α = 0. [sent-234, score-0.66]

65 5 of its search, we simply output as the ﬁnal detection the visited window that has the highest score according to a window classiﬁer c trained beforehand for that class [12] (see sec. [sent-236, score-1.309]

66 3 (7) Learning weights F Our search policy involves the feature weights ΘF = {θj } in the window similarity kernel (eq. [sent-238, score-0.761]

67 the parameters ΘF ; the search is a sequential decision process where window selected at different time steps depend on each other; the policy is non-differentiable with-respect to ΘF (due to the max in eq. [sent-241, score-0.748]

68 The ﬁrst subset provides the L training windows for the nonparametric representation of m in eq. [sent-246, score-0.591]

69 On the second subset we run a stochastic version of our search strategy in which we sample the next observation window according to wt+1 ∼ M t (·|w1 , y1 , . [sent-248, score-0.716]

70 Running the strategy once on the bˆ th training image produces a sample sequence of windows and associated observations h = ˆ ˆ ˆ ˆ (w1 , y1 , . [sent-252, score-0.784]

71 3 The objective (8) tries to maximize the overlap KS with the ground-truth bounding-box weighted by M t , hence encouraging the policy to choose for the next step windows that are likely to lie on the object to be detected. [sent-263, score-0.86]

72 As window classiﬁer we choose the popular multiscale deformable part model of [12]. [sent-267, score-0.589]

73 The score of a window at location (x, y, s) is a weighted sum of the score of the root ﬁlter, the part ﬁlters and a deformation cost measuring the deviation of the part from its mean training position. [sent-269, score-0.668]

74 The work [12] also deﬁnes a multiscale image grid which forms the ﬁxed set of windows observable by our method (sec. [sent-270, score-0.694]

75 Note how all windows in this grid have the same aspect-ratio, as there is a separate window classiﬁer per object viewpoint [12]. [sent-272, score-1.333]

76 The kernel KF used for computing the similarity KF between two windows in eq. [sent-274, score-0.593]

77 (3) involves J = 3 feature types: (j=1) f1 is the (x, y, s) location normalized by image size; (j=2) f2 is a histogram of oriented gradients (HOG) [7]; (j=3) f3 is the score of the window classiﬁer c [12]. [sent-275, score-0.683]

78 β acts as an inverse temˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ M perature and interpolates between a uniform policy (for β → 0) and a policy that always selects the highest probability window as in eq. [sent-286, score-0.772]

79 wt , yt ), t=1 ht−1 p (h w M(w|h M t p (h ) is the distribution over observation sequences of length t resulting from the stochastic policy M , and b r(w) = KS (w, wGT ). [sent-292, score-0.574]

80 For each method we show the average number of windows evaluated per image (#win), the detection rate (DR) and the mean average precision (mAP) over all 54 class-viewpoint combinations. [sent-297, score-0.784]

81 It encourages the search policy to continue to probe nearby windows after one observation hits an instance of the class. [sent-311, score-0.799]

82 We embed the window appearance features (HOG) in a Hamming space of dimension 128 using [14], thus going from 49600 bits to just 128. [sent-313, score-0.576]

83 First, it reduces the memory footprint for storing the appearance descriptor for all training windows of a class to the point where they all ﬁt in memory at once. [sent-315, score-0.678]

84 This speedup is very useful as the number of training windows is typically very large (from a few hundred thousands up to a million depending on the class in our experiments). [sent-319, score-0.624]

85 This makes sense as our method returns exactly one window per image (sec. [sent-334, score-0.604]

86 Sliding Window is the standard sliding-window scheme of [12], which scores about 25000 windows on a multiscale image grid (for an average VOC image). [sent-339, score-0.694]

87 Random Chance scores 100 randomly sampled windows on the same grid using the same classiﬁer [12]. [sent-340, score-0.588]

88 We also compare to two recent methods designed to reduce the number of classiﬁer evaluations by proposing a limited number of candidate windows likely to cover all objects in the image (Objectness [1] and Selective Search [29])5 . [sent-341, score-0.762]

89 As a reference, Sliding Window [12] reaches a good detection accuracy, but at the price of evaluating many windows (25000). [sent-349, score-0.745]

90 Random Chance fails entirely and achieves a very low detection accuracy, showing that an intelligent search strategy is necessary when evaluating very few windows (100). [sent-350, score-0.914]

91 The two competing methods [1, 29] exhibit a trade-off: they evaluate fewer windows than Sliding Window, but at the price of losing some detection accuracy (conﬁrming what reported in [1, 29]). [sent-351, score-0.745]

92 For the car-right example our strategy outputs exactly the same window as Sliding Window. [sent-364, score-0.582]

93 Second row: examples where our method succeeds but Sliding Window fails, because it avoids evaluating cluttered areas where the window classiﬁer [12] produces false positives. [sent-365, score-0.604]

94 7% mAP), while at the same time greatly reducing the number of evaluated windows (250× fewer). [sent-370, score-0.593]

95 The reason is that our method exploits context to avoid evaluating large portions of the image, which often contain highly cluttered areas where the window classiﬁer [12] risks producing false-positives (ﬁg. [sent-374, score-0.639]

96 While our method greatly reduces the number of windows evaluated, it introduces two overheads: (1) nearest neighbor lookup takes between 2. [sent-377, score-0.593]

97 7s, depending on the class (as the number L of training windows varies, see sec. [sent-379, score-0.624]

98 Our total detection time for an average class (5s) is moderately shorter that scoring all windows on the grid (8s), as [12] is already very efﬁcient. [sent-384, score-0.773]

99 On an average image containing 25000 windows, sliding window takes 92s to run, whereas our method takes only 8s, hence achieving a 11× speedup (at no loss of mAP). [sent-389, score-0.839]

100 We have proposed a novel object detection technique to replace sliding window with an intelligent search strategy which exploits context to reduce the number of window evaluations and improve detection performance. [sent-391, score-1.994]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('windows', 0.55), ('window', 0.522), ('sliding', 0.235), ('wt', 0.21), ('yt', 0.196), ('object', 0.195), ('detection', 0.152), ('vote', 0.132), ('policy', 0.115), ('kf', 0.098), ('ks', 0.088), ('er', 0.087), ('image', 0.082), ('search', 0.081), ('dl', 0.065), ('strategy', 0.06), ('objects', 0.06), ('displacement', 0.059), ('appearance', 0.054), ('cvpr', 0.054), ('observation', 0.053), ('location', 0.053), ('classi', 0.052), ('observations', 0.051), ('hog', 0.051), ('pascal', 0.048), ('saliency', 0.048), ('voc', 0.047), ('wgt', 0.047), ('wl', 0.043), ('deformable', 0.043), ('evaluating', 0.043), ('greatly', 0.043), ('images', 0.043), ('past', 0.043), ('training', 0.041), ('locations', 0.04), ('evaluations', 0.04), ('cluttered', 0.039), ('votes', 0.039), ('map', 0.039), ('grid', 0.038), ('detectors', 0.038), ('fj', 0.037), ('localize', 0.036), ('cars', 0.035), ('context', 0.035), ('dr', 0.034), ('yl', 0.034), ('chance', 0.034), ('visited', 0.034), ('iccv', 0.034), ('maps', 0.033), ('class', 0.033), ('wasteful', 0.033), ('decides', 0.031), ('objectness', 0.031), ('strategies', 0.031), ('candidate', 0.03), ('sequential', 0.03), ('hamming', 0.03), ('win', 0.029), ('viewpoint', 0.028), ('vedaldi', 0.028), ('fails', 0.028), ('alexe', 0.027), ('dj', 0.027), ('score', 0.026), ('val', 0.025), ('roads', 0.025), ('test', 0.025), ('spatial', 0.025), ('car', 0.024), ('nn', 0.024), ('multiscale', 0.024), ('rapid', 0.024), ('intelligently', 0.024), ('similarity', 0.023), ('sky', 0.023), ('surf', 0.023), ('fewer', 0.023), ('position', 0.023), ('visual', 0.022), ('selective', 0.022), ('detect', 0.022), ('larochelle', 0.022), ('lampert', 0.022), ('try', 0.021), ('paradigm', 0.021), ('cyan', 0.021), ('intersection', 0.02), ('arrow', 0.02), ('kernel', 0.02), ('driven', 0.02), ('instances', 0.02), ('annotated', 0.02), ('visit', 0.02), ('losing', 0.02), ('highest', 0.02), ('eccv', 0.02)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000004 303 nips-2012-Searching for objects driven by context

Author: Bogdan Alexe, Nicolas Heess, Yee W. Teh, Vittorio Ferrari

2 0.24188614 344 nips-2012-Timely Object Recognition

Author: Sergey Karayev, Tobias Baumgartner, Mario Fritz, Trevor Darrell

Abstract: In a large visual multi-class detection framework, the timeliness of results can be crucial. Our method for timely multi-class detection aims to give the best possible performance at any single point after a start time; it is terminated at a deadline time. Toward this goal, we formulate a dynamic, closed-loop policy that infers the contents of the image in order to decide which detector to deploy next. In contrast to previous work, our method signiﬁcantly diverges from the predominant greedy strategies, and is able to learn to take actions with deferred values. We evaluate our method with a novel timeliness measure, computed as the area under an Average Precision vs. Time curve. Experiments are conducted on the PASCAL VOC object detection dataset. If execution is stopped when only half the detectors have been run, our method obtains 66% better AP than a random ordering, and 14% better performance than an intelligent baseline. On the timeliness measure, our method obtains at least 11% better performance. Our method is easily extensible, as it treats detectors and classiﬁers as black boxes and learns from execution traces using reinforcement learning. 1

3 0.16157562 218 nips-2012-Mixing Properties of Conditional Markov Chains with Unbounded Feature Functions

Author: Mathieu Sinn, Bei Chen

Abstract: Conditional Markov Chains (also known as Linear-Chain Conditional Random Fields in the literature) are a versatile class of discriminative models for the distribution of a sequence of hidden states conditional on a sequence of observable variables. Large-sample properties of Conditional Markov Chains have been ﬁrst studied in [1]. The paper extends this work in two directions: ﬁrst, mixing properties of models with unbounded feature functions are being established; second, necessary conditions for model identiﬁability and the uniqueness of maximum likelihood estimates are being given. 1

4 0.15354793 252 nips-2012-On Multilabel Classification and Ranking with Partial Feedback

Author: Claudio Gentile, Francesco Orabona

Abstract: We present a novel multilabel/ranking algorithm working in partial information settings. The algorithm is based on 2nd-order descent methods, and relies on upper-conﬁdence bounds to trade-off exploration and exploitation. We analyze this algorithm in a partial adversarial setting, where covariates can be adversarial, but multilabel probabilities are ruled by (generalized) linear models. We show O(T 1/2 log T ) regret bounds, which improve in several ways on the existing results. We test the effectiveness of our upper-conﬁdence scheme by contrasting against full-information baselines on real-world multilabel datasets, often obtaining comparable performance. 1

5 0.14601739 83 nips-2012-Controlled Recognition Bounds for Visual Learning and Exploration

Author: Vasiliy Karasev, Alessandro Chiuso, Stefano Soatto

Abstract: We describe the tradeoff between the performance in a visual recognition problem and the control authority that the agent can exercise on the sensing process. We focus on the problem of “visual search” of an object in an otherwise known and static scene, propose a measure of control authority, and relate it to the expected risk and its proxy (conditional entropy of the posterior density). We show this analytically, as well as empirically by simulation using the simplest known model that captures the phenomenology of image formation, including scaling and occlusions. We show that a “passive” agent given a training set can provide no guarantees on performance beyond what is afforded by the priors, and that an “omnipotent” agent, capable of inﬁnite control authority, can achieve arbitrarily good performance (asymptotically). In between these limiting cases, the tradeoff can be characterized empirically. 1

6 0.14402297 38 nips-2012-Algorithms for Learning Markov Field Policies

7 0.13842641 91 nips-2012-Deep Neural Networks Segment Neuronal Membranes in Electron Microscopy Images

8 0.11983132 360 nips-2012-Visual Recognition using Embedded Feature Selection for Curvature Self-Similarity

9 0.11895437 201 nips-2012-Localizing 3D cuboids in single-view images

10 0.11857496 273 nips-2012-Predicting Action Content On-Line and in Real Time before Action Onset – an Intracranial Human Study

11 0.11760175 13 nips-2012-A Nonparametric Conjugate Prior Distribution for the Maximizing Argument of a Noisy Function

12 0.11539704 209 nips-2012-Max-Margin Structured Output Regression for Spatio-Temporal Action Localization

13 0.11483393 1 nips-2012-3D Object Detection and Viewpoint Estimation with a Deformable 3D Cuboid Model

14 0.11267956 106 nips-2012-Dynamical And-Or Graph Learning for Object Shape Modeling and Detection

15 0.10791832 40 nips-2012-Analyzing 3D Objects in Cluttered Images

16 0.099462546 314 nips-2012-Slice Normalized Dynamic Markov Logic Networks

17 0.099385396 292 nips-2012-Regularized Off-Policy TD-Learning

18 0.094322897 8 nips-2012-A Generative Model for Parts-based Object Segmentation

19 0.092217654 311 nips-2012-Shifting Weights: Adapting Object Detectors from Image to Video

20 0.08557266 81 nips-2012-Context-Sensitive Decision Forests for Object Detection

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.192), (1, -0.114), (2, -0.145), (3, 0.119), (4, 0.167), (5, -0.172), (6, 0.004), (7, -0.071), (8, 0.039), (9, -0.005), (10, -0.062), (11, -0.016), (12, 0.169), (13, -0.075), (14, 0.093), (15, 0.077), (16, 0.03), (17, -0.036), (18, -0.032), (19, -0.046), (20, -0.025), (21, -0.038), (22, 0.001), (23, 0.053), (24, 0.018), (25, -0.057), (26, 0.0), (27, 0.035), (28, -0.023), (29, -0.036), (30, 0.061), (31, 0.106), (32, -0.084), (33, 0.054), (34, 0.054), (35, 0.013), (36, 0.009), (37, 0.083), (38, -0.091), (39, 0.03), (40, 0.023), (41, -0.044), (42, -0.039), (43, 0.046), (44, 0.037), (45, 0.02), (46, 0.039), (47, 0.012), (48, 0.084), (49, 0.001)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96598047 303 nips-2012-Searching for objects driven by context

Author: Bogdan Alexe, Nicolas Heess, Yee W. Teh, Vittorio Ferrari

2 0.71991152 344 nips-2012-Timely Object Recognition

Author: Sergey Karayev, Tobias Baumgartner, Mario Fritz, Trevor Darrell

3 0.64927107 201 nips-2012-Localizing 3D cuboids in single-view images

Author: Jianxiong Xiao, Bryan Russell, Antonio Torralba

Abstract: In this paper we seek to detect rectangular cuboids and localize their corners in uncalibrated single-view images depicting everyday scenes. In contrast to recent approaches that rely on detecting vanishing points of the scene and grouping line segments to form cuboids, we build a discriminative parts-based detector that models the appearance of the cuboid corners and internal edges while enforcing consistency to a 3D cuboid model. Our model copes with different 3D viewpoints and aspect ratios and is able to detect cuboids across many different object categories. We introduce a database of images with cuboid annotations that spans a variety of indoor and outdoor scenes and show qualitative and quantitative results on our collected database. Our model out-performs baseline detectors that use 2D constraints alone on the task of localizing cuboid corners. 1

4 0.64092427 83 nips-2012-Controlled Recognition Bounds for Visual Learning and Exploration

Author: Vasiliy Karasev, Alessandro Chiuso, Stefano Soatto

5 0.63641346 209 nips-2012-Max-Margin Structured Output Regression for Spatio-Temporal Action Localization

Author: Du Tran, Junsong Yuan

Abstract: Structured output learning has been successfully applied to object localization, where the mapping between an image and an object bounding box can be well captured. Its extension to action localization in videos, however, is much more challenging, because we need to predict the locations of the action patterns both spatially and temporally, i.e., identifying a sequence of bounding boxes that track the action in video. The problem becomes intractable due to the exponentially large size of the structured video space where actions could occur. We propose a novel structured learning approach for spatio-temporal action localization. The mapping between a video and a spatio-temporal action trajectory is learned. The intractable inference and learning problems are addressed by leveraging an efﬁcient Max-Path search method, thus making it feasible to optimize the model over the whole structured space. Experiments on two challenging benchmark datasets show that our proposed method outperforms the state-of-the-art methods. 1

6 0.62382811 1 nips-2012-3D Object Detection and Viewpoint Estimation with a Deformable 3D Cuboid Model

7 0.59663731 357 nips-2012-Unsupervised Template Learning for Fine-Grained Object Recognition

8 0.5880143 106 nips-2012-Dynamical And-Or Graph Learning for Object Shape Modeling and Detection

9 0.57202697 360 nips-2012-Visual Recognition using Embedded Feature Selection for Curvature Self-Similarity

10 0.52260268 40 nips-2012-Analyzing 3D Objects in Cluttered Images

11 0.52067012 72 nips-2012-Cocktail Party Processing via Structured Prediction

12 0.51434249 311 nips-2012-Shifting Weights: Adapting Object Detectors from Image to Video

13 0.51015615 8 nips-2012-A Generative Model for Parts-based Object Segmentation

14 0.49690843 223 nips-2012-Multi-criteria Anomaly Detection using Pareto Depth Analysis

15 0.48611516 91 nips-2012-Deep Neural Networks Segment Neuronal Membranes in Electron Microscopy Images

16 0.48031881 146 nips-2012-Graphical Gaussian Vector for Image Categorization

17 0.47798112 210 nips-2012-Memorability of Image Regions

18 0.47436884 101 nips-2012-Discriminatively Trained Sparse Code Gradients for Contour Detection

19 0.46979874 28 nips-2012-A systematic approach to extracting semantic information from functional MRI data

20 0.46609318 137 nips-2012-From Deformations to Parts: Motion-based Segmentation of 3D Objects

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.023), (17, 0.026), (21, 0.06), (38, 0.085), (39, 0.021), (42, 0.026), (44, 0.014), (54, 0.046), (55, 0.021), (74, 0.138), (76, 0.142), (80, 0.081), (86, 0.18), (92, 0.03)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.87398839 137 nips-2012-From Deformations to Parts: Motion-based Segmentation of 3D Objects

Author: Soumya Ghosh, Matthew Loper, Erik B. Sudderth, Michael J. Black

Abstract: We develop a method for discovering the parts of an articulated object from aligned meshes of the object in various three-dimensional poses. We adapt the distance dependent Chinese restaurant process (ddCRP) to allow nonparametric discovery of a potentially unbounded number of parts, while simultaneously guaranteeing a spatially connected segmentation. To allow analysis of datasets in which object instances have varying 3D shapes, we model part variability across poses via afﬁne transformations. By placing a matrix normal-inverse-Wishart prior on these afﬁne transformations, we develop a ddCRP Gibbs sampler which tractably marginalizes over transformation uncertainty. Analyzing a dataset of humans captured in dozens of poses, we infer parts which provide quantitatively better deformation predictions than conventional clustering methods.

same-paper 2 0.86984849 303 nips-2012-Searching for objects driven by context

Author: Bogdan Alexe, Nicolas Heess, Yee W. Teh, Vittorio Ferrari

3 0.7939496 57 nips-2012-Bayesian estimation of discrete entropy with mixtures of stick-breaking priors

Author: Evan Archer, Il M. Park, Jonathan W. Pillow

Abstract: We consider the problem of estimating Shannon’s entropy H in the under-sampled regime, where the number of possible symbols may be unknown or countably inﬁnite. Dirichlet and Pitman-Yor processes provide tractable prior distributions over the space of countably inﬁnite discrete distributions, and have found major applications in Bayesian non-parametric statistics and machine learning. Here we show that they provide natural priors for Bayesian entropy estimation, due to the analytic tractability of the moments of the induced posterior distribution over entropy H. We derive formulas for the posterior mean and variance of H given data. However, we show that a ﬁxed Dirichlet or Pitman-Yor process prior implies a narrow prior on H, meaning the prior strongly determines the estimate in the under-sampled regime. We therefore deﬁne a family of continuous mixing measures such that the resulting mixture of Dirichlet or Pitman-Yor processes produces an approximately ﬂat prior over H. We explore the theoretical properties of the resulting estimators and show that they perform well on data sampled from both exponential and power-law tailed distributions. 1

4 0.77887374 3 nips-2012-A Bayesian Approach for Policy Learning from Trajectory Preference Queries

Author: Aaron Wilson, Alan Fern, Prasad Tadepalli

Abstract: We consider the problem of learning control policies via trajectory preference queries to an expert. In particular, the agent presents an expert with short runs of a pair of policies originating from the same state and the expert indicates which trajectory is preferred. The agent’s goal is to elicit a latent target policy from the expert with as few queries as possible. To tackle this problem we propose a novel Bayesian model of the querying process and introduce two methods that exploit this model to actively select expert queries. Experimental results on four benchmark problems indicate that our model can effectively learn policies from trajectory preference queries and that active query selection can be substantially more efﬁcient than random selection. 1

5 0.76355523 201 nips-2012-Localizing 3D cuboids in single-view images

Author: Jianxiong Xiao, Bryan Russell, Antonio Torralba

6 0.75985253 274 nips-2012-Priors for Diversity in Generative Latent Variable Models

7 0.7588172 210 nips-2012-Memorability of Image Regions

8 0.75768024 339 nips-2012-The Time-Marginalized Coalescent Prior for Hierarchical Clustering

9 0.75622129 357 nips-2012-Unsupervised Template Learning for Fine-Grained Object Recognition

10 0.74826753 176 nips-2012-Learning Image Descriptors with the Boosting-Trick

11 0.74491793 74 nips-2012-Collaborative Gaussian Processes for Preference Learning

12 0.74473387 360 nips-2012-Visual Recognition using Embedded Feature Selection for Curvature Self-Similarity

13 0.7424733 8 nips-2012-A Generative Model for Parts-based Object Segmentation

14 0.74201185 193 nips-2012-Learning to Align from Scratch

15 0.74110031 202 nips-2012-Locally Uniform Comparison Image Descriptor

16 0.74024981 209 nips-2012-Max-Margin Structured Output Regression for Spatio-Temporal Action Localization

17 0.74000478 337 nips-2012-The Lovász ϑ function, SVMs and finding large dense subgraphs

18 0.73998833 168 nips-2012-Kernel Latent SVM for Visual Recognition

19 0.7399314 81 nips-2012-Context-Sensitive Decision Forests for Object Detection

20 0.73979247 185 nips-2012-Learning about Canonical Views from Internet Image Collections