cvpr cvpr2013 cvpr2013-67 knowledge-graph by maker-knowledge-mining

67 cvpr-2013-Blocks That Shout: Distinctive Parts for Scene Classification

Source: pdf

Author: Mayank Juneja, Andrea Vedaldi, C.V. Jawahar, Andrew Zisserman

Abstract: The automatic discovery of distinctive parts for an object or scene class is challenging since it requires simultaneously to learn the part appearance and also to identify the part occurrences in images. In this paper, we propose a simple, efficient, and effective method to do so. We address this problem by learning parts incrementally, starting from a single part occurrence with an Exemplar SVM. In this manner, additional part instances are discovered and aligned reliably before being considered as training examples. We also propose entropy-rank curves as a means of evaluating the distinctiveness of parts shareable between categories and use them to select useful parts out of a set of candidates. We apply the new representation to the task of scene categorisation on the MIT Scene 67 benchmark. We show that our method can learn parts which are significantly more informative and for a fraction of the cost, compared to previouspart-learning methods such as Singh et al. [28]. We also show that a well constructed bag of words or Fisher vector model can substantially outperform the previous state-of- the-art classification performance on this data.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 uk Abstract The automatic discovery of distinctive parts for an object or scene class is challenging since it requires simultaneously to learn the part appearance and also to identify the part occurrences in images. [sent-10, score-1.15]

2 We address this problem by learning parts incrementally, starting from a single part occurrence with an Exemplar SVM. [sent-12, score-0.498]

3 We also propose entropy-rank curves as a means of evaluating the distinctiveness of parts shareable between categories and use them to select useful parts out of a set of candidates. [sent-14, score-0.735]

4 We show that our method can learn parts which are significantly more informative and for a fraction of the cost, compared to previouspart-learning methods such as Singh et al. [sent-16, score-0.34]

5 Yet, the automatic discovery of good parts is still a difficult problem. [sent-21, score-0.333]

6 In DPM, for example, part occurrences are initially assumed to be in a fixed location relative to the ground truth object bounding boxes, and then are refined as latent variables during learning [9]. [sent-22, score-0.367]

7 In this paper, a simple, efficient, and effective method for discovering parts automatically and with very little supervision is proposed. [sent-25, score-0.273]

8 Its power is demonstrated in the context of scene recognition where, unlike in object recognition, object bounding boxes are not available, making part alignment very challenging. [sent-26, score-0.306]

9 1 shows examples of the learned parts detected on the test set. [sent-29, score-0.377]

10 The first is to find and align part instances in the training data while a model of the part is not yet available. [sent-31, score-0.322]

11 While this procedure requires training a sequence of detectors, the LDA technique of [13] is used to avoid mining for hard negative examples, eliminating the main bottleneck in detector learning [9, 32], and enabling a very efficient part-learning algorithm. [sent-35, score-0.299]

12 The second issue is to select distinctive parts among the ones that are generated by the part mining process. [sent-36, score-0.672]

13 This criterion selects parts that are informative for a small proportion of classes. [sent-39, score-0.34]

14 Differently to other measures such as average precision, the resulting parts can then be shared by more than one object category. [sent-40, score-0.273]

15 This is particularly important because parts should be regarded as mid-level primitives that do not necessarily have to respond to a single object class. [sent-41, score-0.273]

16 Example of occurrences of distinctive parts learned by our method from weakly supervised image data. [sent-43, score-0.836]

17 These part occurrences are detected on the test data. [sent-44, score-0.374]

18 The result of our procedure is the automatic discovery of distinctive part detectors. [sent-46, score-0.403]

19 In models such as DPMs, parts are devoid of a specific semantic content and are used to represent deformations of a two dimensional template. [sent-51, score-0.273]

20 For example, in Poselets [5] object parts correspond to recognizable clusters in appearance and configuration, in Li et al. [sent-53, score-0.273]

21 [15] scene parts correspond to object categories, and in Raptis et al. [sent-54, score-0.378]

22 [25] action parts capture spatio-temporal components of human activities. [sent-55, score-0.273]

23 The learning of parts is usually integrated into the learning of a complete object or scene model [3, 9]. [sent-56, score-0.45]

24 [28] explores learning parts in both an unsupervised and weakly supervised manner, where the weakly supervised case (as here) only uses the class label of the image. [sent-60, score-0.631]

25 Their weakly supervised procedure is applied to the MIT Scene 67 dataset, obtaining state-of-the-art scene classification performance. [sent-61, score-0.335]

26 As will be seen though, our part-learning method is: (i) simpler, (ii) more efficient, and (iii) able to learn parts that are significantly better at scene classification. [sent-62, score-0.378]

27 [19] propose a reconfigurable version of a spatial bag of visual words (BoW) model that associates different BoW descriptors to different image segments, corresponding to different types of “stuff”. [sent-66, score-0.308]

28 The standard DPM model is applied to the task of scene categorization by Pandey and Lazebnik [18], but the problem of part initialization and learning is not addressed, and the quality of the parts that can be obtained in this manner remains unclear. [sent-68, score-0.602]

29 Blocks that shout: learning distinctive parts In characterizing images of particular scene classes, e. [sent-75, score-0.581]

30 In practice, however, a distinctive part is useful only if it can be detected automatically, preferably by an efficient and simple algorithm. [sent-80, score-0.343]

31 Moreover, distinctive parts may include other structures that have a weaker or more abstract semantic, such as the corners of a room or a corridor, particular shapes (rounded, square), and so on. [sent-81, score-0.44]

32 Designing a good vocabulary of parts is therefore best left to learning. [sent-82, score-0.273]

33 Learning a distinctive part means identifying a localized detectable entity that is informative for the task at hand (in our example discriminating different scene types). [sent-83, score-0.472]

34 This is very challenging because (i) one does not know if a part occurs in any given training image or not, and (ii) when the part occurs, one does not know its location. [sent-84, score-0.322]

35 While methods such as multiple instance learning have often been proposed to try to identify parts automatically, in practice they require careful initialization to work well. [sent-85, score-0.342]

36 An example block learnt automatically for the laundromat class. [sent-117, score-0.328]

37 This block is a characteristic part for this class. [sent-118, score-0.344]

38 All such blocks are treated initially as potentially different parts. [sent-120, score-0.268]

39 2) each block is used as a seed to build a model for a part while gradually searching for more and more part occurrences in the training data. [sent-123, score-0.844]

40 This paced expansion addresses the issue of detecting and localizing part exemplars. [sent-124, score-0.313]

41 3) finds the most distinctive parts in the pool of candidate parts generated by seeding and expansion by looking at their predictive power in terms of entropy-rank. [sent-127, score-1.014]

42 The procedure is weakly supervised in that positives are only sought in the seeding and expansion stages within images of a single class. [sent-128, score-0.483]

43 Once these distinctive parts are obtained, they can be used for a variety of tasks. [sent-129, score-0.44]

44 Seeding: proposing an initial set of parts Initially, no part model is available and, without further information, any sub-window in any training image is equally likely to contain a distinctive part. [sent-137, score-0.629]

45 In principle, one could simply try to learn a part model by starting from all possible image sub-windows and identify good parts aposteriori, during the selection stage (Sect. [sent-138, score-0.439]

46 Unfortunately, most of these parts will in fact not be distinctive (e. [sent-141, score-0.44]

47 Each part is described by a block cofe l8s × o f8 8 8H ×OG 8 pceixl esl,s . [sent-155, score-0.307]

48 AO part sllese,d a nisd i nhietinacleiz eodcc fuopri eesac ahn superpixel by centering the 64 64 pixel block at the center of mass of tcheen superpixel. [sent-157, score-0.307]

49 Figure 3 shows an example of the superpixels computed from a training image, and the seed blocks obtained using this procedure. [sent-159, score-0.537]

50 Expansion: learning part detectors Learning a part detector requires a set of part exemplars, and these need to be identified in the training data. [sent-162, score-0.621]

51 A possible approach is to sample at random a set of part occurrences, but this is extremely unlikely to hit multiple occurrences of the same part. [sent-163, score-0.331]

52 In practice, part initialization can be obtained by means of some heuristic, such as clustering patches, or taking parts at a fixed location assuming that images are at least partially aligned. [sent-164, score-0.406]

53 However, the detector of a part is, by definition, the most general and reliable tool for the identification of that part occurrences. [sent-165, score-0.331]

54 There is a special case in which a part detector can be learned without worrying about exemplar alignment: a training set consisting exactly of one part instance. [sent-166, score-0.533]

55 In practice, at each round of learning the current part model is used to rank blocks from images of the selected class and the highest scoring blocks are considered as further part occurrences. [sent-169, score-0.974]

56 Figure 4 shows an example seed part on the left, and the additional part occurrences that are added to the training set during successive iterations of expansion. [sent-174, score-0.67]

57 The super-pixels (b) suggest characteristic regions of the image, and blocks are formed for these. [sent-177, score-0.305]

58 The downside of this mining process is that the part detector must be learned multiple times. [sent-187, score-0.358]

59 In practice, the parameter vector w of a part classifier is learned simply as w = Σ−1 ( x¯ − μ0) where ¯x is the mean of the HOG features of the( positive part samples, μ0 is the mean of the HOG blocks in the dataset, and Σ the corresponding covariance matrix. [sent-190, score-0.635]

60 Selection: identifying distinctive parts Our notion of a discriminative block is that it should occur in many of the images of the class from which it is learnt, but not in many images from other classes. [sent-196, score-0.803]

61 How- ever, it is not reasonable to assume that parts (represented by blocks) are so discriminative that they only occur in the class from which they are learnt. [sent-197, score-0.415]

62 For example, the door of a washing machine will occur in the laundromat class, but can also occur in the kitchen or garage class. [sent-198, score-0.321]

63 However, one would not expect these parts to appear in many other of the indoor classes. [sent-200, score-0.316]

64 The block classifiers were learnt on training images for a particular class, and they are tested as detectors on validation images of all classes. [sent-203, score-0.464]

65 On the right the additional example blocks added to the positive training set for retraining the part detector are shown in the order that they are added. [sent-207, score-0.522]

66 Note that mining uses blocks selected from a certain scene category, but no other supervision is used. [sent-208, score-0.472]

67 learned from a class are not required to be detected only from images of that class; instead, the milder constraint that the distribution of classes in which the block is detected should have low entropy is imposed. [sent-209, score-0.532]

68 In this manner, distinctive but shareable mid-level parts can be selected. [sent-210, score-0.527]

69 For the laundromat example above, we would expect the washing machine door to be detected in only a handful of the classes, so the entropy would be low. [sent-211, score-0.361]

70 In contrast the block for a wall would be detected across many classes, so its distribution would be nearer uniform across classes, and hence the entropy higher. [sent-212, score-0.385]

71 To operationalize this requirement, each block is evaluated in a sliding-window manner on each validation image. [sent-213, score-0.276]

72 Then, five block occurrences are extracted from each image by max-pooling in five image regions, corresponding to the spatial subdivisions used in the encoding of Sect. [sent-214, score-0.595]

73 Classifier (a) has low entropy at top ranks, which shows that it is picking the blocks from a few classes. [sent-219, score-0.392]

74 block occurrence (zi, yi) detected in this manner receives a detection score z and a class label y equal to the label of the image. [sent-222, score-0.376]

75 The blocks are sorted on their score z, and the top r ranking blocks selected. [sent-223, score-0.536]

76 We introduce Entropy-Rank Curves (ER curves) to measure the entropy of a block classifier at different ranks. [sent-229, score-0.298]

77 Note, entropy for all part classifiers converges to a constant value (which depends on the class prior) as the rank increases. [sent-232, score-0.371]

78 In fact, there is no guarantee that the part mining procedure will not return the same or similar parts multiple times. [sent-236, score-0.548]

79 Bag of parts In order to compute an image-level descriptor from the parts learned in Sect. [sent-272, score-0.645]

80 Note that the method of selecting the parts (Sect. [sent-276, score-0.273]

81 In order to use non-linear additive kernels instead of the linear one, the χ2 explicit feature map of [33] is used (the bag of parts and bag of words histograms are l1 normalized). [sent-301, score-0.665]

82 Experiments and results The part-learning algorithm is evaluated on the task of scene classification on the MIT 67 indoor scene dataset of Quattoni and Torralba [24]. [sent-308, score-0.303]

83 For each of these seed blocks, a classifier is learned by following the expansion procedure of Sect. [sent-360, score-0.369]

84 We sample about 620,000 HOG blocks randomly from the training set, and compute the mean (μ0) and covariance (Σ) ofthis set. [sent-363, score-0.364]

85 Once the parts have been learned as described in Sect. [sent-366, score-0.334]

86 3, the bag of parts representation is extracted from each training image as described in Sect. [sent-370, score-0.504]

87 Finally, 67 one-vs-rest SVMs are learned from the training images, and the resulting scene classifiers are evaluated on the test data. [sent-373, score-0.288]

88 As one can expect, the classification accuracy increases as more parts are added to the representation (Table 2), but the peak is at around 50 parts per category. [sent-374, score-0.596]

89 The probable reason is a lack of training material (after all the parts and classifiers are learned on the same 999992222288666 methods (previous publications and this paper). [sent-375, score-0.501]

90 So the parts found by our algorithm are much more informative, improving the accuracy by 8% using only a quarter of the number of detectors. [sent-389, score-0.273]

91 In the final experiment, the BoP (using 50 parts per class) and BoW/IFV representa- ? [sent-396, score-0.273]

92 Figure 6 shows qualitative results obtained by the combined bag of parts and IFV method. [sent-446, score-0.481]

93 Summary We have presented a novel method to learn distinctive parts of objects or scenes automatically, from image-level category labels. [sent-448, score-0.44]

94 The key problem of simultaneously learning a part model and detecting its occurrences in the training data was solved by paced learning of Exemplar SVMs, growing a model from just one occurrence of the part. [sent-449, score-0.58]

95 The distinctiveness of parts was measured by the new concept of entropy-rank, capturing the idea that parts are at the same time predictive of certain object categories but shareable between different categories. [sent-450, score-0.734]

96 The learned parts have been shown to perform very well on the task of scene classification, where they improved a very solid bag of words or Fisher Vector baseline that in itself establishes the new state-of-the-art on the MIT Scene 67 benchmark. [sent-451, score-0.656]

97 This mid-level representation is useful for other tasks, for example to initialize the region models of [19] or the part models of [18], and yields more understandable and diagnosable models than the original bag of visual words method. [sent-453, score-0.35]

98 (a), (c) Seed blocks and the learnt HOG templates, and (b), (d) detections on the validation set images. [sent-458, score-0.371]

99 Beyond bag of features: Spatial pyramid matching for recognizing natural scene categories. [sent-554, score-0.28]

100 Discovering discriminative [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] action parts from mid-level video representations. [sent-640, score-0.323]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('ifv', 0.294), ('parts', 0.273), ('blocks', 0.268), ('occurrences', 0.198), ('bow', 0.183), ('bag', 0.175), ('block', 0.174), ('distinctive', 0.167), ('seeding', 0.152), ('seed', 0.15), ('rootsift', 0.134), ('part', 0.133), ('entropy', 0.124), ('encodings', 0.116), ('expansion', 0.115), ('mit', 0.111), ('fisher', 0.107), ('scene', 0.105), ('mining', 0.099), ('laundromat', 0.098), ('shout', 0.098), ('encoding', 0.094), ('round', 0.088), ('weakly', 0.087), ('shareable', 0.087), ('hog', 0.085), ('exemplar', 0.085), ('corridor', 0.069), ('distinctiveness', 0.067), ('informative', 0.067), ('classifiers', 0.066), ('paced', 0.065), ('subdivisions', 0.065), ('upstream', 0.065), ('detector', 0.065), ('detectors', 0.065), ('superpixels', 0.063), ('lda', 0.063), ('learned', 0.061), ('discovery', 0.06), ('quattoni', 0.058), ('washing', 0.058), ('occurrence', 0.056), ('training', 0.056), ('learnt', 0.056), ('manner', 0.055), ('supervised', 0.05), ('discriminative', 0.05), ('classification', 0.05), ('er', 0.049), ('descriptors', 0.048), ('parizi', 0.048), ('class', 0.048), ('notion', 0.047), ('validation', 0.047), ('grids', 0.045), ('pegasos', 0.045), ('publications', 0.045), ('wall', 0.044), ('poselets', 0.044), ('vedaldi', 0.044), ('occur', 0.044), ('pandey', 0.043), ('reconfigurable', 0.043), ('indoor', 0.043), ('procedure', 0.043), ('detected', 0.043), ('rescaling', 0.042), ('raptis', 0.042), ('words', 0.042), ('singh', 0.041), ('dpm', 0.041), ('sadeghi', 0.041), ('amit', 0.041), ('covariance', 0.04), ('kitchen', 0.039), ('dpms', 0.039), ('classes', 0.039), ('door', 0.038), ('objectness', 0.038), ('descriptor', 0.038), ('retained', 0.037), ('cats', 0.037), ('characteristic', 0.037), ('pooling', 0.037), ('learning', 0.036), ('sought', 0.036), ('curves', 0.035), ('boxes', 0.035), ('exemplars', 0.034), ('auc', 0.034), ('predictive', 0.034), ('seeds', 0.033), ('combined', 0.033), ('alignment', 0.033), ('svms', 0.033), ('identify', 0.033), ('quantization', 0.033), ('voc', 0.032), ('five', 0.032)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9999997 67 cvpr-2013-Blocks That Shout: Distinctive Parts for Scene Classification

Author: Mayank Juneja, Andrea Vedaldi, C.V. Jawahar, Andrew Zisserman

2 0.27978507 325 cvpr-2013-Part Discovery from Partial Correspondence

Author: Subhransu Maji, Gregory Shakhnarovich

Abstract: We study the problem of part discovery when partial correspondence between instances of a category are available. For visual categories that exhibit high diversity in structure such as buildings, our approach can be used to discover parts that are hard to name, but can be easily expressed as a correspondence between pairs of images. Parts naturally emerge from point-wise landmark matches across many instances within a category. We propose a learning framework for automatic discovery of parts in such weakly supervised settings, and show the utility of the rich part library learned in this way for three tasks: object detection, category-specific saliency estimation, and fine-grained image parsing.

3 0.25878045 248 cvpr-2013-Learning Collections of Part Models for Object Recognition

Author: Ian Endres, Kevin J. Shih, Johnston Jiaa, Derek Hoiem

Abstract: We propose a method to learn a diverse collection of discriminative parts from object bounding box annotations. Part detectors can be trained and applied individually, which simplifies learning and extension to new features or categories. We apply the parts to object category detection, pooling part detections within bottom-up proposed regions and using a boosted classifier with proposed sigmoid weak learners for scoring. On PASCAL VOC 2010, we evaluate the part detectors ’ ability to discriminate and localize annotated keypoints. Our detection system is competitive with the best-existing systems, outperforming other HOG-based detectors on the more deformable categories.

4 0.18647154 153 cvpr-2013-Expanded Parts Model for Human Attribute and Action Recognition in Still Images

Author: Gaurav Sharma, Frédéric Jurie, Cordelia Schmid

Abstract: We propose a new model for recognizing human attributes (e.g. wearing a suit, sitting, short hair) and actions (e.g. running, riding a horse) in still images. The proposed model relies on a collection of part templates which are learnt discriminatively to explain specific scale-space locations in the images (in human centric coordinates). It avoids the limitations of highly structured models, which consist of a few (i.e. a mixture of) ‘average ’ templates. To learn our model, we propose an algorithm which automatically mines out parts and learns corresponding discriminative templates with their respective locations from a large number of candidate parts. We validate the method on recent challenging datasets: (i) Willow 7 actions [7], (ii) 27 Human Attributes (HAT) [25], and (iii) Stanford 40 actions [37]. We obtain convincing qualitative and state-of-the-art quantitative results on the three datasets.

5 0.14740208 60 cvpr-2013-Beyond Physical Connections: Tree Models in Human Pose Estimation

Author: Fang Wang, Yi Li

Abstract: Simple tree models for articulated objects prevails in the last decade. However, it is also believed that these simple tree models are not capable of capturing large variations in many scenarios, such as human pose estimation. This paper attempts to address three questions: 1) are simple tree models sufficient? more specifically, 2) how to use tree models effectively in human pose estimation? and 3) how shall we use combined parts together with single parts efficiently? Assuming we have a set of single parts and combined parts, and the goal is to estimate a joint distribution of their locations. We surprisingly find that no latent variables are introduced in the Leeds Sport Dataset (LSP) during learning latent trees for deformable model, which aims at approximating the joint distributions of body part locations using minimal tree structure. This suggests one can straightforwardly use a mixed representation of single and combined parts to approximate their joint distribution in a simple tree model. As such, one only needs to build Visual Categories of the combined parts, and then perform inference on the learned latent tree. Our method outperformed the state of the art on the LSP, both in the scenarios when the training images are from the same dataset and from the PARSE dataset. Experiments on animal images from the VOC challenge further support our findings.

6 0.14618805 217 cvpr-2013-Improving an Object Detector and Extracting Regions Using Superpixels

7 0.14218049 66 cvpr-2013-Block and Group Regularized Sparse Modeling for Dictionary Learning

8 0.14190623 445 cvpr-2013-Understanding Bayesian Rooms Using Composite 3D Object Models

9 0.13804938 49 cvpr-2013-Augmenting Bag-of-Words: Data-Driven Discovery of Temporal and Structural Information for Activity Recognition

10 0.13003126 200 cvpr-2013-Harvesting Mid-level Visual Concepts from Large-Scale Internet Images

11 0.12609008 388 cvpr-2013-Semi-supervised Learning of Feature Hierarchies for Object Detection in a Video

12 0.12475965 154 cvpr-2013-Explicit Occlusion Modeling for 3D Object Class Representations

13 0.12228693 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases

14 0.12110605 268 cvpr-2013-Leveraging Structure from Motion to Learn Discriminative Codebooks for Scalable Landmark Classification

15 0.12094085 70 cvpr-2013-Bottom-Up Segmentation for Top-Down Detection

16 0.12017397 189 cvpr-2013-Graph-Based Discriminative Learning for Location Recognition

17 0.11574899 355 cvpr-2013-Representing Videos Using Mid-level Discriminative Patches

18 0.1133854 163 cvpr-2013-Fast, Accurate Detection of 100,000 Object Classes on a Single Machine

19 0.11283195 456 cvpr-2013-Visual Place Recognition with Repetitive Structures

20 0.1124791 173 cvpr-2013-Finding Things: Image Parsing with Regions and Per-Exemplar Detectors

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.285), (1, -0.133), (2, 0.001), (3, -0.051), (4, 0.067), (5, 0.041), (6, 0.034), (7, 0.115), (8, -0.035), (9, -0.071), (10, -0.095), (11, -0.026), (12, 0.066), (13, -0.044), (14, 0.033), (15, -0.071), (16, 0.075), (17, -0.023), (18, -0.023), (19, -0.017), (20, 0.083), (21, -0.008), (22, 0.174), (23, -0.018), (24, 0.012), (25, 0.128), (26, -0.013), (27, 0.014), (28, -0.064), (29, -0.036), (30, 0.005), (31, 0.044), (32, -0.003), (33, 0.014), (34, 0.02), (35, -0.036), (36, 0.013), (37, -0.07), (38, 0.008), (39, -0.055), (40, 0.019), (41, -0.085), (42, -0.077), (43, 0.01), (44, 0.004), (45, -0.067), (46, -0.042), (47, -0.124), (48, -0.067), (49, -0.073)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96567863 67 cvpr-2013-Blocks That Shout: Distinctive Parts for Scene Classification

Author: Mayank Juneja, Andrea Vedaldi, C.V. Jawahar, Andrew Zisserman

2 0.83835 325 cvpr-2013-Part Discovery from Partial Correspondence

Author: Subhransu Maji, Gregory Shakhnarovich

3 0.82827455 248 cvpr-2013-Learning Collections of Part Models for Object Recognition

Author: Ian Endres, Kevin J. Shih, Johnston Jiaa, Derek Hoiem

4 0.78501242 144 cvpr-2013-Efficient Maximum Appearance Search for Large-Scale Object Detection

Author: Qiang Chen, Zheng Song, Rogerio Feris, Ankur Datta, Liangliang Cao, Zhongyang Huang, Shuicheng Yan

Abstract: In recent years, efficiency of large-scale object detection has arisen as an important topic due to the exponential growth in the size of benchmark object detection datasets. Most current object detection methods focus on improving accuracy of large-scale object detection with efficiency being an afterthought. In this paper, we present the Efficient Maximum Appearance Search (EMAS) model which is an order of magnitude faster than the existing state-of-the-art large-scale object detection approaches, while maintaining comparable accuracy. Our EMAS model consists of representing an image as an ensemble of densely sampled feature points with the proposed Pointwise Fisher Vector encoding method, so that the learnt discriminative scoring function can be applied locally. Consequently, the object detection problem is transformed into searching an image sub-area for maximum local appearance probability, thereby making EMAS an order of magnitude faster than the traditional detection methods. In addition, the proposed model is also suitable for incorporating global context at a negligible extra computational cost. EMAS can also incorporate fusion of multiple features, which greatly improves its performance in detecting multiple object categories. Our experiments show that the proposed algorithm can perform detection of 1000 object classes in less than one minute per image on the Image Net ILSVRC2012 dataset and for 107 object classes in less than 5 seconds per image for the SUN09 dataset using a single CPU.

5 0.75750673 153 cvpr-2013-Expanded Parts Model for Human Attribute and Action Recognition in Still Images

Author: Gaurav Sharma, Frédéric Jurie, Cordelia Schmid

6 0.74847627 221 cvpr-2013-Incorporating Structural Alternatives and Sharing into Hierarchy for Multiclass Object Recognition and Detection

7 0.73680973 417 cvpr-2013-Subcategory-Aware Object Classification

8 0.72019678 247 cvpr-2013-Learning Class-to-Image Distance with Object Matchings

9 0.70411742 136 cvpr-2013-Discriminatively Trained And-Or Tree Models for Object Detection

10 0.68491012 200 cvpr-2013-Harvesting Mid-level Visual Concepts from Large-Scale Internet Images

11 0.68300813 173 cvpr-2013-Finding Things: Image Parsing with Regions and Per-Exemplar Detectors

12 0.68040562 204 cvpr-2013-Histograms of Sparse Codes for Object Detection

13 0.66274345 53 cvpr-2013-BFO Meets HOG: Feature Extraction Based on Histograms of Oriented p.d.f. Gradients for Image Classification

14 0.66259098 78 cvpr-2013-Capturing Layers in Image Collections with Componential Models: From the Layered Epitome to the Componential Counting Grid

15 0.65721023 445 cvpr-2013-Understanding Bayesian Rooms Using Composite 3D Object Models

16 0.64704591 364 cvpr-2013-Robust Object Co-detection

17 0.64555162 30 cvpr-2013-Accurate Localization of 3D Objects from RGB-D Data Using Segmentation Hypotheses

18 0.64120626 134 cvpr-2013-Discriminative Sub-categorization

19 0.63223398 174 cvpr-2013-Fine-Grained Crowdsourcing for Fine-Grained Recognition

20 0.62893647 452 cvpr-2013-Vantage Feature Frames for Fine-Grained Categorization

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(10, 0.142), (16, 0.023), (26, 0.064), (28, 0.017), (33, 0.275), (48, 0.141), (67, 0.085), (69, 0.086), (72, 0.011), (80, 0.01), (87, 0.058)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.91120344 67 cvpr-2013-Blocks That Shout: Distinctive Parts for Scene Classification

Author: Mayank Juneja, Andrea Vedaldi, C.V. Jawahar, Andrew Zisserman

2 0.91031933 248 cvpr-2013-Learning Collections of Part Models for Object Recognition

Author: Ian Endres, Kevin J. Shih, Johnston Jiaa, Derek Hoiem

3 0.90533781 325 cvpr-2013-Part Discovery from Partial Correspondence

Author: Subhransu Maji, Gregory Shakhnarovich

4 0.90509719 414 cvpr-2013-Structure Preserving Object Tracking

Author: Lu Zhang, Laurens van_der_Maaten

Abstract: Model-free trackers can track arbitrary objects based on a single (bounding-box) annotation of the object. Whilst the performance of model-free trackers has recently improved significantly, simultaneously tracking multiple objects with similar appearance remains very hard. In this paper, we propose a new multi-object model-free tracker (based on tracking-by-detection) that resolves this problem by incorporating spatial constraints between the objects. The spatial constraints are learned along with the object detectors using an online structured SVM algorithm. The experimental evaluation ofour structure-preserving object tracker (SPOT) reveals significant performance improvements in multi-object tracking. We also show that SPOT can improve the performance of single-object trackers by simultaneously tracking different parts of the object.

5 0.9020893 70 cvpr-2013-Bottom-Up Segmentation for Top-Down Detection

Author: Sanja Fidler, Roozbeh Mottaghi, Alan Yuille, Raquel Urtasun

Abstract: In this paper we are interested in how semantic segmentation can help object detection. Towards this goal, we propose a novel deformable part-based model which exploits region-based segmentation algorithms that compute candidate object regions by bottom-up clustering followed by ranking of those regions. Our approach allows every detection hypothesis to select a segment (including void), and scores each box in the image using both the traditional HOG filters as well as a set of novel segmentation features. Thus our model “blends ” between the detector and segmentation models. Since our features can be computed very efficiently given the segments, we maintain the same complexity as the original DPM [14]. We demonstrate the effectiveness of our approach in PASCAL VOC 2010, and show that when employing only a root filter our approach outperforms Dalal & Triggs detector [12] on all classes, achieving 13% higher average AP. When employing the parts, we outperform the original DPM [14] in 19 out of 20 classes, achieving an improvement of 8% AP. Furthermore, we outperform the previous state-of-the-art on VOC’10 test by 4%.

6 0.90109849 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases

7 0.90071666 104 cvpr-2013-Deep Convolutional Network Cascade for Facial Point Detection

8 0.89966434 445 cvpr-2013-Understanding Bayesian Rooms Using Composite 3D Object Models

9 0.89963299 285 cvpr-2013-Minimum Uncertainty Gap for Robust Visual Tracking

10 0.89915484 221 cvpr-2013-Incorporating Structural Alternatives and Sharing into Hierarchy for Multiclass Object Recognition and Detection

11 0.89913094 311 cvpr-2013-Occlusion Patterns for Object Class Detection

12 0.89845484 225 cvpr-2013-Integrating Grammar and Segmentation for Human Pose Estimation

13 0.89781749 387 cvpr-2013-Semi-supervised Domain Adaptation with Instance Constraints

14 0.89706886 408 cvpr-2013-Spatiotemporal Deformable Part Models for Action Detection

15 0.89582509 314 cvpr-2013-Online Object Tracking: A Benchmark

16 0.89564782 4 cvpr-2013-3D Visual Proxemics: Recognizing Human Interactions in 3D from a Single Image

17 0.89526635 288 cvpr-2013-Modeling Mutual Visibility Relationship in Pedestrian Detection

18 0.89462364 372 cvpr-2013-SLAM++: Simultaneous Localisation and Mapping at the Level of Objects

19 0.89434594 119 cvpr-2013-Detecting and Aligning Faces by Image Retrieval

20 0.89421743 61 cvpr-2013-Beyond Point Clouds: Scene Understanding by Reasoning Geometry and Physics