iccv iccv2013 iccv2013-349 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Xiaoyu Wang, Ming Yang, Shenghuo Zhu, Yuanqing Lin
Abstract: Generic object detection is confronted by dealing with different degrees of variations in distinct object classes with tractable computations, which demands for descriptive and flexible object representations that are also efficient to evaluate for many locations. In view of this, we propose to model an object class by a cascaded boosting classifier which integrates various types of features from competing local regions, named as regionlets. A regionlet is a base feature extraction region defined proportionally to a detection window at an arbitrary resolution (i.e. size and aspect ratio). These regionlets are organized in small groups with stable relative positions to delineate fine-grained spatial layouts inside objects. Their features are aggregated to a one-dimensional feature within one group so as to tolerate deformations. Then we evaluate the object bounding box proposal in selective search from segmentation cues, limiting the evaluation locations to thousands. Our approach significantly outperforms the state-of-the-art on popular multi-class detection benchmark datasets with a single method, without any contexts. It achieves the detec- tion mean average precision of 41. 7% on the PASCAL VOC 2007 dataset and 39. 7% on the VOC 2010 for 20 object categories. It achieves 14. 7% mean average precision on the ImageNet dataset for 200 object categories, outperforming the latest deformable part-based model (DPM) by 4. 7%.
Reference: text
sentIndex sentText sentNum sentScore
1 com Abstract Generic object detection is confronted by dealing with different degrees of variations in distinct object classes with tractable computations, which demands for descriptive and flexible object representations that are also efficient to evaluate for many locations. [sent-3, score-0.27]
2 A regionlet is a base feature extraction region defined proportionally to a detection window at an arbitrary resolution (i. [sent-5, score-0.804]
3 These regionlets are organized in small groups with stable relative positions to delineate fine-grained spatial layouts inside objects. [sent-8, score-0.876]
4 Then we evaluate the object bounding box proposal in selective search from segmentation cues, limiting the evaluation locations to thousands. [sent-10, score-0.335]
5 Introduction Despite the success of face detection where the target objects are roughly rigid, generic object detection remains an open problem mainly due to the challenge of handling all possible variations with tractable computations. [sent-19, score-0.258]
6 , deformable objects may appear somehow rigid at a distance and even rigid objects may show larger variations in different view angles. [sent-24, score-0.277]
7 Regionlet representation can be applied to candidate bounding boxes that have different sizes and aspect ratios. [sent-99, score-0.324]
8 A regionlet-based model is composed of a number of regions (denoted by blue rectangles), and then each region is represented by a group of regionlets (denoted by the small orange rectangles inside each region). [sent-100, score-0.91]
9 dilemma to object class representations: on one hand, a delicate model describing rigid object appearances may hardly handle deformable objects; on the other hand, a high tolerance of deformation may result in imprecise localization or false positives for rigid objects. [sent-101, score-0.369]
10 Prior arts in object detection cope with object deformation efficiently with primarily three typical strategies. [sent-102, score-0.256]
11 First, if spatial layouts of object appearances are roughly rigid such as faces or pedestrians at a distance, the classical Adaboost detection [26] mainly tackles local variations with an ensemble classifier of efficient features. [sent-103, score-0.285]
12 Second, the deformable part model (DPM) method [12] inherits the HOG window template matching [6] but explicitly models de- formations by latent variables, where an exhaustive search of possible locations, scales, and aspect ratios are critical to localize objects. [sent-105, score-0.263]
13 These successful detection approaches inspire us to investigate a descriptive and flexible object representation, which delivers the modeling capacity for both rigid and deformable objects in a unified framework. [sent-112, score-0.311]
14 In this paper, we propose a new object representation strategy for generic object detection, which incorporates adaptive deformation handling into both object classifier learning and basic feature extraction. [sent-113, score-0.305]
15 Each object bounding box is classified by a cascaded boosting classifier, where each weak classifier takes the feature response of a region inside the bounding box as its input and then the region is in turn represented by a group of small sub-regions, named as regionlets. [sent-114, score-0.818]
16 The sets of regionlets are selected from a huge pool of candidate regionlet groups by boosting. [sent-115, score-1.406]
17 On one hand, the relative spatial positions of the regionlets within a region and the region within an object bounding box are stable. [sent-116, score-1.163]
18 Therefore, the proposed regionlet representation can model fine-grained spatial appearance layouts. [sent-117, score-0.553]
19 On the other hand, the feature responses of regionlets within one group are aggregated to a one dimensional feature, and the resulting feature is generally robust to local deformation. [sent-118, score-0.849]
20 Also, our regionlet model is designed to be flexible to take bounding boxes with different sizes and aspect ratios. [sent-119, score-0.799]
21 Therefore our approach is ready to utilizes the selective search strategy [25] to evaluate on merely thousands of candidate bounding boxes rather than hundreds of thousands (if not millions) of sliding windows as in the exhaustive search. [sent-120, score-0.488]
22 An illustration of the regionlet representation is shown in Figure 1, where the regionlets drawn as orange boxes are grouped within blue rectangular regions. [sent-121, score-1.346]
23 The regionlets and their groups for one object class are learned in boosting with stable relative positions to each other. [sent-122, score-0.927]
24 When they are applied to two candidate bounding boxes, the feature responses of regionlets are obtained at the their respective scales and aspect ratios without enumerating all possible spatial configurations. [sent-123, score-1.08]
25 1) It introduces the regionlet concept which is flexible to extract features from arbitrary bounding boxes. [sent-125, score-0.653]
26 2) The regionlet-based representation for an object class, which not only models relative spatial layouts inside an object but also accommodates variations especially deformations by the regionlet group selection in boosting and the aggregation of feature responses in a regionlet group. [sent-126, score-1.464]
27 Since the resolutions of the object templates are fixed, an exhaustive sliding window search [12] is required to find objects at different scales and different aspect ratios. [sent-135, score-0.291]
28 In contrast, our regionlet-based detection handles object deformation directly in feature extraction, and it is flexible to deal with different scaling and aspect ratios without the need of an exhaustive search. [sent-137, score-0.376]
29 Recently, a new detection strategy [2, 25, 20] is to use multi-scale image segmentation to propose a couple thou- sands of candidate bounding boxes for each image and then the object categories of the bounding boxes are determined by strong object classifiers, e. [sent-138, score-0.591]
30 Beyond the straightforward exhaustive search of all locations, our regionlet detection approach screens the candidate windows derived from the selective search [25]. [sent-148, score-0.899]
31 For selective search, given an image, it first over-segments the image into superpixels, and then those superpixels are grouped in a bottomup manner to propose candidate bounding boxes. [sent-149, score-0.265]
32 To that end, we introduce regionlet features for each candidate bounding box. [sent-152, score-0.706]
33 In our proposed method, we construct a largely overcomplete regionlet feature pool and then design a cascaded boosting learning process to select the most discriminative regionlets for detection. [sent-153, score-1.507]
34 1 describes what the regionlets are and explains how they are designed to handle deformation. [sent-155, score-0.728]
35 2 presents how to construct a largely over-complete regionlet pool and learn a cascaded boosting classifier for an object category by selecting the most discriminative regionlets. [sent-157, score-0.812]
36 1 Regionlet definition In object detection, an object category is essentially defined by a classifier where both object appearance and the spatial layout inside an object shall be taken into account. [sent-162, score-0.262]
37 We would like to introduce the regionlets with an example illustrated in Figure 2. [sent-170, score-0.728]
38 A rectangle feature extraction region inside the bounding box is denoted as R, which will contribute a weak classifier to the boosting classifier. [sent-172, score-0.517]
39 We employ the term regionlet, because the features of these sub-regions will be aggregated to a single feature for R, and they are below the level of a standalone feature extraction region in an object classifier. [sent-176, score-0.275]
40 This example also illustrates how regionlets are designed to handle deformation. [sent-178, score-0.728]
41 Hand, as a supposingly informative Figure 2: Illustration of the relationship among a detection bounding box, a feature extraction region and regionlets. [sent-179, score-0.334]
42 Inside R, several small sub-regions denoted as r1, r2 and r3 (in orange small rectangules) are the regionlets to capture the possible locations of the hand for person detection. [sent-181, score-0.82]
43 To that end, we introduce three regionlets inside R (In general, a region can contain many regionlets. [sent-185, score-0.848]
44 Each regionlet r covers a possible location of hand. [sent-187, score-0.528]
45 Then only features from the regionlets are extracted and aggregated to generate a compact representation for R. [sent-188, score-0.754]
46 More regionlets in R will increase the capacity to model deformations, e. [sent-190, score-0.728]
47 On the other hand, rigid objects may only require one regionlet from a feature extraction region. [sent-193, score-0.701]
48 , the HOG [6] and LBP descriptors [1] from each regionlet respectively; and 2) generating the representation of R based on regionlets’ features. [sent-198, score-0.528]
49 Let’s denote T(R) as the feature representation for region R, T(rj) as the feature extracted from the jth regionlet rj in R, then the 19 operation is defined as following: T(R) =? [sent-201, score-0.748]
50 NRαj= 1, j=1 where NR is the total number of regionlets in region R, αj is a binary variable, either 0 or 1. [sent-203, score-0.817]
51 This operation is permutation invariant, namely, the occurrence of the appearance cues in any of regionlets is equivalent, which allows deformations among these regionlet locations. [sent-204, score-1.325]
52 The operation also assumes the exclusiveness within a group of regionlets, namely, one and only one regionlet will contribute to the region feature representation. [sent-205, score-0.695]
53 In our framework, we simply apply max-pooling over regionlet features. [sent-207, score-0.528]
54 For each regionlet rj, we first extract low-level feature vectors, such as HOG or LBP histograms. [sent-211, score-0.566]
55 Then, we pick a 1D feature from the same dimension of these feature vectors in each regionlet and apply Eq. [sent-212, score-0.604]
56 Assuming the first dimension of the concatenated low-level features is the most distinctive feature dimension learned for hand, we collect this dimension from all the three regionlets and represent T(R) by the strongest feature response from the top regionlet. [sent-219, score-0.804]
57 3 Regionlets normalized by detection windows In this work, the proposed regionlet representations are evaluated on the candidate bounding boxes derived from selective search approach [25]. [sent-222, score-0.99]
58 The selective search approach first over-segments an images into superpixels, and then the superpixel are grouped in a bottom-up manner to propose some candidate bounding boxes. [sent-224, score-0.3]
59 This approach typically produces 1000 to 2000 candidate bounding boxes for an object detector to evaluate on, compared to millions of windows in an exhaustive sliding window search. [sent-225, score-0.484]
60 However, these proposed bounding boxes have arbitrary sizes and aspect ratios. [sent-280, score-0.247]
61 We address this difficulty by using the relative positions and sizes of the regionlets and their groups to an object bounding box. [sent-282, score-0.968]
62 Figure 4 shows our way of defining regionlets in contrast to fixed regions with absolute sizes. [sent-283, score-0.749]
63 When using a sliding window search, a feature extraction region is often defined by the top-left (l, t) and the bottom-right corner (r, b) w. [sent-284, score-0.259]
64 These relative region definitions allow us to directly evaluate the regionlets-based representation on candidate windows at different sizes and aspect ratios without scaling images into multiple resolutions or using multiples components for enumerating possible aspect ratios. [sent-294, score-0.4]
65 Learning the object detection model We follow the boosting framework to learn the discriminative regionlet groups and their configurations from a huge pool of candidate regions and regionlets. [sent-297, score-0.899]
66 A set of small regionlets that is effective to capture finger-level deformation may hardly handle deformation caused by hand movements. [sent-302, score-0.906]
67 In order to deal with diverse variations, we build a largely overcomplete pool for regions and regionlets with various positions, aspect ratios, and sizes. [sent-303, score-0.865]
68 3, we denote the 1D feature of a region relative to a bounding box as R? [sent-367, score-0.291]
69 The region pool is spanned by X Y W H F, where X and Y are respectively the space of horizontal and vertical anchor position of R in the detection window, W and H are the width and height ofthe feature extraction region R? [sent-375, score-0.438]
70 Afterwards, we propose a set of regionlets with random positions inside each region. [sent-383, score-0.782]
71 Although the sizes of regionlets in a region could be arbitrary in general, we restrict regionlets in a group to have the identical size because our regionlets are designed to capture the same appearance in different possible locations due to deformation. [sent-384, score-2.342]
72 The sizes of regionlets in different groups could be different. [sent-385, score-0.781]
73 A region may contain up to 5 regionlets in our implementation. [sent-386, score-0.817]
74 So the final feature space used as the feature pool for boosting is spanned by R C, where R is the region feature prototype space, C is the configuration space of regionlets. [sent-387, score-0.361]
75 2 Training with boosting regionlet features We use RealBoost [22] to train cascaded classifiers for our object detector. [sent-396, score-0.739]
76 3 with the extracted feature, we can get the weak classifier in the tth round of train21 ing for the bounding box Q: n−1 ht(T(R? [sent-408, score-0.252]
77 t,j(Q) ⎠⎞ ,(5) where it is the index of the region selected in the tth round of training, Nit is the total number of regionlets in Rit , and βt is the weight of the selected weak classifier. [sent-417, score-0.897]
78 The classification result of the candidate bounding box Q is determined by the final round of cascade if it passes all previous ones, and it is expressed as f(Q) = sign(H∗ (Q)) where H∗ denotes the last stage of cascade. [sent-418, score-0.29]
79 Given a test image, we first propose a number of candidate bounding boxes using the selective search [25]. [sent-425, score-0.368]
80 Then, each candidate bounding box is passed along the cascaded classifiers learned in the boosting process. [sent-426, score-0.389]
81 Because of early rejections, only a small number of candidate bounding boxes reach the last stage of the cascade. [sent-427, score-0.246]
82 Experiment on PASCAL VOC datasets In our implementation of regionlet-based detection, we utilize the selective search bounding boxes from [25] to train our detector. [sent-439, score-0.291]
83 To validate the advantages of the proposed approach, we compare it with three baselines: deformable part-based model [12] which is one of the most effective sliding window based detectors, and two recent approaches based on selective search [3, 25]. [sent-441, score-0.275]
84 Compared to [25], our edge comes from our regionlet representation encoding object’s spatial configuration. [sent-452, score-0.553]
85 Compared to [12], our improvement on accuracy is led by the joint deformation and misalignment handling powered by the regionlets representation with multiple resolution features. [sent-453, score-0.805]
86 If we limit the number of regionlets in a region to be 1, our method obtained a mean average precision of 36. [sent-454, score-0.834]
87 Allowing multiple regionlets consistently improves the object detection accuracy for each class and pushes the number to 41. [sent-456, score-0.84]
88 Regionlets-S: our regionlets approach with a single regionlet per region. [sent-469, score-1.256]
89 Regionlets-M: our regionlets approach allowing for multiple regionlets per region. [sent-470, score-1.456]
90 aero bike bird boat bottle bus car cat chair cow table dog horse mbike person plant sheep sofa train tv mAP DPM [12]133. [sent-472, score-0.333]
91 aero bike bird boat bottle bus car cat chair cow table dog horse mbike person plant sheep sofa train tv mAP DPM [12] 45. [sent-578, score-0.333]
92 7 Figure 5: Statistics of number of regionlets used for each class. [sent-641, score-0.728]
93 prefer more regionlets than rigid objects like bicycle, bus, diningtable, motorbike, sofa and train. [sent-644, score-0.851]
94 An interesting yet consistent phenomenon has been observed for rigid objects like aeroplane and tvmonitor, as in the comparison of [12] and [25]: our algorithm selects even more regionlets than those for other deformable objects. [sent-645, score-0.905]
95 We speculate the regionlets in these two cases may help to handle misalignment due to multiple viewpoints and sub-categories. [sent-646, score-0.728]
96 Our regionlet approach again achieves the best mean average precision. [sent-655, score-0.528]
97 Due to the regionlets representation and enforced spatial layout learning, our proposed approach performs perfectly in both cases. [sent-681, score-0.753]
98 Regionlets provide a radically different way to model object deformation compared to existing BoW approaches with selective search and DPM approaches. [sent-686, score-0.244]
99 Our regionlet model can well adapt itself for detecting rigid objects, objects with small local deformations as well as long-range deformations. [sent-687, score-0.672]
100 Validated on the chal- lenging PASCAL VOC datasets and ImageNet object detection dataset, the proposed regionlet approach demonstrates superior performance compared to the existing approaches. [sent-688, score-0.64]
wordName wordTfidf (topN-words)
[('regionlets', 0.728), ('regionlet', 0.528), ('bounding', 0.101), ('dpm', 0.092), ('region', 0.089), ('boosting', 0.088), ('selective', 0.087), ('candidate', 0.077), ('deformation', 0.077), ('voc', 0.075), ('rigid', 0.071), ('boxes', 0.068), ('detection', 0.067), ('deformable', 0.06), ('cascaded', 0.059), ('sliding', 0.05), ('aspect', 0.05), ('deformations', 0.048), ('pool', 0.048), ('box', 0.045), ('object', 0.045), ('pascal', 0.044), ('exhaustive', 0.043), ('window', 0.043), ('imagenet', 0.042), ('weak', 0.042), ('prototypes', 0.04), ('lbp', 0.04), ('extraction', 0.039), ('round', 0.038), ('feature', 0.038), ('spm', 0.036), ('search', 0.035), ('tolerate', 0.035), ('bus', 0.034), ('rj', 0.034), ('sheep', 0.033), ('ratios', 0.032), ('inside', 0.031), ('cov', 0.03), ('millions', 0.03), ('enumerating', 0.029), ('generic', 0.029), ('cascade', 0.029), ('mbike', 0.029), ('tvmonitor', 0.029), ('sizes', 0.028), ('windows', 0.027), ('sofa', 0.027), ('cow', 0.027), ('classifier', 0.026), ('layouts', 0.026), ('aggregated', 0.026), ('bow', 0.025), ('groups', 0.025), ('spatial', 0.025), ('hog', 0.025), ('objects', 0.025), ('height', 0.025), ('variations', 0.025), ('cat', 0.024), ('flexible', 0.024), ('hand', 0.024), ('person', 0.024), ('vo', 0.024), ('width', 0.024), ('aero', 0.023), ('positions', 0.023), ('objectness', 0.023), ('locations', 0.022), ('arts', 0.022), ('prototype', 0.022), ('horse', 0.022), ('orange', 0.022), ('regions', 0.021), ('oth', 0.021), ('latest', 0.021), ('aeroplane', 0.021), ('baselines', 0.021), ('operation', 0.021), ('entry', 0.02), ('nr', 0.02), ('categories', 0.019), ('classifiers', 0.019), ('anchor', 0.019), ('descriptive', 0.019), ('group', 0.019), ('dog', 0.019), ('bike', 0.018), ('proposing', 0.018), ('rectangle', 0.018), ('validated', 0.018), ('bird', 0.018), ('relative', 0.018), ('largely', 0.018), ('boat', 0.018), ('tht', 0.017), ('precision', 0.017), ('bottle', 0.017), ('lookup', 0.017)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999988 349 iccv-2013-Regionlets for Generic Object Detection
Author: Xiaoyu Wang, Ming Yang, Shenghuo Zhu, Yuanqing Lin
Abstract: Generic object detection is confronted by dealing with different degrees of variations in distinct object classes with tractable computations, which demands for descriptive and flexible object representations that are also efficient to evaluate for many locations. In view of this, we propose to model an object class by a cascaded boosting classifier which integrates various types of features from competing local regions, named as regionlets. A regionlet is a base feature extraction region defined proportionally to a detection window at an arbitrary resolution (i.e. size and aspect ratio). These regionlets are organized in small groups with stable relative positions to delineate fine-grained spatial layouts inside objects. Their features are aggregated to a one-dimensional feature within one group so as to tolerate deformations. Then we evaluate the object bounding box proposal in selective search from segmentation cues, limiting the evaluation locations to thousands. Our approach significantly outperforms the state-of-the-art on popular multi-class detection benchmark datasets with a single method, without any contexts. It achieves the detec- tion mean average precision of 41. 7% on the PASCAL VOC 2007 dataset and 39. 7% on the VOC 2010 for 20 object categories. It achieves 14. 7% mean average precision on the ImageNet dataset for 200 object categories, outperforming the latest deformable part-based model (DPM) by 4. 7%.
2 0.11947414 377 iccv-2013-Segmentation Driven Object Detection with Fisher Vectors
Author: Ramazan Gokberk Cinbis, Jakob Verbeek, Cordelia Schmid
Abstract: We present an object detection system based on the Fisher vector (FV) image representation computed over SIFT and color descriptors. For computational and storage efficiency, we use a recent segmentation-based method to generate class-independent object detection hypotheses, in combination with data compression techniques. Our main contribution is a method to produce tentative object segmentation masks to suppress background clutter in the features. Re-weighting the local image features based on these masks is shown to improve object detection significantly. We also exploit contextual features in the form of a full-image FV descriptor, and an inter-category rescoring mechanism. Our experiments on the PASCAL VOC 2007 and 2010 datasets show that our detector improves over the current state-of-the-art detection results.
3 0.11758169 379 iccv-2013-Semantic Segmentation without Annotating Segments
Author: Wei Xia, Csaba Domokos, Jian Dong, Loong-Fah Cheong, Shuicheng Yan
Abstract: Numerous existing object segmentation frameworks commonly utilize the object bounding box as a prior. In this paper, we address semantic segmentation assuming that object bounding boxes are provided by object detectors, but no training data with annotated segments are available. Based on a set of segment hypotheses, we introduce a simple voting scheme to estimate shape guidance for each bounding box. The derived shape guidance is used in the subsequent graph-cut-based figure-ground segmentation. The final segmentation result is obtained by merging the segmentation results in the bounding boxes. We conduct an extensive analysis of the effect of object bounding box accuracy. Comprehensive experiments on both the challenging PASCAL VOC object segmentation dataset and GrabCut50 image segmentation dataset show that the proposed approach achieves competitive results compared to previous detection or bounding box prior based methods, as well as other state-of-the-art semantic segmentation methods.
4 0.097984008 107 iccv-2013-Deformable Part Descriptors for Fine-Grained Recognition and Attribute Prediction
Author: Ning Zhang, Ryan Farrell, Forrest Iandola, Trevor Darrell
Abstract: Recognizing objects in fine-grained domains can be extremely challenging due to the subtle differences between subcategories. Discriminative markings are often highly localized, leading traditional object recognition approaches to struggle with the large pose variation often present in these domains. Pose-normalization seeks to align training exemplars, either piecewise by part or globally for the whole object, effectively factoring out differences in pose and in viewing angle. Prior approaches relied on computationally-expensive filter ensembles for part localization and required extensive supervision. This paper proposes two pose-normalized descriptors based on computationally-efficient deformable part models. The first leverages the semantics inherent in strongly-supervised DPM parts. The second exploits weak semantic annotations to learn cross-component correspondences, computing pose-normalized descriptors from the latent parts of a weakly-supervised DPM. These representations enable pooling across pose and viewpoint, in turn facilitating tasks such as fine-grained recognition and attribute prediction. Experiments conducted on the Caltech-UCSD Birds 200 dataset and Berkeley Human Attribute dataset demonstrate significant improvements over state-of-art algorithms.
5 0.094666429 327 iccv-2013-Predicting an Object Location Using a Global Image Representation
Author: Jose A. Rodriguez Serrano, Diane Larlus
Abstract: We tackle the detection of prominent objects in images as a retrieval task: given a global image descriptor, we find the most similar images in an annotated dataset, and transfer the object bounding boxes. We refer to this approach as data driven detection (DDD), that is an alternative to sliding windows. Previous works have used similar notions but with task-independent similarities and representations, i.e. they were not tailored to the end-goal of localization. This article proposes two contributions: (i) a metric learning algorithm and (ii) a representation of images as object probability maps, that are both optimized for detection. We show experimentally that these two contributions are crucial to DDD, do not require costly additional operations, and in some cases yield comparable or better results than state-of-the-art detectors despite conceptual simplicity and increased speed. As an application of prominent object detection, we improve fine-grained categorization by precropping images with the proposed approach.
6 0.085627191 104 iccv-2013-Decomposing Bag of Words Histograms
7 0.080727018 424 iccv-2013-Tracking Revisited Using RGBD Camera: Unified Benchmark and Baselines
8 0.074758284 111 iccv-2013-Detecting Dynamic Objects with Multi-view Background Subtraction
9 0.072030067 269 iccv-2013-Modeling Occlusion by Discriminative AND-OR Structures
10 0.069884941 426 iccv-2013-Training Deformable Part Models with Decorrelated Features
11 0.069514215 179 iccv-2013-From Subcategories to Visual Composites: A Multi-level Framework for Object Detection
12 0.069397956 338 iccv-2013-Randomized Ensemble Tracking
13 0.069189087 190 iccv-2013-Handling Occlusions with Franken-Classifiers
14 0.067706764 411 iccv-2013-Symbiotic Segmentation and Part Localization for Fine-Grained Categorization
15 0.06699159 204 iccv-2013-Human Attribute Recognition by Rich Appearance Dictionary
16 0.065773226 79 iccv-2013-Coherent Object Detection with 3D Geometric Context from a Single Image
17 0.061753009 66 iccv-2013-Building Part-Based Object Detectors via 3D Geometry
18 0.061651371 279 iccv-2013-Multi-stage Contextual Deep Learning for Pedestrian Detection
19 0.060795303 220 iccv-2013-Joint Deep Learning for Pedestrian Detection
20 0.059708633 201 iccv-2013-Holistic Scene Understanding for 3D Object Detection with RGBD Cameras
topicId topicWeight
[(0, 0.132), (1, 0.017), (2, 0.02), (3, -0.044), (4, 0.077), (5, -0.038), (6, -0.035), (7, 0.045), (8, -0.062), (9, -0.052), (10, 0.047), (11, 0.017), (12, -0.006), (13, -0.07), (14, 0.007), (15, -0.054), (16, 0.036), (17, 0.082), (18, 0.026), (19, -0.001), (20, -0.031), (21, 0.05), (22, -0.042), (23, 0.03), (24, -0.019), (25, 0.023), (26, -0.021), (27, 0.004), (28, -0.004), (29, -0.004), (30, -0.038), (31, -0.019), (32, -0.046), (33, 0.008), (34, 0.004), (35, -0.022), (36, 0.056), (37, 0.014), (38, -0.002), (39, 0.032), (40, 0.05), (41, -0.006), (42, -0.004), (43, 0.03), (44, 0.068), (45, -0.049), (46, 0.102), (47, -0.009), (48, 0.045), (49, 0.023)]
simIndex simValue paperId paperTitle
same-paper 1 0.92173558 349 iccv-2013-Regionlets for Generic Object Detection
Author: Xiaoyu Wang, Ming Yang, Shenghuo Zhu, Yuanqing Lin
Abstract: Generic object detection is confronted by dealing with different degrees of variations in distinct object classes with tractable computations, which demands for descriptive and flexible object representations that are also efficient to evaluate for many locations. In view of this, we propose to model an object class by a cascaded boosting classifier which integrates various types of features from competing local regions, named as regionlets. A regionlet is a base feature extraction region defined proportionally to a detection window at an arbitrary resolution (i.e. size and aspect ratio). These regionlets are organized in small groups with stable relative positions to delineate fine-grained spatial layouts inside objects. Their features are aggregated to a one-dimensional feature within one group so as to tolerate deformations. Then we evaluate the object bounding box proposal in selective search from segmentation cues, limiting the evaluation locations to thousands. Our approach significantly outperforms the state-of-the-art on popular multi-class detection benchmark datasets with a single method, without any contexts. It achieves the detec- tion mean average precision of 41. 7% on the PASCAL VOC 2007 dataset and 39. 7% on the VOC 2010 for 20 object categories. It achieves 14. 7% mean average precision on the ImageNet dataset for 200 object categories, outperforming the latest deformable part-based model (DPM) by 4. 7%.
2 0.79882312 109 iccv-2013-Detecting Avocados to Zucchinis: What Have We Done, and Where Are We Going?
Author: Olga Russakovsky, Jia Deng, Zhiheng Huang, Alexander C. Berg, Li Fei-Fei
Abstract: The growth of detection datasets and the multiple directions of object detection research provide both an unprecedented need and a great opportunity for a thorough evaluation of the current state of the field of categorical object detection. In this paper we strive to answer two key questions. First, where are we currently as a field: what have we done right, what still needs to be improved? Second, where should we be going in designing the next generation of object detectors? Inspired by the recent work of Hoiem et al. [10] on the standard PASCAL VOC detection dataset, we perform a large-scale study on the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) data. First, we quantitatively demonstrate that this dataset provides many of the same detection challenges as the PASCAL VOC. Due to its scale of 1000 object categories, ILSVRC also provides an excellent testbed for understanding the performance of detectors as a function of several key properties of the object classes. We conduct a series of analyses looking at how different detection methods perform on a number of imagelevel and object-class-levelproperties such as texture, color, deformation, and clutter. We learn important lessons of the current object detection methods and propose a number of insights for designing the next generation object detectors.
3 0.76113802 327 iccv-2013-Predicting an Object Location Using a Global Image Representation
Author: Jose A. Rodriguez Serrano, Diane Larlus
Abstract: We tackle the detection of prominent objects in images as a retrieval task: given a global image descriptor, we find the most similar images in an annotated dataset, and transfer the object bounding boxes. We refer to this approach as data driven detection (DDD), that is an alternative to sliding windows. Previous works have used similar notions but with task-independent similarities and representations, i.e. they were not tailored to the end-goal of localization. This article proposes two contributions: (i) a metric learning algorithm and (ii) a representation of images as object probability maps, that are both optimized for detection. We show experimentally that these two contributions are crucial to DDD, do not require costly additional operations, and in some cases yield comparable or better results than state-of-the-art detectors despite conceptual simplicity and increased speed. As an application of prominent object detection, we improve fine-grained categorization by precropping images with the proposed approach.
4 0.75531238 377 iccv-2013-Segmentation Driven Object Detection with Fisher Vectors
Author: Ramazan Gokberk Cinbis, Jakob Verbeek, Cordelia Schmid
Abstract: We present an object detection system based on the Fisher vector (FV) image representation computed over SIFT and color descriptors. For computational and storage efficiency, we use a recent segmentation-based method to generate class-independent object detection hypotheses, in combination with data compression techniques. Our main contribution is a method to produce tentative object segmentation masks to suppress background clutter in the features. Re-weighting the local image features based on these masks is shown to improve object detection significantly. We also exploit contextual features in the form of a full-image FV descriptor, and an inter-category rescoring mechanism. Our experiments on the PASCAL VOC 2007 and 2010 datasets show that our detector improves over the current state-of-the-art detection results.
5 0.70759892 104 iccv-2013-Decomposing Bag of Words Histograms
Author: Ankit Gandhi, Karteek Alahari, C.V. Jawahar
Abstract: We aim to decompose a global histogram representation of an image into histograms of its associated objects and regions. This task is formulated as an optimization problem, given a set of linear classifiers, which can effectively discriminate the object categories present in the image. Our decomposition bypasses harder problems associated with accurately localizing and segmenting objects. We evaluate our method on a wide variety of composite histograms, and also compare it with MRF-based solutions. In addition to merely measuring the accuracy of decomposition, we also show the utility of the estimated object and background histograms for the task of image classification on the PASCAL VOC 2007 dataset.
6 0.67505974 189 iccv-2013-HOGgles: Visualizing Object Detection Features
7 0.67131513 179 iccv-2013-From Subcategories to Visual Composites: A Multi-level Framework for Object Detection
8 0.65446955 190 iccv-2013-Handling Occlusions with Franken-Classifiers
9 0.65377825 379 iccv-2013-Semantic Segmentation without Annotating Segments
10 0.63682622 338 iccv-2013-Randomized Ensemble Tracking
11 0.63125825 136 iccv-2013-Efficient Pedestrian Detection by Directly Optimizing the Partial Area under the ROC Curve
12 0.62637484 426 iccv-2013-Training Deformable Part Models with Decorrelated Features
13 0.62445819 285 iccv-2013-NEIL: Extracting Visual Knowledge from Web Data
14 0.62125748 269 iccv-2013-Modeling Occlusion by Discriminative AND-OR Structures
15 0.61768419 390 iccv-2013-Shufflets: Shared Mid-level Parts for Fast Object Detection
16 0.61721122 75 iccv-2013-CoDeL: A Human Co-detection and Labeling Framework
17 0.60780543 411 iccv-2013-Symbiotic Segmentation and Part Localization for Fine-Grained Categorization
18 0.60418165 107 iccv-2013-Deformable Part Descriptors for Fine-Grained Recognition and Attribute Prediction
19 0.60388309 61 iccv-2013-Beyond Hard Negative Mining: Efficient Detector Learning via Block-Circulant Decomposition
20 0.59858125 111 iccv-2013-Detecting Dynamic Objects with Multi-view Background Subtraction
topicId topicWeight
[(2, 0.063), (4, 0.015), (7, 0.026), (9, 0.158), (12, 0.025), (13, 0.031), (26, 0.088), (31, 0.075), (34, 0.012), (35, 0.011), (42, 0.104), (48, 0.016), (64, 0.05), (73, 0.049), (78, 0.021), (89, 0.129), (98, 0.011)]
simIndex simValue paperId paperTitle
same-paper 1 0.81674099 349 iccv-2013-Regionlets for Generic Object Detection
Author: Xiaoyu Wang, Ming Yang, Shenghuo Zhu, Yuanqing Lin
Abstract: Generic object detection is confronted by dealing with different degrees of variations in distinct object classes with tractable computations, which demands for descriptive and flexible object representations that are also efficient to evaluate for many locations. In view of this, we propose to model an object class by a cascaded boosting classifier which integrates various types of features from competing local regions, named as regionlets. A regionlet is a base feature extraction region defined proportionally to a detection window at an arbitrary resolution (i.e. size and aspect ratio). These regionlets are organized in small groups with stable relative positions to delineate fine-grained spatial layouts inside objects. Their features are aggregated to a one-dimensional feature within one group so as to tolerate deformations. Then we evaluate the object bounding box proposal in selective search from segmentation cues, limiting the evaluation locations to thousands. Our approach significantly outperforms the state-of-the-art on popular multi-class detection benchmark datasets with a single method, without any contexts. It achieves the detec- tion mean average precision of 41. 7% on the PASCAL VOC 2007 dataset and 39. 7% on the VOC 2010 for 20 object categories. It achieves 14. 7% mean average precision on the ImageNet dataset for 200 object categories, outperforming the latest deformable part-based model (DPM) by 4. 7%.
2 0.78106952 180 iccv-2013-From Where and How to What We See
Author: S. Karthikeyan, Vignesh Jagadeesh, Renuka Shenoy, Miguel Ecksteinz, B.S. Manjunath
Abstract: Eye movement studies have confirmed that overt attention is highly biased towards faces and text regions in images. In this paper we explore a novel problem of predicting face and text regions in images using eye tracking data from multiple subjects. The problem is challenging as we aim to predict the semantics (face/text/background) only from eye tracking data without utilizing any image information. The proposed algorithm spatially clusters eye tracking data obtained in an image into different coherent groups and subsequently models the likelihood of the clusters containing faces and text using afully connectedMarkov Random Field (MRF). Given the eye tracking datafrom a test image, itpredicts potential face/head (humans, dogs and cats) and text locations reliably. Furthermore, the approach can be used to select regions of interest for further analysis by object detectors for faces and text. The hybrid eye position/object detector approach achieves better detection performance and reduced computation time compared to using only the object detection algorithm. We also present a new eye tracking dataset on 300 images selected from ICDAR, Street-view, Flickr and Oxford-IIIT Pet Dataset from 15 subjects.
3 0.78072512 89 iccv-2013-Constructing Adaptive Complex Cells for Robust Visual Tracking
Author: Dapeng Chen, Zejian Yuan, Yang Wu, Geng Zhang, Nanning Zheng
Abstract: Representation is a fundamental problem in object tracking. Conventional methods track the target by describing its local or global appearance. In this paper we present that, besides the two paradigms, the composition of local region histograms can also provide diverse and important object cues. We use cells to extract local appearance, and construct complex cells to integrate the information from cells. With different spatial arrangements of cells, complex cells can explore various contextual information at multiple scales, which is important to improve the tracking performance. We also develop a novel template-matching algorithm for object tracking, where the template is composed of temporal varying cells and has two layers to capture the target and background appearance respectively. An adaptive weight is associated with each complex cell to cope with occlusion as well as appearance variation. A fusion weight is associated with each complex cell type to preserve the global distinctiveness. Our algorithm is evaluated on 25 challenging sequences, and the results not only confirm the contribution of each component in our tracking system, but also outperform other competing trackers.
4 0.77172875 379 iccv-2013-Semantic Segmentation without Annotating Segments
Author: Wei Xia, Csaba Domokos, Jian Dong, Loong-Fah Cheong, Shuicheng Yan
Abstract: Numerous existing object segmentation frameworks commonly utilize the object bounding box as a prior. In this paper, we address semantic segmentation assuming that object bounding boxes are provided by object detectors, but no training data with annotated segments are available. Based on a set of segment hypotheses, we introduce a simple voting scheme to estimate shape guidance for each bounding box. The derived shape guidance is used in the subsequent graph-cut-based figure-ground segmentation. The final segmentation result is obtained by merging the segmentation results in the bounding boxes. We conduct an extensive analysis of the effect of object bounding box accuracy. Comprehensive experiments on both the challenging PASCAL VOC object segmentation dataset and GrabCut50 image segmentation dataset show that the proposed approach achieves competitive results compared to previous detection or bounding box prior based methods, as well as other state-of-the-art semantic segmentation methods.
5 0.77064836 376 iccv-2013-Scene Text Localization and Recognition with Oriented Stroke Detection
Author: Lukáš Neumann, Jiri Matas
Abstract: An unconstrained end-to-end text localization and recognition method is presented. The method introduces a novel approach for character detection and recognition which combines the advantages of sliding-window and connected component methods. Characters are detected and recognized as image regions which contain strokes of specific orientations in a specific relative position, where the strokes are efficiently detected by convolving the image gradient field with a set of oriented bar filters. Additionally, a novel character representation efficiently calculated from the values obtained in the stroke detection phase is introduced. The representation is robust to shift at the stroke level, which makes it less sensitive to intra-class variations and the noise induced by normalizing character size and positioning. The effectiveness of the representation is demonstrated by the results achieved in the classification of real-world characters using an euclidian nearestneighbor classifier trained on synthetic data in a plain form. The method was evaluated on a standard dataset, where it achieves state-of-the-art results in both text localization and recognition.
6 0.76959628 150 iccv-2013-Exemplar Cut
7 0.76776564 156 iccv-2013-Fast Direct Super-Resolution by Simple Functions
8 0.76769423 137 iccv-2013-Efficient Salient Region Detection with Soft Image Abstraction
9 0.76623517 328 iccv-2013-Probabilistic Elastic Part Model for Unsupervised Face Detector Adaptation
10 0.76605183 427 iccv-2013-Transfer Feature Learning with Joint Distribution Adaptation
11 0.7648524 80 iccv-2013-Collaborative Active Learning of a Kernel Machine Ensemble for Recognition
12 0.76473665 73 iccv-2013-Class-Specific Simplex-Latent Dirichlet Allocation for Image Classification
13 0.76208138 95 iccv-2013-Cosegmentation and Cosketch by Unsupervised Learning
14 0.76080102 59 iccv-2013-Bayesian Joint Topic Modelling for Weakly Supervised Object Localisation
15 0.76029116 384 iccv-2013-Semi-supervised Robust Dictionary Learning via Efficient l-Norms Minimization
16 0.76027215 326 iccv-2013-Predicting Sufficient Annotation Strength for Interactive Foreground Segmentation
17 0.76008391 277 iccv-2013-Multi-channel Correlation Filters
18 0.75728643 315 iccv-2013-PhotoOCR: Reading Text in Uncontrolled Conditions
19 0.75667953 188 iccv-2013-Group Sparsity and Geometry Constrained Dictionary Learning for Action Recognition from Depth Maps
20 0.7563563 259 iccv-2013-Manifold Based Face Synthesis from Sparse Samples