iccv iccv2013 iccv2013-169 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: E. Gavves, B. Fernando, C.G.M. Snoek, A.W.M. Smeulders, T. Tuytelaars
Abstract: The aim of this paper is fine-grained categorization without human interaction. Different from prior work, which relies on detectors for specific object parts, we propose to localize distinctive details by roughly aligning the objects using just the overall shape, since implicit to fine-grained categorization is the existence of a super-class shape shared among all classes. The alignments are then used to transfer part annotations from training images to test images (supervised alignment), or to blindly yet consistently segment the object in a number of regions (unsupervised alignment). We furthermore argue that in the distinction of finegrained sub-categories, classification-oriented encodings like Fisher vectors are better suited for describing localized information than popular matching oriented features like HOG. We evaluate the method on the CU-2011 Birds and Stanford Dogs fine-grained datasets, outperforming the state-of-the-art.
Reference: text
sentIndex sentText sentNum sentScore
1 Different from prior work, which relies on detectors for specific object parts, we propose to localize distinctive details by roughly aligning the objects using just the overall shape, since implicit to fine-grained categorization is the existence of a super-class shape shared among all classes. [sent-11, score-0.319]
2 The alignments are then used to transfer part annotations from training images to test images (supervised alignment), or to blindly yet consistently segment the object in a number of regions (unsupervised alignment). [sent-12, score-0.862]
3 We furthermore argue that in the distinction of finegrained sub-categories, classification-oriented encodings like Fisher vectors are better suited for describing localized information than popular matching oriented features like HOG. [sent-13, score-0.328]
4 Introduction Fine-grained categorization relies on identifying the subtle differences in appearance of specific object parts. [sent-16, score-0.176]
5 Humans learn to distinguish different types of birds by addressing the differences in specific details. [sent-18, score-0.255]
6 Based on example images like these, fine-grained categorization tries to answer the question: what fine-grained bird category do we have in the third image? [sent-25, score-0.194]
7 Rather than directly trying to localize parts (be it distinctive or intrinsic), we show in this paper that better results can be obtained if one first tries to align the birds based on their global shape, ignoring the actual bird categories. [sent-26, score-0.54]
8 Yet, it remains unclear what is the most critical aspect of “parts” in a fine-grained categorization context: is it the ability to accurately localize corresponding locations over object instances, or simply the ability to capture detailed information? [sent-29, score-0.282]
9 We argue that a very precise “part” localization is not necessary and rough alignments suffice, as long as one manages to capture the finegrained details in the appearance. [sent-31, score-0.823]
10 Parts may be divided in intrinsic parts [3, 16] such as the head of a dog or the body of a bird, and distinctive parts [32, 31] specific to few sub-categories. [sent-32, score-0.341]
11 Recovering intrinsic parts implies that such parts are seen throughout the whole dataset. [sent-33, score-0.261]
12 Furthermore, rough alignment is not sub-category specific, thus the object representation becomes independent of the number of classes or training images [33, 32]. [sent-42, score-0.204]
13 In the unsupervised case, we use alignments to delineate corresponding object regions that we will use in the differential classification. [sent-47, score-0.857]
14 In contrast to the raw SIFT or template features preferred in the fine-grained literature [16, 3 1, 32], such localized feature encodings are less sensitive to misalignments. [sent-51, score-0.172]
15 We present two methods for recovering alignments that require varying levels of part supervision during training. [sent-53, score-0.783]
16 The results vouch for unsupervised alignments, which outperform previous published results. [sent-55, score-0.152]
17 Different from the above works, we propose to use strong classification- and not matchingoriented, encodings to describe the alignment parts and regions. [sent-66, score-0.29]
18 However, we use Fisher vectors not only as global, object level representations, but also as localized appearance descriptors. [sent-71, score-0.197]
19 The detection of objects in a fine-grained categorization setting ranges from the segmentation of the object of interest [19, 5, 6] to fitting ellipsoids [10] and detecting individual parts and templates [33, 34, 32, 3 1, 16]. [sent-73, score-0.348]
20 propose to use deformable part models to detect the head of cats and dogs and in [1] Berg and Belhumeur learn discriminative parts from pairwise comparisons between classes. [sent-85, score-0.293]
21 propose to share parts between classes to arrive at accurate part localization. [sent-87, score-0.284]
22 Based on this alignment, we then derive a small number of predicted parts (supervised) or regions (unsupervised). [sent-89, score-0.171]
23 The computation of the segmentation mask can be accurate as in the left, ok as in the middle or completely fail as in the right image. [sent-96, score-0.194]
24 We say an image is aligned with other images if we have identified a local frame of reference in the image that is consistent with (a subset of) the frames of reference found in other images. [sent-113, score-0.149]
25 As is common in fine-grained categorization [33, 32, 3 1], we have available both at training and at test time the bounding box locations of the object of interest. [sent-115, score-0.368]
26 , all birds are usually either on trees or flying in the sky. [sent-119, score-0.255]
27 The rectangular bounding box around an object allows for extracting important information, such as the approximate shape of the object. [sent-120, score-0.214]
28 More specifically, we use GrabCut [25] on the bounding box to compute an accurate figure- ground segmentation. [sent-121, score-0.166]
29 Supervised alignments In the supervised scenario the ground truth locations of basic object parts, such as the beak or the tail of the birds, are available in the training set. [sent-126, score-1.102]
30 Then, we can use the common frame of reference to predict the part locations in the test image. [sent-129, score-0.242]
31 Our first goal is to retrieve a small number of training pictures that have a similar shape as the object in the test image. [sent-130, score-0.154]
32 To this end, we first obtain the segmentation mask of the object as described before. [sent-132, score-0.158]
33 We are now in position to use the ground truth locations of the parts in the training images and predict the corresponding locations in the test image. [sent-143, score-0.421]
34 To calculate the positions of the same parts on the test image, one may apply several methods of varying sophistication, ranging from simple average pooling of part locations to local, independent optimization of parts based on HOG convolutions. [sent-144, score-0.444]
35 To ensure maximum compatibility we repeat the above procedure for all training and testing images in the dataset, thus predicting part locations for all the objects in the dataset. [sent-146, score-0.198]
36 On the right, we have the nearest neighbor training images, their ground truth part locations and their HOG shape representations, based on which they were retrieved. [sent-149, score-0.31]
37 Unsupervised alignments In the unsupervised scenario no ground truth information of the training part locations is available. [sent-154, score-1.062]
38 However, we still have the bounding box that surrounds the object, based on which we can derive a shape mask per object. [sent-155, score-0.264]
39 Since no ground truth part locations are available, it does not make sense to align the test image to a small subset of training images. [sent-156, score-0.264]
40 While not as accurate as the alignments in the previous subsection, this procedure allows us to obtain robust and consistent alignments over the entire database. [sent-158, score-1.426]
41 More specifically, we fit an ellipse to the pixels X of the segmentation mask and compute the local 2-d geometry in the form of the two principal axes aj = ¯x + ej? [sent-159, score-0.262]
42 To this end we extract the principal axes using all the foreground pixels of the shape mask. [sent-166, score-0.181]
43 For objects that have an elliptical shape the longer axis is usually the principal axis. [sent-167, score-0.158]
44 We therefore decide not to use the ancillary axis in the generation of consistent regions. [sent-170, score-0.155]
45 Relative to this frame of reference, we can define different locations or regions at will. [sent-173, score-0.156]
46 Here, we divide the principal axis equally from the origin to the end in a fixed number of segments, and define regions as the part of the foreground mask that falls within one such segment. [sent-174, score-0.384]
47 Given accurate segmentation masks, the corresponding locations in different fine-grained objects are visited in the same order, thus resulting in pose-normalized representations, see Fig. [sent-175, score-0.178]
48 Final Image Representation Our alignments are designed to be rough. [sent-180, score-0.676]
49 Note that although the second approach is theoretically more accurate in capturing only the object appearance details, at the same time it might either include background pixels or omit foreground pixels, since segmentation masks are not perfect. [sent-189, score-0.201]
50 After fitting an ellipse, we obtain the two axes in the middle column pictures, the principal green and the ancillary magenta ones. [sent-192, score-0.193]
51 After the gravity vector assumption [22] we assume the origin of the principal axis to be the highest point in the direction of the green arrow. [sent-193, score-0.169]
52 Based on this frame of reference, we split equally in the right column pictures the principal axis to obtain consistent regions. [sent-194, score-0.24]
53 We use the ground truth part annotations only during learning, unless stated otherwise. [sent-205, score-0.164]
54 Fisher vectors are better equipped in describing part appearance than HOG for fine-grained categorization. [sent-222, score-0.208]
55 Matching vs Classification Descriptors In this first experiment we evaluate what are good descriptors for describing parts in a fine-grained categoriza- × tion setting. [sent-225, score-0.22]
56 In order to ensure a fair comparison, as well as to test the maximum recognition capacity of parts for such a task, we use the ground truth part annotations both in training and in testing, as if an oracle algorithm for the part locations was available. [sent-226, score-0.531]
57 If Fisher vectors outperform HOG on perfectly aligned ground truth parts, then we expect this to be the case even more for less accurate parts. [sent-227, score-0.217]
58 In order to avoid a too strong correlation between the parts and also control the dimensionality of the final feature vector we use only the following 7 parts, which cover the bird silhouette: beak, belly, forehead, left wing, right wing, tail and throat. [sent-229, score-0.201]
59 The Fisher vectors from the 7 parts are concatenated with a Fisher vector from the whole bounding box to arrive at the final object representation. [sent-232, score-0.35]
60 Similarly, for the HOG object descriptors we also compute a HOG vector using the bounding box, rescaled to 100 100 pixels. [sent-233, score-0.154]
61 As we see in Table 1, Fisher vectors are much better in describing parts for fine-grained categorization than matching based descriptors like HOG. [sent-234, score-0.408]
62 5 the individual accuracies per class for Fisher vectors and for HOG, noticing that Fisher vectors outperform for 184 out of the 200 sub-categories. [sent-242, score-0.184]
63 In the following experiments we report results using only Fisher vectors for describing the appearance of parts and alignments. [sent-243, score-0.261]
64 Supervised alignments In the second experiment we test whether supervised alignments actually benefit the recognition of fine-grained categories, as compared to a standard classification pipeline. [sent-246, score-1.467]
65 The difference is not that big (2%), but note that for Fisher vector unsupervised alignments no ground truth part locations are required. [sent-251, score-1.033]
66 ×× Part selection [2 2] spatial pyramid kernel Supervised alignment on beak only Supervised alignments Fisher vectors 39. [sent-252, score-0.95]
67 Supervised alignments are more accurate than a spatial pyramid kernel and an alignment based on the beak of a bird only, while being rather close to the theoretical accuracy of the oracle parts in Table 1. [sent-256, score-1.169]
68 Here our supervised alignments use ground truth part annotations only in training. [sent-257, score-0.955]
69 We use the same 7 parts as in the previous experiment plus a Fisher vector extracted from the whole bounding box. [sent-258, score-0.171]
70 We predict their location by averaging the locations of the parts in the top 20 nearest neighbors. [sent-259, score-0.222]
71 We compare our proposed supervised alignment method against a 2 2 spatial pyramid using Fisher vectors computed from all SIFT descriptors in the bounding box. [sent-261, score-0.403]
72 As we observe in Table 2, parts bring an 17% accuracy improvement over a standard spatial pyramid classification approach, since they better capture the little nuances that × differentiate sub-classes that are otherwise visually very similar. [sent-265, score-0.184]
73 Furthermore, we note that extracting Fisher vectors on the supervised alignments is 47. [sent-266, score-0.912]
74 5% obtained when extracting Fisher vectors on the parts provided by the ground truth. [sent-268, score-0.271]
75 This indicates that we capture the part locations well enough for an appearance descriptor like the Fisher vector. [sent-269, score-0.2]
76 In fact, the mean squared error between our estimated parts and the ground truth ones is 12%, after normalizing the respective locations with respect to the bounding box geometry. [sent-270, score-0.378]
77 Our supervised alignments perform consistently better for 141 out of the 200 classes. [sent-273, score-0.791]
78 We conclude that extracting localized information in the form of alignments or parts matters in a fine-grained categorization setting. [sent-274, score-1.004]
79 Unsupervised Alignments In this experiment we compare the unsupervised alignments with the supervised ones. [sent-277, score-0.913]
80 After extracting the principal axis we split the bird mask into four regions, starting from the highest point, considering only the pixels within the segmentation mask. [sent-278, score-0.363]
81 We observe that describing the object based on the unsupervised alignments results in more accurate predictions compared to the supervised case (49. [sent-281, score-1.026]
82 We observe that birds in these sub-classes have consistent appearance. [sent-286, score-0.285]
83 × Part selection Supervised alignments [4 1] spatial pyramid kernel Fisher vector from the foreground mask only Unsupervised alignments Fisher vectors 47. [sent-287, score-1.621]
84 Unsupervised alignments are more accurate than supervised ones, while at the same time requiring no supervision at all. [sent-292, score-0.877]
85 Note that unsupervised alignments use no ground truth part annotations, neither in training nor in testing. [sent-295, score-0.958]
86 We, furthermore, plot the individual accuracy differences per class for supervised and unsupervised alignments in the right picture in Fig. [sent-302, score-0.997]
87 The distribution of classes is split roughly equally for supervised and unsupervised alignments, with unsupervised alignments having slightly larger accuracy differences. [sent-304, score-1.087]
88 We conclude that compared to supervised parts, unsupervised alignments describe the localized appearance of fine-grained objects at least as good, often better. [sent-305, score-0.999]
89 Birds Accuracy Pose pooling kernels [34] Pooling feature learning [12] POOF [1] This paper: Unsupervised alignments 28. [sent-306, score-0.715]
90 Unsupervised alignments with Fisher vectors outperform the state-ofthe-art considerably. [sent-312, score-0.783]
91 Dogs Discriminative Color Descriptors [14] Edge templates [3 1] This paper: Unsupervised alignments Accuracy 28. [sent-313, score-0.731]
92 Unsupervised alignments with Fisher vectors outperform the state-ofthe-art considerably. [sent-318, score-0.783]
93 State-of-the-art comparison In experiment 4, we compare our unsupervised alignments with state-of-the-art methods reported on CU-201 1 Birds and Stanford Dogs. [sent-321, score-0.798]
94 Compared to the very recently published POOF features [1], unsupervised color alignments are 10% more accurate, while not requiring ground truth part annotations. [sent-324, score-0.929]
95 Compared to the pose pooling kernels, unsupervised alignments recognize bird subcategories 84% more accurately. [sent-325, score-0.92]
96 And compared to learned features proposed in [12] unsupervised alignments perform 36. [sent-326, score-0.798]
97 Also for Stanford Dogs we outperform the state-of-the-art, in spite of the larger shape and pose variation among the dogs compared to the birds, see Table 5. [sent-328, score-0.186]
98 6 we plot pictures from four categories for which alignments reach high accuracy, i. [sent-340, score-0.748]
99 We conclude that rough alignments lead to accurate fine-grained categorization. [sent-356, score-0.78]
100 Improving the fisher kernel [24] [25] [26] [27] [28] [29] [30] [3 1] [32] [33] [34] for large-scale image classification. [sent-523, score-0.383]
wordName wordTfidf (topN-words)
[('alignments', 0.676), ('fisher', 0.355), ('birds', 0.255), ('unsupervised', 0.122), ('parts', 0.118), ('encodings', 0.117), ('supervised', 0.115), ('categorization', 0.111), ('dogs', 0.11), ('locations', 0.104), ('mask', 0.094), ('bird', 0.083), ('beak', 0.078), ('hog', 0.077), ('vectors', 0.077), ('stanford', 0.074), ('branson', 0.07), ('descriptors', 0.067), ('ancillary', 0.066), ('part', 0.065), ('rough', 0.06), ('axis', 0.059), ('picture', 0.057), ('templates', 0.055), ('localized', 0.055), ('alignment', 0.055), ('principal', 0.053), ('bounding', 0.053), ('oracle', 0.051), ('distinctive', 0.051), ('chai', 0.05), ('axes', 0.048), ('wah', 0.046), ('reference', 0.046), ('shape', 0.046), ('pictures', 0.045), ('accurate', 0.044), ('grabcut', 0.044), ('aligning', 0.044), ('heermann', 0.044), ('shrikes', 0.044), ('extracting', 0.044), ('farrell', 0.044), ('finegrained', 0.044), ('precise', 0.043), ('supervision', 0.042), ('poof', 0.042), ('perona', 0.039), ('bobolink', 0.039), ('gull', 0.039), ('pooling', 0.039), ('ellipse', 0.037), ('box', 0.037), ('yao', 0.037), ('pyramid', 0.036), ('describing', 0.035), ('truth', 0.034), ('surrounds', 0.034), ('fgvc', 0.034), ('foreground', 0.034), ('object', 0.034), ('annotations', 0.033), ('localize', 0.033), ('ground', 0.032), ('appearance', 0.031), ('sanchez', 0.031), ('arriving', 0.031), ('arrive', 0.031), ('segmentation', 0.03), ('nuances', 0.03), ('nilsback', 0.03), ('amsterdam', 0.03), ('outperform', 0.03), ('consistent', 0.03), ('belong', 0.029), ('gravity', 0.029), ('training', 0.029), ('dog', 0.029), ('kernel', 0.028), ('wing', 0.028), ('parkhi', 0.028), ('predicted', 0.028), ('masks', 0.028), ('origin', 0.028), ('frame', 0.027), ('impression', 0.027), ('nameable', 0.027), ('plot', 0.027), ('classes', 0.026), ('user', 0.026), ('equally', 0.026), ('sift', 0.026), ('middle', 0.026), ('attributes', 0.026), ('interactive', 0.026), ('intrinsic', 0.025), ('parikh', 0.025), ('regions', 0.025), ('zisserman', 0.025), ('novelty', 0.025)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999958 169 iccv-2013-Fine-Grained Categorization by Alignments
Author: E. Gavves, B. Fernando, C.G.M. Snoek, A.W.M. Smeulders, T. Tuytelaars
Abstract: The aim of this paper is fine-grained categorization without human interaction. Different from prior work, which relies on detectors for specific object parts, we propose to localize distinctive details by roughly aligning the objects using just the overall shape, since implicit to fine-grained categorization is the existence of a super-class shape shared among all classes. The alignments are then used to transfer part annotations from training images to test images (supervised alignment), or to blindly yet consistently segment the object in a number of regions (unsupervised alignment). We furthermore argue that in the distinction of finegrained sub-categories, classification-oriented encodings like Fisher vectors are better suited for describing localized information than popular matching oriented features like HOG. We evaluate the method on the CU-2011 Birds and Stanford Dogs fine-grained datasets, outperforming the state-of-the-art.
2 0.21820959 411 iccv-2013-Symbiotic Segmentation and Part Localization for Fine-Grained Categorization
Author: Yuning Chai, Victor Lempitsky, Andrew Zisserman
Abstract: We propose a new method for the task of fine-grained visual categorization. The method builds a model of the baselevel category that can be fitted to images, producing highquality foreground segmentation and mid-level part localizations. The model can be learnt from the typical datasets available for fine-grained categorization, where the only annotation provided is a loose bounding box around the instance (e.g. bird) in each image. Both segmentation and part localizations are then used to encode the image content into a highly-discriminative visual signature. The model is symbiotic in that part discovery/localization is helped by segmentation and, conversely, the segmentation is helped by the detection (e.g. part layout). Our model builds on top of the part-based object category detector of Felzenszwalb et al., and also on the powerful GrabCut segmentation algorithm of Rother et al., and adds a simple spatial saliency coupling between them. In our evaluation, the model improves the categorization accuracy over the state-of-the-art. It also improves over what can be achieved with an analogous system that runs segmentation and part-localization independently.
3 0.19976221 97 iccv-2013-Coupling Alignments with Recognition for Still-to-Video Face Recognition
Author: Zhiwu Huang, Xiaowei Zhao, Shiguang Shan, Ruiping Wang, Xilin Chen
Abstract: The Still-to-Video (S2V) face recognition systems typically need to match faces in low-quality videos captured under unconstrained conditions against high quality still face images, which is very challenging because of noise, image blur, lowface resolutions, varying headpose, complex lighting, and alignment difficulty. To address the problem, one solution is to select the frames of ‘best quality ’ from videos (hereinafter called quality alignment in this paper). Meanwhile, the faces in the selected frames should also be geometrically aligned to the still faces offline well-aligned in the gallery. In this paper, we discover that the interactions among the three tasks–quality alignment, geometric alignment and face recognition–can benefit from each other, thus should be performed jointly. With this in mind, we propose a Coupling Alignments with Recognition (CAR) method to tightly couple these tasks via low-rank regularized sparse representation in a unified framework. Our method makes the three tasks promote mutually by a joint optimization in an Augmented Lagrange Multiplier routine. Extensive , experiments on two challenging S2V datasets demonstrate that our method outperforms the state-of-the-art methods impressively.
4 0.18899383 107 iccv-2013-Deformable Part Descriptors for Fine-Grained Recognition and Attribute Prediction
Author: Ning Zhang, Ryan Farrell, Forrest Iandola, Trevor Darrell
Abstract: Recognizing objects in fine-grained domains can be extremely challenging due to the subtle differences between subcategories. Discriminative markings are often highly localized, leading traditional object recognition approaches to struggle with the large pose variation often present in these domains. Pose-normalization seeks to align training exemplars, either piecewise by part or globally for the whole object, effectively factoring out differences in pose and in viewing angle. Prior approaches relied on computationally-expensive filter ensembles for part localization and required extensive supervision. This paper proposes two pose-normalized descriptors based on computationally-efficient deformable part models. The first leverages the semantics inherent in strongly-supervised DPM parts. The second exploits weak semantic annotations to learn cross-component correspondences, computing pose-normalized descriptors from the latent parts of a weakly-supervised DPM. These representations enable pooling across pose and viewpoint, in turn facilitating tasks such as fine-grained recognition and attribute prediction. Experiments conducted on the Caltech-UCSD Birds 200 dataset and Berkeley Human Attribute dataset demonstrate significant improvements over state-of-art algorithms.
5 0.16814503 377 iccv-2013-Segmentation Driven Object Detection with Fisher Vectors
Author: Ramazan Gokberk Cinbis, Jakob Verbeek, Cordelia Schmid
Abstract: We present an object detection system based on the Fisher vector (FV) image representation computed over SIFT and color descriptors. For computational and storage efficiency, we use a recent segmentation-based method to generate class-independent object detection hypotheses, in combination with data compression techniques. Our main contribution is a method to produce tentative object segmentation masks to suppress background clutter in the features. Re-weighting the local image features based on these masks is shown to improve object detection significantly. We also exploit contextual features in the form of a full-image FV descriptor, and an inter-category rescoring mechanism. Our experiments on the PASCAL VOC 2007 and 2010 datasets show that our detector improves over the current state-of-the-art detection results.
6 0.16265126 62 iccv-2013-Bird Part Localization Using Exemplar-Based Models with Enforced Pose and Subcategory Consistency
7 0.14733946 198 iccv-2013-Hierarchical Part Matching for Fine-Grained Visual Categorization
8 0.14669605 77 iccv-2013-Codemaps - Segment, Classify and Search Objects Locally
9 0.14330173 202 iccv-2013-How Do You Tell a Blackbird from a Crow?
10 0.14259429 327 iccv-2013-Predicting an Object Location Using a Global Image Representation
11 0.10118464 204 iccv-2013-Human Attribute Recognition by Rich Appearance Dictionary
12 0.10095645 379 iccv-2013-Semantic Segmentation without Annotating Segments
13 0.097732455 39 iccv-2013-Action Recognition with Improved Trajectories
14 0.09261097 236 iccv-2013-Learning Discriminative Part Detectors for Image Classification and Cosegmentation
15 0.086506091 308 iccv-2013-Parsing IKEA Objects: Fine Pose Estimation
16 0.086089253 111 iccv-2013-Detecting Dynamic Objects with Multi-view Background Subtraction
17 0.084584497 121 iccv-2013-Discriminatively Trained Templates for 3D Object Detection: A Real Time Scalable Approach
18 0.080718331 451 iccv-2013-Write a Classifier: Zero-Shot Learning Using Purely Textual Descriptions
19 0.079533383 40 iccv-2013-Action and Event Recognition with Fisher Vectors on a Compact Feature Set
20 0.077103071 326 iccv-2013-Predicting Sufficient Annotation Strength for Interactive Foreground Segmentation
topicId topicWeight
[(0, 0.196), (1, 0.042), (2, 0.015), (3, -0.052), (4, 0.102), (5, -0.023), (6, -0.036), (7, 0.025), (8, -0.041), (9, -0.04), (10, 0.083), (11, 0.074), (12, -0.022), (13, -0.125), (14, -0.084), (15, -0.021), (16, 0.047), (17, 0.036), (18, 0.071), (19, -0.058), (20, 0.076), (21, 0.014), (22, -0.022), (23, 0.051), (24, -0.007), (25, 0.158), (26, -0.047), (27, 0.017), (28, 0.031), (29, 0.04), (30, 0.123), (31, -0.136), (32, -0.049), (33, -0.088), (34, -0.1), (35, -0.011), (36, -0.025), (37, -0.099), (38, 0.038), (39, 0.019), (40, -0.095), (41, -0.022), (42, 0.046), (43, 0.019), (44, -0.071), (45, 0.013), (46, 0.012), (47, 0.009), (48, 0.06), (49, -0.044)]
simIndex simValue paperId paperTitle
same-paper 1 0.92747682 169 iccv-2013-Fine-Grained Categorization by Alignments
Author: E. Gavves, B. Fernando, C.G.M. Snoek, A.W.M. Smeulders, T. Tuytelaars
Abstract: The aim of this paper is fine-grained categorization without human interaction. Different from prior work, which relies on detectors for specific object parts, we propose to localize distinctive details by roughly aligning the objects using just the overall shape, since implicit to fine-grained categorization is the existence of a super-class shape shared among all classes. The alignments are then used to transfer part annotations from training images to test images (supervised alignment), or to blindly yet consistently segment the object in a number of regions (unsupervised alignment). We furthermore argue that in the distinction of finegrained sub-categories, classification-oriented encodings like Fisher vectors are better suited for describing localized information than popular matching oriented features like HOG. We evaluate the method on the CU-2011 Birds and Stanford Dogs fine-grained datasets, outperforming the state-of-the-art.
2 0.8216325 202 iccv-2013-How Do You Tell a Blackbird from a Crow?
Author: Thomas Berg, Peter N. Belhumeur
Abstract: How do you tell a blackbirdfrom a crow? There has been great progress toward automatic methods for visual recognition, including fine-grained visual categorization in which the classes to be distinguished are very similar. In a task such as bird species recognition, automatic recognition systems can now exceed the performance of non-experts – most people are challenged to name a couple dozen bird species, let alone identify them. This leads us to the question, “Can a recognition system show humans what to look for when identifying classes (in this case birds)? ” In the context of fine-grained visual categorization, we show that we can automatically determine which classes are most visually similar, discover what visual features distinguish very similar classes, and illustrate the key features in a way meaningful to humans. Running these methods on a dataset of bird images, we can generate a visual field guide to birds which includes a tree of similarity that displays the similarity relations between all species, pages for each species showing the most similar other species, and pages for each pair of similar species illustrating their differences.
3 0.81855148 198 iccv-2013-Hierarchical Part Matching for Fine-Grained Visual Categorization
Author: Lingxi Xie, Qi Tian, Richang Hong, Shuicheng Yan, Bo Zhang
Abstract: As a special topic in computer vision, , fine-grained visual categorization (FGVC) has been attracting growing attention these years. Different with traditional image classification tasks in which objects have large inter-class variation, the visual concepts in the fine-grained datasets, such as hundreds of bird species, often have very similar semantics. Due to the large inter-class similarity, it is very difficult to classify the objects without locating really discriminative features, therefore it becomes more important for the algorithm to make full use of the part information in order to train a robust model. In this paper, we propose a powerful flowchart named Hierarchical Part Matching (HPM) to cope with finegrained classification tasks. We extend the Bag-of-Features (BoF) model by introducing several novel modules to integrate into image representation, including foreground inference and segmentation, Hierarchical Structure Learn- ing (HSL), and Geometric Phrase Pooling (GPP). We verify in experiments that our algorithm achieves the state-ofthe-art classification accuracy in the Caltech-UCSD-Birds200-2011 dataset by making full use of the ground-truth part annotations.
4 0.75758821 411 iccv-2013-Symbiotic Segmentation and Part Localization for Fine-Grained Categorization
Author: Yuning Chai, Victor Lempitsky, Andrew Zisserman
Abstract: We propose a new method for the task of fine-grained visual categorization. The method builds a model of the baselevel category that can be fitted to images, producing highquality foreground segmentation and mid-level part localizations. The model can be learnt from the typical datasets available for fine-grained categorization, where the only annotation provided is a loose bounding box around the instance (e.g. bird) in each image. Both segmentation and part localizations are then used to encode the image content into a highly-discriminative visual signature. The model is symbiotic in that part discovery/localization is helped by segmentation and, conversely, the segmentation is helped by the detection (e.g. part layout). Our model builds on top of the part-based object category detector of Felzenszwalb et al., and also on the powerful GrabCut segmentation algorithm of Rother et al., and adds a simple spatial saliency coupling between them. In our evaluation, the model improves the categorization accuracy over the state-of-the-art. It also improves over what can be achieved with an analogous system that runs segmentation and part-localization independently.
5 0.75267935 107 iccv-2013-Deformable Part Descriptors for Fine-Grained Recognition and Attribute Prediction
Author: Ning Zhang, Ryan Farrell, Forrest Iandola, Trevor Darrell
Abstract: Recognizing objects in fine-grained domains can be extremely challenging due to the subtle differences between subcategories. Discriminative markings are often highly localized, leading traditional object recognition approaches to struggle with the large pose variation often present in these domains. Pose-normalization seeks to align training exemplars, either piecewise by part or globally for the whole object, effectively factoring out differences in pose and in viewing angle. Prior approaches relied on computationally-expensive filter ensembles for part localization and required extensive supervision. This paper proposes two pose-normalized descriptors based on computationally-efficient deformable part models. The first leverages the semantics inherent in strongly-supervised DPM parts. The second exploits weak semantic annotations to learn cross-component correspondences, computing pose-normalized descriptors from the latent parts of a weakly-supervised DPM. These representations enable pooling across pose and viewpoint, in turn facilitating tasks such as fine-grained recognition and attribute prediction. Experiments conducted on the Caltech-UCSD Birds 200 dataset and Berkeley Human Attribute dataset demonstrate significant improvements over state-of-art algorithms.
6 0.72887188 77 iccv-2013-Codemaps - Segment, Classify and Search Objects Locally
7 0.64527339 327 iccv-2013-Predicting an Object Location Using a Global Image Representation
9 0.61530924 236 iccv-2013-Learning Discriminative Part Detectors for Image Classification and Cosegmentation
10 0.60829836 377 iccv-2013-Segmentation Driven Object Detection with Fisher Vectors
11 0.56755453 179 iccv-2013-From Subcategories to Visual Composites: A Multi-level Framework for Object Detection
12 0.56613904 109 iccv-2013-Detecting Avocados to Zucchinis: What Have We Done, and Where Are We Going?
13 0.55712742 390 iccv-2013-Shufflets: Shared Mid-level Parts for Fast Object Detection
14 0.55676275 104 iccv-2013-Decomposing Bag of Words Histograms
15 0.55582076 74 iccv-2013-Co-segmentation by Composition
16 0.54790545 288 iccv-2013-Nested Shape Descriptors
17 0.53540283 21 iccv-2013-A Method of Perceptual-Based Shape Decomposition
18 0.53093433 379 iccv-2013-Semantic Segmentation without Annotating Segments
19 0.52964848 349 iccv-2013-Regionlets for Generic Object Detection
20 0.52131343 48 iccv-2013-An Adaptive Descriptor Design for Object Recognition in the Wild
topicId topicWeight
[(2, 0.081), (4, 0.014), (7, 0.013), (13, 0.012), (26, 0.115), (31, 0.031), (34, 0.079), (40, 0.028), (42, 0.108), (48, 0.011), (64, 0.049), (66, 0.096), (73, 0.027), (77, 0.017), (89, 0.222)]
simIndex simValue paperId paperTitle
same-paper 1 0.93588805 169 iccv-2013-Fine-Grained Categorization by Alignments
Author: E. Gavves, B. Fernando, C.G.M. Snoek, A.W.M. Smeulders, T. Tuytelaars
Abstract: The aim of this paper is fine-grained categorization without human interaction. Different from prior work, which relies on detectors for specific object parts, we propose to localize distinctive details by roughly aligning the objects using just the overall shape, since implicit to fine-grained categorization is the existence of a super-class shape shared among all classes. The alignments are then used to transfer part annotations from training images to test images (supervised alignment), or to blindly yet consistently segment the object in a number of regions (unsupervised alignment). We furthermore argue that in the distinction of finegrained sub-categories, classification-oriented encodings like Fisher vectors are better suited for describing localized information than popular matching oriented features like HOG. We evaluate the method on the CU-2011 Birds and Stanford Dogs fine-grained datasets, outperforming the state-of-the-art.
2 0.92477345 202 iccv-2013-How Do You Tell a Blackbird from a Crow?
Author: Thomas Berg, Peter N. Belhumeur
Abstract: How do you tell a blackbirdfrom a crow? There has been great progress toward automatic methods for visual recognition, including fine-grained visual categorization in which the classes to be distinguished are very similar. In a task such as bird species recognition, automatic recognition systems can now exceed the performance of non-experts – most people are challenged to name a couple dozen bird species, let alone identify them. This leads us to the question, “Can a recognition system show humans what to look for when identifying classes (in this case birds)? ” In the context of fine-grained visual categorization, we show that we can automatically determine which classes are most visually similar, discover what visual features distinguish very similar classes, and illustrate the key features in a way meaningful to humans. Running these methods on a dataset of bird images, we can generate a visual field guide to birds which includes a tree of similarity that displays the similarity relations between all species, pages for each species showing the most similar other species, and pages for each pair of similar species illustrating their differences.
3 0.92307401 278 iccv-2013-Multi-scale Topological Features for Hand Posture Representation and Analysis
Author: Kaoning Hu, Lijun Yin
Abstract: In this paper, we propose a multi-scale topological feature representation for automatic analysis of hand posture. Such topological features have the advantage of being posture-dependent while being preserved under certain variations of illumination, rotation, personal dependency, etc. Our method studies the topology of the holes between the hand region and its convex hull. Inspired by the principle of Persistent Homology, which is the theory of computational topology for topological feature analysis over multiple scales, we construct the multi-scale Betti Numbers matrix (MSBNM) for the topological feature representation. In our experiments, we used 12 different hand postures and compared our features with three popular features (HOG, MCT, and Shape Context) on different data sets. In addition to hand postures, we also extend the feature representations to arm postures. The results demonstrate the feasibility and reliability of the proposed method.
4 0.92206997 31 iccv-2013-A Unified Probabilistic Approach Modeling Relationships between Attributes and Objects
Author: Xiaoyang Wang, Qiang Ji
Abstract: This paper proposes a unified probabilistic model to model the relationships between attributes and objects for attribute prediction and object recognition. As a list of semantically meaningful properties of objects, attributes generally relate to each other statistically. In this paper, we propose a unified probabilistic model to automatically discover and capture both the object-dependent and objectindependent attribute relationships. The model utilizes the captured relationships to benefit both attribute prediction and object recognition. Experiments on four benchmark attribute datasets demonstrate the effectiveness of the proposed unified model for improving attribute prediction as well as object recognition in both standard and zero-shot learning cases.
5 0.92180663 107 iccv-2013-Deformable Part Descriptors for Fine-Grained Recognition and Attribute Prediction
Author: Ning Zhang, Ryan Farrell, Forrest Iandola, Trevor Darrell
Abstract: Recognizing objects in fine-grained domains can be extremely challenging due to the subtle differences between subcategories. Discriminative markings are often highly localized, leading traditional object recognition approaches to struggle with the large pose variation often present in these domains. Pose-normalization seeks to align training exemplars, either piecewise by part or globally for the whole object, effectively factoring out differences in pose and in viewing angle. Prior approaches relied on computationally-expensive filter ensembles for part localization and required extensive supervision. This paper proposes two pose-normalized descriptors based on computationally-efficient deformable part models. The first leverages the semantics inherent in strongly-supervised DPM parts. The second exploits weak semantic annotations to learn cross-component correspondences, computing pose-normalized descriptors from the latent parts of a weakly-supervised DPM. These representations enable pooling across pose and viewpoint, in turn facilitating tasks such as fine-grained recognition and attribute prediction. Experiments conducted on the Caltech-UCSD Birds 200 dataset and Berkeley Human Attribute dataset demonstrate significant improvements over state-of-art algorithms.
6 0.91852558 230 iccv-2013-Latent Data Association: Bayesian Model Selection for Multi-target Tracking
7 0.91684449 411 iccv-2013-Symbiotic Segmentation and Part Localization for Fine-Grained Categorization
8 0.91640949 64 iccv-2013-Box in the Box: Joint 3D Layout and Object Reasoning from Single Images
9 0.9103477 449 iccv-2013-What Do You Do? Occupation Recognition in a Photo via Social Context
10 0.90827322 160 iccv-2013-Fast Object Segmentation in Unconstrained Video
11 0.90821373 414 iccv-2013-Temporally Consistent Superpixels
12 0.90756691 95 iccv-2013-Cosegmentation and Cosketch by Unsupervised Learning
13 0.90618598 204 iccv-2013-Human Attribute Recognition by Rich Appearance Dictionary
14 0.90423292 379 iccv-2013-Semantic Segmentation without Annotating Segments
15 0.90400481 62 iccv-2013-Bird Part Localization Using Exemplar-Based Models with Enforced Pose and Subcategory Consistency
16 0.90346396 150 iccv-2013-Exemplar Cut
17 0.90277493 404 iccv-2013-Structured Forests for Fast Edge Detection
18 0.90276521 110 iccv-2013-Detecting Curved Symmetric Parts Using a Deformable Disc Model
19 0.90244889 78 iccv-2013-Coherent Motion Segmentation in Moving Camera Videos Using Optical Flow Orientations
20 0.90229225 6 iccv-2013-A Convex Optimization Framework for Active Learning