iccv iccv2013 iccv2013-198 knowledge-graph by maker-knowledge-mining

198 iccv-2013-Hierarchical Part Matching for Fine-Grained Visual Categorization


Source: pdf

Author: Lingxi Xie, Qi Tian, Richang Hong, Shuicheng Yan, Bo Zhang

Abstract: As a special topic in computer vision, , fine-grained visual categorization (FGVC) has been attracting growing attention these years. Different with traditional image classification tasks in which objects have large inter-class variation, the visual concepts in the fine-grained datasets, such as hundreds of bird species, often have very similar semantics. Due to the large inter-class similarity, it is very difficult to classify the objects without locating really discriminative features, therefore it becomes more important for the algorithm to make full use of the part information in order to train a robust model. In this paper, we propose a powerful flowchart named Hierarchical Part Matching (HPM) to cope with finegrained classification tasks. We extend the Bag-of-Features (BoF) model by introducing several novel modules to integrate into image representation, including foreground inference and segmentation, Hierarchical Structure Learn- ing (HSL), and Geometric Phrase Pooling (GPP). We verify in experiments that our algorithm achieves the state-ofthe-art classification accuracy in the Caltech-UCSD-Birds200-2011 dataset by making full use of the ground-truth part annotations.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 cn 4eleyans@nus Abstract As a special topic in computer vision, , fine-grained visual categorization (FGVC) has been attracting growing attention these years. [sent-6, score-0.128]

2 Different with traditional image classification tasks in which objects have large inter-class variation, the visual concepts in the fine-grained datasets, such as hundreds of bird species, often have very similar semantics. [sent-7, score-0.321]

3 In this paper, we propose a powerful flowchart named Hierarchical Part Matching (HPM) to cope with finegrained classification tasks. [sent-9, score-0.274]

4 We extend the Bag-of-Features (BoF) model by introducing several novel modules to integrate into image representation, including foreground inference and segmentation, Hierarchical Structure Learn- ing (HSL), and Geometric Phrase Pooling (GPP). [sent-10, score-0.356]

5 We verify in experiments that our algorithm achieves the state-ofthe-art classification accuracy in the Caltech-UCSD-Birds200-2011 dataset by making full use of the ground-truth part annotations. [sent-11, score-0.109]

6 Introduction Classifying images according to their semantic meaning is a basic task in the computer vision community. [sent-13, score-0.104]

7 Among them, finegrained visual categorization (FGVC) is a special case, in which the visual concepts in different categories are very similar. [sent-15, score-0.278]

8 Although in recent years researchers have proposed new approaches to deal with the above problem [3] [4] [11] and verified that these modules help to boost the classification performance [20], the connection between image representation and visual concepts is still weak. [sent-27, score-0.303]

9 It is also observed [18] that traditional classification model works poorly on the fine-grained tasks, due to the limited use of really discriminative features located on special parts of the objects. [sent-28, score-0.137]

10 In this paper, we propose a novel flowchart named Hierarchical Part Matching (HPM) to cope with fine-grained classification problems. [sent-29, score-0.235]

11 We make full use of the groundtruth part annotation to help us obtain better image alignment and segmentation, and provide a much more descriptive image representation by building mid-level structures on local features as well as segmented regions. [sent-30, score-0.102]

12 The new modules added in the HPM model (see Figure 1) could be summarized as follows. [sent-31, score-0.126]

13 Second, we propose the Hierarchical Structure Learning (HSL) algorithm to find midlevel concepts beyond basic parts. [sent-33, score-0.127]

14 Integrating all the modules above gives a powerful model, which achieves the state-of-the-art classification performance in a challenging fine-grained image collection. [sent-35, score-0.192]

15 The main contribution of this paper is to provide an intuitive, simple and efficient way of using ground-truth part annotations, and emphasize the importance of part detection in the fine-grained classification tasks with surprising boost in classification accuracy. [sent-36, score-0.252]

16 Next, we introduce the Hierarchical Part Matching (HPM) model by individually presenting three modules: foreground inference and segmentation in Section 4, the Hierarchical Structure Learning (HSL) algorithm in Section 5, and the Geometric Phrase Pooling (GPP) algorithm in Section 6. [sent-40, score-0.311]

17 and train a visual vocabulary or codebook using K-Means clustering. [sent-48, score-0.12]

18 Fine-Grained Visual Categorization Fine-grained visual categorization (FGVC) is an emerging research area in computer vision, in which a dataset typically contains hundreds of categories sharing similar semantics. [sent-54, score-0.176]

19 For example, the Caltech-UCSD Birds-200-201 1 dataset [18] contains 200 bird species, and there are 120 different kinds of dogs in the Stanford Dogs dataset [12]. [sent-55, score-0.095]

20 To cope with the fine-grained classification tasks, researchers have proposed many novel algorithms, such as visual attributes [8], random templates [22], hierarchical matching [6] and part-based one-vs-one features [2]. [sent-56, score-0.26]

21 Foreground Inference and Segmentation In the fine-grained image classification tasks, almost allthe categories share similar background clutters, and objects often vary from each other only in some small regions named “parts”. [sent-59, score-0.176]

22 For foreground inference, a popular tool is the Grab-Cut [15] algorithm, which iteratively infers the foreground region from an initial mask. [sent-61, score-0.304]

23 There are also works proposing multi-label segmentation to combine above steps as one [5]. [sent-63, score-0.081]

24 The Geometric Phrase Pooling Algorithm The basic units in the BoF model are visual words, which are too far away from visual concepts. [sent-66, score-0.163]

25 As a mid-level structure bridging low-level features and high-level semantics, visual phrases are verified useful in various image applications [23] [25]. [sent-67, score-0.205]

26 The Geometric Phrase Pooling (GPP) algorithm [20] is an efficient phrase extraction and pooling approach, in which we define visual phrases as local groups of visual words, and perform an efficient pooling algorithm to enhance the similarity in both geometric and feature spaces. [sent-68, score-1.005]

27 The Dataset In our experiments, we use the Caltech-UCSD Birds200-201 1 (CUB-200-201 1) dataset [18], which contains 200 bird species and 11788 images in total. [sent-72, score-0.18]

28 Also, a manually labeled bounding box and at most 15 landmark points 11664422 Figure 2. [sent-73, score-0.147]

29 Lower: images of different species showing small inter-class variation. [sent-77, score-0.118]

30 , (dM, RM)} (2) where dm and Rm denote the description vector and occupied regaiondn oRf the m-th descriptor, respectively. [sent-98, score-0.083]

31 The description vector dm is a D-dimensional vector, where D = 3 128 = 384 using OpponentSIFT (OppSIFT) h[1er6e] on RGB-images. [sent-100, score-0.083]

32 For this purpose, we train a codebook C using descriptors from the whole dataset. [sent-102, score-0.133]

33 Given a codebook with B codewords, the quantization vector or feature vector for a descriptor dm would be a Bdimensional vector wm, which is named the corresponding visual word of descriptor dm. [sent-108, score-0.303]

34 , (wM, RM)} (3) Now, we aggregate the local visual words for global image representation. [sent-112, score-0.1]

35 A 3-layer Spatial Pyramid Matching (SPM) [13] follows by dividing the image into hierarchical subregions for individual max-pooling and concatenating the pooled vectors as a super-vector. [sent-116, score-0.11]

36 As they introduce irrelevant features into the BoF model, it is reasonable to perform foreground inference and extract features only on the object (foreground regions). [sent-124, score-0.23]

37 We use the Grab-Cut algorithm [5] for foreground inference. [sent-125, score-0.152]

38 We set the pixels outside the bounding box as definite background, inside as possible foreground and the pixels around landmarks as definite foreground. [sent-129, score-0.343]

39 Energy Function and Segmentation As the regular Spatial Pyramid often fails to align corresponding regions in the fine-grained tasks, we need a more accurate spatial segmentation to capture the semantic parts of the objects. [sent-134, score-0.272]

40 The proposed segmentation algorithm starts with calculating the Ultrametric Contour Map (UCM) [1], which generates closed contours with decreasing boundary intensities to cut the image into smaller and smaller regions. [sent-135, score-0.081]

41 The foreground inference process (best viewed in color PDF). [sent-137, score-0.23]

42 (b) bounding box (red) and small areas around part locations (green). [sent-139, score-0.116]

43 (c) the initial mask in Grab-Cut, in which black, red and green regions are definite BG, possible FG and definite FG, respectively. [sent-140, score-0.166]

44 | = 1} (6) The weight of an edge is determined by the boundary intensity at the tail node (pixel): w(vij → vi? [sent-171, score-0.099]

45 +λ (7) Here, λ is called the step penalty, which takes the geometric distance into consideration. [sent-175, score-0.101]

46 e the landmark points as source nodes, and calculate their shortest paths to other nodes (pixels). [sent-178, score-0.114]

47 (a) Inferred foreground and UCM (darker pixel, larger intensity). [sent-181, score-0.152]

48 Pixel-wise minimization on all the distances yields the segmentation results (centered). [sent-183, score-0.081]

49 Denote the disptaonicnetss as d{d N(pl = , vij W) }×, w Hh eisre t pl i sm tahgee l- sithze part elnoocateti tohne, adnisdvij ciess an arbitrary n)o},de w hine graph G. [sent-185, score-0.256]

50 For later convenience, we dise fainne a {rbdi(tr0a, vij)} as tnhe g background adtiesrta concneves,n wienhciceh, twaeke dse f0i nife vij (is0 a background naockdeg raonudn +d∞ di sottahnercewsi,se w. [sent-186, score-0.162]

51 The segmentation process niso dtoe assign ∞eac oht hneordwei (pixel) to one of the landmarks. [sent-187, score-0.081]

52 Denote an assignment as S: S = (sij)W×H (8) where sij is the index of assigned landmark of node vij, 0 ? [sent-188, score-0.143]

53 L, and sij = 0 implies assigning vij into background. [sent-190, score-0.231]

54 Segmentation algorithDme ndoitveid tehse I s itn otof aalt lm pioxset Ls foreground parts oannd a one background region: ? [sent-201, score-0.223]

55 = 11 664444 It is worth noting that the segmented regions are basic body parts, i. [sent-208, score-0.172]

56 To combine the basic parts, close geometric locations and similar appearance features are the necessary conditions. [sent-213, score-0.166]

57 Therefore, we need to quantize the geometric and feature distances for pairwise parts. [sent-214, score-0.101]

58 We use the set of descriptors {dm, Rm} as defined in Equation e(2 t)h, th seet la onfd dmesacrkri points d{pl, ,}R and} tahse d segmentEedq regions 2{)I,l t}h teo caanldcumlaatrke t phoei ndtisst {anpc}es. [sent-215, score-0.11]

59 dist(l1 , l2) = avg Il1,Il2 =∅ dist(Il1 ,Il2 ) (15) We define a mid-level part Ls as a set of basic parts: Ls = {ls1, ls2 , . [sent-236, score-0.108]

60 Organize all the original and learned parts as a hierarchical structure. [sent-262, score-0.214]

61 We can observe that the learned mid-level parts are semantically nameable (bolded in the table). [sent-287, score-0.138]

62 Also, we can learn a hierarchical structure (more than 2 layers) when μ = 0. [sent-288, score-0.143]

63 Our algorithm requires about 100 seconds in the case of L = 20, and less than 3s in the birds dataset (L = 15). [sent-302, score-0.12]

64 Considering that the hierarchical structure is calculated only once, we can claim that our algorithm is very efficient. [sent-303, score-0.143]

65 After descriptor extraction and feature encoding, we obtain the set of descriptors D defined in (2) and the corresponding set of visual words W de d? [sent-306, score-0.162]

66 ted features: Wl = {(wm, Rm) | Rm ∩ Il = ∅} (18) and performing max-pooling: fl(W) = (wm,mRamx)∈Wlwm (19) Despite its simplicity, the Naive Pooling strategy fails to capture the geometric information, which could be very useful for fine-grained recognition. [sent-327, score-0.101]

67 Figure 6 shows such examples with dominant geometric features, such as ‘crown’ 11664455 (a) (b) Figure 6. [sent-328, score-0.101]

68 Upper: bird species varying from each other mainly in ‘crown’ shape (left pair) and ‘tail’ length (right pair). [sent-330, score-0.18]

69 Middle: examples of Geometric Visual Phrases (GVP), in which red circles are central words and yellows are side words. [sent-331, score-0.139]

70 The GVP in the last case is irregular, for the definition limits the side words on the same region (the long ‘tail’ here) as the central word. [sent-332, score-0.139]

71 Bottom: the regions (of same color) with largest discriminativity increases. [sent-333, score-0.094]

72 In this respect, we adopt the Geometric Phrase Pooling (GPP) algorithm [20], which defines visual phrases as neighboring word groups, and performs an efficient pooling algorithm to enhance the correlation between local word pairs. [sent-336, score-0.489]

73 For each visual word (gwionm, I Ramn)d tihn eW clo,r we soenadrcihng gf sore ti tWs K nearest neighbors in Wl a,ndR fo)rm in a Wword group: Pl,m = {(wl,m,0, ll,m,0) ,. [sent-338, score-0.087]

74 , (wl,m,K, ll,m,K)} (20) Pl,m is the m-th Geometric Visual Phrase (GVP) in Wl, iPn whiisch t wl,m,0 = G wm eist rtihce Vceisnutarall P whorardse, a(GndV oPt)h einrs W are side words. [sent-341, score-0.11]

75 xKwl,m,k (21) and summarizes all the phrases using max-pooling: fl(P) = (wm,mRmax)∈Wlpl,m (22) Here, we conduct an experiment to show the effectiveness of GPP. [sent-347, score-0.123]

76 On the image pairs shown in Figure 6, we calculate the distance φl between the corresponding regions using visual words or phrases, respectively, and consider φl as the model’s discriminativity metric: φl(W) = ? [sent-348, score-0.234]

77 The results reveal that GPP actually discovers useful geometric properties and combines them with the texture features. [sent-367, score-0.101]

78 The feature vectors in the basic aFneda tmuirde-l ceovneslt regions are individually computed using max-pooling, and then concatenated as a super-vector. [sent-380, score-0.113]

79 training ctht efi xcelads nsuifmicabteiorsn ( 5m, 1o0d,el, and test it on the remaining images to calculate the average classification accuracy by category. [sent-386, score-0.106]

80 Automatic Many people have been debating on whether to use human annotations in fine-grained visual categorization (FGVC) tasks [18] [24] [6] [22]. [sent-391, score-0.221]

81 Here we provide twofold clues by testing our model on automatically annotated parts and lighter manually annotated parts. [sent-392, score-0.152]

82 The templates of birds with 9 unnameable parts are trained on the PascalVOC 2007 and PascalVOC 2010 databases, respectively. [sent-394, score-0.191]

83 two lighter sets of manual annotations by preserving 3 or 6 landmark points and discarding others. [sent-415, score-0.216]

84 On the other hand, even partial manual annotations (3 or 6 parts) is valuable for visual categorization. [sent-418, score-0.147]

85 Therefore we propose to use the ground-truth annotations temporarily in the fine-grained recognition, and improve the quality of object detection using the clues learned in the classification tasks. [sent-419, score-0.195]

86 Model and Parameters First, we test the effectiveness of foreground inference and segmentation and list the results in Table 3. [sent-422, score-0.311]

87 Using mid-level parts as extra bins, we improve the classification accuracy by a margin. [sent-426, score-0.137]

88 In parentheses are the numbers of coding bases for central and side words. [sent-438, score-0.175]

89 We choose 5 and 40 as the numbers of coding bases for central and side words, respectively. [sent-454, score-0.175]

90 Both the SPM algorithm and our model (HPM) build hierarchical structures for feature pooling. [sent-455, score-0.11]

91 We make full use of the ground-truth annotations, and extract semantic parts as spatial pooling bins. [sent-456, score-0.351]

92 , Part Segmentation, Structure #3 in HSL, and numbers of coding bases (5,40) 11664477 for GPP. [sent-465, score-0.087]

93 The great improvement in classification accuracy comes from the full use of ground-truth part annotations. [sent-468, score-0.109]

94 Conclusions and Future Works In this paper, we present a novel flowchart named Hierarchical Part Matching (HPM) for fine-grained visual categorization (FGVC). [sent-470, score-0.262]

95 HPM contains three modules to enhance the BoF model. [sent-471, score-0.159]

96 First, using the Grab-Cut algorithm and the Ultrametric Contour Map, we develop an effective algorithm for foreground inference and segmentation, generating more accurate object alignment. [sent-472, score-0.23]

97 Second, we propose the Hierarchical Structure Learning (HSL) algorithm for finding mid-level concepts beyond basic parts. [sent-473, score-0.127]

98 Integrating all the modules makes HPM a powerful model, which shows notable improvements over the existing works on the Birds dataset. [sent-476, score-0.126]

99 For example, biological taxonomy provides a scientific classification system for all the bird species (available on Wikipedia). [sent-478, score-0.246]

100 It implies a hierarchical classifier, on which we could apply various techniques such as transfer learning for fine-grained understanding. [sent-479, score-0.11]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('hpm', 0.385), ('gpp', 0.296), ('hsl', 0.237), ('phrase', 0.234), ('pooling', 0.208), ('bof', 0.19), ('vij', 0.162), ('foreground', 0.152), ('fgvc', 0.137), ('modules', 0.126), ('phrases', 0.123), ('birds', 0.12), ('species', 0.118), ('hierarchical', 0.11), ('crown', 0.105), ('geometric', 0.101), ('tail', 0.099), ('ucm', 0.097), ('ls', 0.094), ('rm', 0.084), ('dm', 0.083), ('segmentation', 0.081), ('categorization', 0.079), ('gvp', 0.079), ('inference', 0.078), ('tian', 0.076), ('landmark', 0.074), ('spm', 0.074), ('ultrametric', 0.073), ('fl', 0.072), ('flowchart', 0.072), ('codebook', 0.071), ('parts', 0.071), ('sij', 0.069), ('classification', 0.066), ('basic', 0.065), ('wm', 0.062), ('concepts', 0.062), ('named', 0.062), ('bird', 0.062), ('descriptors', 0.062), ('tsinghua', 0.061), ('annotations', 0.059), ('bolded', 0.059), ('hefei', 0.059), ('oppsift', 0.059), ('pascalvoc', 0.059), ('segmented', 0.059), ('wl', 0.059), ('definite', 0.059), ('llc', 0.055), ('singapore', 0.053), ('tnlist', 0.053), ('coding', 0.052), ('words', 0.051), ('pl', 0.051), ('pdf', 0.05), ('visual', 0.049), ('dist', 0.049), ('hundreds', 0.048), ('side', 0.048), ('regions', 0.048), ('accuracies', 0.047), ('fg', 0.046), ('discriminativity', 0.046), ('uij', 0.046), ('lighter', 0.044), ('part', 0.043), ('beak', 0.042), ('calculate', 0.04), ('central', 0.04), ('manual', 0.039), ('semantic', 0.039), ('il', 0.039), ('finegrained', 0.039), ('poof', 0.038), ('word', 0.038), ('box', 0.037), ('clues', 0.037), ('baseline', 0.037), ('bounding', 0.036), ('vlfeat', 0.035), ('bases', 0.035), ('cope', 0.035), ('calculates', 0.034), ('semantically', 0.034), ('laboratory', 0.034), ('tasks', 0.034), ('xie', 0.034), ('learned', 0.033), ('structure', 0.033), ('codewords', 0.033), ('texas', 0.033), ('dogs', 0.033), ('enhance', 0.033), ('spatial', 0.033), ('national', 0.033), ('pyramid', 0.033), ('aij', 0.032), ('liblinear', 0.031)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999911 198 iccv-2013-Hierarchical Part Matching for Fine-Grained Visual Categorization

Author: Lingxi Xie, Qi Tian, Richang Hong, Shuicheng Yan, Bo Zhang

Abstract: As a special topic in computer vision, , fine-grained visual categorization (FGVC) has been attracting growing attention these years. Different with traditional image classification tasks in which objects have large inter-class variation, the visual concepts in the fine-grained datasets, such as hundreds of bird species, often have very similar semantics. Due to the large inter-class similarity, it is very difficult to classify the objects without locating really discriminative features, therefore it becomes more important for the algorithm to make full use of the part information in order to train a robust model. In this paper, we propose a powerful flowchart named Hierarchical Part Matching (HPM) to cope with finegrained classification tasks. We extend the Bag-of-Features (BoF) model by introducing several novel modules to integrate into image representation, including foreground inference and segmentation, Hierarchical Structure Learn- ing (HSL), and Geometric Phrase Pooling (GPP). We verify in experiments that our algorithm achieves the state-ofthe-art classification accuracy in the Caltech-UCSD-Birds200-2011 dataset by making full use of the ground-truth part annotations.

2 0.19728681 265 iccv-2013-Mining Motion Atoms and Phrases for Complex Action Recognition

Author: Limin Wang, Yu Qiao, Xiaoou Tang

Abstract: This paper proposes motion atom and phrase as a midlevel temporal “part” for representing and classifying complex action. Motion atom is defined as an atomic part of action, and captures the motion information of action video in a short temporal scale. Motion phrase is a temporal composite of multiple motion atoms with an AND/OR structure, which further enhances the discriminative ability of motion atoms by incorporating temporal constraints in a longer scale. Specifically, given a set of weakly labeled action videos, we firstly design a discriminative clustering method to automatically discovera set ofrepresentative motion atoms. Then, based on these motion atoms, we mine effective motion phrases with high discriminative and representativepower. We introduce a bottom-upphrase construction algorithm and a greedy selection method for this mining task. We examine the classification performance of the motion atom and phrase based representation on two complex action datasets: Olympic Sports and UCF50. Experimental results show that our method achieves superior performance over recent published methods on both datasets.

3 0.17678957 107 iccv-2013-Deformable Part Descriptors for Fine-Grained Recognition and Attribute Prediction

Author: Ning Zhang, Ryan Farrell, Forrest Iandola, Trevor Darrell

Abstract: Recognizing objects in fine-grained domains can be extremely challenging due to the subtle differences between subcategories. Discriminative markings are often highly localized, leading traditional object recognition approaches to struggle with the large pose variation often present in these domains. Pose-normalization seeks to align training exemplars, either piecewise by part or globally for the whole object, effectively factoring out differences in pose and in viewing angle. Prior approaches relied on computationally-expensive filter ensembles for part localization and required extensive supervision. This paper proposes two pose-normalized descriptors based on computationally-efficient deformable part models. The first leverages the semantics inherent in strongly-supervised DPM parts. The second exploits weak semantic annotations to learn cross-component correspondences, computing pose-normalized descriptors from the latent parts of a weakly-supervised DPM. These representations enable pooling across pose and viewpoint, in turn facilitating tasks such as fine-grained recognition and attribute prediction. Experiments conducted on the Caltech-UCSD Birds 200 dataset and Berkeley Human Attribute dataset demonstrate significant improvements over state-of-art algorithms.

4 0.16605599 202 iccv-2013-How Do You Tell a Blackbird from a Crow?

Author: Thomas Berg, Peter N. Belhumeur

Abstract: How do you tell a blackbirdfrom a crow? There has been great progress toward automatic methods for visual recognition, including fine-grained visual categorization in which the classes to be distinguished are very similar. In a task such as bird species recognition, automatic recognition systems can now exceed the performance of non-experts – most people are challenged to name a couple dozen bird species, let alone identify them. This leads us to the question, “Can a recognition system show humans what to look for when identifying classes (in this case birds)? ” In the context of fine-grained visual categorization, we show that we can automatically determine which classes are most visually similar, discover what visual features distinguish very similar classes, and illustrate the key features in a way meaningful to humans. Running these methods on a dataset of bird images, we can generate a visual field guide to birds which includes a tree of similarity that displays the similarity relations between all species, pages for each species showing the most similar other species, and pages for each pair of similar species illustrating their differences.

5 0.16441616 411 iccv-2013-Symbiotic Segmentation and Part Localization for Fine-Grained Categorization

Author: Yuning Chai, Victor Lempitsky, Andrew Zisserman

Abstract: We propose a new method for the task of fine-grained visual categorization. The method builds a model of the baselevel category that can be fitted to images, producing highquality foreground segmentation and mid-level part localizations. The model can be learnt from the typical datasets available for fine-grained categorization, where the only annotation provided is a loose bounding box around the instance (e.g. bird) in each image. Both segmentation and part localizations are then used to encode the image content into a highly-discriminative visual signature. The model is symbiotic in that part discovery/localization is helped by segmentation and, conversely, the segmentation is helped by the detection (e.g. part layout). Our model builds on top of the part-based object category detector of Felzenszwalb et al., and also on the powerful GrabCut segmentation algorithm of Rother et al., and adds a simple spatial saliency coupling between them. In our evaluation, the model improves the categorization accuracy over the state-of-the-art. It also improves over what can be achieved with an analogous system that runs segmentation and part-localization independently.

6 0.15220237 127 iccv-2013-Dynamic Pooling for Complex Event Recognition

7 0.14733946 169 iccv-2013-Fine-Grained Categorization by Alignments

8 0.1372218 396 iccv-2013-Space-Time Robust Representation for Action Recognition

9 0.12464149 379 iccv-2013-Semantic Segmentation without Annotating Segments

10 0.12311538 197 iccv-2013-Hierarchical Joint Max-Margin Learning of Mid and Top Level Representations for Visual Recognition

11 0.12155093 179 iccv-2013-From Subcategories to Visual Composites: A Multi-level Framework for Object Detection

12 0.1157417 294 iccv-2013-Offline Mobile Instance Retrieval with a Small Memory Footprint

13 0.10956758 62 iccv-2013-Bird Part Localization Using Exemplar-Based Models with Enforced Pose and Subcategory Consistency

14 0.10042303 326 iccv-2013-Predicting Sufficient Annotation Strength for Interactive Foreground Segmentation

15 0.099744797 225 iccv-2013-Joint Segmentation and Pose Tracking of Human in Natural Videos

16 0.097419739 327 iccv-2013-Predicting an Object Location Using a Global Image Representation

17 0.094484411 236 iccv-2013-Learning Discriminative Part Detectors for Image Classification and Cosegmentation

18 0.092788771 204 iccv-2013-Human Attribute Recognition by Rich Appearance Dictionary

19 0.089682817 258 iccv-2013-Low-Rank Sparse Coding for Image Classification

20 0.083361901 377 iccv-2013-Segmentation Driven Object Detection with Fisher Vectors


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.22), (1, 0.069), (2, 0.042), (3, -0.04), (4, 0.071), (5, 0.013), (6, -0.062), (7, 0.042), (8, -0.028), (9, -0.093), (10, 0.066), (11, 0.079), (12, -0.007), (13, -0.043), (14, -0.078), (15, 0.008), (16, 0.062), (17, 0.007), (18, 0.045), (19, -0.063), (20, 0.051), (21, 0.042), (22, -0.015), (23, 0.066), (24, -0.026), (25, 0.081), (26, 0.019), (27, -0.031), (28, 0.044), (29, 0.08), (30, 0.096), (31, -0.138), (32, -0.062), (33, 0.002), (34, -0.096), (35, 0.052), (36, -0.065), (37, -0.086), (38, 0.019), (39, 0.051), (40, -0.108), (41, -0.04), (42, 0.022), (43, 0.014), (44, -0.084), (45, 0.04), (46, -0.005), (47, 0.05), (48, 0.157), (49, 0.058)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9098717 198 iccv-2013-Hierarchical Part Matching for Fine-Grained Visual Categorization

Author: Lingxi Xie, Qi Tian, Richang Hong, Shuicheng Yan, Bo Zhang

Abstract: As a special topic in computer vision, , fine-grained visual categorization (FGVC) has been attracting growing attention these years. Different with traditional image classification tasks in which objects have large inter-class variation, the visual concepts in the fine-grained datasets, such as hundreds of bird species, often have very similar semantics. Due to the large inter-class similarity, it is very difficult to classify the objects without locating really discriminative features, therefore it becomes more important for the algorithm to make full use of the part information in order to train a robust model. In this paper, we propose a powerful flowchart named Hierarchical Part Matching (HPM) to cope with finegrained classification tasks. We extend the Bag-of-Features (BoF) model by introducing several novel modules to integrate into image representation, including foreground inference and segmentation, Hierarchical Structure Learn- ing (HSL), and Geometric Phrase Pooling (GPP). We verify in experiments that our algorithm achieves the state-ofthe-art classification accuracy in the Caltech-UCSD-Birds200-2011 dataset by making full use of the ground-truth part annotations.

2 0.82642603 202 iccv-2013-How Do You Tell a Blackbird from a Crow?

Author: Thomas Berg, Peter N. Belhumeur

Abstract: How do you tell a blackbirdfrom a crow? There has been great progress toward automatic methods for visual recognition, including fine-grained visual categorization in which the classes to be distinguished are very similar. In a task such as bird species recognition, automatic recognition systems can now exceed the performance of non-experts – most people are challenged to name a couple dozen bird species, let alone identify them. This leads us to the question, “Can a recognition system show humans what to look for when identifying classes (in this case birds)? ” In the context of fine-grained visual categorization, we show that we can automatically determine which classes are most visually similar, discover what visual features distinguish very similar classes, and illustrate the key features in a way meaningful to humans. Running these methods on a dataset of bird images, we can generate a visual field guide to birds which includes a tree of similarity that displays the similarity relations between all species, pages for each species showing the most similar other species, and pages for each pair of similar species illustrating their differences.

3 0.81408525 169 iccv-2013-Fine-Grained Categorization by Alignments

Author: E. Gavves, B. Fernando, C.G.M. Snoek, A.W.M. Smeulders, T. Tuytelaars

Abstract: The aim of this paper is fine-grained categorization without human interaction. Different from prior work, which relies on detectors for specific object parts, we propose to localize distinctive details by roughly aligning the objects using just the overall shape, since implicit to fine-grained categorization is the existence of a super-class shape shared among all classes. The alignments are then used to transfer part annotations from training images to test images (supervised alignment), or to blindly yet consistently segment the object in a number of regions (unsupervised alignment). We furthermore argue that in the distinction of finegrained sub-categories, classification-oriented encodings like Fisher vectors are better suited for describing localized information than popular matching oriented features like HOG. We evaluate the method on the CU-2011 Birds and Stanford Dogs fine-grained datasets, outperforming the state-of-the-art.

4 0.73723096 77 iccv-2013-Codemaps - Segment, Classify and Search Objects Locally

Author: Zhenyang Li, Efstratios Gavves, Koen E.A. van_de_Sande, Cees G.M. Snoek, Arnold W.M. Smeulders

Abstract: In this paper we aim for segmentation and classification of objects. We propose codemaps that are a joint formulation of the classification score and the local neighborhood it belongs to in the image. We obtain the codemap by reordering the encoding, pooling and classification steps over lattice elements. Other than existing linear decompositions who emphasize only the efficiency benefits for localized search, we make three novel contributions. As a preliminary, we provide a theoretical generalization of the sufficient mathematical conditions under which image encodings and classification becomes locally decomposable. As first novelty we introduce ℓ2 normalization for arbitrarily shaped image regions, which is fast enough for semantic segmentation using our Fisher codemaps. Second, using the same lattice across images, we propose kernel pooling which embeds nonlinearities into codemaps for object classification by explicit or approximate feature mappings. Results demonstrate that ℓ2 normalized Fisher codemaps improve the state-of-the-art in semantic segmentation for PAS- CAL VOC. For object classification the addition of nonlinearities brings us on par with the state-of-the-art, but is 3x faster. Because of the codemaps ’ inherent efficiency, we can reach significant speed-ups for localized search as well. We exploit the efficiency gain for our third novelty: object segment retrieval using a single query image only.

5 0.69015378 411 iccv-2013-Symbiotic Segmentation and Part Localization for Fine-Grained Categorization

Author: Yuning Chai, Victor Lempitsky, Andrew Zisserman

Abstract: We propose a new method for the task of fine-grained visual categorization. The method builds a model of the baselevel category that can be fitted to images, producing highquality foreground segmentation and mid-level part localizations. The model can be learnt from the typical datasets available for fine-grained categorization, where the only annotation provided is a loose bounding box around the instance (e.g. bird) in each image. Both segmentation and part localizations are then used to encode the image content into a highly-discriminative visual signature. The model is symbiotic in that part discovery/localization is helped by segmentation and, conversely, the segmentation is helped by the detection (e.g. part layout). Our model builds on top of the part-based object category detector of Felzenszwalb et al., and also on the powerful GrabCut segmentation algorithm of Rother et al., and adds a simple spatial saliency coupling between them. In our evaluation, the model improves the categorization accuracy over the state-of-the-art. It also improves over what can be achieved with an analogous system that runs segmentation and part-localization independently.

6 0.66723007 107 iccv-2013-Deformable Part Descriptors for Fine-Grained Recognition and Attribute Prediction

7 0.62049228 179 iccv-2013-From Subcategories to Visual Composites: A Multi-level Framework for Object Detection

8 0.61688364 327 iccv-2013-Predicting an Object Location Using a Global Image Representation

9 0.60223675 109 iccv-2013-Detecting Avocados to Zucchinis: What Have We Done, and Where Are We Going?

10 0.58424187 104 iccv-2013-Decomposing Bag of Words Histograms

11 0.58223182 294 iccv-2013-Offline Mobile Instance Retrieval with a Small Memory Footprint

12 0.56149268 258 iccv-2013-Low-Rank Sparse Coding for Image Classification

13 0.56014138 74 iccv-2013-Co-segmentation by Composition

14 0.55874574 236 iccv-2013-Learning Discriminative Part Detectors for Image Classification and Cosegmentation

15 0.54511905 62 iccv-2013-Bird Part Localization Using Exemplar-Based Models with Enforced Pose and Subcategory Consistency

16 0.54497546 377 iccv-2013-Segmentation Driven Object Detection with Fisher Vectors

17 0.54445839 390 iccv-2013-Shufflets: Shared Mid-level Parts for Fast Object Detection

18 0.52677339 176 iccv-2013-From Large Scale Image Categorization to Entry-Level Categories

19 0.51031387 288 iccv-2013-Nested Shape Descriptors

20 0.50597 379 iccv-2013-Semantic Segmentation without Annotating Segments


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(2, 0.066), (13, 0.012), (26, 0.511), (31, 0.035), (34, 0.033), (42, 0.065), (48, 0.011), (64, 0.032), (73, 0.012), (89, 0.136)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.94107509 405 iccv-2013-Structured Light in Sunlight

Author: Mohit Gupta, Qi Yin, Shree K. Nayar

Abstract: Strong ambient illumination severely degrades the performance of structured light based techniques. This is especially true in outdoor scenarios, where the structured light sources have to compete with sunlight, whose power is often 2-5 orders of magnitude larger than the projected light. In this paper, we propose the concept of light-concentration to overcome strong ambient illumination. Our key observation is that given a fixed light (power) budget, it is always better to allocate it sequentially in several portions of the scene, as compared to spreading it over the entire scene at once. For a desired level of accuracy, we show that by distributing light appropriately, the proposed approach requires 1-2 orders lower acquisition time than existing approaches. Our approach is illumination-adaptive as the optimal light distribution is determined based on a measurement of the ambient illumination level. Since current light sources have a fixed light distribution, we have built a prototype light source that supports flexible light distribution by controlling the scanning speed of a laser scanner. We show several high quality 3D scanning results in a wide range of outdoor scenarios. The proposed approach will benefit 3D vision systems that need to operate outdoors under extreme ambient illumination levels on a limited time and power budget.

2 0.90427583 395 iccv-2013-Slice Sampling Particle Belief Propagation

Author: Oliver Müller, Michael Ying Yang, Bodo Rosenhahn

Abstract: Inference in continuous label Markov random fields is a challenging task. We use particle belief propagation (PBP) for solving the inference problem in continuous label space. Sampling particles from the belief distribution is typically done by using Metropolis-Hastings (MH) Markov chain Monte Carlo (MCMC) methods which involves sampling from a proposal distribution. This proposal distribution has to be carefully designed depending on the particular model and input data to achieve fast convergence. We propose to avoid dependence on a proposal distribution by introducing a slice sampling based PBP algorithm. The proposed approach shows superior convergence performance on an image denoising toy example. Our findings are validated on a challenging relational 2D feature tracking application.

3 0.90224117 51 iccv-2013-Anchored Neighborhood Regression for Fast Example-Based Super-Resolution

Author: Radu Timofte, Vincent De_Smet, Luc Van_Gool

Abstract: Recently there have been significant advances in image upscaling or image super-resolution based on a dictionary of low and high resolution exemplars. The running time of the methods is often ignored despite the fact that it is a critical factor for real applications. This paper proposes fast super-resolution methods while making no compromise on quality. First, we support the use of sparse learned dictionaries in combination with neighbor embedding methods. In this case, the nearest neighbors are computed using the correlation with the dictionary atoms rather than the Euclidean distance. Moreover, we show that most of the current approaches reach top performance for the right parameters. Second, we show that using global collaborative coding has considerable speed advantages, reducing the super-resolution mapping to a precomputed projective matrix. Third, we propose the anchored neighborhood regression. That is to anchor the neighborhood embedding of a low resolution patch to the nearest atom in the dictionary and to precompute the corresponding embedding matrix. These proposals are contrasted with current state-of- the-art methods on standard images. We obtain similar or improved quality and one or two orders of magnitude speed improvements.

4 0.8906492 125 iccv-2013-Drosophila Embryo Stage Annotation Using Label Propagation

Author: Tomáš Kazmar, Evgeny Z. Kvon, Alexander Stark, Christoph H. Lampert

Abstract: In this work we propose a system for automatic classification of Drosophila embryos into developmental stages. While the system is designed to solve an actual problem in biological research, we believe that the principle underlying it is interesting not only for biologists, but also for researchers in computer vision. The main idea is to combine two orthogonal sources of information: one is a classifier trained on strongly invariant features, which makes it applicable to images of very different conditions, but also leads to rather noisy predictions. The other is a label propagation step based on a more powerful similarity measure that however is only consistent within specific subsets of the data at a time. In our biological setup, the information sources are the shape and the staining patterns of embryo images. We show experimentally that while neither of the methods can be used by itself to achieve satisfactory results, their combination achieves prediction quality comparable to human per- formance.

same-paper 5 0.87791681 198 iccv-2013-Hierarchical Part Matching for Fine-Grained Visual Categorization

Author: Lingxi Xie, Qi Tian, Richang Hong, Shuicheng Yan, Bo Zhang

Abstract: As a special topic in computer vision, , fine-grained visual categorization (FGVC) has been attracting growing attention these years. Different with traditional image classification tasks in which objects have large inter-class variation, the visual concepts in the fine-grained datasets, such as hundreds of bird species, often have very similar semantics. Due to the large inter-class similarity, it is very difficult to classify the objects without locating really discriminative features, therefore it becomes more important for the algorithm to make full use of the part information in order to train a robust model. In this paper, we propose a powerful flowchart named Hierarchical Part Matching (HPM) to cope with finegrained classification tasks. We extend the Bag-of-Features (BoF) model by introducing several novel modules to integrate into image representation, including foreground inference and segmentation, Hierarchical Structure Learn- ing (HSL), and Geometric Phrase Pooling (GPP). We verify in experiments that our algorithm achieves the state-ofthe-art classification accuracy in the Caltech-UCSD-Birds200-2011 dataset by making full use of the ground-truth part annotations.

6 0.87263477 282 iccv-2013-Multi-view Object Segmentation in Space and Time

7 0.86785471 348 iccv-2013-Refractive Structure-from-Motion on Underwater Images

8 0.80225873 295 iccv-2013-On One-Shot Similarity Kernels: Explicit Feature Maps and Properties

9 0.78194141 8 iccv-2013-A Deformable Mixture Parsing Model with Parselets

10 0.78119284 102 iccv-2013-Data-Driven 3D Primitives for Single Image Understanding

11 0.69166327 414 iccv-2013-Temporally Consistent Superpixels

12 0.69035184 156 iccv-2013-Fast Direct Super-Resolution by Simple Functions

13 0.66677976 326 iccv-2013-Predicting Sufficient Annotation Strength for Interactive Foreground Segmentation

14 0.6459685 150 iccv-2013-Exemplar Cut

15 0.64068127 161 iccv-2013-Fast Sparsity-Based Orthogonal Dictionary Learning for Image Restoration

16 0.63778651 411 iccv-2013-Symbiotic Segmentation and Part Localization for Fine-Grained Categorization

17 0.63288438 423 iccv-2013-Towards Motion Aware Light Field Video for Dynamic Scenes

18 0.63262635 432 iccv-2013-Uncertainty-Driven Efficiently-Sampled Sparse Graphical Models for Concurrent Tumor Segmentation and Atlas Registration

19 0.6276139 63 iccv-2013-Bounded Labeling Function for Global Segmentation of Multi-part Objects with Geometric Constraints

20 0.62758535 330 iccv-2013-Proportion Priors for Image Sequence Segmentation