iccv iccv2013 iccv2013-411 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Yuning Chai, Victor Lempitsky, Andrew Zisserman
Abstract: We propose a new method for the task of fine-grained visual categorization. The method builds a model of the baselevel category that can be fitted to images, producing highquality foreground segmentation and mid-level part localizations. The model can be learnt from the typical datasets available for fine-grained categorization, where the only annotation provided is a loose bounding box around the instance (e.g. bird) in each image. Both segmentation and part localizations are then used to encode the image content into a highly-discriminative visual signature. The model is symbiotic in that part discovery/localization is helped by segmentation and, conversely, the segmentation is helped by the detection (e.g. part layout). Our model builds on top of the part-based object category detector of Felzenszwalb et al., and also on the powerful GrabCut segmentation algorithm of Rother et al., and adds a simple spatial saliency coupling between them. In our evaluation, the model improves the categorization accuracy over the state-of-the-art. It also improves over what can be achieved with an analogous system that runs segmentation and part-localization independently.
Reference: text
sentIndex sentText sentNum sentScore
1 The method builds a model of the baselevel category that can be fitted to images, producing highquality foreground segmentation and mid-level part localizations. [sent-11, score-0.582]
2 The model can be learnt from the typical datasets available for fine-grained categorization, where the only annotation provided is a loose bounding box around the instance (e. [sent-12, score-0.22]
3 Both segmentation and part localizations are then used to encode the image content into a highly-discriminative visual signature. [sent-15, score-0.496]
4 The model is symbiotic in that part discovery/localization is helped by segmentation and, conversely, the segmentation is helped by the detection (e. [sent-16, score-0.996]
5 It also improves over what can be achieved with an analogous system that runs segmentation and part-localization independently. [sent-23, score-0.219]
6 Introduction Fine-grained visual categorization is the task of distinguishing between sub-ordinate categories, e. [sent-25, score-0.22]
7 Several recent works have pointed out two aspects, which distinguish visual categorization at the subordinate level from that at the base level. [sent-28, score-0.304]
8 First, in subordinate classification it often happens that two similar classes can only be distinguished by the appearance of localized and very subtle details (such as the color of the beak for bird classes or the shape of the petal edges for flower classes). [sent-29, score-0.318]
9 Therefore, [5, 24, 32, 34, 35] focused on the localization of these discriminative image parts as a precursor to categorization. [sent-31, score-0.316]
10 Once the discriminative parts are localized, they are encoded into separate parts of the visual signature, enabling the classifier to pick up on the fine differences in those parts. [sent-32, score-0.273]
11 However, [10, 22, 24] demonstrated that at the sub- ordinate category level, the background is seldom discriminative and it is beneficial to segment out the foreground and to discard the visual information in the background. [sent-35, score-0.282]
12 [10] further demonstrated that increasing the accuracy of foreground segmentation at training time directly translates into an increase in accuracy of subordinate-level categorization at test time. [sent-36, score-0.609]
13 In the light of all this evidence, it is natural to investigate the combination of part localization and foreground segmentation for fine-grained categorization, and their interaction in combination is the topic of this work. [sent-37, score-0.633]
14 More interestingly, we demonstrate that the accuracy of fine-grained categorization can be further boosted if part localization and foreground segmentation are performed together, so that the outcomes of both processes aid each other. [sent-39, score-0.909]
15 As a result, better segmentation can be obtained by taking into account part localizations, and, likewise, more semantically meaningful and discriminative parts can be learned and localized if foreground masks are taken into account. [sent-40, score-0.752]
16 We implement this feedback loop via the energy minimization of a joint functional that incorporates the con- sistency between part localization and foreground segmentation as one of the terms. [sent-41, score-0.72]
17 The resulting symbiotic system achieves a better categorization performance compared to the system obtained by a mere concatenation of two visual 321 indicate the provided ground truth bounding box. [sent-42, score-0.884]
18 Middle: GrabCut automatically segments the images using the outside of the given bounding box as background and a prior foreground saliency map for the region inside the bounding box. [sent-44, score-0.584]
19 Bottom: our approach, which trains a symbiotic set of detector templates and saliency maps and applies them jointly to images. [sent-45, score-0.692]
20 Overall, our symbiotic system outperforms the previous state-of-the-art on all datasets considered in our experiments (both the 2010 and 2011version of Caltech-UCSD Birds, and Stanford Dogs). [sent-50, score-0.524]
21 This symbiotic system is the main contribution of the paper. [sent-51, score-0.524]
22 Related Work There is a line of work stretching back over a decade on the interplay between segmentation and detection. [sent-56, score-0.256]
23 In early works, object category detectors simply proposed foreground masks [4, 18]. [sent-57, score-0.297]
24 Later methods used these masks to initialize graph-cuts based segmentations [7] that could take advantage of image specific color distributions, giving crisper and more accurate foreground segmentations [17, 19, 26]. [sent-58, score-0.442]
25 In the poselet line of research [6] the detectors are for parts, rather than for entire categories, but again the poseletdetectors can predict foreground masks for object category detection and segmentation [9, 20]. [sent-59, score-0.526]
26 Whether the parts arise from poselets [35] or are discovered from random initializations [33], there are benefits in comparing objects in finegrained visual categorization tasks at the part level where subtle discriminative features are more evident. [sent-60, score-0.606]
27 We demonstrate, however, that the parts discovered in the absence of supervision are less discriminative than those discovered with the help of the segmentation process as is done in our method. [sent-61, score-0.414]
28 It also accomplishes unsupervised learning of a deformable part model in order to find discriminative parts for fine-grained categorization. [sent-67, score-0.397]
29 An earlier method had used the image as a bounding box for learning a deformable parts model for scene classification [23]. [sent-68, score-0.306]
30 Again, neither of these use segmentation to aid the part learning and localization. [sent-69, score-0.333]
31 In summary, although the synergy between segmentation and detection has long been recognized [16], the interplay between part localization and segmentation has not been investigated in the context of fine-grained categorization (to the best of our knowledge). [sent-70, score-0.894]
32 bird) which includes a deformable part model W and a set S of saliency 322 maps each associated with a part or root of the DPM. [sent-76, score-0.576]
33 The recovered part localizations p and the foreground segmentation f are then used to encode the image content into a highly-discriminative visual signature as discussed in the next section. [sent-79, score-0.762]
34 With the introduction of a third (consistency) energy term EC that takes a pre-trained saliency model S we penalize the cases twakheesre a tphree foreground segmentation wf ea pnden tahlepart locations p do not agree. [sent-81, score-0.59]
35 Deformable part model W = {wt}: here, we use a multicomponent eD pefaortrm maobdlee lP Wart =M {odwel} (:D hPeMre,) [w1e4 ]u consisting of several mixtures of parts, where each part is described by a HOG template and a geometric location prior. [sent-85, score-0.285]
36 We denote the number of mixture components N, and the number of parts in each component M. [sent-86, score-0.214]
37 We omit extra indices for different mixture components and use w0 to describe the root HOG template for each component. [sent-87, score-0.27]
38 wt then denotes the parameters of the t-th part (the HOG template and the geometric prior). [sent-88, score-0.254]
39 Saliency model S = {st}: we associate with the root and eSaaclhie part wt eofl Sthe = d {efso}r:m wabele a part amteod weilt an eex rtorao map st that indicates the foreground probability. [sent-89, score-0.705]
40 Pixels of this saliency map thus have values between −1 and 1, with 1indicating a high cuhsa hnacvee eo vfa thluee pixel being foreground atnhd 1 − in1doithceartiwnigse a. [sent-90, score-0.377]
41 Part localizations p = {pt}: this variable denotes the loPcaatriotn lo (cthalei bounding =b ox { pco}o:rd ithniaste vsa)r ioafb laell d deentoectetsed th parts in an image. [sent-93, score-0.449]
42 The localization of a particular part template wt is denoted pt. [sent-95, score-0.41]
43 The part localizations are shown as colored bounding boxes in the output images of Fig. [sent-96, score-0.45]
44 t where mt (pt, f) is a binary map {−1, 1} clipped from the segmentation mf)a sisk af by trhye m loapca {li−ze1d, part bounding b thoxe pt. [sent-119, score-0.484]
45 This map is resized to the size of a saliency map st, which is denoted as θt. [sent-120, score-0.248]
46 | |mt (pt, f) | |22 is constant for the reason that mt only con|t|amins pixel |v|alues of either −1 or 1and hence the squared norm piisx simply tshe o fnu eimthbeerr − −o1f pixels specified by tshqeu asriezed θt, and does not depend on pt and f. [sent-122, score-0.252]
47 We optimize the cost function (1) using a blockcoordinate-descent pattern, that is, alternating between updating part localizations p while fixing the foreground segmentation f and color c, and vice versa. [sent-123, score-0.718]
48 =0 D(pt, wt, p0) = R(pt, wt) + Qt (pt, p0) (6) R(pt, wt) is the HOG-template filter response map of the t-th root or part template. [sent-129, score-0.262]
49 Qt is a quadratic function of the 323 relative location of the part and the root that penalizes the atypical geometric configurations. [sent-130, score-0.235]
50 Assuming that part localizations p are fixed, the minimization mfinβEGC(f, c|I) + EC(p, f|S) (8) can be accomplished with an appropriately modified GrabCut algorithm. [sent-136, score-0.41]
51 Recall that GrabCut alternates the color model updates and the segmentation updates. [sent-137, score-0.198]
52 Let us now focus on the foreground segmentation update (given part localizations p and the color model c). [sent-139, score-0.718]
53 Thus, the HOG templates for root filters are in the mixture components via latent SVM training (we use a separate unrelated dataset as a source of negative examples; and constrain the root filters to overlap with user-provided boxes by at least 70%). [sent-184, score-0.432]
54 At the same time, we run GrabCut on all training examples (using bounding box annotations), and estimate the root saliency map s0 corresponding to root filters by averaging the segmentation masks (as detailed below). [sent-185, score-0.804]
55 In [14], “interesting” parts are discovered greedily (as discussed in [14]) by covering the high-energy (large gradient magnitude) parts of the root HOG-template. [sent-189, score-0.415]
56 In our case, we modify this interestingness measure by multiplying the HOG magnitude by the root saliency maps estimated for each component. [sent-190, score-0.276]
57 In this way, we constrain the discovery process to parts which overlap substantially with the foreground (as estimated by a GrabCut). [sent-191, score-0.379]
58 We come back to the issue of unsupervised part discovery in the experiments section. [sent-193, score-0.245]
59 Mean accuracy (mA) performance on the three finegrained categorization datasets. [sent-206, score-0.259]
60 Given the part localizatLioeansr nanindg gth teh eG sraabliCenuct segmentations ivoef anll t training images, we set the saliency mask for each part to be the pixel-wise mean of all segmentation masks cutou? [sent-212, score-0.727]
61 The symbiotic model is fitted to images using 5 alternation iterations (the convergence is observed after 3 iterations in most cases). [sent-229, score-0.535]
62 The symbiotic model outputs one binary segmentation and a set of detected part bounding boxes for a given image. [sent-235, score-0.88]
63 , one feature vector, xSEG, for the foreground region in the segmentation, and a feature vector for each of the parts apart from the root template. [sent-238, score-0.423]
64 the foreground and the box of each part) is encoded by: (1) LLC-encoded [29] Lab color histogram vector, and (2) Fisher vector [25] aggregating SIFT features (the implementation [11] was adopted). [sent-244, score-0.269]
65 20992 dims each), no matter how many parts and mixture components are used. [sent-254, score-0.214]
66 The models learned by the symbiotic system for the birds and dogs datasets can be seen in Fig. [sent-271, score-0.793]
67 The relative importance of the model components, as well as the net effect of the “symbiosis” between the segmentation and part localization, are evaluated in Tab. [sent-274, score-0.285]
68 In the table, we compare the categorization accuracy of the systems resulting from applying GrabCut alone or DPM 325 IDModel fittingDescriptormABirdsm11APmABirdsm10APmADogmsAP Table2. [sent-276, score-0.225]
69 975segmntaio produced by the symbiotic model allow for more discriminative signatures than those produced with GrabCut alone (#3 vs. [sent-286, score-0.742]
70 #2), while parts learned and localized by the symbiotic model are more discriminative than those learned and localized by DPM (#5 vs. [sent-287, score-0.739]
71 Finally, categorization with full signatures produced by the symbiotic model is better than categorization based on the concatenation of segmentation-based and part-based signatures produced by GrabCut and DPM run independently (#7 vs #6). [sent-289, score-1.221]
72 All these improvements are due to the fact that part localization and segmentation processes assist each other within the proposed symbiotic model. [sent-290, score-0.949]
73 part localization alone, while keeping the rest of the parameters (initialization, feature encoding, etc. [sent-291, score-0.273]
74 Likewise, the same improvement is observed for part localization, when the segmentation process is used to aid part discovery and fitting, as opposed to using a DPM model on its own (line 5 vs line 4). [sent-294, score-0.557]
75 The interaction between the segmentation and the part localization processes are further shown in Fig. [sent-296, score-0.476]
76 3, we used the same deformable part model W (learned within the symbiotic dmeofodreml) bb ulet pevaarltua mteodd ilt Wwith (l eaanrdn wdit whoituhti nth teh help obfi othtiec segmentation process. [sent-300, score-0.824]
77 In both cases, it can be seen how symbiosis between the part localization and the segmentation improve the performance of each process. [sent-303, score-0.491]
78 We attribute this fact to a greater pose variability for dogs that is harder to cope with for the deformable parts model. [sent-305, score-0.332]
79 At the same time, dogs have a nice roundish shape which makes them very appropriate for GrabCut (so that the aid from the parts localization is not needed in most cases). [sent-306, score-0.442]
80 However, as discussed below, it might hurt the generalization in the categorization step, and es- pecially since we keep the feature dimension of xPART the same. [sent-308, score-0.221]
81 We have further evaluated the influence of the size of the deformable parts model on the categorization accuracy, namely N (number of mixture components) and M (the number of parts per component). [sent-311, score-0.555]
82 While large N may also increase the data fragmentation within some subordinate classes, potentially having large N may also attribute different subordinate classes to different components, thus making the categorization easier. [sent-315, score-0.357]
83 Overall, for the bird datasets, we chose N = 1and M = 4, while N = 2 and M = 4 seems to be more reasonable for the dogs dataset (each DPM mixture component is applied twice (once with mirroring and once without) during training and test). [sent-322, score-0.359]
84 The loss in accuracy with higher number of mixture components indicates that the complexity of a bird pose does not justify more than one mixture component in our model. [sent-346, score-0.275]
85 Only by combining segmentation and part localization (lines 6 and 7 in the table) can we see a consistent benefit from having part localization in the system. [sent-348, score-0.714]
86 One natural question is whether the perfor- mance of part localization is inherently limited or is this a problem with segmentation-supervised and, particularly, unsupervised part discovery? [sent-349, score-0.444]
87 Apart from the bounding boxes, there are 15 part locations annotated per image. [sent-351, score-0.226]
88 Thus, we first made use of the annotated head locations and trained a head detector (which was a mixture of HOG templates). [sent-357, score-0.259]
89 4, the resulting systems were able to surpass the performance of the symbiotic system even when only using the trained head detector. [sent-367, score-0.587]
90 Using ground – truth head localizations, the gap in the achieved accuracy compared to the symbiotic system (and, naturally, all other systems evaluated on this task) becomes very large. [sent-368, score-0.587]
91 Overall, our conclusion here is that part localization has TabldG oeCcTt4a. [sent-369, score-0.273]
92 The top two rows show the results if the head detector is trained using human annotation rather than unsupervised training, while the bottom rows show the accuracies if the head position is given even during test time. [sent-376, score-0.265]
93 Conclusion We have introduced and symbiotic part localization fine-grained categorization. [sent-380, score-0.746]
94 It also opens up new research questions: how can the model be extended from loose bounding box annotation to (even weaker) image level annotation? [sent-382, score-0.22]
95 Top: part localizations using the symbiotically trained DPM, but fitted without the guidance of segmentation. [sent-403, score-0.39]
96 Bottom: the same DPM model fitted with the help of segmentation (i. [sent-404, score-0.23]
97 The last three columns show some failure cases where segmentations hurts part localization. [sent-408, score-0.196]
98 Object detection and segmentation from joint embedding of parts and pixels. [sent-538, score-0.281]
99 Weakly supervised discriminative localization and classification: a joint learning process. [sent-546, score-0.203]
100 Scene recognition and weakly supervised object localization with deformable part-based models. [sent-557, score-0.222]
wordName wordTfidf (topN-words)
[('symbiotic', 0.473), ('dpm', 0.234), ('grabcut', 0.212), ('localizations', 0.211), ('categorization', 0.193), ('foreground', 0.192), ('segmentation', 0.168), ('pt', 0.16), ('saliency', 0.158), ('localization', 0.156), ('birds', 0.144), ('signatures', 0.136), ('dogs', 0.125), ('edpm', 0.124), ('xseg', 0.124), ('root', 0.118), ('part', 0.117), ('parts', 0.113), ('bird', 0.104), ('egc', 0.1), ('xpart', 0.1), ('ux', 0.093), ('mt', 0.092), ('wt', 0.086), ('bounding', 0.08), ('segmentations', 0.079), ('discovery', 0.074), ('ec', 0.073), ('mixture', 0.07), ('subordinate', 0.068), ('deformable', 0.066), ('finegrained', 0.066), ('head', 0.063), ('masks', 0.062), ('fitted', 0.062), ('interplay', 0.055), ('unsupervised', 0.054), ('localized', 0.053), ('template', 0.051), ('annotation', 0.051), ('system', 0.051), ('symb', 0.05), ('symbiosis', 0.05), ('nx', 0.049), ('aid', 0.048), ('st', 0.048), ('box', 0.047), ('discriminative', 0.047), ('signature', 0.046), ('ox', 0.045), ('minimization', 0.044), ('energy', 0.043), ('base', 0.043), ('category', 0.043), ('discovered', 0.043), ('midlevel', 0.042), ('boxes', 0.042), ('loose', 0.042), ('modified', 0.038), ('hog', 0.038), ('rt', 0.038), ('locus', 0.037), ('synergy', 0.037), ('concatenation', 0.036), ('stanford', 0.036), ('resized', 0.036), ('encoding', 0.035), ('processes', 0.035), ('helped', 0.035), ('beak', 0.035), ('consistency', 0.035), ('mirroring', 0.034), ('detector', 0.034), ('lempitsky', 0.034), ('qt', 0.033), ('line', 0.033), ('alone', 0.032), ('bourdev', 0.032), ('components', 0.031), ('color', 0.03), ('translates', 0.03), ('schroff', 0.03), ('locations', 0.029), ('discussed', 0.028), ('chai', 0.028), ('discriminating', 0.028), ('poselet', 0.028), ('attribute', 0.028), ('flower', 0.028), ('map', 0.027), ('templates', 0.027), ('vedaldi', 0.027), ('produced', 0.027), ('poselets', 0.027), ('distinguishing', 0.027), ('saturation', 0.027), ('training', 0.026), ('unchanged', 0.026), ('branson', 0.026), ('wah', 0.026)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999976 411 iccv-2013-Symbiotic Segmentation and Part Localization for Fine-Grained Categorization
Author: Yuning Chai, Victor Lempitsky, Andrew Zisserman
Abstract: We propose a new method for the task of fine-grained visual categorization. The method builds a model of the baselevel category that can be fitted to images, producing highquality foreground segmentation and mid-level part localizations. The model can be learnt from the typical datasets available for fine-grained categorization, where the only annotation provided is a loose bounding box around the instance (e.g. bird) in each image. Both segmentation and part localizations are then used to encode the image content into a highly-discriminative visual signature. The model is symbiotic in that part discovery/localization is helped by segmentation and, conversely, the segmentation is helped by the detection (e.g. part layout). Our model builds on top of the part-based object category detector of Felzenszwalb et al., and also on the powerful GrabCut segmentation algorithm of Rother et al., and adds a simple spatial saliency coupling between them. In our evaluation, the model improves the categorization accuracy over the state-of-the-art. It also improves over what can be achieved with an analogous system that runs segmentation and part-localization independently.
2 0.31496489 107 iccv-2013-Deformable Part Descriptors for Fine-Grained Recognition and Attribute Prediction
Author: Ning Zhang, Ryan Farrell, Forrest Iandola, Trevor Darrell
Abstract: Recognizing objects in fine-grained domains can be extremely challenging due to the subtle differences between subcategories. Discriminative markings are often highly localized, leading traditional object recognition approaches to struggle with the large pose variation often present in these domains. Pose-normalization seeks to align training exemplars, either piecewise by part or globally for the whole object, effectively factoring out differences in pose and in viewing angle. Prior approaches relied on computationally-expensive filter ensembles for part localization and required extensive supervision. This paper proposes two pose-normalized descriptors based on computationally-efficient deformable part models. The first leverages the semantics inherent in strongly-supervised DPM parts. The second exploits weak semantic annotations to learn cross-component correspondences, computing pose-normalized descriptors from the latent parts of a weakly-supervised DPM. These representations enable pooling across pose and viewpoint, in turn facilitating tasks such as fine-grained recognition and attribute prediction. Experiments conducted on the Caltech-UCSD Birds 200 dataset and Berkeley Human Attribute dataset demonstrate significant improvements over state-of-art algorithms.
3 0.21820959 169 iccv-2013-Fine-Grained Categorization by Alignments
Author: E. Gavves, B. Fernando, C.G.M. Snoek, A.W.M. Smeulders, T. Tuytelaars
Abstract: The aim of this paper is fine-grained categorization without human interaction. Different from prior work, which relies on detectors for specific object parts, we propose to localize distinctive details by roughly aligning the objects using just the overall shape, since implicit to fine-grained categorization is the existence of a super-class shape shared among all classes. The alignments are then used to transfer part annotations from training images to test images (supervised alignment), or to blindly yet consistently segment the object in a number of regions (unsupervised alignment). We furthermore argue that in the distinction of finegrained sub-categories, classification-oriented encodings like Fisher vectors are better suited for describing localized information than popular matching oriented features like HOG. We evaluate the method on the CU-2011 Birds and Stanford Dogs fine-grained datasets, outperforming the state-of-the-art.
4 0.19000311 379 iccv-2013-Semantic Segmentation without Annotating Segments
Author: Wei Xia, Csaba Domokos, Jian Dong, Loong-Fah Cheong, Shuicheng Yan
Abstract: Numerous existing object segmentation frameworks commonly utilize the object bounding box as a prior. In this paper, we address semantic segmentation assuming that object bounding boxes are provided by object detectors, but no training data with annotated segments are available. Based on a set of segment hypotheses, we introduce a simple voting scheme to estimate shape guidance for each bounding box. The derived shape guidance is used in the subsequent graph-cut-based figure-ground segmentation. The final segmentation result is obtained by merging the segmentation results in the bounding boxes. We conduct an extensive analysis of the effect of object bounding box accuracy. Comprehensive experiments on both the challenging PASCAL VOC object segmentation dataset and GrabCut50 image segmentation dataset show that the proposed approach achieves competitive results compared to previous detection or bounding box prior based methods, as well as other state-of-the-art semantic segmentation methods.
5 0.18745546 71 iccv-2013-Category-Independent Object-Level Saliency Detection
Author: Yangqing Jia, Mei Han
Abstract: It is known that purely low-level saliency cues such as frequency does not lead to a good salient object detection result, requiring high-level knowledge to be adopted for successful discovery of task-independent salient objects. In this paper, we propose an efficient way to combine such high-level saliency priors and low-level appearance models. We obtain the high-level saliency prior with the objectness algorithm to find potential object candidates without the need of category information, and then enforce the consistency among the salient regions using a Gaussian MRF with the weights scaled by diverse density that emphasizes the influence of potential foreground pixels. Our model obtains saliency maps that assign high scores for the whole salient object, and achieves state-of-the-art performance on benchmark datasets covering various foreground statistics.
6 0.17914358 186 iccv-2013-GrabCut in One Cut
7 0.17487647 62 iccv-2013-Bird Part Localization Using Exemplar-Based Models with Enforced Pose and Subcategory Consistency
8 0.16450244 327 iccv-2013-Predicting an Object Location Using a Global Image Representation
9 0.16441616 198 iccv-2013-Hierarchical Part Matching for Fine-Grained Visual Categorization
10 0.15963137 372 iccv-2013-Saliency Detection via Dense and Sparse Reconstruction
11 0.1516905 326 iccv-2013-Predicting Sufficient Annotation Strength for Interactive Foreground Segmentation
12 0.15123172 225 iccv-2013-Joint Segmentation and Pose Tracking of Human in Natural Videos
13 0.15047127 236 iccv-2013-Learning Discriminative Part Detectors for Image Classification and Cosegmentation
14 0.14576715 396 iccv-2013-Space-Time Robust Representation for Action Recognition
15 0.13796881 111 iccv-2013-Detecting Dynamic Objects with Multi-view Background Subtraction
16 0.1329881 426 iccv-2013-Training Deformable Part Models with Decorrelated Features
17 0.13238853 377 iccv-2013-Segmentation Driven Object Detection with Fisher Vectors
18 0.12846836 204 iccv-2013-Human Attribute Recognition by Rich Appearance Dictionary
19 0.12289812 91 iccv-2013-Contextual Hypergraph Modeling for Salient Object Detection
20 0.12037687 66 iccv-2013-Building Part-Based Object Detectors via 3D Geometry
topicId topicWeight
[(0, 0.252), (1, 0.018), (2, 0.186), (3, -0.11), (4, 0.101), (5, -0.046), (6, -0.113), (7, 0.082), (8, -0.023), (9, -0.081), (10, 0.078), (11, 0.124), (12, -0.056), (13, -0.177), (14, -0.095), (15, -0.036), (16, 0.061), (17, 0.074), (18, 0.037), (19, -0.048), (20, 0.101), (21, 0.058), (22, -0.036), (23, 0.048), (24, 0.007), (25, 0.138), (26, -0.024), (27, -0.061), (28, 0.088), (29, 0.003), (30, 0.079), (31, -0.084), (32, -0.009), (33, -0.044), (34, -0.073), (35, 0.001), (36, -0.025), (37, 0.073), (38, 0.028), (39, 0.047), (40, -0.014), (41, -0.009), (42, 0.078), (43, -0.004), (44, 0.004), (45, 0.01), (46, 0.013), (47, -0.025), (48, 0.034), (49, -0.035)]
simIndex simValue paperId paperTitle
same-paper 1 0.96556914 411 iccv-2013-Symbiotic Segmentation and Part Localization for Fine-Grained Categorization
Author: Yuning Chai, Victor Lempitsky, Andrew Zisserman
Abstract: We propose a new method for the task of fine-grained visual categorization. The method builds a model of the baselevel category that can be fitted to images, producing highquality foreground segmentation and mid-level part localizations. The model can be learnt from the typical datasets available for fine-grained categorization, where the only annotation provided is a loose bounding box around the instance (e.g. bird) in each image. Both segmentation and part localizations are then used to encode the image content into a highly-discriminative visual signature. The model is symbiotic in that part discovery/localization is helped by segmentation and, conversely, the segmentation is helped by the detection (e.g. part layout). Our model builds on top of the part-based object category detector of Felzenszwalb et al., and also on the powerful GrabCut segmentation algorithm of Rother et al., and adds a simple spatial saliency coupling between them. In our evaluation, the model improves the categorization accuracy over the state-of-the-art. It also improves over what can be achieved with an analogous system that runs segmentation and part-localization independently.
2 0.82573533 169 iccv-2013-Fine-Grained Categorization by Alignments
Author: E. Gavves, B. Fernando, C.G.M. Snoek, A.W.M. Smeulders, T. Tuytelaars
Abstract: The aim of this paper is fine-grained categorization without human interaction. Different from prior work, which relies on detectors for specific object parts, we propose to localize distinctive details by roughly aligning the objects using just the overall shape, since implicit to fine-grained categorization is the existence of a super-class shape shared among all classes. The alignments are then used to transfer part annotations from training images to test images (supervised alignment), or to blindly yet consistently segment the object in a number of regions (unsupervised alignment). We furthermore argue that in the distinction of finegrained sub-categories, classification-oriented encodings like Fisher vectors are better suited for describing localized information than popular matching oriented features like HOG. We evaluate the method on the CU-2011 Birds and Stanford Dogs fine-grained datasets, outperforming the state-of-the-art.
3 0.79272473 107 iccv-2013-Deformable Part Descriptors for Fine-Grained Recognition and Attribute Prediction
Author: Ning Zhang, Ryan Farrell, Forrest Iandola, Trevor Darrell
Abstract: Recognizing objects in fine-grained domains can be extremely challenging due to the subtle differences between subcategories. Discriminative markings are often highly localized, leading traditional object recognition approaches to struggle with the large pose variation often present in these domains. Pose-normalization seeks to align training exemplars, either piecewise by part or globally for the whole object, effectively factoring out differences in pose and in viewing angle. Prior approaches relied on computationally-expensive filter ensembles for part localization and required extensive supervision. This paper proposes two pose-normalized descriptors based on computationally-efficient deformable part models. The first leverages the semantics inherent in strongly-supervised DPM parts. The second exploits weak semantic annotations to learn cross-component correspondences, computing pose-normalized descriptors from the latent parts of a weakly-supervised DPM. These representations enable pooling across pose and viewpoint, in turn facilitating tasks such as fine-grained recognition and attribute prediction. Experiments conducted on the Caltech-UCSD Birds 200 dataset and Berkeley Human Attribute dataset demonstrate significant improvements over state-of-art algorithms.
4 0.77012962 198 iccv-2013-Hierarchical Part Matching for Fine-Grained Visual Categorization
Author: Lingxi Xie, Qi Tian, Richang Hong, Shuicheng Yan, Bo Zhang
Abstract: As a special topic in computer vision, , fine-grained visual categorization (FGVC) has been attracting growing attention these years. Different with traditional image classification tasks in which objects have large inter-class variation, the visual concepts in the fine-grained datasets, such as hundreds of bird species, often have very similar semantics. Due to the large inter-class similarity, it is very difficult to classify the objects without locating really discriminative features, therefore it becomes more important for the algorithm to make full use of the part information in order to train a robust model. In this paper, we propose a powerful flowchart named Hierarchical Part Matching (HPM) to cope with finegrained classification tasks. We extend the Bag-of-Features (BoF) model by introducing several novel modules to integrate into image representation, including foreground inference and segmentation, Hierarchical Structure Learn- ing (HSL), and Geometric Phrase Pooling (GPP). We verify in experiments that our algorithm achieves the state-ofthe-art classification accuracy in the Caltech-UCSD-Birds200-2011 dataset by making full use of the ground-truth part annotations.
5 0.75683367 379 iccv-2013-Semantic Segmentation without Annotating Segments
Author: Wei Xia, Csaba Domokos, Jian Dong, Loong-Fah Cheong, Shuicheng Yan
Abstract: Numerous existing object segmentation frameworks commonly utilize the object bounding box as a prior. In this paper, we address semantic segmentation assuming that object bounding boxes are provided by object detectors, but no training data with annotated segments are available. Based on a set of segment hypotheses, we introduce a simple voting scheme to estimate shape guidance for each bounding box. The derived shape guidance is used in the subsequent graph-cut-based figure-ground segmentation. The final segmentation result is obtained by merging the segmentation results in the bounding boxes. We conduct an extensive analysis of the effect of object bounding box accuracy. Comprehensive experiments on both the challenging PASCAL VOC object segmentation dataset and GrabCut50 image segmentation dataset show that the proposed approach achieves competitive results compared to previous detection or bounding box prior based methods, as well as other state-of-the-art semantic segmentation methods.
6 0.70346004 236 iccv-2013-Learning Discriminative Part Detectors for Image Classification and Cosegmentation
7 0.69976044 377 iccv-2013-Segmentation Driven Object Detection with Fisher Vectors
8 0.68481553 179 iccv-2013-From Subcategories to Visual Composites: A Multi-level Framework for Object Detection
9 0.67275876 327 iccv-2013-Predicting an Object Location Using a Global Image Representation
10 0.67224044 349 iccv-2013-Regionlets for Generic Object Detection
11 0.67211562 326 iccv-2013-Predicting Sufficient Annotation Strength for Interactive Foreground Segmentation
12 0.66689372 109 iccv-2013-Detecting Avocados to Zucchinis: What Have We Done, and Where Are We Going?
13 0.66068757 104 iccv-2013-Decomposing Bag of Words Histograms
14 0.65098989 186 iccv-2013-GrabCut in One Cut
15 0.64296263 62 iccv-2013-Bird Part Localization Using Exemplar-Based Models with Enforced Pose and Subcategory Consistency
16 0.64145285 390 iccv-2013-Shufflets: Shared Mid-level Parts for Fast Object Detection
17 0.63773251 74 iccv-2013-Co-segmentation by Composition
18 0.62899351 330 iccv-2013-Proportion Priors for Image Sequence Segmentation
19 0.62896711 202 iccv-2013-How Do You Tell a Blackbird from a Crow?
20 0.61755615 208 iccv-2013-Image Co-segmentation via Consistent Functional Maps
topicId topicWeight
[(2, 0.07), (4, 0.027), (7, 0.017), (13, 0.015), (25, 0.158), (26, 0.128), (31, 0.057), (34, 0.04), (35, 0.022), (40, 0.012), (42, 0.081), (64, 0.061), (73, 0.035), (89, 0.17), (95, 0.013), (98, 0.013)]
simIndex simValue paperId paperTitle
1 0.88141239 135 iccv-2013-Efficient Image Dehazing with Boundary Constraint and Contextual Regularization
Author: Gaofeng Meng, Ying Wang, Jiangyong Duan, Shiming Xiang, Chunhong Pan
Abstract: unkown-abstract
2 0.86764443 234 iccv-2013-Learning CRFs for Image Parsing with Adaptive Subgradient Descent
Author: Honghui Zhang, Jingdong Wang, Ping Tan, Jinglu Wang, Long Quan
Abstract: We propose an adaptive subgradient descent method to efficiently learn the parameters of CRF models for image parsing. To balance the learning efficiency and performance of the learned CRF models, the parameter learning is iteratively carried out by solving a convex optimization problem in each iteration, which integrates a proximal term to preserve the previously learned information and the large margin preference to distinguish bad labeling and the ground truth labeling. A solution of subgradient descent updating form is derived for the convex optimization problem, with an adaptively determined updating step-size. Besides, to deal with partially labeled training data, we propose a new objective constraint modeling both the labeled and unlabeled parts in the partially labeled training data for the parameter learning of CRF models. The superior learning efficiency of the proposed method is verified by the experiment results on two public datasets. We also demonstrate the powerfulness of our method for handling partially labeled training data.
same-paper 3 0.86119866 411 iccv-2013-Symbiotic Segmentation and Part Localization for Fine-Grained Categorization
Author: Yuning Chai, Victor Lempitsky, Andrew Zisserman
Abstract: We propose a new method for the task of fine-grained visual categorization. The method builds a model of the baselevel category that can be fitted to images, producing highquality foreground segmentation and mid-level part localizations. The model can be learnt from the typical datasets available for fine-grained categorization, where the only annotation provided is a loose bounding box around the instance (e.g. bird) in each image. Both segmentation and part localizations are then used to encode the image content into a highly-discriminative visual signature. The model is symbiotic in that part discovery/localization is helped by segmentation and, conversely, the segmentation is helped by the detection (e.g. part layout). Our model builds on top of the part-based object category detector of Felzenszwalb et al., and also on the powerful GrabCut segmentation algorithm of Rother et al., and adds a simple spatial saliency coupling between them. In our evaluation, the model improves the categorization accuracy over the state-of-the-art. It also improves over what can be achieved with an analogous system that runs segmentation and part-localization independently.
4 0.85722566 211 iccv-2013-Image Segmentation with Cascaded Hierarchical Models and Logistic Disjunctive Normal Networks
Author: Mojtaba Seyedhosseini, Mehdi Sajjadi, Tolga Tasdizen
Abstract: Contextual information plays an important role in solving vision problems such as image segmentation. However, extracting contextual information and using it in an effective way remains a difficult problem. To address this challenge, we propose a multi-resolution contextual framework, called cascaded hierarchical model (CHM), which learns contextual information in a hierarchical framework for image segmentation. At each level of the hierarchy, a classifier is trained based on downsampled input images and outputs of previous levels. Our model then incorporates the resulting multi-resolution contextual information into a classifier to segment the input image at original resolution. We repeat this procedure by cascading the hierarchical framework to improve the segmentation accuracy. Multiple classifiers are learned in the CHM; therefore, a fast and accurate classifier is required to make the training tractable. The classifier also needs to be robust against overfitting due to the large number of parameters learned during training. We introduce a novel classification scheme, called logistic dis- junctive normal networks (LDNN), which consists of one adaptive layer of feature detectors implemented by logistic sigmoid functions followed by two fixed layers of logical units that compute conjunctions and disjunctions, respectively. We demonstrate that LDNN outperforms state-of-theart classifiers and can be used in the CHM to improve object segmentation performance.
5 0.84727716 307 iccv-2013-Parallel Transport of Deformations in Shape Space of Elastic Surfaces
Author: Qian Xie, Sebastian Kurtek, Huiling Le, Anuj Srivastava
Abstract: Statistical shape analysis develops methods for comparisons, deformations, summarizations, and modeling of shapes in given data sets. These tasks require afundamental tool called parallel transport of tangent vectors along arbitrary paths. This tool is essential for: (1) computation of geodesic paths using either shooting or path-straightening method, (2) transferring deformations across objects, and (3) modeling of statistical variability in shapes. Using the square-root normal field (SRNF) representation of parameterized surfaces, we present a method for transporting deformations along paths in the shape space. This is difficult despite the underlying space being a vector space because the chosen (elastic) Riemannian metric is non-standard. Using a finite-basis for representing SRNFs of shapes, we derive expressions for Christoffel symbols that enable parallel transports. We demonstrate this framework using examples from shape analysis of parameterized spherical sur- faces, in the three contexts mentioned above.
6 0.83860946 30 iccv-2013-A Simple Model for Intrinsic Image Decomposition with Depth Cues
7 0.81999534 414 iccv-2013-Temporally Consistent Superpixels
8 0.81236827 107 iccv-2013-Deformable Part Descriptors for Fine-Grained Recognition and Attribute Prediction
9 0.81208003 95 iccv-2013-Cosegmentation and Cosketch by Unsupervised Learning
10 0.80878961 156 iccv-2013-Fast Direct Super-Resolution by Simple Functions
11 0.80712283 102 iccv-2013-Data-Driven 3D Primitives for Single Image Understanding
12 0.80678582 8 iccv-2013-A Deformable Mixture Parsing Model with Parselets
13 0.80675161 150 iccv-2013-Exemplar Cut
14 0.8042627 312 iccv-2013-Perceptual Fidelity Aware Mean Squared Error
15 0.80256367 420 iccv-2013-Topology-Constrained Layered Tracking with Latent Flow
16 0.80243188 326 iccv-2013-Predicting Sufficient Annotation Strength for Interactive Foreground Segmentation
17 0.80206758 196 iccv-2013-Hierarchical Data-Driven Descent for Efficient Optimal Deformation Estimation
18 0.80163854 423 iccv-2013-Towards Motion Aware Light Field Video for Dynamic Scenes
19 0.80075449 180 iccv-2013-From Where and How to What We See
20 0.79952055 295 iccv-2013-On One-Shot Similarity Kernels: Explicit Feature Maps and Properties