iccv iccv2013 iccv2013-169 iccv2013-169-reference knowledge-graph by maker-knowledge-mining

169 iccv-2013-Fine-Grained Categorization by Alignments

Source: pdf

Author: E. Gavves, B. Fernando, C.G.M. Snoek, A.W.M. Smeulders, T. Tuytelaars

Abstract: The aim of this paper is fine-grained categorization without human interaction. Different from prior work, which relies on detectors for specific object parts, we propose to localize distinctive details by roughly aligning the objects using just the overall shape, since implicit to fine-grained categorization is the existence of a super-class shape shared among all classes. The alignments are then used to transfer part annotations from training images to test images (supervised alignment), or to blindly yet consistently segment the object in a number of regions (unsupervised alignment). We furthermore argue that in the distinction of finegrained sub-categories, classification-oriented encodings like Fisher vectors are better suited for describing localized information than popular matching oriented features like HOG. We evaluate the method on the CU-2011 Birds and Stanford Dogs fine-grained datasets, outperforming the state-of-the-art.

reference text

[1] T. Berg and P. N. Belhumeur. POOF: Part-based one-vs-one features for fine-grained categorization, face verification, and attribute estimation. In CVPR, 2013.

[2] L. Bo, X. Ren, and D. Fox. Kernel descriptors for visual recognition. In NIPS, 2010.

[3] S. Branson, P. Perona, and S. Belongie. Strong supervision from weak annotation: Interactive training of deformable part models. In ICCV, 2011.

[4] S. Branson, C. Wah, F. Schroff, B. Babenko, P. Welinder, P. Perona, and S. Belongie. Visual recognition with humans in the loop. In ECCV, 2010.

[5] Y. Chai, V. Lempitsky, and A. Zisserman. Bicos: A bi-level cosegmentation method for image classification. In ICCV, 2011.

[6] Y. Chai, E. Rahtu, V. Lempitsky, L. Van Gool, and A. Zisserman. Tricos: a tri-level class-discriminative co-segmentation method for image classification. In ECCV, 2012.

[7] N. Dalal and B. Triggs. Histograms of oriented gradients for human

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23] detection. In CVPR, 2005. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. ImageNet: A Large-Scale Hierarchical Image Database. In CVPR, 2009. K. Duan, D. Parikh, D. Crandall, and K. Grauman. Discovering localized attributes for fine-grained recognition. In CVPR, 2012. R. Farrell, O. Oza, N. Zhang, V. I. Morariu, T. Darrell, and L. S. Davis. Birdlets: Subordinate categorization using volumetric primitives and pose-normalized appearance. In ICCV, 2011. P. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan. Object detection with discriminatively trained part-based models. PAMI, 32(9): 1627–1645, 2010. Y. Jia, O. Vinyals, and T. Darrell. Pooling-invariant image feature learning. Technical report, 2013. arXiv: 1302.5056. F. P. Jorge Sanchez and Z. Akata. Fisher vectors for fine-grained visual categorization. In CVPR, 2011. R. Khan, J. Van de Weijer, F. S. Khan, D. Muselet, C. Ducottet, and C. Barat. Discriminative color descriptors. In CVPR, 2013. A. Khosla, N. Jayadevaprakash, B. Yao, and L. Fei-Fei. Novel dataset for fine-grained image categorization. In CVPR, FGVC workshop, 2011. J. Liu, A. Kanazawa, D. Jacobs, and P. Belhumeur. Dog breed classification using part localization. In ECCV, 2012. D. Lowe. Distinctive image features from scale-invariant keypoints. IJCV, 60(2):91–1 10, 2004. K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas, F. Schaffalitzky, T. Kadir, and L. Van Gool. A comparison of affine region detectors. IJCV, 65(1-2):43–72, 2005. M.-E. Nilsback and A. Zisserman. Automated flower classification over a large number of classes. In ICVGIP, 2008. D. Parikh and K. Grauman. Interactive discovery of task-specific nameable attributes. In CVPR, FGVC workshop, 2011. O. M. Parkhi, A. Vedaldi, A. Zisserman, and C. Jawahar. Cats and dogs. In CVPR, 2012. M. Perd o´ch, O. Chum, and J. Matas. Efficient representation of local geometry for large scale object retrieval. In CVPR, 2009. F. Perronnin, J. S ´anchez, and T. Mensink. Improving the fisher kernel

[24]

[25]

[26]

[27]

[28]

[29]

[30] [3 1]

[32]

[33]

[34] for large-scale image classification. In ECCV, 2010. E. Rosch, C. B. Mervis, W. D. Gray, D. M. Johnson, and P. BoyesBraem. Basic objects in natural categories. Cogn. psych., 8(3):382– 439, 1976. C. Rother, V. Kolmogorov, and A. Blake. Grabcut: Interactive foreground extraction using iterated graph cuts. ACM TOG, 23(3):309– 314, 2004. S. Shalev-Shwartz, Y. Singer, and N. Srebro. Pegasos: Primal estimated sub-gradient solver for svm. In ICML, 2007. K. Van De Sande, T. Gevers, and C. Snoek. Evaluating color descriptors for object and scene recognition. PAMI, 2010. A. Vedaldi and B. Fulkerson. Vlfeat: An open and portable library of computer vision algorithms. In MM, 2010. C. Wah, S. Branson, P. Perona, and S. Belongie. Multiclass recognition and part localization with humans in the loop. In ICCV, 2011. C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie. The caltech-ucsd birds-200-201 1 dataset. Technical report, 2011. S. Yang, L. Bo, J. Wang, and L. Shapiro. Unsupervised template learning for fine-grained object recognition. In NIPS, 2012. B. Yao, G. Bradski, and L. Fei-Fei. A codebook-free and annotationfree approach for fine-grained image categorization. In CVPR, 2012. B. Yao, A. Khosla, and L. Fei-Fei. Combining randomization and discrimination for fine-grained image categorization. In CVPR, 2011. N. Zhang, R. Farrell, and T. Darrell. Pose pooling kernels for subcategory recognition. In CVPR, 2012. 11772200