nips nips2013 nips2013-83 nips2013-83-reference knowledge-graph by maker-knowledge-mining

83 nips-2013-Deep Fisher Networks for Large-Scale Image Classification

Source: pdf

Author: Karen Simonyan, Andrea Vedaldi, Andrew Zisserman

Abstract: As massively parallel computations have become broadly available with modern GPUs, deep architectures trained on very large datasets have risen in popularity. Discriminatively trained convolutional neural networks, in particular, were recently shown to yield state-of-the-art performance in challenging image classiﬁcation benchmarks such as ImageNet. However, elements of these architectures are similar to standard hand-crafted representations used in computer vision. In this paper, we explore the extent of this analogy, proposing a version of the stateof-the-art Fisher vector image encoding that can be stacked in multiple layers. This architecture signiﬁcantly improves on standard Fisher vectors, and obtains competitive results with deep convolutional networks at a smaller computational learning cost. Our hybrid architecture allows us to assess how the performance of a conventional hand-crafted image classiﬁcation pipeline changes with increased depth. We also show that convolutional networks and Fisher vector encodings are complementary in the sense that their combination further improves the accuracy. 1

reference text

[1] A. Agarwal and B. Triggs. Hyperfeatures - multilevel local coding for visual recognition. In Proc. ECCV, pages 30–43, 2006.

[2] R. Arandjelovi´ and A. Zisserman. Three things everyone should know to improve object retrieval. In c Proc. CVPR, 2012.

[3] Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle. Greedy layer-wise training of deep networks. In NIPS, pages 153–160, 2006.

[4] A Berg, J Deng, and L Fei-Fei. Large scale visual recognition challenge (ILSVRC), 2010. URL http: //www.image-net.org/challenges/LSVRC/2010/.

[5] J. Carreira, R. Caseiro, J. Batista, and C. Sminchisescu. Semantic segmentation with second-order pooling. In Proc. ECCV, pages 430–443, 2012.

[6] K. Chatﬁeld, V. Lempitsky, A. Vedaldi, and A. Zisserman. The devil is in the details: an evaluation of recent feature encoding methods. In Proc. BMVC., 2011. 8

[7] D. C. Ciresan, U. Meier, and J. Schmidhuber. Multi-column deep neural networks for image classiﬁcation. In Proc. CVPR, pages 3642–3649, 2012.

[8] A. Coates, A. Y. Ng, and H. Lee. An analysis of single-layer networks in unsupervised feature learning. In Proc. AISTATS, 2011.

[9] G. Csurka, C. Bray, C. Dance, and L. Fan. Visual categorization with bags of keypoints. In Workshop on Statistical Learning in Computer Vision, ECCV, pages 1–22, 2004.

[10] A. Gordo, J. A. Rodr´guez-Serrano, F. Perronnin, and E. Valveny. Leveraging category-level labels for ı instance-level image retrieval. In Proc. CVPR, pages 3045–3052, 2012.

[11] B. Hariharan, J. Malik, and D. Ramanan. Discriminative decorrelation for clustering and classiﬁcation. In Proc. ECCV, 2012.

[12] G. E. Hinton, S. Osindero, and Y. W. Teh. A fast learning algorithm for deep belief nets. Neural Computation, 18(7):1527–1554, 2006.

[13] T. Jaakkola and D. Haussler. Exploiting generative models in discriminative classiﬁers. In NIPS, pages 487–493, 1998.

[14] A. Krizhevsky, I. Sutskever, and G. E. Hinton. ImageNet classiﬁcation with deep convolutional neural networks. In NIPS, pages 1106–1114, 2012.

[15] C. H. Lampert, H. Nickisch, and S. Harmeling. Learning to detect unseen object classes by between-class attribute transfer. In Proc. CVPR, pages 951–958, 2009.

[16] S. Lazebnik, C. Schmid, and J Ponce. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories. In Proc. CVPR, 2006.

[17] Q. Le, M. Ranzato, R. Monga, M. Devin, K. Chen, G. Corrado, J. Dean, and A. Ng. Building high-level features using large scale unsupervised learning. In Proc. ICML, 2012.

[18] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.

[19] D. Lowe. Object recognition from local scale-invariant features. In Proc. ICCV, pages 1150–1157, Sep 1999.

[20] F. Perronnin, J. S´ nchez, and T. Mensink. Improving the Fisher kernel for large-scale image classiﬁcation. a In Proc. ECCV, 2010.

[21] F. Perronnin, Z. Akata, Z. Harchaoui, and C. Schmid. Towards good practice in large-scale learning for image classiﬁcation. In Proc. CVPR, pages 3482–3489, 2012.

[22] M. Riesenhuber and T. Poggio. Hierarchical models of object recognition in cortex. Nature Neuroscience, 2(11):1019–1025, 1999.

[23] J. S´ nchez and F. Perronnin. High-dimensional signature compression for large-scale image classiﬁcation. a In Proc. CVPR, 2011.

[24] J. S´ nchez, F. Perronnin, and T. Em´dio de Campos. Modeling the spatial layout of images beyond spatial a ı pyramids. Pattern Recognition Letters, 33(16):2216–2223, 2012.

[25] P. Sermanet and Y. LeCun. Trafﬁc sign recognition with multi-scale convolutional networks. In International Joint Conference on Neural Networks, pages 2809–2813, 2011.

[26] T. Serre, L. Wolf, and T. Poggio. A new biologically motivated framework for robust object recognition. Proc. CVPR, 2005.

[27] S. Shalev-Shwartz, Y. Singer, and N. Srebro. Pegasos: Primal estimated sub-gradient SOlver for SVM. In Proc. ICML, volume 227, 2007.

[28] K. Simonyan, O. M. Parkhi, A. Vedaldi, and A. Zisserman. Fisher Vector Faces in the Wild. In Proc. BMVC., 2013.

[29] J. Sivic and A. Zisserman. Video Google: A text retrieval approach to object matching in videos. In Proc. ICCV, volume 2, pages 1470–1477, 2003.

[30] L. Torresani, M. Szummer, and A. Fitzgibbon. Efﬁcient object category recognition using classemes. In Proc. ECCV, pages 776–789, sep 2010.

[31] J. Weston, S. Bengio, and N. Usunier. WSABIE: Scaling up to large vocabulary image annotation. In Proc. IJCAI, pages 2764–2770, 2011.

[32] S. Yan, X. Xu, D. Xu, S. Lin, and X. Li. Beyond spatial pyramids: A new feature extraction framework with dense spatial sampling for image classiﬁcation. In Proc. ECCV, pages 473–487, 2012.

[33] J. Yang, K. Yu, Y. Gong, and T. S. Huang. Linear spatial pyramid matching using sparse coding for image classiﬁcation. In Proc. CVPR, pages 1794–1801, 2009. 9