nips nips2013 nips2013-83 nips2013-83-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Karen Simonyan, Andrea Vedaldi, Andrew Zisserman
Abstract: As massively parallel computations have become broadly available with modern GPUs, deep architectures trained on very large datasets have risen in popularity. Discriminatively trained convolutional neural networks, in particular, were recently shown to yield state-of-the-art performance in challenging image classification benchmarks such as ImageNet. However, elements of these architectures are similar to standard hand-crafted representations used in computer vision. In this paper, we explore the extent of this analogy, proposing a version of the stateof-the-art Fisher vector image encoding that can be stacked in multiple layers. This architecture significantly improves on standard Fisher vectors, and obtains competitive results with deep convolutional networks at a smaller computational learning cost. Our hybrid architecture allows us to assess how the performance of a conventional hand-crafted image classification pipeline changes with increased depth. We also show that convolutional networks and Fisher vector encodings are complementary in the sense that their combination further improves the accuracy. 1
[1] A. Agarwal and B. Triggs. Hyperfeatures - multilevel local coding for visual recognition. In Proc. ECCV, pages 30–43, 2006.
[2] R. Arandjelovi´ and A. Zisserman. Three things everyone should know to improve object retrieval. In c Proc. CVPR, 2012.
[3] Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle. Greedy layer-wise training of deep networks. In NIPS, pages 153–160, 2006.
[4] A Berg, J Deng, and L Fei-Fei. Large scale visual recognition challenge (ILSVRC), 2010. URL http: //www.image-net.org/challenges/LSVRC/2010/.
[5] J. Carreira, R. Caseiro, J. Batista, and C. Sminchisescu. Semantic segmentation with second-order pooling. In Proc. ECCV, pages 430–443, 2012.
[6] K. Chatfield, V. Lempitsky, A. Vedaldi, and A. Zisserman. The devil is in the details: an evaluation of recent feature encoding methods. In Proc. BMVC., 2011. 8
[7] D. C. Ciresan, U. Meier, and J. Schmidhuber. Multi-column deep neural networks for image classification. In Proc. CVPR, pages 3642–3649, 2012.
[8] A. Coates, A. Y. Ng, and H. Lee. An analysis of single-layer networks in unsupervised feature learning. In Proc. AISTATS, 2011.
[9] G. Csurka, C. Bray, C. Dance, and L. Fan. Visual categorization with bags of keypoints. In Workshop on Statistical Learning in Computer Vision, ECCV, pages 1–22, 2004.
[10] A. Gordo, J. A. Rodr´guez-Serrano, F. Perronnin, and E. Valveny. Leveraging category-level labels for ı instance-level image retrieval. In Proc. CVPR, pages 3045–3052, 2012.
[11] B. Hariharan, J. Malik, and D. Ramanan. Discriminative decorrelation for clustering and classification. In Proc. ECCV, 2012.
[12] G. E. Hinton, S. Osindero, and Y. W. Teh. A fast learning algorithm for deep belief nets. Neural Computation, 18(7):1527–1554, 2006.
[13] T. Jaakkola and D. Haussler. Exploiting generative models in discriminative classifiers. In NIPS, pages 487–493, 1998.
[14] A. Krizhevsky, I. Sutskever, and G. E. Hinton. ImageNet classification with deep convolutional neural networks. In NIPS, pages 1106–1114, 2012.
[15] C. H. Lampert, H. Nickisch, and S. Harmeling. Learning to detect unseen object classes by between-class attribute transfer. In Proc. CVPR, pages 951–958, 2009.
[16] S. Lazebnik, C. Schmid, and J Ponce. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories. In Proc. CVPR, 2006.
[17] Q. Le, M. Ranzato, R. Monga, M. Devin, K. Chen, G. Corrado, J. Dean, and A. Ng. Building high-level features using large scale unsupervised learning. In Proc. ICML, 2012.
[18] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
[19] D. Lowe. Object recognition from local scale-invariant features. In Proc. ICCV, pages 1150–1157, Sep 1999.
[20] F. Perronnin, J. S´ nchez, and T. Mensink. Improving the Fisher kernel for large-scale image classification. a In Proc. ECCV, 2010.
[21] F. Perronnin, Z. Akata, Z. Harchaoui, and C. Schmid. Towards good practice in large-scale learning for image classification. In Proc. CVPR, pages 3482–3489, 2012.
[22] M. Riesenhuber and T. Poggio. Hierarchical models of object recognition in cortex. Nature Neuroscience, 2(11):1019–1025, 1999.
[23] J. S´ nchez and F. Perronnin. High-dimensional signature compression for large-scale image classification. a In Proc. CVPR, 2011.
[24] J. S´ nchez, F. Perronnin, and T. Em´dio de Campos. Modeling the spatial layout of images beyond spatial a ı pyramids. Pattern Recognition Letters, 33(16):2216–2223, 2012.
[25] P. Sermanet and Y. LeCun. Traffic sign recognition with multi-scale convolutional networks. In International Joint Conference on Neural Networks, pages 2809–2813, 2011.
[26] T. Serre, L. Wolf, and T. Poggio. A new biologically motivated framework for robust object recognition. Proc. CVPR, 2005.
[27] S. Shalev-Shwartz, Y. Singer, and N. Srebro. Pegasos: Primal estimated sub-gradient SOlver for SVM. In Proc. ICML, volume 227, 2007.
[28] K. Simonyan, O. M. Parkhi, A. Vedaldi, and A. Zisserman. Fisher Vector Faces in the Wild. In Proc. BMVC., 2013.
[29] J. Sivic and A. Zisserman. Video Google: A text retrieval approach to object matching in videos. In Proc. ICCV, volume 2, pages 1470–1477, 2003.
[30] L. Torresani, M. Szummer, and A. Fitzgibbon. Efficient object category recognition using classemes. In Proc. ECCV, pages 776–789, sep 2010.
[31] J. Weston, S. Bengio, and N. Usunier. WSABIE: Scaling up to large vocabulary image annotation. In Proc. IJCAI, pages 2764–2770, 2011.
[32] S. Yan, X. Xu, D. Xu, S. Lin, and X. Li. Beyond spatial pyramids: A new feature extraction framework with dense spatial sampling for image classification. In Proc. ECCV, pages 473–487, 2012.
[33] J. Yang, K. Yu, Y. Gong, and T. S. Huang. Linear spatial pyramid matching using sparse coding for image classification. In Proc. CVPR, pages 1794–1801, 2009. 9