cvpr cvpr2013 cvpr2013-200 cvpr2013-200-reference knowledge-graph by maker-knowledge-mining

200 cvpr-2013-Harvesting Mid-level Visual Concepts from Large-Scale Internet Images

Source: pdf

Author: Quannan Li, Jiajun Wu, Zhuowen Tu

Abstract: Obtaining effective mid-level representations has become an increasingly important task in computer vision. In this paper, we propose a fully automatic algorithm which harvests visual concepts from a large number of Internet images (more than a quarter of a million) using text-based queries. Existing approaches to visual concept learning from Internet images either rely on strong supervision with detailed manual annotations or learn image-level classifiers only. Here, we take the advantage of having massive wellorganized Google and Bing image data; visual concepts (around 14, 000) are automatically exploited from images using word-based queries. Using the learned visual concepts, we show state-of-the-art performances on a variety of benchmark datasets, which demonstrate the effectiveness of the learned mid-level representations: being able to generalize well to general natural images. Our method shows significant improvement over the competing systems in image classification, including those with strong supervision.

reference text

[1] S. Andrews, I. Tsochantaridis, and T. Hofmann. Support vector machines for multiple-instance learning. In NIPS, pages 561–568, 2002.

[2] F. R. Bach, G. R. G. Lanckriet, and M. I. Jordan. Multiple kernel learning, conic duality, and the smo algorithm. In ICML, 2004.

[3] K. Chatfield, V. Lempitsky, A. Vedaldi, and A. Zisserman. The devil is in the details: an evaluation of recent feature

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13] encoding methods. In British Machine Vision Conference, 2011. N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In CVPR (1), pages 886–893, 2005. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. ImageNet: A Large-Scale Hierarchical Image Database. In CVPR09, 2009. M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results. http://www.pascalnetwork.org/challenges/VOC/voc2007/workshop/index.html. R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.J. Lin. Liblinear: A library for large linear classification. Journal of Machine Learning Research, 9: 1871–1874, 2008. A. Farhadi, I. Endres, D. Hoiem, and D. A. Forsyth. Describing objects by their attributes. In CVPR, pages 1778–1785, 2009. J. Feng, Y. Wei, L. Tao, C. Zhang, and J. Sun. Salient object detection by composition. In ICCV, pages 1028–1035, 2011. V. Ferrari, F. Jurie, and C. Schmid. Accurate object detection with deformable shape models learnt from images. In CVPR, 2007. J. Heinly, E. Dunn, and J.-M. Frahm. Comparative evaluation of binary features. In Proc. of ECCV, 2012. R. Kwitt, N. Vasconcelos, and N. Rasiwasia. Scene recognition on the semantic manifold. In ECCV (4), pages 359–372, 2012. G. R. G. Lanckriet, N. Cristianini, P. L. Bartlett, L. E. Ghaoui, and M. I. Jordan. Learning the kernel matrix with semidefinite programming. Journal of Machine Learning Research, 5:27–72, 2004.

[14] S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In CVPR (2), pages 2169–2178, 2006.

[15] Q. V. Le, M. Ranzato, R. Monga, M. Devin, G. Corrado, K. Chen, J. Dean, and A. Y. Ng. Building high-level features using large scale unsupervised learning. In ICML, 2012.

[16] L.-J. Li and L. Fei-Fei. What, where and who? classifying events by scene and object recognition. In ICCV, pages 1–8, 2007.

[17] L.-J. Li, H. Su, E. P. Xing, and L. Fei-Fei. Object bank: A high-level image representation for scene classification & semantic feature sparsification. In NIPS, pages 1378–1386, 2010.

[18] D. G. Lowe. Distinctive image features from scale-invariant keypoints. Int’l J. of Comp. Vis., 60(2):91–1 10, 2004.

[19] G. A. Miller. Wordnet: A lexical database for english. Commun. ACM, 38(1 1):39–41, 1995.

[20] F. Moosmann, E. Nowak, and F. Jurie. Randomized clustering forests for image classification. IEEE Trans. Pattern Anal. Mach. Intell., 30(9): 1632–1646, 2008.

[21] Z. Niu, G. Hua, X. Gao, and Q. Tian. Context aware topic model for scene recognition. In CVPR, pages 2743–2750, 2012.

[22] T. Ojala, M. Pietik¨ ainen, and D. Harwood. A comparative study of texture measures with classification based on featured distributions. Pattern Recognition, 29(1):5 1–59, 1996.

[23] M. Pandey and S. Lazebnik. Scene recognition and weakly supervised object localization with deformable part-based models. In ICCV, pages 1307–13 14, 2011.

[24] D. Parikh and K. Grauman. Interactively building a discriminative vocabulary of nameable attributes. In CVPR, pages

[25]

[26]

[27]

[28]

[29]

[30] [3 1]

[32]

[33]

[34]

[35] 1681–1688, 2011. D. Parikh and K. Grauman. Relative attributes. In ICCV, pages 503–510, 2011. F. Perronnin, J. S ´anchez, and T. Mensink. Improving the fisher kernel for large-scale image classification. In ECCV (4), pages 143–156, 2010. A. Quattoni and A. Torralba. Recognizing indoor scenes. In CVPR, pages 413–420, 2009. S. Singh, A. Gupta, and A. A. Efros. Unsupervised discovery of mid-level discriminative patches. In ECCV (2), pages 73– 86, 2012. L. Torresani, M. Szummer, and A. W. Fitzgibbon. Efficient object category recognition using classemes. In ECCV (1), pages 776–789, 2010. A. Vedaldi and B. Fulkerson. VLFeat: An open and portable library of computer vision algorithms, 2008. A. Vedaldi and A. Zisserman. Efficient additive kernels via explicit feature maps. IEEE Trans. Pattern Anal. Mach. Intell., 34(3):480–492, 2012. J. Wang, J. Yang, K. Yu, F. Lv, T. S. Huang, and Y. Gong. Locality-constrained linear coding for image classification. In CVPR, pages 3360–3367, 2010. L. Wang, Y. Li, J. Jia, J. Sun, D. P. Wipf, and J. M. Rehg. Learning sparse covariance patterns for natural scenes. In CVPR, pages 2767–2774, 2012. Y. Xu, J.-Y. Zhu, E. I.-C. Chang, and Z. Tu. Multiple clustered instance learning for histopathology cancer image classification, segmentation and clustering. In CVPR, pages 964– 971, 2012. J. Yang, K. Yu, Y. Gong, and T. S. Huang. Linear spatial pyramid matching using sparse coding for image classifica- tion. In CVPR, pages 1794–1801, 2009.

[36] G. Zhao and M. Pietik¨ ainen. Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Trans. Pattern Anal. Mach. Intell., 29(6):915– 928, 2007.

[37] J.-Y. Zhu, J. Wu, Y. Wei, E. I.-C. Chang, and Z. Tu. Unsupervised object class discovery via saliency-guided multiple class learning. In CVPR, pages 3218–3225, 2012. 888885555588666