iccv iccv2013 iccv2013-451 iccv2013-451-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Mohamed Elhoseiny, Babak Saleh, Ahmed Elgammal
Abstract: The main question we address in this paper is how to use purely textual description of categories with no training images to learn visual classifiers for these categories. We propose an approach for zero-shot learning of object categories where the description of unseen categories comes in the form of typical text such as an encyclopedia entry, without the need to explicitly defined attributes. We propose and investigate two baseline formulations, based on regression and domain adaptation. Then, we propose a new constrained optimization formulation that combines a regression function and a knowledge transfer function with additional constraints to predict the classifier parameters for new classes. We applied the proposed approach on two fine-grained categorization datasets, and the results indicate successful classifier prediction.
[1] K. Barnard, P. Duygulu, and D. Forsyth. Clustering art. In CVPR, 2001. 2
[2] E. Bart and S. Ullman. Cross-generalization: Learning novel classes from a single example by feature replacement. In CVPR, 2005. 1, 2
[3] L. Bo and C. Sminchisescu. Twin gaussian processes for structured prediction. IJCV, 2010. 5, 6
[4] J. Deng, A. C. Berg, K. Li, and L. Fei-Fei. What does classifying more than 10,000 image categories tell us? In ECCV. 2010. 1, 2
[5] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. FeiFei. Imagenet: A large-scale hierarchical image database. In CVPR, 2009. 2
[6] L. Duan, D. Xu, I. W.-H. Tsang, and J. Luo. Visual event recognition in videos by learning from web data. TPAMI, 2012. 3
[7] A. Farhadi, I. Endres, D. Hoiem, and D. A. Forsyth. Describing objects by their attributes. In CVPR, 2009. 1, 2
[8] A. Farhadi, M. Hejrati, M. A. Sadeghi, P. Young, C. Rashtchian, J. Hockenmaier, and D. Forsyth. Every picture tells a story: Generating sentences from images. In ECCV. 2010. 3
[9] L. Fe-Fei, R. Fergus, and P. Perona. A bayesian approach to unsupervised one-shot learning of object categories. In CVPR, 2003. 1, 2
[10] R. Fergus, H. Bernal, Y. Weiss, and A. Torralba. Semantic label sharing for learning with many categories. In ECCV. 2010. 2
[11] M. Fink. Object classification from a single example utilizing class relevance metrics. In NIPS, 2004. 2
[12] G. Griffin and P. Perona. Learning and using taxonomies for fast visual categorization. In CVPR, 2008. 2
[13] A. E. Hoerl and R. W. Kennard. Ridge Regression: Biased Estimation for Nonorthogonal Problems. Technometrics, 1970. 3
[14] N. Krishnamoorthy, G. Malkarnenkar, R. Mooney, K. Saenko, U. Lowell, and S. Guadarrama. Generating natural-language video descriptions using text-mined knowledge. NAACL HLT, 2013. 3
[15] B. Kulis, K. Saenko, and T. Darrell. What you saw is not what you get: Domain adaptation using asymmetric kernel transforms. In CVPR, 2011. 3, 4, 5, 6
[16] G. Kulkarni, V. Premraj, S. Dhar, S. Li, Y. Choi, A. C. Berg, and T. L. Berg. Baby talk: Understanding and generating simple image descriptions. In CVPR, 2011. 3
[17] C. H. Lampert, H. Nickisch, and S. Harmeling. Learning to detect unseen object classes by betweenclass attribute transfer. In CVPR, 2009. 1, 2
[18] E. G. Miller, N. E. Matsakis, and P. A. Viola. Learning from one example through shared densities on transforms. In CVPR, 2000. 2
[19] G. A. Miller. Wordnet: A lexical database for english. COMMUNICATIONS OF THE ACM, 1995. 2
[20] M.-E. Nilsback and A. Zisserman. Automated flower classification over large number of classes. In ICVGIP, 2008. 2, 6
[21] V. Ordonez, G. Kulkarni, and T. L. Berg. Im2text: Describing images using 1 million captioned photographs. In NIPS, 2011. 3
[22] M. Palatucci, D. Pomerleau, G. E. Hinton, and T. M. Mitchell. Zero-shot learning with semantic output codes. In NIPS, 2009. 2
[23] D. Parikh and K. Grauman. Interactively building a discriminative vocabulary of nameable attributes. In CVPR, 2011. 2
[24] C. E. Rasmussen and C. K. I. Williams. Gaussian Processes for Machine Learning. The MIT Press, 2005. 3, 6
[25] M. Rohrbach, M. Stark, G. Szarvas, and B. Schiele. Combining language sources and robust semantic relatedness for attribute-based knowledge transfer. In Parts and Attributes Workshop at ECCV, 2010. 2
[26] K. Saenko, B. Kulis, M. Fritz, and T. Darrell. Adapting visual category models to new domains. In ECCV. 2010. 3
[27] R. Salakhutdinov, A. Torralba, and J. B. Tenenbaum. Learning to share visual appearance for multiclass object detection. In CVPR, 2011. 1, 2
[28] B. Saleh, A. Farhadi, and A. Elgammal. Object-centric anomaly detection by attribute-based reasoning. In CVPR, 2013. 1
[29] G. Salton and C. Buckley. Term-weighting approaches in automatic text retrieval. IPM, 1988. 6
[30] A. Torralba, R. Fergus, and W. T. Freeman. 80 million tiny images: A large data set for nonparametric object and scene recognition. PAMI, 2008. 2 [3 1] L. Torresani, M. Szummer, and A. Fitzgibbon. Efficient object category recognition using classemes. In ECCV, 2010. 6
[32] P. Welinder, S. Branson, T. Mita, C. Wah, F. Schroff, S. Belongie, and P. Perona. Caltech-UCSD Birds 200. Technical report, California Institute of Technology, 2010. 2, 6
[33] J. Yang, R. Yan, and A. G. Hauptmann. Cross-domain video concept detection using adaptive svms. In MULTIMEDIA, 2007. 3
[34] Y. Yang, C. L. Teo, H. Daum e´ III, and Y. Aloimonos. Corpus-guided sentence generation of natural images. In EMNLP, 2011. 3
[35] D. Zeimpekis and E. Gallopoulos. Clsi: A flexible approximation scheme from clustered term-document matrices. In In SDM, 2005. 6 2591