nips nips2011 nips2011-151 nips2011-151-reference knowledge-graph by maker-knowledge-mining

151 nips-2011-Learning a Tree of Metrics with Disjoint Visual Features

Source: pdf

Author: Kristen Grauman, Fei Sha, Sung J. Hwang

Abstract: We introduce an approach to learn discriminative visual representations while exploiting external semantic knowledge about object category relationships. Given a hierarchical taxonomy that captures semantic similarity between the objects, we learn a corresponding tree of metrics (ToM). In this tree, we have one metric for each non-leaf node of the object hierarchy, and each metric is responsible for discriminating among its immediate subcategory children. Speciﬁcally, a Mahalanobis metric learned for a given node must satisfy the appropriate (dis)similarity constraints generated only among its subtree members’ training instances. To further exploit the semantics, we introduce a novel regularizer coupling the metrics that prefers a sparse disjoint set of features to be selected for each metric relative to its ancestor (supercategory) nodes’ metrics. Intuitively, this reﬂects that visual cues most useful to distinguish the generic classes (e.g., feline vs. canine) should be different than those cues most useful to distinguish their component ﬁne-grained classes (e.g., Persian cat vs. Siamese cat). We validate our approach with multiple image datasets using the WordNet taxonomy, show its advantages over alternative metric learning approaches, and analyze the meaning of attribute features selected by our algorithm. 1

reference text

[1] A. Vedaldi, V. Gulshan, M. Varma, and A. Zisserman. Multiple kernels for object detection. In ICCV, 2009.

[2] P. Gehler and S. Nowozin. On feature combination for multiclass object classiﬁcation. In ICCV, 2009.

[3] J. Yang, K. Yu, Y. Gong, and T. Huang. Linear spatial pyramid matching using sparse coding for image classiﬁcation. In CVPR, 2009.

[4] L.-J. Li, H. Su, E. Xing, and L. Fei-Fei. Object bank: A high-level image representation for scene classiﬁcation and semantic feature sparsiﬁcation. In NIPS, 2010.

[5] Y. Jia, M. Salzmann, and T. Darrell. Factorized latent spaces with structured sparsity. In NIPS, 2010.

[6] A. Frome, Y. Singer, and J. Malik. Image retrieval and classiﬁcation using local distance functions. In NIPS, 2006.

[7] P. Kumar, P. Torr, and A. Zisserman. An invariant large margin nearest neighbour classiﬁer. In ICCV, 2007.

[8] P. Jain, B. Kulis, and K. Grauman. Fast image search for learned metrics. In CVPR, 2008.

[9] D. Ramanan and S. Baker. Local distance functions: A taxonomy, new algorithms, and an evaluation. In PAMI, 2011.

[10] Z. Wang, Y. Hu, and L.-T. Chia. Image-to-class distance metric learning for image classiﬁcation. In ECCV, 2010.

[11] K. Q. Weinberger and K. L. Saul. Distance metric learning for large margin nearest neighbor classiﬁcation. JMLR, 10:207–244, June 2009.

[12] M. Marszalek and C. Schmid. Constructing category hierarchies for visual recognition. In ECCV, 2008.

[13] G. Grifﬁn and P. Perona. Learning and using taxonomies for fast visual category recognition. In CVPR, 2008.

[14] D. Koller and M. Sahami. Hierarchically classifying documents using very few words. In ICML, 1997.

[15] A. McCallum, R. Rosenfeld, T. Mitchell, and A. Ng. Improving text classiﬁcation by shrinkage in a hierarchy of classes. In ICML, 1998.

[16] L. Cai and T. Hofmann. Hierarchical document categorization with support vector machines. In CIKM, 2004.

[17] C. Lampert, H. Nickisch, and S. Harmeling. Learning to detect unseen object classes by between-class attribute transfer. In CVPR, 2009.

[18] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-F ei. ImageNet: A large-scale hierarchical image database. In CVPR, 2009.

[19] A. Torralba and K. Murphy. Sharing visual features for multiclass and multiview object detection. PAMI, 29(5), 2007.

[20] G. Shakhnarovich. Learning Task-Speciﬁc Similarity. PhD thesis, MIT, 2006.

[21] B. Babenko, S. Branson, and S. Belongie. Similarity functions for categorization: from monolithic to category speciﬁc. In ICCV, 2009.

[22] A. Globerson and S. Roweis. Metric learning by collapsing classes. In NIPS, pages 451–458. 2006.

[23] J. Davis, B. Kulis, P. Jain, S. Sra, and I. Dhillon. Information-theoretic metric learning. In ICML, 2007.

[24] K. Weinberger and L. Saul. Fast solvers and efﬁcient implementations for distance metric learning. In ICML, 2008.

[25] Q. Chen and S. Sun. Hierarchical large margin nearest neighbor classiﬁcation. In ICPR, 2010.

[26] S. Parameswaran and K. Weinberger. Large margin multi-task metric learning. In NIPS, 2010.

[27] Y. Ying, K. Huang, and C. Campbell. Sparse metric learning via smooth optimization. In NIPS. 2009.

[28] D. Zhou, L. Xiao, and M. Wu. Hierarchical classiﬁcation via orthogonal transfer. In ICML, 2011.

[29] J. Sivic, B. Russell, A. Zisserman, W. Freeman, and A. Efros. Unsupervised discovery of visual object class hierarchies. In CVPR, 2008.

[30] E. Bart, I. Porteous, P. Perona, and M. Welling. Unsupervised learning of visual taxonomies. In CVPR, 2008.

[31] A. Zweig and D. Weinshall. Exploiting object hierarchy: Combining models from different category levels. In ICCV, 2007.

[32] J. Deng, A. Berg, K. Li, and L. Fei-Fei. What does classifying more than 10,000 image categories tell us? In ECCV, 2010.

[33] Y. Wang and G. Mori. A discriminative latent model of object classes and attributes. In ECCV, 2010.

[34] R. Tibshirani. Regression shrinkage and selection via the lasso. J. Roy. Statistical Society, 58:267–288, 1994.

[35] S. Dumais and H. Chen. Hierarchical classiﬁcation of web content. In Research and Development in Information Retrieval, 2000.

[36] R. Fergus, H. Bernal, Y. Weiss, and A. Torralba. Semantic label sharing for learning with many categories. In ECCV, 2010.

[37] K. Barnard, Q. Fan, R. Swaminathan, A. Hoogs, R. Collins, P. Rondot, and J. Kaufhold. Evaluation of localized semantics: data, methodology, and experiments. Technical report, University of Arizona, 2005. 9