nips nips2009 nips2009-84 nips2009-84-reference knowledge-graph by maker-knowledge-mining

84 nips-2009-Evaluating multi-class learning strategies in a generative hierarchical framework for object detection


Source: pdf

Author: Sanja Fidler, Marko Boben, Ales Leonardis

Abstract: Multi-class object learning and detection is a challenging problem due to the large number of object classes and their high visual variability. Specialized detectors usually excel in performance, while joint representations optimize sharing and reduce inference time — but are complex to train. Conveniently, sequential class learning cuts down training time by transferring existing knowledge to novel classes, but cannot fully exploit the shareability of features among object classes and might depend on ordering of classes during learning. In hierarchical frameworks these issues have been little explored. In this paper, we provide a rigorous experimental analysis of various multiple object class learning strategies within a generative hierarchical framework. Specifically, we propose, evaluate and compare three important types of multi-class learning: 1.) independent training of individual categories, 2.) joint training of classes, and 3.) sequential learning of classes. We explore and compare their computational behavior (space and time) and detection performance as a function of the number of learned object classes on several recognition datasets. We show that sequential training achieves the best trade-off between inference and training times at a comparable detection performance and could thus be used to learn the classes on a larger scale. 1


reference text

[1] Russell, B., Torralba, A., Murphy, K., and Freeman, W. T. (2008) Labelme: a database and web-based tool for image annotation. IJCV, 77, 157–173.

[2] Leibe, B., Leonardis, A., and Schiele, B. (2008) Robust object detection with interleaved categorization and segmentation. IJCV, 77, 259–289.

[3] Torralba, A., Murphy, K. P., and Freeman, W. T. (2007) Sharing visual features for multiclass and multiview object detection. IEEE PAMI, 29, 854–869.

[4] Opelt, A., Pinz, A., and Zisserman, A. (2008) Learning an alphabet of shape and appearance for multiclass object detection. IJCV, 80, 16–44.

[5] Fei-Fei, L., Fergus, R., and Perona, P. (2004) Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. IEEE CVPR’04 Workshop on Generative-Model Based Vision.

[6] Krempp, S., Geman, D., and Amit, Y. (2002) Sequential learning of reusable parts for object detection. Tech. rep.

[7] Todorovic, S. and Ahuja, N. (2007) Unsupervised category modeling, recognition, and segmentation in images. IEEE PAMI.

[8] Zhu, S. and Mumford, D. (2006) A stochastic grammar of images. Found. and Trends in Comp. Graphics and Vision, 2, 259–362.

[9] Ranzato, M. A., Huang, F.-J., Boureau, Y.-L., and LeCun, Y. (2007) Unsupervised learning of invariant feature hierarchies with applications to object recognition. CVPR.

[10] Ullman, S. and Epshtein, B. (2006) Visual Classification by a Hierarchy of Extended Features.. Towards Category-Level Object Recognition, Springer-Verlag.

[11] Sivic, J., Russell, B. C., Zisserman, A., Freeman, W. T., and Efros, A. A. (2008) Unsupervised discovery of visual object class hierarchies. CVPR.

[12] Bart, I., Porteous, E., Perona, P., and Wellings, M. (2008) Unsupervised learning of visual taxonomies. CVPR.

[13] Fidler, S. and Leonardis, A. (2007) Towards scalable representations of visual categories: Learning a hierarchy of parts. CVPR.

[14] Scalzo, F. and Piater, J. H. (2005) Statistical learning of visual feature hierarchies. W. on Learning, CVPR.

[15] Zhu, L., Lin, C., Huang, H., Chen, Y., and Yuille, A. (2008) Unsupervised structure learning: Hierarchical recursive composition, suspicious coincidence and competitive exclusion. ECCV, vol. 2, pp. 759–773.

[16] Fleuret, F. and Geman, D. (2001) Coarse-to-fine face detection. IJCV, 41, 85–107.

[17] Schwartz, J. and Felzenszwalb, P. (2007) Hierarchical matching of deformable shapes. CVPR.

[18] Ommer, B. and Buhmann, J. M. (2007) Learning the compositional nature of visual objects. CVPR.

[19] Serre, T., Wolf, L., Bileschi, S., Riesenhuber, M., and Poggio, T. (2007) Object recognition with cortexlike mechanisms. IEEE PAMI, 29, 411–426.

[20] Sudderth, E., Torralba, A., Freeman, W. T., and Willsky, A. (2008) Describing visual scenes using transformed objects and parts. IJCV, pp. 291–330.

[21] Fidler, S., Boben, M., and Leonardis, A. (2009) Optimization framework for learning a hierarchical shape vocabulary for object class detection. BMVC.

[22] Agarwal, S., Awan, A., and Roth, D. (2004) Learning to detect objects in images via a sparse, part-based representation. IEEE PAMI, 26, 1475–1490.

[23] Shotton, J., Blake, A., and Cipolla, R. (2008) Multi-scale categorical object recognition using contour fragments. PAMI, 30, 1270–1281.

[24] Ferrari, V., Fevrier, L., Jurie, F., and Schmid, C. (2007) Accurate object detection with deformable shape models learnt from images. CVPR.

[25] Stark, M. and Schiele, B. (2007) How good are local features for classes of geometric objects? ICCV.

[26] Fritz, M. and Schiele, B. (2008) Decomposition, discovery and detection of visual categories using topic models. CVPR.

[27] Mutch, J. and Lowe, D. G. (2006) Multiclass object recognition with sparse, localized features. CVPR, pp. 11–18.

[28] Shotton, J., Blake, A., and Cipolla, R. (2008) Efficiently combining contour and texture cues for object recognition. BMVC.

[29] Ahuja, N. and Todorovic, S. (2008) Connected segmentation tree – a joint representation of region layout and hierarchy. CVPR. 9