jmlr jmlr2008 jmlr2008-87 jmlr2008-87-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: François Fleuret, Donald Geman
Abstract: Most discriminative techniques for detecting instances from object categories in still images consist of looping over a partition of a pose space with dedicated binary classiÄ?Ĺš ers. The efÄ?Ĺš ciency of this strategy for a complex pose, that is, for Ä?Ĺš ne-grained descriptions, can be assessed by measuring the effect of sample size and pose resolution on accuracy and computation. Two conclusions emerge: (1) fragmenting the training data, which is inevitable in dealing with high in-class variation, severely reduces accuracy; (2) the computational cost at high resolution is prohibitive due to visiting a massive pose partition. To overcome data-fragmentation we propose a novel framework centered on pose-indexed features which assign a response to a pair consisting of an image and a pose, and are designed to be stationary: the probability distribution of the response is always the same if an object is actually present. Such features allow for efÄ?Ĺš cient, one-shot learning of pose-speciÄ?Ĺš c classiÄ?Ĺš ers. To avoid expensive scene processing, we arrange these classiÄ?Ĺš ers in a hierarchy based on nested partitions of the pose; as in previous work on coarse-to-Ä?Ĺš ne search, this allows for efÄ?Ĺš cient processing. The hierarchy is then â€?foldedâ€? for training: all the classiÄ?Ĺš ers at each level are derived from one base predictor learned from all the data. The hierarchy is â€?unfoldedâ€? for testing: parsing a scene amounts to examining increasingly Ä?Ĺš ner object descriptions only when there is sufÄ?Ĺš cient evidence for coarser ones. In this way, the detection results are equivalent to an exhaustive search at high resolution. We illustrate these ideas by detecting and localizing cats in highly cluttered greyscale scenes. Keywords: supervised learning, computer vision, image interpretation, cats, stationary features, hierarchical search
Y. Amit, D. Geman, and B. Jedynak. EfÄ?Ĺš cient focusing and face detection. In Face Recognition: From Theory to Applications. Springer Verlag, 1998. D. J. Crandall and D. P. Huttenlocher. Weakly supervised learning of part-based spatial models for visual object recognition. In European Conference on Computer Vision, pages 16–29, 2006. X. Fan. Learning a Hierarchy of ClassiÄ?Ĺš ers for Multi-class Shape Detection. PhD thesis, Johns Hopkins University, 2006. F. Fleuret and D. Geman. Coarse-to-Ä?Ĺš ne face detection. International Journal of Computer Vision (IJCV), 41(1/2):85–107, 2001. F. Fleuret and D. Geman. Stationary features and cat detection. Technical Report 07-56, IDIAP Research Institute, October 2007. Y. Freund and R. E. Schapire. A short introduction to boosting. Journal of Japanese Society for ArtiÄ?Ĺš cial Intelligence, 14(5):771–780, 1999. S. Gangaputra and D. Geman. A design principle for coarse-to-Ä?Ĺš ne classiÄ?Ĺš cation. In Conference on Computer Vision and Pattern Recognition, volume 2, pages 1877–1884, 2006. D.M. Gavrila. Multi-frame hierarchical template matching using distance transforms. In International Conference on Pattern Recognition, 1998. S. Geman, K. Manbeck, and E. McClure. Coarse-to-Ä?Ĺš ne search and rank-sum statistics in object recognition. Technical report, Brown University, 1995. S. Geman, D. F. Potter, and Z. Chi. Composition systems. Quarterly of Applied Mathematics, LX: 707–736, 2002. U. Grenander. General Pattern Theory. Oxford U. Press, 1993. D. Huttenlocher and P. Felzenszwalb. Pictorial structures for object recognition. International Journal of Computer Vision, 61(1):55–79, 2005. D.P. Huttenlocher and W.J. Rucklidge. A multi-resolution technique for comparing images using the hausdorff distance. In Conference on Computer Vision and Pattern Recognition, 1993. Y. LeCun, F. Huang, and L. Bottou. Learning methods for generic object recognition with invariance to pose and lighting. In Conference on Computer Vision and Pattern Recognition. IEEE Press, 2004. F. Li, R. Fergus, and P. Perona. A Bayesian approach to unsupervised one-shot learning of object categories. In International Conference on Computer Vision, volume 2, page 1134, 2003. J.L. Mundy and A. Zisserman, editors. Geometric Invariance in Computer Vision. MIT Press, 1992. B. Ommer, M. Sauter, and J. M. Buhmann. Learning top-down grouping of compositional hierarchies for recognition. In Conference on Computer Vision and Pattern Recognition, 2006. 2577 F LEURET AND G EMAN C. Papageorgiou and T. Poggio. A trainable system for object detection. International Journal of Computer Vision, 38(1):15–33, June 2000. H. A. Rowley, S. Baluja, and T. Kanade. Neural network-based face detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(1):23–28, 1998. H. Schneiderman and T. Kanade. Object detection using the statistics of parts. International Journal of Computer Vision, 56(3):151–177, 2004. P. Simard, L. Bottou, P. Haffner, and Y. LeCun. Boxlets: a fast convolution algorithm for neural networks and signal processing. In Neural Information Processing Systems, volume 11, 1999. B. Stenger, A. Thayananthan, P.H.S. Torr, and R. Cipolla. Model-based hand tracking using a hierarchical bayesian Ä?Ĺš lter. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28 (9):1372–1384, 2006. P. Viola and M. J. Jones. Robust real-time face detection. International Journal of Computer Vision, 57(2):137–154, 2004. J. Wu, S. C. Brubaker, M. D. Mullin, and J. M. Rehg. Fast asymmetric learning for cascade face detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30:369–382, 2008. S.C. Zhu and D. Mumford. A Stochastic Grammar of Images, volume 2 of Foundations and Trends in Computer Graphics and Vision, pages 259–362. Now Publishers, 2006. 2578