cvpr cvpr2013 cvpr2013-264 cvpr2013-264-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Carlos Arteta, Victor Lempitsky, J. Alison Noble, Andrew Zisserman
Abstract: The objective of this work is to detect all instances of a class (such as cells or people) in an image. The instances may be partially overlapping and clustered, and hence quite challenging for traditional detectors, which aim at localizing individual instances. Our approach is to propose a set of candidate regions, and then select regions based on optimizing a global classification score, subject to the constraint that the selected regions are non-overlapping. Our novel contribution is to extend standard object detection by introducing separate classes for tuples of objects into the detection process. For example, our detector can pick a region containing two or three object instances, while assigning such region an appropriate label. We show that this formulation can be learned within the structured output SVM framework, and that the inference in such model can be accomplished using dynamic programming on a tree structured region graph. Furthermore, the learning only requires weak annotations – a dot on each instance. The improvement resulting from the addition of the capability to detect tuples of objects is demonstrated on quite disparate data sets: fluorescence microscopy images and UCSD pedestrians.
[1] P. Arbelaez, M. Maire, C. Fowlkes, and J. Malik. Contour detection and hierarchical image segmentation. TPAMI, 33:898–916, 2011.
[2] C. Arteta, V. Lempitsky, J. A. Noble, and A. Zisserman. Learning to detect cells using non-overlapping extremal regions. In Proc. MICCAI, 2012.
[3] O. Barinova, V. Lempitsky, and P. Kohli. On the detection of multiple object instances using Hough transforms. TPAMI, 2012.
[4] E. Bernardis and S. X. Yu. Pop out many small structures from a very large microscopic image. Med. Image Analysis, 2011.
[5] A. Chan and N. Vasconcelos. Bayesian poisson regression for crowd counting. In Proc. CVPR, 2009.
[6] A. B. Chan, Z.-S. J. Liang, and N. Vasconcelos. Privacy preserving crowd monitoring: Counting people without people models or tracking. In CVPR, 2008.
[7] W. Choi and S. Savarese. A unified framework for multi-target tracking and collective activity recognition. In ECCV, 2012.
[8] C. Desai, D. Ramanan, and C. Fowlkes. Discriminative models for multi-class object layout. In Proc. ICCV, 2009.
[9] X. Descombes, R. Minlos, and E. Zhizhina. Object extraction using a stochastic birth-and-death dynamics in continuum. Journal of Mathematical Imaging and Vision, 2009.
[10] L. Dong, V. Parameswaran, V. Ramesh, and I. Zoghlami. Fast crowd segmentation using shape indexing. In Proc. ICCV, 2007.
[11] L. Fiaschi, R. Nair, U. K ¨oethe, and F. Hamprecht. Learning to count with regression forest and structured labels. In Proc. ICPR, 2012.
[12] D. Kong, D. Gray, and H. Tao. A viewpoint invariant approach for crowd counting. In Proc. ICPR, 2006.
[13] A. Lehmussola, P. Ruusuvuori, J. Selinummi, H. Huttunen, and O. Yli-Harja. Computational framework for simulating fluorescence microscope images with cell populations. IEEE TMI, 2007.
[14] B. Leibe, A. Leonardis, and B. Schiele. Robust object detection with interleaved categorization and segmentation. IJCV, 2008.
[15] V. Lempitsky and A. Zisserman. Learning to count objects in images. In NIPS, 2010.
[16] A. Marana, S. Velastin, L. Costa, and R. Lotufo. Estimation of crowd density using image processing. In Image Processing for Security Applications, IEE Colloquium on, 1997.
[17] J. Matas, O. Chum, M. Urban, and T. Pajdla. Robust wide-baseline stereo from maximally stable extremal regions. Image and Vision Computing, 2004.
[18] J. Matas and K. Zimmermann. A new class of learnable detectors for categorisation. In SCIA, 2005.
[19] L. Neumann and J. Matas. Text localization in real-world images using efficiently pruned exhaustive search. In ICDAR, 2011.
[20] J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kauffman, California, 1988.
[21] H. Riemenschneider, S. Sternig, M. Donoser, P. Roth, and H. Bischof. Hough regions for joining instance localization and segmentation. In Proc. ECCV, 2012.
[22] D. Ryan, S. Denman, C. Fookes, and S. Sridharan. Crowd counting using multiple local features. In Proc. DICTA, 2009.
[23] I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun. Support vector machine learning for interdependent and structured output spaces. In Proc. ICML, 2004.
[24] B. Wu and R. Nevatia. Detection and segmentation of multiple, partially occluded objects by grouping, merging, assigning part detection responses. IJCV, 2009.
[25] C. Yu and T. Joachims. Learning structural SVMs with latent variables. In Proc. ICML, 2009. 333222333755