nips nips2010 nips2010-149 nips2010-149-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Victor Lempitsky, Andrew Zisserman
Abstract: We propose a new supervised learning framework for visual object counting tasks, such as estimating the number of cells in a microscopic image or the number of humans in surveillance video frames. We focus on the practically-attractive case when the training images are annotated with dots (one dot per object). Our goal is to accurately estimate the count. However, we evade the hard task of learning to detect and localize individual object instances. Instead, we cast the problem as that of estimating an image density whose integral over any image region gives the count of objects within that region. Learning to infer such density can be formulated as a minimization of a regularized risk quadratic cost function. We introduce a new loss function, which is well-suited for such learning, and at the same time can be computed efficiently via a maximum subarray algorithm. The learning can then be posed as a convex quadratic program solvable with cutting-plane optimization. The proposed framework is very flexible as it can accept any domain-specific visual features. Once trained, our system provides accurate object counts and requires a very small time overhead over the feature extraction step, making it a good candidate for applications involving real-time processing or dealing with huge amount of visual data. 1
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]
[30]
[31]
[32]
[33]
[34] http://www.robots.ox.ac.uk/%7Evgg/research/counting/index.html. The MOSEK optimization software. http://www.mosek.com/. N. Ahuja and S. Todorovic. Extracting texels in 2.1d natural textures. ICCV, pp. 1–8, 2007. S. An, P. Peursum, W. Liu, and S. Venkatesh. Efficient algorithms for subwindow search in object detection and localization. CVPR, pp. 264–271, 2009. D. Anoraganingrum. Cell segmentation with median filter and mathematical morphology operation. Image Analysis and Processing, International Conference on, 0:1043, 1999. O. Barinova, V. Lempitsky, and P. Kohli. On the detection of multiple object instances using Hough transforms. CVPR, 2010. J. L. Bentley. Programming pearls: Algorithm design techniques. Comm. ACM, 27(9):865–871, 1984. J. L. Bentley. Programming pearls: Perspective on performance. Comm. ACM, 27(11):1087–1092, 1984. L. Breiman. Random forests. Machine Learning, 45(1):5–32, 2001. A. B. Chan, Z.-S. J. Liang, and N. Vasconcelos. Privacy preserving crowd monitoring: Counting people without people models or tracking. CVPR, 2008. S.-Y. Cho, T. W. S. Chow, and C.-T. Leung. A neural-based crowd estimation by hybrid global learning algorithm. IEEE Transactions on Systems, Man, and Cybernetics, Part B, 29(4):535–541, 1999. C. Desai, D. Ramanan, and C. Fowlkes. Discriminative models for multi-class object layout. ICCV, 2009. X. Descombes, R. Minlos, and E. Zhizhina. Object extraction using a stochastic birth-and-death dynamics in continuum. Journal of Mathematical Imaging and Vision, 33(3):347–359, 2009. L. Dong, V. Parameswaran, V. Ramesh, and I. Zoghlami. Fast crowd segmentation using shape indexing. ICCV, pp. 1–8, 2007. M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The PASCAL Visual Object Classes Challenge 2009 (VOC2009) Results. http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2009/workshop/index.html. T. Joachims, T. Finley, and C.-N. J. Yu. Cutting-plane training of structural svms. Machine Learning, 77(1):27– 59, 2009. D. Kong, D. Gray, and H. Tao. A viewpoint invariant approach for crowd counting. ICPR (3), pp. 1187–1190, 2006. P. D. Kovesi. MATLAB and Octave functions for computer vision and image processing. School of Computer Science & Software Engineering, The University of Western Australia. Available from: http://www.csse.uwa.edu.au/∼pk/research/matlabfns/. A. Lehmussola, P. Ruusuvuori, J. Selinummi, H. Huttunen, and O. Yli-Harja. Computational framework for simulating fluorescence microscope images with cell populations. IEEE Trans. Med. Imaging, 26(7):1010– 1016, 2007. B. Leibe, A. Leonardis, and B. Schiele. Robust object detection with interleaved categorization and segmentation. International Journal of Computer Vision, 77(1-3):259–289, 2008. D. G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2):91–110, 2004. A. N. Marana, S. A. Velastin, L. F. Costa, and R. A. Lotufo. Estimation of crowd density using image processing. Image Processing for Security Applications, pp. 1–8, 1997. J. Massey, Frank J. The kolmogorov-smirnov test for goodness of fit. Journal of the American Statistical Association, 46(253):68–78, 1951. F. Moosmann, B. Triggs, and F. Jurie. Fast discriminative visual codebooks using randomized clustering forests. NIPS, pp. 985–992, 2006. S. K. Nath, K. Palaniappan, and F. Bunyak. Cell segmentation using coupled level sets and graph-vertex coloring. MICCAI (1), pp. 101–108, 2006. T. W. Nattkemper, H. Wersing, W. Schubert, and H. Ritter. A neural network architecture for automatic segmentation of fluorescence micrographs. Neurocomputing, 48(1-4):357–367, 2002. V. Rabaud and S. Belongie. Counting crowded moving objects. CVPR (1), pp. 705–711, 2006. D. Ryan, S. Denman, C. Fookes, and S. Sridharan. Crowd counting using multiple local features. DICTA ’09: Proceedings of the 2009 Digital Image Computing: Techniques and Applications, pp. 81–88, 2009. J. Selinummi, J. Seppala, O. Yli-Harja, and J. A. Puhakka. Software for quantification of labeled bacteria from digital microscope images by automated image analysis. Biotechniques, 39(6):859–63, 2005. T. Sharp. Implementing decision trees and forests on a GPU. ECCV (4), pp. 595–608, 2008. H. Tamaki and T. Tokuyama. Algorithms for the maxium subarray problem based on matrix multiplication. SODA, pp. 446–452, 1998. A. Vedaldi and B. Fulkerson. VLFeat: An open and portable library of computer vision algorithms. http://www.vlfeat.org/, 2008. B. Wu, R. Nevatia, and Y. Li. Segmentation of multiple, partially occluded objects by grouping, merging, assigning part detection responses. CVPR, 2008. T. Zhao and R. Nevatia. Bayesian human segmentation in crowded situations. CVPR (2), pp. 459–466, 2003. 9