nips nips2010 nips2010-240 nips2010-240-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Matthew Blaschko, Andrea Vedaldi, Andrew Zisserman
Abstract: A standard approach to learning object category detectors is to provide strong supervision in the form of a region of interest (ROI) specifying each instance of the object in the training images [17]. In this work are goal is to learn from heterogeneous labels, in which some images are only weakly supervised, specifying only the presence or absence of the object or a weak indication of object location, whilst others are fully annotated. To this end we develop a discriminative learning approach and make two contributions: (i) we propose a structured output formulation for weakly annotated images where full annotations are treated as latent variables; and (ii) we propose to optimize a ranking objective function, allowing our method to more effectively use negatively labeled images to improve detection average precision performance. The method is demonstrated on the benchmark INRIA pedestrian detection dataset of Dalal and Triggs [14] and the PASCAL VOC dataset [17], and it is shown that for a significant proportion of weakly supervised images the performance achieved is very similar to the fully supervised (state of the art) results. 1
[1] B. Alexe, T. Deselaers, and V. Ferrari. What is an object? In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, June 2010.
[2] S. Andrews, I. Tsochantaridis, and T. Hofmann. Support vector machines for multiple-instance learning. In Advances in Neural Information Processing Systems, pages 561–568. MIT Press, 2003.
[3] G. H. Bakır, T. Hofmann, B. Sch¨ lkopf, A. J. Smola, B. Taskar, and S. V. N. Vishwanathan. Predicting o Structured Data. MIT Press, 2007.
[4] A. Bar Hillel, T. Hertz, and D. Weinshall. Efficient learning of relational object class models. In Proceedings of the International Conference on Computer Vision, pages 1762–1769, 2005.
[5] T. Berg, A. Berg, J. Edwards, M. Mair, R. White, Y. Teh, E. Learned-Miller, and D. Forsyth. Names and Faces in the News. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Washington, DC, 2004.
[6] C. Bergeron, J. Zaretzki, C. Breneman, and K. P. Bennett. Multiple instance ranking. In Proceedings of the International Conference on Machine Learning, pages 48–55, 2008.
[7] M. B. Blaschko and C. H. Lampert. Correlational spectral clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2008.
[8] M. B. Blaschko and C. H. Lampert. Learning to localize objects with structured output regression. In Proceedings of the European Conference on Computer Vision, 2008.
[9] M. B. Blaschko and C. H. Lampert. Object localization with global and local context kernels. In Proceedings of the British Machine Vision Conference, 2009.
[10] P. Carbonetto, G. Dork´ , C. Schmid, H. K¨ ck, and N. Freitas. Learning to recognize objects with little o u supervision. International Journal of Computer Vision, 77(1–3):219–237, 2008. 8
[11] O. Chapelle and S. S. Keerthi. Efficient algorithms for ranking with svms. Information Retrieval, 2009.
[12] O. Chum and A. Zisserman. An exemplar model for learning object classes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2007.
[13] T. Cour, B. Sapp, C. Jordan, and B. Taskar. Learning from ambiguously labeled images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2009.
[14] N. Dalal and B. Triggs. Histogram of Oriented Gradients for Human Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, volume 2, pages 886–893, 2005.
[15] T. Deselaers, B. Alexe, and V. Ferrari. Localizing objects while learning their appearance. In Proceedings of the European Conference on Computer Vision, 2010.
[16] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results. http://www.pascalnetwork.org/challenges/VOC/voc2007/workshop/index.html, 2007.
[17] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88(2):303–338, June 2010.
[18] A. Farhadi, I. Endres, D. Hoiem, and D. Forsyth. Describing objects by their attributes. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1778–1785, 2009.
[19] P. Felzenszwalb, D. Mcallester, and D. Ramanan. A discriminatively trained, multiscale, deformable part model. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2008.
[20] R. Fergus, L. Fei-Fei, P. Perona, and A. Zisserman. Learning object categories from Google’s image search. In Proceedings of the International Conference on Computer Vision, 2005.
[21] T. Joachims. Optimizing search engines using clickthrough data. In KDD ’02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 133–142, New York, NY, USA, 2002. ACM.
[22] T. Joachims, T. Finley, and C.-N. J. Yu. Cutting-plane training of structural svms. Machine Learning, 77(1):27–59, 2009.
[23] G. Kim and A. Torralba. Unsupervised detection of regions of interest using iterative link analysis. In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems, pages 961–969. 2009.
[24] C. H. Lampert, M. B. Blaschko, and T. Hofmann. Beyond sliding windows: Object localization by efficient subwindow search. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2008.
[25] C. H. Lampert, M. B. Blaschko, and T. Hofmann. Efficient subwindow search: A branch and bound framework for object localization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009.
[26] C. H. Lampert, H. Nickisch, and S. Harmeling. Learning to detect unseen object classes by between-class attribute transfer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 951–958, 2009.
[27] B. Leibe, A. Leonardis, and B. Schiele. Combined object categorization and segmentation with an implicit shape model. In Workshop on Statistical Learning in Computer Vision, ECCV, May 2004.
[28] F. Moosmann, D. Larlus, and F. Jurie. Learning saliency maps for object categorization. In ECCV International Workshop on The Representation and Use of Prior Knowledge in Vision, 2006.
[29] M. H. Nguyen, L. Torresani, F. De la Torre Frade, and C. Rother. Weakly supervised discriminative localization and classification: A joint learning process. In Proceedings of the International Conference on Computer Vision, 2009.
[30] A. Opelt, A. Fussenegger, A. Pinz, and P. Auer. Weak hypotheses and boosting for generic object detection and recognition. In Proceedings of the 8th European Conference on Computer Vision, Prague, Czech Republic, volume 2, pages 71–84, 2004.
[31] A. Opelt and A. Pinz. Object localization with boosting and weak supervision for generic object recognition. In Scandinavian Conference on Image Analysis, pages 862–871, 2005.
[32] P. Ott and M. Everingham. Implicit color segmentation features for pedestrian and object detection. In Proceedings of the International Conference on Computer Vision, 2009.
[33] C. Pantofaru and M. Hebert. A framework for learning to recognize and segment object classes using weakly supervised training data. In Proceedings of the British Machine Vision Conference, 2007.
[34] N. Rasiwasia and N. Vasconcelos. Scene classification with low-dimensional semantic spaces and weak supervision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2008.
[35] B. Taskar, C. Guestrin, and D. Koller. Max-margin markov networks. In S. Thrun, L. Saul, and B. Sch¨ lkopf, editors, Advances in Neural Information Processing Systems. 2004. o
[36] I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun. Support vector machine learning for interdependent and structured output spaces. In Proceedings of the International Conference on Machine Learning, 2004.
[37] T. Tuytelaars, C. H. Lampert, M. B. Blaschko, and W. Buntine. Unsupervised object discovery: A comparison. International Journal of Computer Vision, 88(2):61–85, 2010.
[38] A. Vedaldi and A. Zisserman. Structured output regression for detection with partial truncation. In Advances in Neural Information Processing Systems, 2009.
[39] S. Vijayanarasimhan and K. Grauman. Multi-level active prediction of useful image annotations for recognition. In D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors, Advances in Neural Information Processing Systems, pages 1705–1712. 2009.
[40] C.-N. J. Yu and T. Joachims. Learning structural svms with latent variables. In Proceedings of the International Conference on Machine Learning, 2009.
[41] Y. Yue, T. Finley, F. Radlinski, and T. Joachims. A support vector method for optimizing average precision. In Special Interest Group on Information Retrieval, 2007. 9