nips nips2011 nips2011-223 nips2011-223-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Adrian Ion, Joao Carreira, Cristian Sminchisescu
Abstract: We present a joint image segmentation and labeling model (JSL) which, given a bag of figure-ground segment hypotheses extracted at multiple image locations and scales, constructs a joint probability distribution over both the compatible image interpretations (tilings or image segmentations) composed from those segments, and over their labeling into categories. The process of drawing samples from the joint distribution can be interpreted as first sampling tilings, modeled as maximal cliques, from a graph connecting spatially non-overlapping segments in the bag [1], followed by sampling labels for those segments, conditioned on the choice of a particular tiling. We learn the segmentation and labeling parameters jointly, based on Maximum Likelihood with a novel Incremental Saddle Point estimation procedure. The partition function over tilings and labelings is increasingly more accurately approximated by including incorrect configurations that a not-yet-competent model rates probable during learning. We show that the proposed methodology matches the current state of the art in the Stanford dataset [2], as well as in VOC2010, where 41.7% accuracy on the test set is achieved.
[1] A. Ion, J. Carreira, and C. Sminchisescu. Image segmentation by figure-ground composition into maximal cliques. In ICCV, November 2011.
[2] S. Gould, R. Fulton, and D. Koller. Decomposing a scene into geometric and semantically consistent regions. In ICCV, September 2009.
[3] J. Carreira and C. Sminchisescu. Constrained parametric min-cuts for automatic object segmentation. In CVPR, June 2010.
[4] M. P. Kumar and D. Koller. Efficiently selecting regions for scene understanding. In CVPR, 2010.
[5] S. Nowozin, P.V. Gehler, and C.H. Lampert. On parameter learning in crf-based approaches to object class image segmentation. In ECCV, 2010.
[6] L. Ladicky, C. Russell, P. Kohli, and P. H. S. Torr. Associative hierarchical crfs for object class image segmentation. In ICCV, 2009.
[7] D. Hoiem, A. Efros, and M. Hebert. Recovering surface layout from an image. IJCV, 75(1), 2007.
[8] K. Barnard, P. Duygulu, D. Forsyth, N. de Freitas, D. M. Blei, and M. Jordan. Matching words and pictures. JMLR., 3:1107–1135, March 2003.
[9] P. Arbelaez, M. Maire, C. Fowlkes, and J. Malik. From contours to regions: An empirical evaluation. In CVPR, pages 2294–2301, June 2009.
[10] T. Malisiewicz and A. Efros. Improving spatial support for objects via multiple segmentations. In BMVC, 2007.
[11] J. Shotton, J. Winn, C. Rother, and A. Criminisi. Textonboost for image understanding: Multi-class object recognition and segmentation by jointly modeling texture, layout, and context. IJCV, 81:2–23, 2009.
[12] X. He, R. S. Zemel, and M. Carreira-Perpinan. Multiscale conditional random fields for image labeling. CVPR, 2004.
[13] G. Csurka and F. Perronnin. An efficient approach to semantic segmentation. IJCV, pages 1–15, 2010.
[14] B. Fulkerson, A. Vedaldi, and S. Soatto. Class segmentation and object localization with superpixel neighborhoods. In ICCV, 2009.
[15] J. M. Gonfaus, X. Boix, J. van de Weijer, A. D. Bagdanov, J Serrat, and J. Gonzalez. Harmony potentials for joint classification and segmentation. In CVPR, 2010.
[16] P. Kohli, L. Ladicky, and P.H.S. Torr. Robust higher order potentials for enforcing label consistency. In CVPR, 2008.
[17] L. Ladicky, P. Sturgess, K. Alaharia, C. Russel, and P.H.S. Torr. What, where & how many ? combining object detectors and crfs. In ECCV, September 2010.
[18] C. Pantofaru, C. Schmid, and M. Hebert. Object recognition by integrating multiple image segmentations. In ECCV, 2008.
[19] J.J. Lim, P. Arbelaez, Chunhui Gu, and J. Malik. Context by region ancestry. In ICCV, 2009.
[20] Z. Tu, X. Chen, A.L. Yuille, and S.-C. Zhu. Image parsing: unifying segmentation, detection, and recognition. In ICCV, 2003.
[21] V. Kolmogorov. Convergent tree-reweighted message passing for energy minimization. PAMI, 28(10):1568–1583, 2006.
[22] S. Kumar, J. August, and M. Hebert. Exploiting inference for approximate parameter learning in discriminative fields: An empirical study. In EMMCVPR, 2005.
[23] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The PASCAL Visual Object Classes Challenge 2010 (VOC2010) Results. http://www.pascal-network.org/challenges/VOC/.
[24] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan. Object detection with discriminatively trained part-based models. PAMI, 32(9):1627–1645, 2010.
[25] S. Gould, J. Rodgers, D. Cohen, G. Elidan, and D. Koller. Multi-class segmentation with relative location prior. IJCV, 80(3):300–316, 2008.
[26] A. Rahimi and B. Recht. Random features for large-scale kernel machines. In NIPS, December 2007.
[27] F. Li, C. Ionescu, and C. Sminchisescu. Random Fourier approximations for skewed multiplicative histogram kernels. In DAGM, September 2010.
[28] F. Li, J. Carreira, and C. Sminchisescu. Object recognition by sequential figure-ground ranking. IJCV, 2012.
[29] D. G. Lowe. Distinctive image features from scale-invariant keypoints. IJCV, 60(2):91–110, 2004.
[30] K. E. A. van de Sande, T. Gevers, and C. G. M. Snoek. Evaluating color descriptors for object and scene recognition. PAMI, 32(9):1582–1596, 2010. 9