nips nips2011 nips2011-227 nips2011-227-reference knowledge-graph by maker-knowledge-mining

227 nips-2011-Pylon Model for Semantic Segmentation

Source: pdf

Author: Victor Lempitsky, Andrea Vedaldi, Andrew Zisserman

Abstract: Graph cut optimization is one of the standard workhorses of image segmentation since for binary random ﬁeld representations of the image, it gives globally optimal results and there are efﬁcient polynomial time implementations. Often, the random ﬁeld is applied over a ﬂat partitioning of the image into non-intersecting elements, such as pixels or super-pixels. In the paper we show that if, instead of a ﬂat partitioning, the image is represented by a hierarchical segmentation tree, then the resulting energy combining unary and boundary terms can still be optimized using graph cut (with all the corresponding beneﬁts of global optimality and efﬁciency). As a result of such inference, the image gets partitioned into a set of segments that may come from different layers of the tree. We apply this formulation, which we call the pylon model, to the task of semantic segmentation where the goal is to separate an image into areas belonging to different semantic classes. The experiments highlight the advantage of inference on a segmentation tree (over a ﬂat partitioning) and demonstrate that the optimization in the pylon model is able to ﬂexibly choose the level of segmentation across the image. Overall, the proposed system has superior segmentation accuracy on several datasets (Graz-02, Stanford background) compared to previously suggested approaches. 1

reference text

[1] N. Ahuja. A transform for multiscale image segmentation by integrated edge and region detection. IEEE Trans. Pattern Anal. Mach. Intell., 18(12), 1996.

[2] P. Arbelaez, M. Maire, C. Fowlkes, and J. Malik. Contour detection and hierarchical image segmentation. IEEE Trans. Pattern Anal. Mach. Intell., 33(5):898–916, 2011.

[3] P. Awasthi, A. Gagrani, and B. Ravindran. Image modeling using tree structured conditional random ﬁelds. In IJCAI, pages 2060–2065, 2007.

[4] E. Boros and P. L. Hammer. Pseudo-boolean optimization. Discrete Applied Mathematics, 123(1-3):155–225, 2002.

[5] C. A. Bouman and M. Shapiro. A multiscale random ﬁeld model for bayesian image segmentation. IEEE Transactions on Image Processing, 3(2):162–177, 1994.

[6] Y. Boykov and V. Kolmogorov. An experimental comparison of min-cut/max-ﬂow algorithms for energy minimization in vision. IEEE Trans. Pattern Anal. Mach. Intell., 26(9):1124–1137, 2004.

[7] Y. Boykov, O. Veksler, and R. Zabih. Fast approximate energy minimization via graph cuts. IEEE Trans. Pattern Anal. Mach. Intell., 23(11):1222–1239, 2001.

[8] X. Chen, A. Jain, A. Gupta, and L. Davis. Piecing together the segmentation jigsaw using context. In CVPR, 2011.

[9] X. Feng, C. K. I. Williams, and S. N. Felderhof. Combining belief networks and neural networks for scene segmentation. IEEE Trans. Pattern Anal. Mach. Intell., 24(4):467–483, 2002.

[10] B. Fulkerson, A. Vedaldi, and S. Soatto. Class segmentation and object localization with superpixel neighborhoods. In ICCV, pages 670–677, 2009.

[11] S. Gould, R. Fulton, and D. Koller. Decomposing a scene into geometric and semantically consistent regions. In ICCV, pages 1–8, 2009.

[12] D. M. Greig, B. T. Porteous, and A. H. Seheult. Exact maximum a posteriori estimation for binary images. Journal of the Royal Statistical Society, 51(2), 1989.

[13] C. Gu, J. J. Lim, P. Arbelaez, and J. Malik. Recognition using regions. In CVPR, pages 1030–1037, 2009.

[14] T. Joachims, T. Finley, and C.-N. J. Yu. Cutting-plane training of structural SVMs. Machine Learning, 77(1), 2009.

[15] V. Kolmogorov, Y. Boykov, and C. Rother. Applications of parametric maxﬂow in computer vision. In ICCV, pages 1–8, 2007.

[16] V. Kolmogorov and R. Zabih. What energy functions can be minimized via graph cuts? IEEE Trans. Pattern Anal. Mach. Intell., 26(2):147–159, 2004.

[17] A. Kulesza and F. Pereira. Structured learning with approximate inference. In NIPS, 2007.

[18] M. P. Kumar and D. Koller. Efﬁciently selecting regions for scene understanding. In CVPR, 2010.

[19] L. Ladicky, C. Russell, P. Kohli, and P. H. S. Torr. Associative hierarchical crfs for object class image segmentation. In ICCV, pages 739–746, 2009.

[20] T. Malisiewicz and A. A. Efros. Improving spatial support for objects via multiple segmentations. In BMVC, September 2007.

[21] M. Marszalek and C. Schmid. Accurate object localization with shape masks. In CVPR, 2007.

[22] D. Munoz, J. A. Bagnell, and M. Hebert. Stacked hierarchical labeling. In ECCV (6), pages 57–70, 2010.

[23] A. Opelt, A. Pinz, M. Fussenegger, and P. Auer. Generic object recognition with boosting. IEEE Trans. Pattern Anal. Mach. Intell., 28(3):416–431, 2006.

[24] N. Plath, M. Toussaint, and S. Nakajima. Multi-class image segmentation using conditional random ﬁelds and global classiﬁcation. In ICML, page 103, 2009.

[25] J. Reynolds and K. Murphy. Figure-ground segmentation using a hierarchical conditional random ﬁeld. In CRV, pages 175–182, 2007.

[26] P. Schnitzspan, M. Fritz, and B. Schiele. Hierarchical support vector random ﬁelds: Joint training to combine local and global features. In ECCV (2), pages 527–540, 2008.

[27] E. Sharon, A. Brandt, and R. Basri. Fast multiscale image segmentation. In CVPR, 2000.

[28] J. Shi and J. Malik. Normalized cuts and image segmentation. In CVPR, pages 731–737, 1997.

[29] J. Shotton, J. M. Winn, C. Rother, and A. Criminisi. TextonBoost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation. In ECCV (1), pages 1–15, 2006.

[30] D. Singaraju and R. Vidal. Using global bag of features models in random ﬁelds for joint categorization and segmentation of objects. In CVPR, 2011.

[31] R. Szeliski, R. Zabih, D. Scharstein, O. Veksler, V. Kolmogorov, A. Agarwala, M. F. Tappen, and C. Rother. A comparative study of energy minimization methods for markov random ﬁelds with smoothness-based priors. IEEE Trans. Pattern Anal. Mach. Intell., 30(6):1068–1080, 2008.

[32] M. Szummer, P. Kohli, and D. Hoiem. Learning crfs using graph cuts. In ECCV, 2008.

[33] B. Taskar, C. Guestrin, and D. Koller. Max-margin markov networks. In NIPS, 2003.

[34] S. Todorovic and N. Ahuja. Learning subcategory relevances for category recognition. In CVPR, 2008.

[35] I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun. Support vector machine learning for interdependent and structured output spaces. In ICML, 2004.

[36] A. Vedaldi and B. Fulkerson. VLFeat: An open and portable library of computer vision algorithms. http://www.vlfeat.org/, 2008.

[37] A. Vedaldi and A. Zisserman. Efﬁcient additive kernels via explicit feature maps. In CVPR, 2010.

[38] O. Veksler. Image segmentation by nested cuts. In CVPR, pages 1339–, 2000.

[39] J. Wang, J. Yang, K. Yu, F. Lv, T. S. Huang, and Y. Gong. Locality-constrained linear coding for image classiﬁcation. In CVPR, pages 3360–3367, 2010. 9