cvpr cvpr2013 cvpr2013-43 cvpr2013-43-reference knowledge-graph by maker-knowledge-mining

43 cvpr-2013-Analyzing Semantic Segmentation Using Hybrid Human-Machine CRFs

Source: pdf

Author: Roozbeh Mottaghi, Sanja Fidler, Jian Yao, Raquel Urtasun, Devi Parikh

Abstract: Recent trends in semantic image segmentation have pushed for holistic scene understanding models that jointly reason about various tasks such as object detection, scene recognition, shape analysis, contextual reasoning. In this work, we are interested in understanding the roles of these different tasks in aiding semantic segmentation. Towards this goal, we “plug-in ” human subjects for each of the various components in a state-of-the-art conditional random field model (CRF) on the MSRC dataset. Comparisons among various hybrid human-machine CRFs give us indications of how much “head room ” there is to improve segmentation by focusing research efforts on each of the tasks. One of the interesting findings from our slew of studies was that human classification of isolated super-pixels, while being worse than current machine classifiers, provides a significant boost in performance when plugged into the CRF! Fascinated by this finding, we conducted in depth analysis of the human generated potentials. This inspired a new machine potential which significantly improves state-of-the-art performance on the MRSC dataset.

reference text

[1] P. Arbelaez, M. Maire, C. Fowlkes, and J. Malik. Contour detection and hierarchical image segmentation. In PAMI, 201 1. 4

[2] T. Bachmann. Identification of spatially quantized tachistoscopic images of faces: How many pixels does it take to carry identity? Europ. J. of Cogn. Psych., 1991. 2

[3] H. Barrow and J. Tenenbaum. Recovering intrinsic scene characteristics from images. In Comp. Vision Systems, 1978. 2

[4] G. J. Brostow, J. Shotton, J. Fauqueur, and R. Cipolla. Segmentation and recognition using structure from motion point clouds. In ECCV, 2008. 3

[5] T. Brox, L. Bourdev, S. Maji, and J. Malik. Object segmentation by alignment of poselet activations to image contours. In CVPR, 2011. 2

[6] G. Cardinal, X. Boix, J. van de Weijer, A. D. Bagdanov, J. Serrat, and J. Gonzalez. Harmony potentials for joint classification and segmentation. In CVPR, 2010. 2, 3

[7] M. J. Choi, J. J. Lim, A. Torralba, and A. S. Willsky. Exploiting hierarchical context on a large database of object categories. In CVPR, 2010. 3

[8] C. K. Chow and C. N. Liu. Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory, 14(3):462467, 1968. 4

[9] L. Fei-Fei, R. VanRullen, C. Koch, and P. Perona. Rapid natural scene categorization in the near absence of attention. PNAS, 2002. 2

[10] P. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan. Object detection with discriminatively trained part based models. PAMI, 32(9), 2010. 1, 3, 4

[11] C. C. Fowlkes. Measuring the ecological validity of grouping and figure-ground cues. Thesis, 2005. 2

[12] S. Gould, T. Gao, and D. Koller. Region-based segmentation and object detection. In NIPS, 2009. 1, 2

[13] C. Gu, J. J. Lim, P. Arbelaez, and J. Malik. Recognition using regions. In CVPR, 2009. 2

[14] B. Hariharan, P. Arbelaez, L. Bourdev, S. Maji, and J. Malik. Semantic contours from inverse detectors. In ICCV, 201 1. 5

[15] T. Hazan and R. Urtasun. A primal-dual message-passing algorithm for approximated large scale structured prediction. In NIPS, 2010. 3

[16] G. Heitz, S. Gould, A. Saxena, and D. Koller. Cascaded classification models: Combining models for holistic scene understanding. In NIPS, 2008. 1, 2

[17] D. Hoiem, A. A. Efros, , and M. Hebert. Closing the loop on scene interpretation. In CVPR, 2008. 2

[18] P. Kohli, M. P. Kumar, and P. H. S. Torr. and beyond: Solving energies with higher order cliquess. In CVPR, 2007. 4

[19] S. Kumar and M. Hebert. A hierarchical field framework for unified contextbased classification. In ICCV, 2005. 2

[20] L. Ladicky, C. Russell, P. H. S. Torr, and P. Kohli. Associative hierarchical crfs for object class image segmentation. In ICCV, 2009. 3, 4

[21] L. Ladicky, P. Sturgess, K. Alahari, C. Russell, and P. H. Torr. What, where and how many? combining object detectors and crfs. In ECCV, 2010. 2, 3, 4

[22] V. Lempitsky, P. Kohli, C. Rother, and B. Sharp. Image segmentation with a bounding box prior. In ICCV, 2009. 2

[23] C. Li, A. Kowdle, A. Saxena, and T. Chen. Towards holistic scene understanding: Feedback enabled cascaded classification models. In NIPS, 2010. 1, 2

[24] C. Liu, J. Yuen, and A. Torralba. Nonparametric scene parsing: label transfer via dense scene alignment. In CVPR, 2009. 3

[25] T. Malisiewicz and A. A.Efros. Improving spatial support for objects via multiple segmentations. In BMVC, 2007. 4

[26] A. Oliva and P. G. Schyns. Diagnostic colors mediate scene recognition. Cognitive Psychology, 2000. 2

[27] D. Parikh and C. Zitnick. Finding the weakest link in person detectors. In CVPR, 2011. 2

[28] S. Park, T. Brady, M. Greene, and A. Oliva. Disentangling scene content from its spatial boundary: Complementary roles for the ppa and loc in representing real-world scenes. Journal of Neuroscience, 2011. 1

[29] A. Rabinovich, A. Vedaldi, C. Galleguillos, E. Wiewiora, and S. Belongie. Objects in context. In ICCV, 2007. 1, 2

[30] J. Rivest and P. Cabanagh. Localizing contours defined by more than one attribute. Vision Research, 1996. 2 [3 1] A. Schwing, T. Hazan, M. Pollefeys, and R. Urtasun. Distributed message passing for large scale graphical models. In CVPR, 2011. 3

[32] J. Shotton, M. Johnson, and R. Cipolla. Semantic texton forests for image categorization and segmentation. In CVPR, 2008. 4 p3

[33] J. Shotton, J. Winn, C. Rother, and A. Criminisi. Textonboost for image understanding: Multi-class object recognition and segmentation by jointly modeling appearance, shape and context. IJCV, 81(1), 2007. 3, 4, 7

[34] E. Sudderth, A. Torralba, W. T. Freeman, and A. Wilsky. Learning hierarchical models of scenes, objects, and parts. In ICCV, 2005. 2

[35] A. Torrabla. How many pixels make an image? Visual Neuroscience, 2009. 1, 2

[36] A. Torralba, K. P. Murphy, and W. T. Freeman. Contextual models for object detection using boosted random fields. In NIPS, pages 1401–1408, 2005. 2

[37] C. Wojek and B. Schiele. A dynamic conditional random field model for joint labeling of object and scene classes. In ECCV, volume 4, pages 733–747, 2008. 2

[38] J. Xiao, J. Hays, K. Ehinger, A. Oliva, and A. Torralba. Sun database: Largescale scene recognition from abbey to zoo. In CVPR, 2010. 1, 3, 4, 8

[39] Y. Yang and D. Ramanan. Articulated pose estimation using flexible mixtures of parts. In CVPR, 2011. 1

[40] J. Yao, S. Fidler, and R. Urtasun. Describing the scene as a whole: Joint object detection, scene classification and semantic segmentation. In CVPR, 2012. 1, 2, 3, 4, 5, 8 333 111454880