nips nips2009 nips2009-201 nips2009-201-reference knowledge-graph by maker-knowledge-mining

201 nips-2009-Region-based Segmentation and Object Detection

Source: pdf

Author: Stephen Gould, Tianshi Gao, Daphne Koller

Abstract: Object detection and multi-class image segmentation are two closely related tasks that can be greatly improved when solved jointly by feeding information from one task to the other [10, 11]. However, current state-of-the-art models use a separate representation for each task making joint inference clumsy and leaving the classiﬁcation of many parts of the scene ambiguous. In this work, we propose a hierarchical region-based approach to joint object detection and image segmentation. Our approach simultaneously reasons about pixels, regions and objects in a coherent probabilistic model. Pixel appearance features allow us to perform well on classifying amorphous background classes, while the explicit representation of regions facilitate the computation of more sophisticated features necessary for object detection. Importantly, our model gives a single uniﬁed description of the scene—we explain every pixel in the image and enforce global consistency between all random variables in our model. We run experiments on the challenging Street Scene dataset [2] and show signiﬁcant improvement over state-of-the-art results for object detection accuracy. 1

reference text

[1] H.G. Barrow and J.M. Tenenbaum. Computational vision. IEEE, 1981.

[2] S. Bileschi and L. Wolf. A uniﬁed system for object detection, texture recognition, and context analysis based on the standard model feature set. In BMVC, 2005.

[3] D. Comaniciu and P. Meer. Mean shift: A robust approach toward feature space analysis. PAMI, 2002.

[4] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In CVPR, 2005.

[5] V. Ferrari, L. Fevrier, F. Jurie, and C. Schmid. Groups of adjacent contour segments for object detection. PAMI, 2008.

[6] M. Fink and P. Perona. Mutual boosting for contextual inference. In NIPS, 2003.

[7] Stephen Gould, Rick Fulton, and Daphne Koller. Decompsing a scene into geometric and semantically consistent regions. In ICCV, 2009.

[8] C. Gu, J. J. Lim, P. Arbelaez, and J. Malik. Recognition using regions. In CVPR, 2009.

[9] G. Heitz and D. Koller. Learning spatial context: Using stuff to ﬁnd things. In ECCV, 2008.

[10] G. Heitz, S. Gould, A. Saxena, and D. Koller. Cascaded classiﬁcation models: Combining models for holistic scene understanding. In NIPS, 2008.

[11] D. Hoiem, A. A. Efros, and M. Hebert. Closing the loop on scene interpretation. CVPR, 2008.

[12] D. Hoiem, A. A. Efros, and M. Hebert. Putting objects in perspective. IJCV, 2008.

[13] B. Leibe, A. Leonardis, and B. Schiele. Combined object categorization and segmentation with an implicit shape model. In ECCV, 2004.

[14] C. Liu, J. Yuen, and A. Torralba. Nonparametric scene parsing: Label transfer via dense scene alignment. In CVPR, 2009.

[15] A. Rabinovich, A. Vedaldi, C. Galleguillos, E. Wiewiora, and S. Belongie. Objects in context. In ICCV, 2007.

[16] J. Shotton, J. Winn, C. Rother, and A. Criminisi. TextonBoost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation. In ECCV, 2006.

[17] E. Sudderth, A. Torralba, W. Freeman, and A. Willsky. Describing visual scenes using transformed objects and parts. In IJCV, 2007.

[18] A. Torralba, K. P. Murphy, W. T. Freeman, and M. A. Rubin. Context-based vision system for place and object recognition, 2003.

[19] A. Torralba, K. Murphy, and W. Freeman. Sharing features: efﬁcient boosting procedures for multiclass object detection. In CVPR, 2004.

[20] A. Torralba, K. Murphy, and W. Freeman. Contextual models for object detection using boosted random ﬁelds. In NIPS, 2004.

[21] Z. Tu, X. Chen, A. L. Yuille, and S.-C. Zhu. Image parsing: Unifying segmentation, detection, and recognition. In ICCV, 2003.

[22] P. Viola and M. J. Jones. Robust real-time face detection. IJCV, 2004.

[23] C. Wojek and B. Schiele. A dynamic conditional random ﬁeld model for joint labeling of object and scene classes. In ECCV, 2008. 9