nips nips2012 nips2012-101 nips2012-101-reference knowledge-graph by maker-knowledge-mining

101 nips-2012-Discriminatively Trained Sparse Code Gradients for Contour Detection

Source: pdf

Author: Ren Xiaofeng, Liefeng Bo

Abstract: Finding contours in natural images is a fundamental problem that serves as the basis of many tasks such as image segmentation and object recognition. At the core of contour detection technologies are a set of hand-designed gradient features, used by most approaches including the state-of-the-art Global Pb (gPb) operator. In this work, we show that contour detection accuracy can be significantly improved by computing Sparse Code Gradients (SCG), which measure contrast using patch representations automatically learned through sparse coding. We use K-SVD for dictionary learning and Orthogonal Matching Pursuit for computing sparse codes on oriented local neighborhoods, and apply multi-scale pooling and power transforms before classifying them with linear SVMs. By extracting rich representations from pixels and avoiding collapsing them prematurely, Sparse Code Gradients effectively learn how to measure local contrasts and ﬁnd contours. We improve the F-measure metric on the BSDS500 benchmark to 0.74 (up from 0.71 of gPb contours). Moreover, our learning approach can easily adapt to novel sensor data such as Kinect-style RGB-D cameras: Sparse Code Gradients on depth maps and surface normals lead to promising contour detection using depth and depth+color, as veriﬁed on the NYU Depth Dataset. 1

reference text

[1] M. Aharon, M. Elad, and A. Bruckstein. K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE Transactions on Signal Processing, 54(11):4311–4322, 2006.

[2] P. Arbelaez, M. Maire, C. Fowlkes, and J. Malik. Contour detection and hierarchical image segmentation. IEEE Trans. PAMI, 33(5):898–916, 2011.

[3] L. Bo, X. Ren, and D. Fox. Hierarchical Matching Pursuit for Image Classiﬁcation: Architecture and Fast Algorithms. In Advances in Neural Information Processing Systems 24, 2011.

[4] L. Bo, X. Ren, and D. Fox. Unsupervised Feature Learning for RGB-D Based Object Recognition. In International Symposium on Experimental Robotics (ISER), 2012.

[5] P. Dollar, Z. Tu, and S. Belongie. Supervised learning of edges and object boundaries. In CVPR, volume 2, pages 1964–71, 2006.

[6] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The PASCAL Visual Object Classes Challenge 2008 (VOC2008). http://www.pascal-network.org/challenges/VOC/voc2008/.

[7] R. Fan, K. Chang, C. Hsieh, X. Wang, and C. Lin. Liblinear: A library for large linear classiﬁcation. The Journal of Machine Learning Research, 9:1871–1874, 2008.

[8] V. Ferrari, T. Tuytelaars, and L. V. Gool. Object detection by contour segment networks. In ECCV, pages 14–28, 2006.

[9] C. Gu, J. Lim, P. Arbel´ ez, and J. Malik. Recognition using regions. In CVPR, pages 1030–1037, 2009. a

[10] P. Henry, M. Krainin, E. Herbst, X. Ren, and D. Fox. Rgb-d mapping: Using depth cameras for dense 3d modeling of indoor environments. In International Symposium on Experimental Robotics (ISER), 2010.

[11] G. Hinton, S. Osindero, and Y. Teh. A fast learning algorithm for deep belief nets. Neural computation, 18(7):1527–1554, 2006.

[12] I. Kokkinos. Highly accurate boundary detection and grouping. In CVPR, pages 2520–2527, 2010.

[13] K. Lai, L. Bo, X. Ren, and D. Fox. A large-scale hierarchical multi-view RGB-D object dataset. In ICRA, pages 1817–1824, 2011.

[14] H. Lee, R. Grosse, R. Ranganath, and A. Ng. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In ICML, pages 609–616, 2009.

[15] J. Mairal, F. Bach, J. Ponce, G. Sapiro, and A. Zisserman. Discriminative learned dictionaries for local image analysis. In CVPR, pages 1–8, 2008.

[16] J. Mairal, M. Leordeanu, F. Bach, M. Hebert, and J. Ponce. Discriminative sparse image models for class-speciﬁc edge detection and image interpretation. ECCV, pages 43–56, 2008.

[17] D. Martin, C. Fowlkes, and J. Malik. Learning to detect natural image boundaries using brightness and texture. In Advances in Neural Information Processing Systems 15, 2002.

[18] Y. Pati, R. Rezaiifar, and P. Krishnaprasad. Orthogonal Matching Pursuit: Recursive Function Approximation with Applications to Wavelet Decomposition. In The Twenty-Seventh Asilomar Conference on Signals, Systems and Computers, pages 40–44, 1993.

[19] F. Perronnin, J. S´ nchez, and T. Mensink. Improving the ﬁsher kernel for large-scale image classiﬁcation. a In ECCV, pages 143–156, 2010.

[20] M. Prasad, A. Zisserman, A. Fitzgibbon, M. Kumar, and P. Torr. Learning class-speciﬁc edges for object detection and segmentation. Computer Vision, Graphics and Image Processing, pages 94–105, 2006.

[21] X. Ren. Multi-scale improves boundary detection in natural images. In ECCV, pages 533–545, 2008.

[22] X. Ren, L. Bo, and D. Fox. RGB-(D) scene labeling: features and algorithms. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages 2759–2766. IEEE, 2012.

[23] X. Ren, C. Fowlkes, and J. Malik. Cue integration in ﬁgure/ground labeling. In Advances in Neural Information Processing Systems 18, 2005.

[24] R. Rubinstein, M. Zibulevsky, and M. Elad. Efﬁcient Implementation of the K-SVD Algorithm using Batch Orthogonal Matching Pursuit. Technical report, CS Technion, 2008.

[25] J. Shotton, A. Fitzgibbon, M. Cook, T. Sharp, M. Finocchio, R. Moore, A. Kipman, and A. Blake. Realtime human pose recognition in parts from single depth images. In CVPR, volume 2, page 3, 2011.

[26] J. Shotton, J. Winn, C. Rother, and A. Criminisi. Textonboost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation. In ECCV, 2006.

[27] N. Silberman and R. Fergus. Indoor scene segmentation using a structured light sensor. In IEEE Workshop on 3D Representation and Recognition (3dRR), 2011.

[28] J. Wright, A. Yang, A. Ganesh, S. Sastry, and Y. Ma. Robust face recognition via sparse representation. IEEE Trans. PAMI, 31(2):210–227, 2009.

[29] J. Yang, K. Yu, Y. Gong, and T. Huang. Linear spatial pyramid matching using sparse coding for image classiﬁcation. In CVPR, pages 1794–1801, 2009.

[30] K. Yu, Y. Lin, and J. Lafferty. Learning image representations from the pixel level via hierarchical sparse coding. In CVPR, pages 1713–1720, 2011.

[31] Q. Zhu, G. Song, and J. Shi. Untangling cycles for contour grouping. In ICCV, 2007. 9