cvpr cvpr2013 cvpr2013-242 cvpr2013-242-reference knowledge-graph by maker-knowledge-mining

242 cvpr-2013-Label Propagation from ImageNet to 3D Point Clouds

Source: pdf

Author: Yan Wang, Rongrong Ji, Shih-Fu Chang

Abstract: Recent years have witnessed a growing interest in understanding the semantics of point clouds in a wide variety of applications. However, point cloud labeling remains an open problem, due to the difficulty in acquiring sufficient 3D point labels towards training effective classifiers. In this paper, we overcome this challenge by utilizing the existing massive 2D semantic labeled datasets from decadelong community efforts, such as ImageNet and LabelMe, and a novel “cross-domain ” label propagation approach. Our proposed method consists of two major novel components, Exemplar SVM based label propagation, which effectively addresses the cross-domain issue, and a graphical model based contextual refinement incorporating 3D constraints. Most importantly, the entire process does not require any training data from the target scenes, also with good scalability towards large scale applications. We evaluate our approach on the well-known Cornell Point Cloud Dataset, achieving much greater efficiency and comparable accuracy even without any 3D training data. Our approach shows further major gains in accuracy when the training data from the target scenes is used, outperforming state-ofthe-art approaches with far better efficiency.

reference text

[1] N. Snavely, S. M. Seitz, and R. Szeliski. Photo Tourism: Exploring Image Collections in 3D. SigGraph, 2006.

[2] S. Izadi, D. Kim, O. Hilliges, et al. KinectFusion: Realtime 3D Reconstruction and Interaction Using a Moving Depth Camera. UIST, 2011.

[3] M. Montemerlo, S. Thrun, D. Koller, and B. Wegbreit. FastSLAM: A Factored Solution to the Simultaneous Localization and Mapping Problem. AAAI, 2002.

[4] Y. Furukawa, B. Curless, S. Seitz, and R. Szeliski. Towards Internet-scale Multi-view Stereo. CVPR, 2010.

[5] H. Koppula, A. Anand, T. Joachims, and A. Saxena. Semantic Labeling of 3D Point Clouds for Indoor Scenes. NIPS, 2011.

[6] A. Anand, H. S. Koppula, T. Joachims, and A. Saxena. Contextually Guided Semantic Labeling and Search for 3D Point Clouds. IJRR, 2012.

[7] X. Xiong, D. Munoz, J. Bagnell, and M. Hebert. 3-d scene analysis via sequenced predictions over points and regions. ICRA, 2011.

[8] L. Nan, K. Xie, and A. Sharf. A Search-Classify Approach for Cluttered Indoor Scene Understanding. SigGraph Asia, 2012.

[9] E. Kalogerakis, A. Hertzmann, and K. Singh. Learning 3D Mesh Segmentation and Labeling. ToG, 2010.

[10] K. Lai and D. Fox Object Recognition in 3D Point Clouds Using Web Data and Domain Adaptation IJRR, 2010.

[11] B. Russell, A. Torralba, K. P. Murphy, and W. Freeman. LabelMe: A Database and Web-Based Tool for Image Annotation. IJCV, 2008.

[12] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. FeiFei. ImageNet: A Large-Scale Hierarchical Image Database. CVPR, 2009.

[13] J. Xiao, J. Hays, K. A. Ehinger, A. Oliva, and A. Torralba. SUN Database: Large-scale Scene Recognition from Abbey to Zoo. CVPR, 2010.

[14] D. Kuettel, M. Guillaumin, and V. Ferrari. Segmentation Propagation in ImageNet. ECCV, 2012.

[15] NYU Indoor Depth Dataset. http://cs.nyu.edu/ silber- man/datasets/.

[16] Cornell Point Cloud Dataset. http://pr.cs.cornell.edu/sceneunderstanding/data/data.php.

[17] A. Shrivastava, T. Malisiewicz, A. Gupta, and A. A. Efros. Data-driven Visual Similarity for Cross-domain Image Matching. SigGraph, 2011.

[18] T. Malisiewicz, A. Gupta, and A. A. Efros. Ensemble of Exemplar-SVMs for Object Detection and Beyond. ICCV, 2011.

[19] S. Gould, R. Fulton, and D. Koller. Decomposing a scene into geometric and semantically consistent regions. IJCV, 2009.

[20] D. Comaniciu and P. Meer. Mean Shift: A Robust Approach Toward Feature Space Analysis. TPAMI, 2002.

[21] T. Deselaers, B. Alexe, and V. Ferrari. Weakly Supervised Localization and Learning with Generic Knowledge. IJCV, 2012.

[22] B. Alexe, T. Deselaers, and V. Ferrari. What is an object? CVPR, 2010.

[23] T. Joachims. Optimizing Search Engines Using Clickthrough Data KDD, 2002.

[24] I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun Large Margin Methods for Structured and Interdependent Output Variables JMLR, 2005.

[25] D. Munoz, J. Bagnell, and M. Hebert. Stacked Hierarchical Labeling. ECCV, 2010.

[26] N. Dalal and B. Triggs. Histograms of Oriented Gradients for Human Detection. CVPR, 2005.

[27] G. Ben-Artzi, H. Hel-Or, and Y. Hel-Or. The gray-code filter kernels. TPAMI, 2007.

[28] K. Murphy, A. Torralba, and W. Freeman. Using the forest to see the trees: a graphical model relating features, objects and scenes. NIPS, 2003

[29] R. Fulton and D. Koller. Decomposing a Scene into Geometric and Semantically Consistent Regions. CVPR, 2009.

[30] S. J. Pan and Q. Yang. A Survey on Transfer Learning. TKDE, 2010. [3 1] C. Liu, J. Yuen, and A. Torralba. Nonparametric Scene Parsing via Label Transfer. TPAMI, 2011.

[32] A. Patterson, P. Mordohai, and K. Daniilidis. Object Detection from Large-scale 3-D Datasets Using Bottom-up and Top-down Descriptors. ECCV, 2008. 333 111444002