iccv iccv2013 iccv2013-433 iccv2013-433-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Hongyi Zhang, Andreas Geiger, Raquel Urtasun
Abstract: In this paper, we are interested in understanding the semantics of outdoor scenes in the context of autonomous driving. Towards this goal, we propose a generative model of 3D urban scenes which is able to reason not only about the geometry and objects present in the scene, but also about the high-level semantics in the form of traffic patterns. We found that a small number of patterns is sufficient to model the vast majority of traffic scenes and show how these patterns can be learned. As evidenced by our experiments, this high-level reasoning significantly improves the overall scene estimation as well as the vehicle-to-lane association when compared to state-of-the-art approaches [10].
[1] W. Choi and S. Savarese. A unified framework for multitarget tracking and collective activity recognition. In ECCV, 2012. 2
[2] A. Ess, B. Leibe, K. Schindler, and L. Van Gool. Ro- bust multi-person tracking from a mobile platform. PAMI, 3062 Location Location Location ormr)e(2604 Orien0γt.a5tio3n4− a rm 1)m(roer6420 .6Orienξta0.8io43n− a rm 1)m(ro e6420 .2Orie∆0n.σ4ta2htio043n.− 6a rm 0.8 degr(oe)r02640Ov0.eγ5rlap43− a rm 1)ed(ro eg6420 .6Oveξr0.la8p34− a rm 1)ged(ro e6420 .2O∆v0.σ4e2hrlap043.− 6a rm 0.8 er ncegtap64 5 50 0 .γ534− a rm 1tecr pnega654 605 0 .6ξ0.843− a rm 1egatecr pn6 5 450 50.2∆0.σ42h430− .6a rm 0.8 Figure 6. Robustness against Parameter Variations: We depict the robustness of our method against varying three parameters: the logarithm weight γ of the heading probability, the variance of the Gaussian kernel ∆σ2h in the transition probability of h, and the scaling constant ξ on the uncertainty of detections. The black dot marks the parameter setting used in all of our experiments, as reported in Table 3.
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12] 31:1831–1846, 2009. 2 A. Ess, T. Mueller, H. Grabner, and L. van Gool. Segmentation-based urban traffic scene understanding. In BMVC, 2009. 1, 2 P. Felzenszwalb, R.Girshick, D. McAllester, and D. Ramanan. Object detection with discriminatively trained partbased models. PAMI, 32: 1627–1645, 2010. 4 S. Fidler, S. Dickinson, and R. Urtasun. 3d object detection and viewpoint estimation with a deformable 3d cuboid model. In NIPS, December 2012. 1 D. F. Fouhey, V. Delaitre, A. Gupta, A. A. Efros, I. Laptev, and J. Sivic. People watching: Human actions as a cue for single-view geometry. In ECCV, 2012. 2 D. Gavrila and S. Munder. Multi-cue pedestrian detection and tracking from a moving vehicle. IJCV, 73:41–59, 2007. 2 A. Geiger and B. Kitt. Objectflow: A descriptor for classifying traffic motion. In IEEE Intelligent Vehicles Symposium, San Diego, USA, June 2010. 2 A. Geiger, M. Lauer, and R. Urtasun. A generative model for 3d urban scene understanding from movable platforms. In CVPR, 2011. 1, 2, 6 A. Geiger, C. Wojek, and R. Urtasun. Joint 3d estimation of objects and scene layout. In NIPS, Granada, Spain, December 2011. 1, 2, 3, 6, 7 R. Guo and D. Hoiem. Beyond the line of sight: Labeling the underlying surfaces. In ECCV, 2012. 2 A. Gupta, A. Efros, and M. Hebert. Blocks world revisited: Image understanding using qualitative geometry and mechanics. In ECCV, 2010. 1, 2
[13] V. Hedau, D. Hoiem, and D. Forsyth. Recovering the spatial layout of cluttered rooms. In ICCV, 2009. 2
[14] V. Hedau, D. Hoiem, and D. A. Forsyth. Recovering free space of indoor scenes from a single image. In CVPR, 2012. 1, 2
[15] D. Hoiem, A. Efros, and M. Hebert. Recovering surface layout from an image. IJCV, 75: 151–172, 2007. 1, 2
[16] C. Huang, B. Wu, and R. Nevatia. Robust object tracking by hierarchical association of detection responses. In ECCV, 2008. 2
[17] D. Kuettel, M. D. Breitenstein, L. V. Gool, and V. Ferrari. What’s going on?: Discovering spatio-temporal dependencies in dynamic scenes. In CVPR, 2010. 1, 2, 6
[18] L. Leal-Taixe, G. Pons-Moll, and B. Rosenhahn. Everybody needs somebody: modeling social and grouping behavior on a linear programming multiple people tracker. 1st Workshop on Modeling, Simulation and Visual Analysis of Large Crowds, 2011. 2
[19] D. Lee, A. Gupta, M. Hebert, and T. Kanade. Estimating spatial layout of rooms using volumetric reasoning about objects and surfaces. In NIPS, 2010. 1, 2
[20] P. K. Nathan Silberman, Derek Hoiem and R. Fergus. Indoor segmentation and support inference from rgbd images. In ECCV, 2012. 2
[21] S. Pellegrini, A. Ess, K. Schindler, and L. J. V. Gool. You’ll never walk alone: Modeling social behavior for multi-target tracking. In ICCV, 2009. 2
[22] B. Pepik, P. Gehler, M. Stark, and B. Schiele. 3d2pm - 3d deformable part models. In ECCV, Firenze, Italy, 2012. 1
[23] L. D. Pero, J. Bowdish, D. Fried, B. Kermgard, E. Hartley, and K. Barnard. Bayesian geometric modeling of indoor
[24]
[25]
[26]
[27]
[28]
[29]
[30] [3 1] scenes. In CVPR, 2012. 1, 2 L. G. Roberts. Machine perception of three-dimensional solids. PhD thesis, Massachusetts Institute of Technology. Dept. of Electrical Engineering, 1963. 1 A. Saxena, S. H. Chung, and A. Y. Ng. 3-D depth reconstruction from a single still image. IJCV, 76:53–69, 2008. 1, 2 A. Schwing and R. Urtasun. Efficient Exact Inference for 3D Indoor Scene Understanding. In ECCV, 2012. 1, 2 P. Sturgess, K. Alahari, L. Ladicky, and P. H. S. Torr. Combining appearance and structure from motion features for road scene understanding. In BMVC, 2009. 1 H. Wang, S. Gould, and D. Koller. Discriminative learning with latent variables for cluttered indoor scene understanding. In ECCV, 2010. 2 C. Wojek, S. Walk, S. Roth, K. Schindler, and B. Schiele. Monocular visual scene understanding: Understanding multi-object traffic scenes. PAMI, 2012. 1, 2 J. Xiao, B. C. Russell, and A. Torralba. Localizing 3d cuboids in single-view images. In Advances in Neural Information Processing Systems, December 2012. 1, 2 K. Yamaguchi, A. C. Berg, L. E. Ortiz, and T. L. Berg. Who are you with and where are you going? In CVPR, 2011. 2 3063