cvpr cvpr2013 cvpr2013-70 cvpr2013-70-reference knowledge-graph by maker-knowledge-mining

70 cvpr-2013-Bottom-Up Segmentation for Top-Down Detection


Source: pdf

Author: Sanja Fidler, Roozbeh Mottaghi, Alan Yuille, Raquel Urtasun

Abstract: In this paper we are interested in how semantic segmentation can help object detection. Towards this goal, we propose a novel deformable part-based model which exploits region-based segmentation algorithms that compute candidate object regions by bottom-up clustering followed by ranking of those regions. Our approach allows every detection hypothesis to select a segment (including void), and scores each box in the image using both the traditional HOG filters as well as a set of novel segmentation features. Thus our model “blends ” between the detector and segmentation models. Since our features can be computed very efficiently given the segments, we maintain the same complexity as the original DPM [14]. We demonstrate the effectiveness of our approach in PASCAL VOC 2010, and show that when employing only a root filter our approach outperforms Dalal & Triggs detector [12] on all classes, achieving 13% higher average AP. When employing the parts, we outperform the original DPM [14] in 19 out of 20 classes, achieving an improvement of 8% AP. Furthermore, we outperform the previous state-of-the-art on VOC’10 test by 4%.


reference text

[1] P. Arbelaez, B. Hariharan, C. Gu, S. Gupta, L. Bourdev, and J. Malik. Finding animals: Semantic segmentation using regions and parts. In CVPR, 2012. 1

[2] H. Azizpour and I. Laptev. Object detection using stronglysupervised deformable part models. In ECCV, 2012. 1, 2

[3] L. Bertelli, T. Yu, D. Vu, and B. Gokturk. Kernelized structural svm learning for supervised object segmentation. In CVPR, 2011. 2

[4] L. Bourdev, S. Maji, T. Brox, and J. Malik. Detecting people using mutually consistent poselet activations. In ECCV, 2010. 1, 2

[5] T. Brox, L. Bourdev, S. Maji, and J. Malik. Object segmentation by alignment of poselet activations to image contours. In CVPR’11. 1

[6] G. Cardinal, X. Boix, J. van de Weijer, A. D. Bagdanov, J. Serrat, and J. Gonzalez. Harmony potentials for joint classification and segmentation. In CVPR, 2010. 1

[7] J. Carreira, R. Caseiroa, J. Batista, and C. Sminchisescu. Semantic segmentation with second-order pooling. In ECCV, 2012. 1, 5, 6, 7

[8] J. Carreira, F. Li, and C. Sminchisescu. Object Recognition by Sequential Figure-Ground Ranking. IJCV, 2011. 1, 5, 6

[9] Q. Chen, Z. Song, Y. Hua, Z. Huang, and S. Yan. Hierarchical matching with side information for image classification. In CVPR, 2012. 5, 6

[10] Y. Chen, L. Zhu, and A. Yuille. Active mask hierarchies for object detection. In ECCV, 2010. 1, 2

[11] Q. Dai and D. Hoiem. Learning to localize detected objects. In CVPR, 2012. 2

[12] N. Dalal and B. Triggs. Histograms of oriented gradients for human

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28] detection. In CVPR, pages I: 886–893, 2005. 1, 5, 6 C. Desai, D. Ramanan, and C. Fowlkes. Discriminative models for multi-class object layout. In ICCV, 2009. 2 P. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan. Object detection with discriminatively trained part based models. PAMI, 32(9), 2010. 1, 2, 3, 4, 5, 6 R. Girshick, P. Felzenszwalb, and D. McAllester. Object detection with grammar models. In NIPS, 2009. 4 C. Gu, P. Arbelaez, Y. Lin, K. Yu, and J. Malik. Multi-component models for object detection. In ECCV, 2012. 6 C. Gu, J. Lim, P. Arbelaez, and J. Malik. Recognition using regions. In CVPR, 2009. 2 G. Heitz, S. Gould, A. Saxena, and D. Koller. Cascaded classification models: Combining models for holistic scene understanding. In NIPS, 2008. 1 P. Kr ¨ahenb u¨hl and V. Koltun. Efficient inference in fully connected crfs with gaussian edge potentials. In NIPS, 2011. 1, 2 L. Ladicky, C. Russell, P. Kohli, and P. H. Torr. Graph cut based inference with co-occurrence statistics. In ECCV, 2010. 1 L. Ladicky, P. Sturgess, K. Alahari, C. Russell, and P. H. Torr. What, where and how many? combining object detectors and crfs. In ECCV, 2010. 2 V. Lempitsky, P. Kohli, C. Rother, and B. Sharp. Image segmentation with a bounding box prior. In ICCV, 2009. 1 M. Maire, S. X. Yu, and P. Perona. Object detection and segmentation from joint embedding of parts and pixels. In ICCV, 2011. 2 A. Monroy and B. Ommer. Beyond bounding-boxes: Learning object shape by model-driven grouping. In ECCV12. 2 R. Mottaghi. Augmenting deformable part models with irregularshaped object patches. In CVPR, 2012. 2 O. Parkhi, A. Vedaldi, C. V. Jawahar, and A. Zisserman. The truth about cats and dogs. In ICCV, 2011. 2 M. Pedersoli, A. Vedaldi, and J. Gonzlez. A coarse-to-fine approach for fast deformable object detection. In CVPR, 2011. 2 P. Srinivasan, Q. Zhu, and J. Shi. Many-to-one contour matching for describing and discriminating object shape. In CVPR, 2010. 2

[29] E. Sudderth, A. Torralba, W. T. Freeman, and A. Wilsky. Learning hierarchical models of scenes, objects, and parts. In ICCV, 2005. 1 333222999977 iecrsnpo0.46821clas=0e.4ropDOCcluaPMnrs−e0,lCwp.A6voPar=tls42,596A.031P=85.7241incerpso0.8624clas=0b.ireyDCOcPaMulres,C0v−.Aa6pwPlo=r2ts3a40,169.AP5=8043.1nosicerp 86420.clas=breiOCDcduPaMlrvs−C,0pw.Al6oP2art=0s4,1.A728tP90= 3.291noiscerp0.8642clas=0.broeDOCauPMtlrvs0−,Ca.pwl6A2oPrt0=sa631,.A4t50P9=7816. oiernscp0.82146clas0=.broeOtDCaluMPrsv0,l−Cawp.A6oP2r=pts10,aA85.6P1=28.56noiscerp0.41862clas0=.breuCDOcsPaMlrv,C0−Al.6pwoP2a=0rts541,37A.2Ps0=8574.1noiscerp0 8642.clas0=reODCcuPaMrlvsC−0,pw.A6a2oPrt=0s31,827At.P 0=18432.nociserp018642.clas0=reDOCatuPMlvrs0,−C.A6pw2aPo0rts=1, 45At.sP80,=349.1 npisecor0.84261clas=0.rehODCciaPMurslv−0,Caw.A6plPo2r=t0sa19,4.8A320P7=14.3nicerpos0.18624clas0=.roewOCDaluPMv 0rs.,Cl6−A2pwP0ao=1rts5,.80A1Ps9=23.5nosicerp0.18642las=di0n.regODCctPuaMrlsb−C,0epw.A6aoPvrt=s1,6lA.02tP5=187.4nciserpo0.18642clas0=.dreoDOCgauPMlrvs0−,C.pwl6A2aoPr0tsp=15,32.A0s8P=12.43 rsneiopc0.24681 las0=.horeOcDCsaMuPlr−0v,Cw.Aa6plPo2r=ts30a,A416.P85=A420.91snoicerp 84620la.s=mo4treDCcOPaMbuiklrs,Ce0−.A6pwvPaor=tls328,A.04P713=8.451nosicerp0.8642clas=0p.erODCcuPaMorlns−0,Cvpw.Aa6oPrl=ts24,0A7.1P5=9842.136nioscerp0.1842las=po0t.erdcODCauPMlrsn0−,C.pwt6AavoPr=s32,l.6A1t0P9=381. nroepisc0.21486 las=0.hrecDCOaPuMlprs0,C−vaw.A6poPl2r=ts1,0a86A.94P2=83.571inoscerp0.4862clas=0.4roefODCalPMvur0s.Cl,6−A2pwP0aor=1ts29,.0Ar781Ps6=25.noicesrp0.18642clas=0treODCciuPaMnrlsv−C0,apw.Al6oP2rt=s0a3,147At.5P20=8439.51nosicerp0 8642l.as=tv0mroeDncCOauiPMtolrs0−,C.pwv6AaoPrtl=s23,10At.P2=1839.7 Figure 3. Precision-recall curves on PASCAL VOC 2010 val. Note that our approach significantly outperforms all baselines. plane bike bird boat bottle buscarcatchair cow table dog horse motor person plant sheep sofa traintvAvg. VOC 2012 val, more segments sC eP gDM DCP CM (15(s15esg e)g [)7]5 6935. 8371 5214097. 056 2 327. 9138 1 6985. 026731842. 1507457823. 9714 324 13. 129 45 196. 368 1485.324312 853. 9065 12 027. 75 3 248. 3841 43 6920. 2856435284. 639249502. 91 6301. 63921392. 38253120685. 6 5 43653. 749 432139. 541732614. 247 Table 3. AP performance (in %) on VOC 2010 val for our detector when using more segments.

[30] K. E. A. van de Sande, J. R. R. Uijlings, T. Gevers, and A. W. M. In ICCV, Smeulders. Segmentation as selective search for object recognition. 2011. 6

[31] A. Vedaldi, V. Gulshan, M. Varma, and A. Zisserman. Multiple ker333223990088 aeroplaneaeroplaneaeroplane carcarcarcarcarcarcar boatboatboatboatboatboatboat birdbirdbirdbird chaircatcatccahtairchaircat dog(a) GT(bd)og CPMC(c) DPMdog(d )o sgegDPM Figure 4. For each method, we show top k detections for each class, where k is the number of boxes for that class in GT. For example, for an image with a chair and a cat GT box, we show the top scoring box for chair and the top scoring box for cat. nels for object detection. In ICCV, 2009. 6

[34] Y. Yu, J. Zhang, Y. Huang, S. Zheng, W. Ren, C. Wang, K. Huang,

[35]aELnC.dCZhTV.uwT,.Yaon. CPhOAebnSj,CecA tL.d,eY2tue0ilc1t0ie,o.n6anbdyWco.ntFerx teman dn.boL aste dnth oige-rlabrpch.icIanl

[32] Ym.o Ydeanlsg, fo Sr. im Haagl mea sneg,m De.n Rta timonan.a PnA,M anId, 2 C0.1 F1o.w 2lkes. Layered object

[33] J. Yao, S. Fidler, and R. Urtasun. Describing the scene as a whole: Joint object detection, scene classification and semantic segmentation. In CVPR, 2012. 1, 2 structural learning for object detection. In CVPR, 2010. 1, 2, 6 333223990199