cvpr cvpr2013 cvpr2013-71 cvpr2013-71-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Kevin Karsch, Zicheng Liao, Jason Rock, Jonathan T. Barron, Derek Hoiem
Abstract: Early work in computer vision considered a host of geometric cues for both shape reconstruction [11] and recognition [14]. However, since then, the vision community has focused heavily on shading cues for reconstruction [1], and moved towards data-driven approaches for recognition [6]. In this paper, we reconsider these perhaps overlooked “boundary” cues (such as self occlusions and folds in a surface), as well as many other established constraints for shape reconstruction. In a variety of user studies and quantitative tasks, we evaluate how well these cues inform shape reconstruction (relative to each other) in terms of both shape quality and shape recognition. Our findings suggest many new directions for future research in shape reconstruction, such as automatic boundary cue detection and relaxing assumptions in shape from shading (e.g. orthographic projection, Lambertian surfaces).
[1] J. T. Barron and J. Malik. Color constancy, intrinsic images, and shape estimation. In ECCV, 2012.
[2] J. T. Barron and J. Malik. Shape, albedo, and illumination from a single image of an unknown object. In CVPR, 2012.
[3] L. Bo, X. Ren, and D. Fox. Kernel descriptors for visual recognition. NIPS, 2010.
[4] L. Bo, X. Ren, and D. Fox. Depth kernel descriptors for object recognition. In Intelligent Robots and Systems (IROS), pages 821–826. IEEE, 2011.
[5] A. Bosch, A. Zisserman, and X. Muoz. Image classification using random forests and ferns. In ICCV. IEEE, 2007.
[6] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan. Object detection with discriminatively trained part based models. IEEE TPAMI, 32(9): 1627–1645, 2010.
[7] V. Ferrari, L. Fevrier, F. Jurie, and C. Schmid. Groups of Adjacent Contour Segments for Object Detection. IEEE TPAMI, 30(1):36–51, Jan. 2008.
[8] R. Grosse, M. K. Johnson, E. H. Adelson, and W. T. Freeman. Ground-truth dataset and baseline evaluations for in-
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19] trinsic image algorithms. ICCV, 2009. D. Hoiem, Y. Chodpathumwan, and Q. Dai. Diagnosing error in object detectors. In ECCV, 2012. J. Koenderink. What does the occluding contour tell us about solid shape. Perception, 1984. J. Malik. Interpreting line drawings of curved objects. PhD thesis, Stanford University, Stanford, CA, USA, 1986. J. Malik and D. Maydan. Recovering three-dimensional shape from a single image of curved objects. IEEE TPAMI, 11(6):555–566, June 1989. J. L. Mundy. Object recognition in the geometric era: A retrospective. In Toward Category Level Object Recognition, pages 3–29. Springer, 2006. L. G. Roberts. Machine Perception of Three-Dimensional Solids. Outstanding Dissertations in the Computer Sciences. Garland Publishing, New York, 1963. S. Shalev-Shwartz, Y. Singer, and N. Srebro. Pegasos: Primal estimated sub-gradient solver for svm. In ICML, pages 807–814. ACM, 2007. N. Silberman and R. Fergus. Indoor scene segmentation using a structured light sensor. In ICCV - Workshop on 3D Representation and Recognition, 2011. A. Vedaldi and B. Fulkerson. Vlfeat – an open and portable library of computer vision algorithms. In Proceedings of the 18th annual ACM International Conference on Multimedia, 2010. A. Vedaldi, V. Gulshan, M. Varma, and A. Zisserman. Multiple kernels for object detection. In ICCV, 2009. A. Vedaldi and A. Zisserman. Efficient additive kernels via explicit feature maps. IEEE TPAMI, 34(3):480–492, 2012. Appendix: Fold constraint implementation Consider the (i)th point on the contour C, parametrized by position p = [px , py] and tangent vector u = [ux , uy], both on the image plane. The sign of the tangent vector is arbitrary. Let us define a vector perpendicular to each tangent vector: v = [−uy , ux] . By default, this fold is convex g—en ftol vdeecdto irn: tvhe = =di [r−euction of negative Z. To construct a concave fold, we flip the sign of v. With this parametrization, we can find the positions of the points to the left and right of the point in question relative to the contour: + + p? = [round (px vx) , round (py vy)] = [round (px − vx) , round (py − vy)] pr (9) (10) Given a normal field N we compute the normal of the surface at these “left” and “right” points: N? = [Nx(p?x, py?), Ny(p?x, py?), Nz(p?x, py?)] Nr = [Nx(pxr, pyr), Ny(pxr, pyr), Nz(pxr, pyr)] pCroondsuidcte orf c n,? thaen ddo ntr: product of [ux,uy,0] c = ux(Ny?Nzr − Nz?Nyr) + uy(Nz?Nxr (11) (12) with the cross− Nx?Nzr) (13) If c = 1, then the cross product of the surface normals on both sides of the contour is exactly equal to the tangent vector, and the surface is therefore convexly folded in the direction of the contour. If c = −1, then the surface is folded and tcioonnc oafve th. eO cfo course, Iff c ct h=e s−i1g,n t hofe tnh teh eco snutrofuacre, a isnd fo othldeerdef aonred of the v vector, is flipped, then c = 1when the surface is concavely folded, etc. Intuitively, to force the surface to satisfy the fold constraint imposed by the contour, we should force c to be as close to 1 as possible. This is the insight used in edge constraint ofthe shape-from-contour algorithm in [12]. But constraining c = 1is not appropriate for our purposes, as it ignores the fact that u and therefore v lie in an image plane, while the true tangent vector of the contour may not be parallel to the image plane. To account for such contours, we will therefore penalized c for being significantly smaller than 1. More concretely, we will minimize the following cost with respect to each contour pixel: ffold(N(Z)) = ?max(0,? − c(i)), (14) ?i∈C where ? = √12. This is a sort of ?-insensitive hinge loss which allows for fold contours to be oriented as much as 45◦ out of the image plane. In practice, the value of ? effects how sharp the contours produced by the fold-constraint are —? = 0 is satisfied by a flat fronto-parallel plane, and ? = 1 is only satisfied by a perfect fold whose crease is parallel with the image plane. In our experience, ? = √12 produces folds that are roughly 90◦, and which look reasonable upon inspection. 222111667088