nips nips2002 nips2002-182 nips2002-182-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: William T. Freeman, Antonio Torralba
Abstract: The goal of low-level vision is to estimate an underlying scene, given an observed image. Real-world scenes (eg, albedos or shapes) can be very complex, conventionally requiring high dimensional representations which are hard to estimate and store. We propose a low-dimensional representation, called a scene recipe, that relies on the image itself to describe the complex scene configurations. Shape recipes are an example: these are the regression coefficients that predict the bandpassed shape from image data. We describe the benefits of this representation, and show two uses illustrating their properties: (1) we improve stereo shape estimates by learning shape recipes at low resolution and applying them at full resolution; (2) Shape recipes implicitly contain information about lighting and materials and we use them for material segmentation.
[1] E. H. Adelson. Lightness perception and lightness illusions. In M. Gazzaniga, editor, The New Cognitive Neurosciences, pages 339–351. MIT Press, 2000.
[2] C. M. Bishop. Neural networks for pattern recognition. Oxford, 1995.
[3] A. Gilchrist et al. An anchoring theory of lightness. Psychological Review, 106(4):795–834, 1999.
[4] W. T. Freeman. The generic viewpoint assumption in a framework for visual perception. Nature, 368(6471):542–545, April 7 1994.
[5] B. K. P. Horn and M. J. Brooks, editors. Shape from shading. The MIT Press, Cambridge, MA, 1989.
[6] T. Leung and J. Malik. Representing and recognizing the visual appearance of materials using three-dimensional textons. Intl. J. Comp. Vis., 43(1):29–44, 2001.
[7] A. P. Pentland. Linear shape from shading. Intl. J. Comp. Vis., 1(4):153–162, 1990.
[8] M. Pollefeys, R. Koch, and L. V. Gool. A simple and efficient rectification method for general motion. In Intl. Conf. on Computer Vision (ICCV), pages 496–501, 1999.
[9] R. A. Rensink. The dynamic representation of scenes. Vis. Cognition, 7:17–42, 2000.
[10] S. Sclaroff and A. Pentland. Generalized implicit functions for computer graphics. In Proc. SIGGRAPH 91, volume 25, pages 247–250, 1991. In Computer Graphics, Annual Conference Series.
[11] J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Pattern Analysis and Machine Intelligence, 22(8):888–905, 2000.
[12] E. P. Simoncelli. Statistical models for images: Compression, restoration and synthesis. In 31st Asilomar Conf. on Sig., Sys. and Computers, Pacific Grove, CA, 1997.
[13] E. P. Simoncelli and W. T. Freeman. The steerable pyramid: a flexible architecture for multi-scale derivative computation. In 2nd Annual Intl. Conf. on Image Processing, Washington, DC, 1995. IEEE.
[14] R. Szeliski. Bayesian modeling of uncertainty in low-level vision. Intl. J. Comp. Vis., 5(3):271–301, 1990.
[15] M. F. Tappen, W. T. Freeman, and E. H. Adelson. Recovering intrinsic images from a single image. In Adv. in Neural Info. Proc. Systems, volume 15. MIT Press, 2003.
[16] A. Torralba and W. T. Freeman. Properties and applications of shape recipes. Technical Report AIM-2002-019, MIT AI lab, 2002.
[17] Y. Weiss. Bayesian motion estimation and segmentation. PhD thesis, M.I.T., 1998.
[18] Z. Zhang. Determining the epipolar geometry and its uncertainty: A review. Technical Report 2927, Sophia-Antipolis Cedex, France, 1996. see http://wwwsop.inria.fr/robotvis/demo/f-http/html/.
[19] C. L. Zitnick and T. Kanade. A cooperative algorithm for stereo matching and occlusion detection. IEEE Pattern Analysis and Machine Intelligence, 22(7), July 2000.