nips nips2011 nips2011-304 nips2011-304-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Joel Z. Leibo, Jim Mutch, Tomaso Poggio
Abstract: Many studies have uncovered evidence that visual cortex contains specialized regions involved in processing faces but not other object classes. Recent electrophysiology studies of cells in several of these specialized regions revealed that at least some of these regions are organized in a hierarchical manner with viewpointspecific cells projecting to downstream viewpoint-invariant identity-specific cells [1]. A separate computational line of reasoning leads to the claim that some transformations of visual inputs that preserve viewed object identity are class-specific. In particular, the 2D images evoked by a face undergoing a 3D rotation are not produced by the same image transformation (2D) that would produce the images evoked by an object of another class undergoing the same 3D rotation. However, within the class of faces, knowledge of the image transformation evoked by 3D rotation can be reliably transferred from previously viewed faces to help identify a novel face at a new viewpoint. We show, through computational simulations, that an architecture which applies this method of gaining invariance to class-specific transformations is effective when restricted to faces and fails spectacularly when applied to other object classes. We argue here that in order to accomplish viewpoint-invariant face identification from a single example view, visual cortex must separate the circuitry involved in discounting 3D rotations of faces from the generic circuitry involved in processing other objects. The resulting model of the ventral stream of visual cortex is consistent with the recent physiology results showing the hierarchical organization of the face processing network. 1
[1] W. Freiwald and D. Tsao, “Functional Compartmentalization and Viewpoint Generalization Within the Macaque Face-Processing System,” Science, vol. 330, no. 6005, p. 845, 2010.
[2] N. Kanwisher, J. McDermott, and M. Chun, “The fusiform face area: a module in human extrastriate cortex specialized for face perception,” The Journal of Neuroscience, vol. 17, no. 11, p. 4302, 1997.
[3] K. Grill-Spector, N. Knouf, and N. Kanwisher, “The fusiform face area subserves face perception, not generic within-category identification,” Nature Neuroscience, vol. 7, no. 5, pp. 555–562, 2004.
[4] D. Tsao, W. Freiwald, R. Tootell, and M. Livingstone, “A cortical region consisting entirely of faceselective cells,” Science, vol. 311, no. 5761, p. 670, 2006.
[5] D. Tsao, W. Freiwald, T. Knutsen, J. Mandeville, and R. Tootell, “Faces and objects in macaque cerebral cortex,” Nature Neuroscience, vol. 6, no. 9, pp. 989–995, 2003.
[6] R. Rajimehr, J. Young, and R. Tootell, “An anterior temporal face patch in human cortex, predicted by macaque maps,” Proceedings of the National Academy of Sciences, vol. 106, no. 6, p. 1995, 2009.
[7] S. Ku, A. Tolias, N. Logothetis, and J. Goense, “fMRI of the Face-Processing Network in the Ventral Temporal Lobe of Awake and Anesthetized Macaques,” Neuron, vol. 70, no. 2, pp. 352–362, 2011.
[8] M. Tarr and I. Gauthier, “FFA: a flexible fusiform area for subordinate-level visual processing automatized by expertise,” Nature Neuroscience, vol. 3, pp. 764–770, 2000.
[9] J. Z. Leibo, J. Mutch, L. Rosasco, S. Ullman, and T. Poggio, “Learning Generic Invariances in Object Recognition: Translation and Scale,” MIT-CSAIL-TR-2010-061, CBCL-294, 2010.
[10] S. Moeller, W. Freiwald, and D. Tsao, “Patches with links: a unified system for processing faces in the macaque temporal lobe,” Science, vol. 320, no. 5881, p. 1355, 2008.
[11] T. Serre, M. Kouh, C. Cadieu, U. Knoblich, G. Kreiman, and T. Poggio, “A theory of object recognition: computations and circuits in the feedforward path of the ventral stream in primate visual cortex,” CBCL Paper #259/AI Memo #2005-036, 2005.
[12] K. Fukushima, “Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position,” Biological Cybernetics, vol. 36, pp. 193–202, Apr. 1980.
[13] M. Riesenhuber and T. Poggio, “Hierarchical models of object recognition in cortex,” Nature Neuroscience, vol. 2, pp. 1019–1025, Nov. 1999.
[14] T. Serre, L. Wolf, S. Bileschi, M. Riesenhuber, and T. Poggio, “Robust Object Recognition with CortexLike Mechanisms,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 29, no. 3, pp. 411–426, 2007.
[15] B. W. Mel, “SEEMORE: Combining Color, Shape, and Texture Histogramming in a Neurally Inspired Approach to Visual Object Recognition,” Neural Computation, vol. 9, pp. 777–804, May 1997.
[16] D. Hubel and T. Wiesel, “Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex,” The Journal of Physiology, vol. 160, no. 1, p. 106, 1962.
[17] J. Mutch and D. Lowe, “Object class recognition and localization using sparse features with limited receptive fields,” International Journal of Computer Vision, vol. 80, no. 1, pp. 45–57, 2008.
[18] P. F¨ ldi´ k, “Learning invariance from transformation sequences,” Neural Computation, vol. 3, no. 2, o a pp. 194–200, 1991.
[19] S. Stringer and E. Rolls, “Invariant object recognition in the visual system with novel views of 3D objects,” Neural Computation, vol. 14, no. 11, pp. 2585–2596, 2002.
[20] L. Wiskott and T. Sejnowski, “Slow feature analysis: Unsupervised learning of invariances,” Neural computation, vol. 14, no. 4, pp. 715–770, 2002.
[21] T. Masquelier, T. Serre, S. Thorpe, and T. Poggio, “Learning complex cell invariance from natural videos: A plausibility proof,” AI Technical Report #2007-060 CBCL Paper #269, 2007.
[22] M. Spratling, “Learning viewpoint invariant perceptual representations from cluttered images,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 5, pp. 753–761, 2005. 8
[23] D. Cox, P. Meier, N. Oertelt, and J. J. DiCarlo, “’Breaking’position-invariant object recognition,” Nature Neuroscience, vol. 8, no. 9, pp. 1145–1147, 2005.
[24] N. Li and J. J. DiCarlo, “Unsupervised natural experience rapidly alters invariant object representation in visual cortex.,” Science, vol. 321, pp. 1502–7, Sept. 2008.
[25] N. Li and J. J. DiCarlo, “Unsupervised Natural Visual Experience Rapidly Reshapes Size-Invariant Object Representation in Inferior Temporal Cortex,” Neuron, vol. 67, no. 6, pp. 1062–1075, 2010.
[26] G. Wallis and H. H. B¨ lthoff, “Effects of temporal association on recognition memory.,” Proceedings of u the National Academy of Sciences of the United States of America, vol. 98, pp. 4800–4, Apr. 2001.
[27] G. Wallis, B. Backus, M. Langer, G. Huebner, and H. B¨ lthoff, “Learning illumination-and orientationu invariant representations of objects through temporal association,” Journal of vision, vol. 9, no. 7, 2009.
[28] T. Vetter, A. Hurlbert, and T. Poggio, “View-based models of 3D object recognition: invariance to imaging transformations,” Cerebral Cortex, vol. 5, no. 3, p. 261, 1995.
[29] E. Bart and S. Ullman, “Class-based feature matching across unrestricted transformations,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 30, no. 9, pp. 1618–1631, 2008.
[30] H. B¨ lthoff and S. Edelman, “Psychophysical support for a two-dimensional view interpolation theory of u object recognition,” Proceedings of the National Academy of Sciences, vol. 89, no. 1, p. 60, 1992.
[31] N. Logothetis, J. Pauls, H. B¨ lthoff, and T. Poggio, “View-dependent object recognition by monkeys,” u Current Biology, vol. 4, no. 5, pp. 401–414, 1994.
[32] P. Downing and Y. Jiang, “A cortical area selective for visual processing of the human body,” Science, vol. 293, no. 5539, p. 2470, 2001.
[33] L. Cohen, S. Dehaene, and L. Naccache, “The visual word form area,” Brain, vol. 123, no. 2, p. 291, 2000.
[34] C. Baker, J. Liu, L. Wald, K. Kwong, T. Benner, and N. Kanwisher, “Visual word processing and experiential origins of functional selectivity in human extrastriate cortex,” Proceedings of the National Academy of Sciences, vol. 104, no. 21, p. 9087, 2007. 9