nips nips2001 nips2001-84 nips2001-84-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Sam T. Roweis, Lawrence K. Saul, Geoffrey E. Hinton
Abstract: High dimensional data that lies on or near a low dimensional manifold can be described by a collection of local linear models. Such a description, however, does not provide a global parameterization of the manifold—arguably an important goal of unsupervised learning. In this paper, we show how to learn a collection of local linear models that solves this more difficult problem. Our local linear models are represented by a mixture of factor analyzers, and the “global coordination” of these models is achieved by adding a regularizing term to the standard maximum likelihood objective function. The regularizer breaks a degeneracy in the mixture model’s parameter space, favoring models whose internal coordinate systems are aligned in a consistent way. As a result, the internal coordinates change smoothly and continuously as one traverses a connected path on the manifold—even when the path crosses the domains of many different local models. The regularizer takes the form of a Kullback-Leibler divergence and illustrates an unexpected application of variational methods: not to perform approximate inference in intractable probabilistic models, but to learn more useful internal representations in tractable ones. 1 Manifold Learning Consider an ensemble of images, each of which contains a face against a neutral background. Each image can be represented by a point in the high dimensional vector space of pixel intensities. This representation, however, does not exploit the strong correlations between pixels of the same image, nor does it support many useful operations for reasoning about faces. If, for example, we select two images with faces in widely different locations and then average their pixel intensities, we do not obtain an image of a face at their average location. Images of faces lie on or near a low-dimensional, curved manifold, and we can represent them more usefully by the coordinates on this manifold than by pixel intensities. Using these “intrinsic coordinates”, the average of two faces is another face with the average of their locations, poses and expressions. To analyze and manipulate faces, it is helpful to imagine a “magic black box” with levers or dials corresponding to the intrinsic coordinates on this manifold. Given a setting of the levers and dials, the box generates an image of a face. Given an image of a face, the box deduces the appropriate setting of the levers and dials. In this paper, we describe a fairly general way to construct such a box automatically from an ensemble of high-dimensional vectors. We assume only that there exists an underlying manifold of low dimensionality and that the relationship between the raw data and the manifold coordinates is locally linear and smoothly varying. Thus our method applies not only to images of faces, but also to many other forms of highly distributed perceptual and scientific data (e.g., spectrograms of speech, robotic sensors, gene expression arrays, document collections). 2 Local Linear Models The global structure of perceptual manifolds (such as images of faces) tends to be highly nonlinear. Fortunately, despite their complicated global structure, we can usually characterize these manifolds as locally linear. Thus, to a good approximation, they can be represented by collections of simpler models, each of which describes a locally linear neighborhood[3, 6, 8]. For unsupervised learning tasks, a probabilistic model that nicely captures this intuition is a mixture of factor analyzers (MFA)[5]. The model is used to describe high dimensional data that lies on or near a lower dimensional manifold. MFAs parameterize a joint distribution over observed and hidden variables: (1) where the observed variable, , represents the high dimensional data; the discrete hidden variables, , indexes different neighborhoods on the manifold; and the continuous hidden variables, , represent low dimensional local coordinates. The model assumes that data is sampled from different neighborhoods on the manifold with prior probabilities , and that within each neighborhood, the data’s local coordinates are normally distributed1 as: RP&¤§¢ Q ¡ I 0 ( 3HGF D C¥@@@¥ 8¥75 ( E¨BAA9¨©©64§ 2 0 ( 31)£ ¥ ¡ ¡ ¥ ¡ ¥ §¥ ¡ &¤§¢'&§ %#¤¢$#¨
[1] D. Beymer & T. Poggio. Image representations for visual learning. pringerScience 272 (1996).
[2] C. Bishop, M. Svensen, and C. Williams. GTM: The generative topographic mapping. Neural Computation 10 (1998).
[3] C. Bregler & S. Omohundro. Nonlinear image interpolation using manifold learning. Advances in Neural Information Processing Systems 7 (1995).
[4] D. DeMers & G.W. Cottrell. Nonlinear dimensionality reduction. Advances in Neural Information Processing Systems 5 (1993).
[5] Ghahramani, Z. and Hinton, G. The EM algorithm for mixtures of factor analyzers. University of Toronto Technical Report CRG-TR-96-1 (1996).
[6] Hinton, G., Dayan, P., and Revow, M. Modeling the manifolds of images of handwritten digits. IEEE Transactions on Neural Networks 8 (1997).
[7] M. Jordan, Z. Ghahramani, T. Jaakkola, and L. Saul. An introduction to variational methods for graphical models. Machine Learning 37(2) (1999).
[8] N. Kambhatla and T. K. Leen. Dimension reduction by local principal component analysis. Neural Computation 9 (1997).
[9] S. T. Roweis & L. K. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science 290 (2000).
[10] J. B. Tenenbaum, V. de Silva, and J. C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science 290 (2000).