nips nips2000 nips2000-22 nips2000-22-reference knowledge-graph by maker-knowledge-mining

22 nips-2000-Algorithms for Non-negative Matrix Factorization

Source: pdf

Author: Daniel D. Lee, H. Sebastian Seung

Abstract: Non-negative matrix factorization (NMF) has previously been shown to be a useful decomposition for multivariate data. Two different multiplicative algorithms for NMF are analyzed. They differ only slightly in the multiplicative factor used in the update rules. One algorithm can be shown to minimize the conventional least squares error while the other minimizes the generalized Kullback-Leibler divergence. The monotonic convergence of both algorithms can be proven using an auxiliary function analogous to that used for proving convergence of the ExpectationMaximization algorithm. The algorithms can also be interpreted as diagonally rescaled gradient descent, where the rescaling factor is optimally chosen to ensure convergence.

reference text

[1] Jolliffe, IT (1986). Principal Component Analysis. New York: Springer-Verlag.

[2] Turk, M & Pentland, A (1991). Eigenfaces for recognition. J. Cogn. Neurosci. 3, 71- 86.

[3] Gersho, A & Gray, RM (1992). Vector Quantization and Signal Compression. Kluwer Acad. Press.

[4] Lee, DD & Seung, HS . Unsupervised learning by convex and conic coding (1997). Proceedings of the Conference on Neural Information Processing Systems 9, 515- 521.

[5] Lee, DD & Seung, HS (1999). Learning the parts of objects by non-negative matrix factorization. Nature 401, 788- 791.

[6] Field, DJ (1994). What is the goal of sensory coding? Neural Comput. 6, 559-601.

[7] Foldiak, P & Young, M (1995). Sparse coding in the primate cortex. The Handbook of Brain Theory and Neural Networks, 895- 898. (MIT Press, Cambridge, MA).

[8] Press, WH, Teukolsky, SA, Vetterling, WT & Flannery, BP (1993). Numerical recipes: the art of scientific computing. (Cambridge University Press, Cambridge, England).

[9] Shepp, LA & Vardi, Y (1982) . Maximum likelihood reconstruction for emission tomography. IEEE Trans . MI-2, 113- 122.

[10] Richardson, WH (1972) . Bayesian-based iterative method of image restoration. 1. Opt. Soc. Am. 62, 55- 59.

[11] Lucy, LB (1974). An iterative technique for the rectification of observed distributions. Astron. J. 74, 745- 754.

[12] Bouman, CA & Sauer, K (1996). A unified approach to statistical tomography using coordinate descent optimization. IEEE Trans. Image Proc. 5, 480--492.

[13] Paatero, P & Tapper, U (1997). Least squares formulation of robust non-negative factor analysis. Chemometr. Intell. Lab. 37, 23- 35.

[14] Kivinen, J & Warmuth, M (1997). Additive versus exponentiated gradient updates for linear prediction. Journal of Tnformation and Computation 132, 1-64.

[15] Dempster, AP, Laird, NM & Rubin, DB (1977). Maximum likelihood from incomplete data via the EM algorithm. J. Royal Stat. Soc. 39, 1-38.

[16] Saul, L & Pereira, F (1997). Aggregate and mixed-order Markov models for statistical language processing. In C. Cardie and R. Weischedel (eds). Proceedings of the Second Conference on Empirical Methods in Natural Language Processing, 81- 89. ACL Press.