jmlr jmlr2005 jmlr2005-23 jmlr2005-23-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Asela Gunawardana, William Byrne
Abstract: The EM algorithm is widely used to develop iterative parameter estimation procedures for statistical models. In cases where these procedures strictly follow the EM formulation, the convergence properties of the estimation procedures are well understood. In some instances there are practical reasons to develop procedures that do not strictly fall within the EM framework. We study EM variants in which the E-step is not performed exactly, either to obtain improved rates of convergence, or due to approximations needed to compute statistics under a model family over which E-steps cannot be realized. Since these variants are not EM procedures, the standard (G)EM convergence results do not apply to them. We present an information geometric framework for describing such algorithms and analyzing their convergence properties. We apply this framework to analyze the convergence properties of incremental EM and variational EM. For incremental EM, we discuss conditions under these algorithms converge in likelihood. For variational EM, we show how the E-step approximation prevents convergence to local maxima in likelihood. Keywords: EM, variational EM, incremental EM, convergence, information geometry
S.-I. Amari. Information geometry of the EM and em algorithms for neural networks. Neural Networks, 8(9):1379–1408, 1995. R. A. Boyles. On the convergence of the EM algorithm. Journal of the Royal Statistical Society, Series B, 45(1):47–50, 1983. W. Byrne. Alternating minimization and Boltzman machine learning. IEEE Transactions on Neural Networks, 3(4):612–620, 1992. W. Byrne and A. Gunawardana. Comments on ‘Efficient training algorithms for HMM’s using incremental estimation’. IEEE Transactions on Speech and Audio Processing, 8(6):751–754, November 2000. I. Csisz´ r. I-divergence geometry of probability distributions and minimization problems. Annals a of Probability, 3(1):146–158, 1975. I. Csisz´ r and G. Tusn´ dy. Information geometry and alternating minimization procedures. Statistics a a and Decisions, Supplemental Issue Number 1, pages 205–237, 1984. I. Csisz´ r. Information theory and statistics. ENEE 728F. Lecture Notes, Spring 1990. University a of Maryland. J. N. Darroch and D. Ratcliff. Generalized iterative scaling for log-linear models. Annals of Mathematical Statistics, 43(5):1470–1480, 1972. A. P. Dempster, A. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data. Journal of the Royal Statistical Society, Series B, 39(1):1–38, 1977. V. Digalakis. On-line adaptation of Hidden Markov Models using incremental estimation models. In European Conference on Speech Communication and Technology, pages 1859–1862, 1997. B. Efron. Defining the curvature of a statistical problem (with applications to second order efficiency). Annals of Statistics, 3(6):1189–1242, 1975. 2072 C ONVERGENCE OF GAM P ROCEDURES R. A. Fisher. On the mathematical foundations of theoretical statistics. Philosophical Transactions of the Royal Society A, 222:309–768, 1922. A. Gunawardana. The Information Geometry of EM Variants for Speech and Image Processing. PhD thesis, The Johns Hopkins University, 2001. I.-T. Hsiao, A. Rangarajan, P. Khurd, and G. Gindi. Fast, globally convergent, reconstruction in emission tomography using COSEM, an incremental EM algorithm. IEEE Transactions on Medical Imaging, 2004. Submitted. M. I. Jordan, Z. Ghahramani, T. S. Jaakkola, and L. K. Saul. An introduction to varational methods for graphical models. In Michael I. Jordan, editor, Learning in Graphical Models, pages 105–162. MIT Press, 1999. E. L. Lehmann. Efficient likelihood estimators. The American Statistician, 34(4):233–235, November 1980. F. Liese and I. Vajda. Convex Statistical Distances. Teubner Verlagsgesellschaft, Liepzig, 1987. G. J. McLachlan and T. Krishnan. The EM Algorithm and Extensions. Wiley, 1997. I. Meilijson. A fast improvement to the EM algorithm on its own terms. Journal of the Royal Statistical Society, Series B, 51(1):127–138, 1989. R. M. Neal and G. E. Hinton. A view of the EM algorithm that justifies incremental and other variants. In M. I. Jordan, editor, Learning in Graphical Models. Kluwer Academic Press, 1998. R. Salakhutdinov, S. T. Roweis, and Z. Gharamani. On the convergence of bound optimization algorithms. In Conference on Uncertainty in Artificial Intelligence, volume 19, 2003. B. Thiesson, C. Meek, and D. Heckerman. Accelerating EM for large databases. Machine Learning, pages 279–299, 2001. A. Wald. Note on the consistency of the maximum likelihood estimate. Annals of Mathematical Statistics, 20, 1949. C. F. J. Wu. On the convergence properties of the EM algorithm. Annals of Statistics, 11(1):95–103, 1983. W. I. Zangwill. Nonlinear Programming: A Unified Approach. Prentice-Hall, 1969. 2073