jmlr jmlr2010 jmlr2010-109 jmlr2010-109-reference knowledge-graph by maker-knowledge-mining

109 jmlr-2010-Stochastic Composite Likelihood


Source: pdf

Author: Joshua V. Dillon, Guy Lebanon

Abstract: Maximum likelihood estimators are often of limited practical use due to the intensive computation they require. We propose a family of alternative estimators that maximize a stochastic variation of the composite likelihood function. Each of the estimators resolve the computation-accuracy tradeoff differently, and taken together they span a continuous spectrum of computation-accuracy tradeoff resolutions. We prove the consistency of the estimators, provide formulas for their asymptotic variance, statistical robustness, and computational complexity. We discuss experimental results in the context of Boltzmann machines and conditional random fields. The theoretical and experimental studies demonstrate the effectiveness of the estimators when the computational resources are insufficient. They also demonstrate that in some cases reduced computational complexity is associated with robustness thereby increasing statistical accuracy. Keywords: Markov random fields, composite likelihood, maximum likelihood estimation


reference text

B. Arnold and D. Strauss. Pseudolikelihood estimation: some examples. Sankhya B, 53:233–243, 1991. J. Besag. Spatial interaction and the statistical analysis of lattice systems (with discussion). Journal of the Royal Statistical Society B, 36(2):192–236, 1974. Y. Bishop, S. Fienberg, and P. Holland. Discrete multivariate analysis: theory and practice. MIT press, 1975. L´ on Bottou and Olivier Bousquet. Learning using large datasets. In Mining Massive DataSets for e Security. IOS Press, 2008. R. Casella and C. Robert. Monte Carlo Statistical Methods. Springer Verlag, second edition, 2004. T. M. Cover and J. A. Thomas. Elements of Information Theory. John Wiley & Sons, second edition, 2005. D. R. Cox and E. J. Snell. A general definition of residuals (with discussion). Journal of the Royal Statistical Society B, 1968. J. V. Dillon, K. Balasubramanian, and G. Lebanon. Asymptotic analysis of generative semisupervised learning. In Proc. of the International Conference on Machine Learning, 2010. T. S. Ferguson. A Course in Large Sample Theory. Chapman & Hall, 1996. 2632 S TOCHASTIC C OMPOSITE L IKELIHOOD G. Hinton and T. Sejnowski. Optimal perceptual inference. In Proc. Computer Vision and Pattern Recognition, 1983. N. Hjort and C. Varin. ML, PL, and QL in markov chain models. Scandinavian Journal of Statistics, 35(1):64–82, 2008. R. Horn and C. R. Johnson. Matrix Analysis. Cambridge University Press, 1990. M. I. Jordan, Z. Ghahramani, T. S. Jaakkola, and L. K. Saul. An introduction to variational methods for graphical models. Machine Learning, 37(2):183–233, 1999. D. Lewis, Y. Yang, T. Rose, and F. Li. RCV1: A new benchmark collection for text categorization research. Journal of Machine Learning Research, 5:361–397, 2004. G. Liang and B. Yu. Maximum pseudo likelihood estimation in network tomography. IEEE Trans. Signal Process, 51(8):2043–2053, 2003. P. Liang and M. I. Jordan. An asymptotic analysis of generative, discriminative, and pseudolikelihood estimators. In Proc. of the International Conference on Machine Learning, 2008. B. G. Lindsay. Composite likelihood methods. Contemporary Mathematics, 80:221–239, 1988. D. J. C. MacKay. Equivalence of linear boltzmann chains and hidden markov models. Neural Computation, 8(1):178–181, 1996. Y. Mao and G. Lebanon. Isotonic conditional random fields and local sentiment flow. In Advances in Neural Information Processing Systems 19, pages 961–968, 2007. R. J. Serfling. Approximation Theorems of Mathematical Statistics. John Wiley, 1980. F. Sha and F. Pereira. Shallow parsing with conditional random fields. Proceedings of HLT-NAACL, pages 213–220, 2003. C. Sutton and A. McCallum. Piecewise pseudolikelihood for efficient training of conditional random fields. In Proc. of the International Conference on Machine Learning, 2007. A. W. van der Vaart. Asymptotic Statistics. Cambridge University Press, 1998. C. Varin and P. Vidoni. A note on composite likelihood inference and model selection. Biometrika, 92:519–528, 2005. D. Vickrey, C. Lin, and D. Koller. Non-local contrastive objectives. In Proc. of the International Conference on Machine Learning, 2010. E. P. Xing, M. I. Jordan, and S. Russell. A generalized mean field algorithm for variational inference in exponential families. In Proc. of Uncertainty in Artificial Intelligence, 2003. S.-C. Zhu and X. Liu. Learning in Gibbsian fields: How accurate and how fast can it be? IEEE Trans. Pattern Analysis, 24(7):1001–1006, 2002. 2633