emnlp emnlp2012 emnlp2012-91 emnlp2012-91-reference knowledge-graph by maker-knowledge-mining

91 emnlp-2012-Monte Carlo MCMC: Efficient Inference by Approximate Sampling

Source: pdf

Author: Sameer Singh ; Michael Wick ; Andrew McCallum

Abstract: Conditional random fields and other graphical models have achieved state of the art results in a variety of tasks such as coreference, relation extraction, data integration, and parsing. Increasingly, practitioners are using models with more complex structure—higher treewidth, larger fan-out, more features, and more data—rendering even approximate inference methods such as MCMC inefficient. In this paper we propose an alternative MCMC sam- pling scheme in which transition probabilities are approximated by sampling from the set of relevant factors. We demonstrate that our method converges more quickly than a traditional MCMC sampler for both marginal and MAP inference. In an author coreference task with over 5 million mentions, we achieve a 13 times speedup over regular MCMC inference.

reference text

[Bagga and Baldwin1998] Amit Bagga and Breck Baldwin. 1998. Algorithms for scoring coreference chains. In International Conference on Language Resources and Evaluation (LREC) Workshop on Linguistics Coreference, pages 563–566. [Bertsimas and Tsitsiklis1993] D. Bertsimas and J. Tsitsiklis. 1993. Simulated annealing. Statistical Science, pages 10–15. [Carreras2007] Xavier Carreras. 2007. Experiments with a higher-order projective dependency parser. In Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007, pages 957–961. [Christen and Fox2005] J. Andr e´s Christen and Colin Fox. 2005. Markov chain monte carlo proximation. Journal of Computational cal Statistics, 14(4):pp. 795–810. [Coughlan and Shen2007] James Coughlan Shen. 2007. Dynamic quantization for using an apand Graphiand Huiying belief propa- 1111 gation in sparse spaces. Computer Vision and Image Understanding, 106:47–58, April. [Culotta et al.2007] Aron Culotta, Michael Wick, and Andrew McCallum. 2007. First-order probabilistic models for coreference resolution. In North American Chapter of the Association for Computational Linguistics - Human Language Technologies (NAACL HLT). [Gonzalez et al.2009] Joseph Gonzalez, Yucheng Low, and Carlos Guestrin. 2009. Residual splash for optimally parallelizing belief propagation. In Artificial Intelligence and Statistics (AISTATS). [Gonzalez et al.201 1] Joseph Gonzalez, Yucheng Low, Arthur Gretton, and Carlos Guestrin. 2011. Parallel gibbs sampling: From colored fields to thin junction trees. In Artificial Intelligence and Statistics (AISTATS), Ft. Lauderdale, FL, May. [Hoffmann et al.201 1] Raphael Hoffmann, Congle Zhang, Xiao Ling, Luke Zettlemoyer, and Daniel S. Weld. 2011. Knowledge-based weak supervision for information extraction of overlapping relations. In Annual Meeting of the Association for Computational Linguistics (ACL), pages 541–550, Portland, Oregon, USA, June. Association for Computational Linguistics. [Hulten and Domingos2002] Geoff Hulten and Pedro Domingos. 2002. Mining complex models from arbitrarily large databases in constant time. In International Conference on Knowledge Discovery and Data Mining (KDD), pages 525–531, New York, NY, USA. ACM. [Kok and Domingos2005] Stanley Kok and Pedro Domingos. 2005. Learning the structure of markov logic networks. In International Conference on Machine Learning (ICML), pages 441–448, New York, NY, USA. ACM. [Kschischang et al.2001] Frank R. Kschischang, Brendan J. Frey, and Hans Andrea Loeliger. 2001. Factor graphs and the sum-product algorithm. IEEE Transactions of Information Theory, 47(2):498–519, Feb. [Lafferty et al.2001] John D. Lafferty, Andrew McCallum, and Fernando Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In International Conference on Machine Learning (ICML). [Leskovec and Faloutsos2006] Jure Leskovec and Christos Faloutsos. 2006. Sampling from large graphs. In International Conference on Knowledge Discovery and Data Mining (KDD), pages 63 1–636, New York, NY, USA. ACM. [McCallum and Wellner2004] Andrew McCallum and Ben Wellner. 2004. Conditional models of identity uncertainty with application to noun coreference. In Neural Information Processing Systems (NIPS). [McCallum et al. 1999] Andrew McCallum, Kamal Nigam, Jason Rennie, and Kristie Seymore. 1999. A machine learning approach to building domainspecific search engines. In International Joint Conference on Artificial Intelligence (IJCAI). [McCallum et al.2009] Andrew McCallum, Karl Schultz, and Sameer Singh. 2009. FACTORIE: Probabilistic programming via imperatively defined factor graphs. In Neural Information Processing Systems (NIPS). [Murray and Ghahramani2004] Iain Murray and Zoubin Ghahramani. 2004. Bayesian learning in undirected graphical models: Approximate MCMC algorithms. In Uncertainty in Artificial Intelligence (UAI). [Pasula et al.2003] H. Pasula, B. Marthi, B. Milch, S. Russell, and I. Shpitser. 2003. Identity uncertainty and citation matching. In Neural Information Processing Systems (NIPS). [Poon and Domingos2006] Hoifung Poon and Pedro Domingos. 2006. Sound and efficient inference with probabilistic and deterministic dependencies. In AAAI Conference on Artificial Intelligence. [Poon and Domingos2007] Hoifung Poon and Pedro Domingos. 2007. Joint inference in informa- tion extraction. In AAAI Conference on Artificial Intelligence, pages 913–918. [Poon et al.2008] Hoifung Poon, Pedro Domingos, and Marc Sumner. 2008. A general method for reducing the complexity of relational inference and its application to MCMC. In AAAI Conference on Artificial Intelligence. [Richardson and Domingos2006] Matthew Richardson and Pedro Domingos. 2006. Markov logic networks. Machine Learning, 62(1-2): 107–136. [Riedel and Smith2010] Sebastian Riedel and David A. Smith. 2010. Relaxed marginal inference and its application to dependency parsing. In North American Chapter of the Association for Computational Linguistics - Human Language Technologies (NAACL HLT), pages 760–768. [Singh et al.2009] Sameer Singh, Karl Schultz, and Andrew McCallum. 2009. Bi-directional joint inference for entity resolution and segmentation using imperatively-defined factor graphs. In Machine Learning and Knowledge Discovery in Databases (Lecture Notes in Computer Science) and European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD), pages 414–429. [Singh et al.2010] Sameer Singh, Michael L. Wick, and Andrew McCallum. 2010. Distantly labeling data for large scale cross-document coreference. Computing Research Repository (CoRR), abs/1005.4298. [Singh et al.201 1] Sameer Singh, Amarnag Subramanya, Fernando Pereira, and Andrew McCallum. 2011. 1112 Large-scale cross-document coreference using distributed inference and hierarchical models. In Association for Computational Linguistics: Human Language Technologies (ACL HLT). [Sutton and McCallum2004] Charles Sutton and Andrew McCallum. 2004. Collective segmentation and labeling of distant entities in information extraction. Technical Report TR#04-49, University of Massachusetts, July. [Sutton and McCallum2007] Charles Sutton and Andrew McCallum. 2007. Improved dynamic schedules for belief propagation. In Uncertainty in Artificial Intelligence (UAI). [Wick et al.2009] Michael Wick, Aron Culotta, Khashayar Rohanimanesh, and Andrew McCallum. 2009. An entity-based model for coreference resolution. In SIAM International Conference on Data Mining (SDM). [Wick et al.2010] Michael Wick, Andrew McCallum, and Gerome Miklau. 2010. Scalable probabilistic databases with factor graphs and mcmc. International Conference on Very Large Databases (VLDB), 3:794– 804, September. [Wick et al.201 1] Michael Wick, Khashayar Rohanimanesh, Kedar Bellare, Aron Culotta, and Andrew McCallum. 2011. Samplerank: Training factor graphs with atomic gradients. In International Conference on Machine Learning (ICML). [Yao et al.2010] Limin Yao, Sebastian Riedel, and Andrew McCallum. 2010. Collective cross-document relation extraction without labelled data. In Empirical Methods in Natural Language Processing (EMNLP). 1113